Blog

Aug 16 2021

How Can SEO Testing Increase Traffic and Profits?

Quality is the most important criterion by which Google ranks websites in SERP. The more convenient, useful, and interesting a web application is, the more users turn to it. The growing number of visitors makes it clear for the search engine that the website has value and should be raised higher so that it can be viewed by as many people as possible. Being in the TOP 10 in the search results is a paramount task, as statistically, only 0.78% of users reach the second page of Google. Therefore, companies should give consideration to not only an SEO audit but also SEO testing because it directly affects the traffic and profits of an organization. Let’s figure out how it works.

What SEO testing is and why it is important

SEO testing is not much different from other types of testing. It involves searching for errors: 404 pages, broken links, irrelevant code, problems with loading pages, visual defects, and so on. All this can be found during UX testing, performance testing, or cross-platform testing. And the goal of SEO testing is to identify problems that can arise after changes in a website before they affect the quality of the web application and organic traffic.

From a tester’s point of view, the 404 error is not a significant bug: it doesn’t cause disruption of the website, and the visitor can go to another page and continue the search. But from the SEO perspective, hundreds of 404 pages can significantly reduce organic traffic. According to Google, 61% of users will leave a website if it has access issues. The search engine sees a lot of low-quality pages with duplicate content and lowers the “intruder” in the ranking.

Here’s another example of the direct impact of errors on traffic. Let’s suppose developers have updated an important page and deleted its heading – H1- by accident. And H1 is one of the mandatory ranking factors, according to which the search engine determines the page content. This is the heading of the page that users see. If H1 is deleted, this important page will simply drop out of the search results, which will lead to a drop in traffic.

From these examples, it becomes clear that any change can lead to the accidental loss of data important for SEO, which will affect the quality of the website and its ranking in the search engine. That’s why testers should devote their attention to SEO. This will allow a business to maintain an unbreakable chain:

The higher the quality of the website that QA provides is, the higher the web application is ranked.
The higher the search engine score is, the more organic traffic comes to the website.
The higher the organic traffic is, the more orders for products or paid services the business receives.
The more leads the business has, the better.

No website is immune to problems with SEO. Google has over 200 ranking factors, and the search engine changes its algorithms up to 1000 times a year. You needn’t strive to please it. What’s important is to create a high-quality and user-friendly website and check it after each change in order to provide yourself with the first positions in the search results.

An SEO audit and SEO testing: what’s the difference?

Sometimes non-experts get confused about the difference between an SEO audit and SEO testing. The SEO audit is an analysis of the current state of the website manually or using a special tool. This procedure helps to understand whether the pages are indexed, whether meta tags are written for them, whether the images are optimized, etc. In other words, such research helps to identify content gaps or deficiencies in the information architecture.

SEO testing involves tracking the results after changes to assess their impact or effectiveness. A well-tuned QA process for SEO includes:

Benchmarking testing, where two versions of the source code are compared (intermediate and production);
Testing elements important for SEO (for example, metadata);
Automation (using tools that collect all changes between preparation and production);
Monitoring changes when the application is in production;
An archive of web pages, which contains a history of changes and source code you can return to in the case of a traffic drop.

QA testing helps to identify problems with a website before a product hits the market. The practice of good QA for SEO works as a safety cable, eliminating potential problems and reducing the number of bug fixes.

How often should SEO testing be done?

SEO testing is worth doing as your website is updated. As QA practitioners note, identifying SEO bugs is quite difficult – they may not affect the overall functionality of the website. It takes time for the search engines to re-index the website after bugs are found and fixed. If an error is not instantly eliminated, the website will no longer be included in the first search results.

Fortunately, this is much easier to do, since experts have access to tools that automate the process of collecting data. They identify quality control problems that need to be solved. This makes testers’ jobs much easier, as they no longer need to manually check every page, link, or image.

SEO testing helps to prevent failed migrations, fraudulent redirects, unintentional indexing, disappearing tags, and more. It allows specialists to check important elements for ranking: broken links, missing content, page load speed, missing metadata, and other issues that affect SEO performance.

With so many hands working on a website (developers, designers, project managers, and so on), every new update poses a risk. Since these updates directly affect the sales and success of a business, SEO testing should be an important part of increasing organic search traffic on Google. Therefore, as part of SEO promotion, it is worth tapping into the services of QA specialists who focus on finding defects and improving the quality of software.

Source Prolead brokers usa

Goldman 0 Comments

Aug 16 2021

Machine learning with H2O in R / Python

In this blog, we shall discuss about how to use H2O to build a few supervised machine learning models. H2O is a Java-based software for data modeling and general computing, with the primary purpose of it being a distributed, parallel, in memory processing engine. It needs to be installed first (instructions) and by default an H2O instance will run on localhost:54321. Additionally, one needs to install R/python clients to to communicate with the H2O instance. Every new R / python session first needs to initialize a connection between the python client and the H2O cluster.

The problems to be described in this blog appeared in the exercises / projects in the Coursera course “Practical Machine Learning on H2O,” by H2O. The problem statements / descriptions / steps are taken from the course itself. We shall use the concepts from the course, in order to:

to build a few machine learning / deep learning models using different algorithms (such as Gradient Boosting, Random Forest, Neural Net, Elastic Net GLM etc.),
to review the classic bias-variance tradeoff (overfitting)
for hyper-parameter tuning using Grid Search
to use AutoML to automatically find a bunch of good performing models
to use Stacked Ensembles of models to improve performance.

Problem 1

In this problem we will create an artificial data set, then run random forest / GBM on it with H2O, to create two supervised models for classification, one that is reasonable and another one that shows clear over-fitting. We will use R client (package) for H2O for this problem.

Let’s first create a data set to predict an employee’s job satisfaction in an organization. Let’s say an employee’s job satisfaction depends on the following factors (there are several other factors in general, but we shall limit us to the following few ones):
- work environment
- pay
- flexibility
- relationship with manager
- age

set.seed(321) # Let's say an employee's job satisfaction depends on the work environment, pay, flexibility, relationship with manager and age. N <- 1000 # number of samples d <- data.frame(id = 1:N) d$workEnvironment <- sample(1:5, N, replace=TRUE) # on a scale of 1-5, 1 being bad and 5 being good v <- round(rnorm(N, mean=60000, sd=20000)) # 68% are 40-80k v <- pmax(v, 20000) v <- pmin(v, 100000) #table(v) d$pay <- v d$flexibility <- sample(1:5, N, replace=TRUE) # on a scale of 1-5, 1 being bad and 5 being good d$managerRel <- sample(1:5, N, replace=TRUE) # on a scale of 1-5, 1 being bad and 5 being good d$age <- round(runif(N, min=20, max=60)) head(d) # id workEnvironment pay flexibility managerRel age #1 1 2 20000 2 2 21 #2 2 5 75817 1 2 31 #3 3 5 45649 5 3 25 #4 4 1 47157 1 5 55 #5 5 2 69729 2 4 33 #6 6 1 75101 2 2 39 v <- 125 * (d$pay/1000)^2 # e.g., job satisfaction score is proportional to square of pay (hypothetically) v <- v + 250 / log(d$age) # e.g., inversely proportional to log of age v <- v + 5 * d$flexibility v <- v + 200 * d$workEnvironment v <- v + 1000 * d$managerRel^3 v <- v + runif(N, 0, 5000) v <- 100 * (v - 0) / (max(v) - min(v)) # min-max normalization to bring the score in 0-100 d$jobSatScore <- round(v) # Round to nearest integer (percentage)

2. Let’s start h2o, and import the data.

library(h2o) h2o.init() as.h2o(d, destination_frame = "jobsatisfaction") jobsat <- h2o.getFrame("jobsatisfaction") # |===========================================================================================================| 100% # id workEnvironment pay flexibility managerRel age jobSatScore #1 1 2 20000 2 2 21 5 #2 2 5 75817 1 2 31 55 #3 3 5 45649 5 3 25 22 #4 4 1 47157 1 5 55 30 #5 5 2 69729 2 4 33 51 #6 6 1 75101 2 2 39 54

3. Let’s split the data. Here we plan to use cross-validation.

parts <- h2o.splitFrame( jobsat, ratios = 0.8, destination_frames=c("jobsat_train", "jobsat_test"), seed = 321) train <- h2o.getFrame("jobsat_train") test <- h2o.getFrame("jobsat_test") norw(train) # 794 norw(test) # 206 rows y <- "jobSatScore" x <- setdiff(names(train), c("id", y))

4. Let’s choose the gradient boosting model (gbm), and create a model. It’s a regression model since the output variable is treated to be continuous.

# the reasonable model with 10-fold cross-validation m_res <- h2o.gbm(x, y, train, model_id = "model10foldsreasonable", ntrees = 20, nfolds = 10, seed = 123) > h2o.performance(m_res, train = TRUE) # RMSE 2.973807 #H2ORegressionMetrics: gbm #** Reported on training data. ** #MSE: 8.069509 #RMSE: 2.840688 #MAE: 2.266134 #RMSLE: 0.1357181 #Mean Residual Deviance : 8.069509 > h2o.performance(m_res, xval = TRUE) # RMSE 3.299601 #H2ORegressionMetrics: gbm #** Reported on cross-validation data. ** #** 10-fold cross-validation on training data (Metrics computed for combined holdout predictions) ** #MSE: 8.84353 #RMSE: 2.973807 #MAE: 2.320899 #RMSLE: 0.1384746 #Mean Residual Deviance : 8.84353 > h2o.performance(m_res, test) # RMSE 0.6476077 #H2ORegressionMetrics: gbm #MSE: 10.88737 #RMSE: 3.299601 #MAE: 2.524492 #RMSLE: 0.1409274 #Mean Residual Deviance : 10.88737

5. Let’s try some alternative parameters, to build a different model, and show how the results differ.

# overfitting model with 10-fold cross-validation m_ovf <- h2o.gbm(x, y, train, model_id = "model10foldsoverfitting", ntrees = 2000, max_depth = 20, nfolds = 10, seed = 123) > h2o.performance(m_ovf, train = TRUE) # RMSE 0.004474786 #H2ORegressionMetrics: gbm #** Reported on training data. ** #MSE: 2.002371e-05 #RMSE: 0.004474786 #MAE: 0.0007455944 #RMSLE: 5.032019e-05 #Mean Residual Deviance : 2.002371e-05 > h2o.performance(m_ovf, xval = TRUE) # RMSE 0.6801615 #H2ORegressionMetrics: gbm #** Reported on cross-validation data. ** #** 10-fold cross-validation on training data (Metrics computed for combined holdout predictions) ** #MSE: 0.4626197 #RMSE: 0.6801615 #MAE: 0.4820542 #RMSLE: 0.02323415 #Mean Residual Deviance : 0.4626197 > h2o.performance(m_ovf, test) # RMSE 0.4969761 #H2ORegressionMetrics: gbm #MSE: 0.2469853 #RMSE: 0.4969761 #MAE: 0.3749822 #RMSLE: 0.01698435 #Mean Residual Deviance : 0.2469853

Problem 2

Predict Chocolate Makers Location with Deep Learning Model with H2O

The data is available here: http://coursera.h2o.ai/cacao.882.csv

This is a classification problem. We need to predict “Maker Location.” In other words, using the rating, and the other fields, how accurately we can identify if it is Belgian chocolate, French chocolate, and so on. We shall use python client (library) for H2O for this problem.

Let’s start H2O, load the data set, and split it. By the end of this stage we should have
three variables, pointing to three data frames on H2O: train, valid, test. However, if you are choosing to use
cross-validation, you will only have two: train and test.

import H2O import pandas as pd import numpy as np import matplotlib.pyplot as plt df = pd.read_csv('http://coursera.h2o.ai/cacao.882.csv') print(df.shape) # (1795, 9) df.head()

	Maker	Origin	REF	Review Date	Cocoa Percent	Maker Location	Rating	Bean Origin
0	A. Morin	Agua Grande	1876	2016	63%	France	3.75	Sao Tome
1	A. Morin	Kpime	1676	2015	70%	France	2.75	Togo
2	A. Morin	Atsane	1676	2015	70%	France	3.00	Togo
3	A. Morin	Akata	1680	2015	70%	France	3.50	Togo
4	A. Morin	Quilla	1704	2015	70%	France	3.50	Peru

print(df['Maker Location'].unique()) # ['France' 'U.S.A.' 'Fiji' 'Ecuador' 'Mexico' 'Switzerland' 'Netherlands' # 'Spain' 'Peru' 'Canada' 'Italy' 'Brazil' 'U.K.' 'Australia' 'Wales' # 'Belgium' 'Germany' 'Russia' 'Puerto Rico' 'Venezuela' 'Colombia' 'Japan' # 'New Zealand' 'Costa Rica' 'South Korea' 'Amsterdam' 'Scotland' # 'Martinique' 'Sao Tome' 'Argentina' 'Guatemala' 'South Africa' 'Bolivia' # 'St. Lucia' 'Portugal' 'Singapore' 'Denmark' 'Vietnam' 'Grenada' 'Israel' # 'India' 'Czech Republic' 'Domincan Republic' 'Finland' 'Madagascar' # 'Philippines' 'Sweden' 'Poland' 'Austria' 'Honduras' 'Nicaragua' # 'Lithuania' 'Niacragua' 'Chile' 'Ghana' 'Iceland' 'Eucador' 'Hungary' # 'Suriname' 'Ireland'] print(len(df['Maker Location'].unique())) # 60 loc_table = df['Maker Location'].value_counts() print(loc_table) #U.S.A. 764 #France 156 #Canada 125 #U.K. 96 #Italy 63 #Ecuador 54 #Australia 49 #Belgium 40 #Switzerland 38 #Germany 35 #Austria 26 #Spain 25 #Colombia 23 #Hungary 22 #Venezuela 20 #Madagascar 17 #Japan 17 #New Zealand 17 #Brazil 17 #Peru 17 #Denmark 15 #Vietnam 11 #Scotland 10 #Guatemala 10 #Costa Rica 9 #Israel 9 #Argentina 9 #Poland 8 #Honduras 6 #Lithuania 6 #Sweden 5 #Nicaragua 5 #Domincan Republic 5 #South Korea 5 #Netherlands 4 #Amsterdam 4 #Puerto Rico 4 #Fiji 4 #Sao Tome 4 #Mexico 4 #Ireland 4 #Portugal 3 #Singapore 3 #Iceland 3 #South Africa 3 #Grenada 3 #Chile 2 #St. Lucia 2 #Bolivia 2 #Finland 2 #Martinique 1 #Eucador 1 #Wales 1 #Czech Republic 1 #Suriname 1 #Ghana 1 #India 1 #Niacragua 1 #Philippines 1 #Russia 1 #Name: Maker Location, dtype: int64 loc_table.hist()

As can be seen from the above table, some of the locations have too few records, which will result in poor accuracy of the model to be learnt on after splitting the dataset into train, validation and test datasets. Let’s get rid of the locations that have small number of (< 40) examples in the dataset, to make the results more easily comprehendible, by reducing number of categories in the output variable.

## filter out the countries for which there is < 40 examples present in the dataset loc_gt_40_recs = loc_table[loc_table >= 40].index.tolist() df_sub = df[df['Maker Location'].isin(loc_gt_40_recs)] # now connect to H2O h2o.init() # h2o.clusterStatus()

H2O cluster uptime:	1 day 14 hours 48 mins
H2O cluster version:	3.13.0.3978
H2O cluster version age:	4 years and 9 days !!!
H2O cluster name:	H2O_started_from_R_Sandipan.Dey_kpl973
H2O cluster total nodes:	1
H2O cluster free memory:	2.530 Gb
H2O cluster total cores:	4
H2O cluster allowed cores:	4
H2O cluster status:	locked, healthy
H2O connection url:	http://localhost:54321
H2O connection proxy:	None
H2O internal security:	False
H2O API Extensions:	Algos, AutoML, Core V3, Core V4
Python version:	3.7.6 final

h2o_df = h2o.H2OFrame(df_sub.values, destination_frame = "cacao_882", column_names=[x.replace(' ', '_') for x in df.columns.tolist()]) #h2o_df.head() #h2o_df.summary() df_cacao_882 = h2o.get_frame('cacao_882') # df_cacao_882.as_data_frame() #df_cacao_882.head() df_cacao_882.describe()

	Maker	Origin	REF	Review_Date	Cocoa_Percent	Maker_Location	Rating	Bean_Type	Bean_Origin
type	enum	enum	int	int	enum	enum	real	enum	enum
mins			5.0	2006.0			1.0
mean			1025.8849294729039	2012.273942093541			3.1818856718633928
maxs			1952.0	2017.0			5.0
sigma			553.7812013716441	2.978615633185091			0.4911459825968248
zeros			0	0			0
missing	0	0	0	0	0	0	0	0	0
0	A. Morin	Agua Grande	1876.0	2016.0	63%	France	3.75	<0xA0>	Sao Tome
1	A. Morin	Kpime	1676.0	2015.0	70%	France	2.75	<0xA0>	Togo
2	A. Morin	Atsane	1676.0	2015.0	70%	France	3.0	<0xA0>	Togo
3	A. Morin	Akata	1680.0	2015.0	70%	France	3.5	<0xA0>	Togo
4	A. Morin	Quilla	1704.0	2015.0	70%	France	3.5	<0xA0>	Peru
5	A. Morin	Carenero	1315.0	2014.0	70%	France	2.75	Criollo	Venezuela
6	A. Morin	Cuba	1315.0	2014.0	70%	France	3.5	<0xA0>	Cuba
7	A. Morin	Sur del Lago	1315.0	2014.0	70%	France	3.5	Criollo	Venezuela
8	A. Morin	Puerto Cabello	1319.0	2014.0	70%	France	3.75	Criollo	Venezuela
9	A. Morin	Pablino	1319.0	2014.0	70%	France	4.0	<0xA0>	Peru

df_cacao_882['Maker_Location'].table() #Maker_Location Count #Australia 49 #Belgium 40 #Canada 125 #Ecuador 54 #France 156 #Italy 63 #U.K. 96 #U.S.A. 764 train, valid, test = df_cacao_882.split_frame(ratios = [0.8, 0.1], destination_frames = ['train', 'valid', 'test'], seed = 321) print("%d/%d/%d" %(train.nrows, valid.nrows, test.nrows)) # 1082/138/127

2. Let’s set x to be the list of columns we shall use to train on, to be the column we shall learn. Here it’s going to be a multi-class classification problem.

ignore_fields = ['Review_Date', 'Bean_Type', 'Maker_Location'] # Specify the response and predictor columns y = 'Maker_Location' # multinomial Classification x = [i for i in train.names if not i in ignore_fields]

3. Let’s now create a baseline deep learning model. It is recommended to use all default settings (remembering to
specify either nfolds or validation_frame) for the baseline model.

from h2o.estimators.deeplearning import H2ODeepLearningEstimator model = H2ODeepLearningEstimator() %time model.train(x = x, y = y, training_frame = train, validation_frame = valid) # deeplearning Model Build progress: |██████████████████████████████████████| 100% # Wall time: 6.44 s model.model_performance(train).mean_per_class_error() # 0.05118279569892473 model.model_performance(valid).mean_per_class_error() # 0.26888404593884047 perf_test = model.model_performance(test) print('Mean class error', perf_test.mean_per_class_error()) # Mean class error 0.2149184149184149 print('log loss', perf_test.logloss()) # log loss 0.48864148412056846 print('MSE', perf_test.mse()) # MSE 0.11940531127368789 print('RMSE', perf_test.rmse()) # RMSE 0.3455507361787671 perf_test.hit_ratio_table()

Top-8 Hit Ratios:

k	hit_ratio
1	0.8897638
2	0.9291338
3	0.9527559
4	0.9685039
5	0.9763779
6	0.9921259
7	0.9999999
8	0.9999999

perf_test.confusion_matrix().as_data_frame()

	Australia	Belgium	Canada	Ecuador	France	Italy	U.K.	U.S.A.	Error	Rate
0	3.0	0.0	0.0	0.0	0.0	0.0	0.0	2.0	0.400000	2 / 5
1	0.0	2.0	0.0	0.0	0.0	1.0	0.0	0.0	0.333333	1 / 3
2	0.0	0.0	12.0	0.0	0.0	0.0	0.0	1.0	0.076923	1 / 13
3	0.0	0.0	0.0	3.0	0.0	0.0	0.0	0.0	0.000000	0 / 3
4	0.0	0.0	0.0	0.0	8.0	2.0	0.0	1.0	0.272727	3 / 11
5	0.0	0.0	0.0	0.0	0.0	10.0	0.0	0.0	0.000000	0 / 10
6	0.0	0.0	0.0	1.0	0.0	2.0	4.0	4.0	0.636364	7 / 11
7	0.0	0.0	0.0	0.0	0.0	0.0	0.0	71.0	0.000000	0 / 71
8	3.0	2.0	12.0	4.0	8.0	15.0	4.0	79.0	0.110236	14 / 127

model.plot()

4. Now, let’s create a tuned model, that gives superior performance. However we should use no more than 10 times
the running time of your baseline model, so again our script should be timing the model.

model_tuned = H2ODeepLearningEstimator(epochs=200, distribution="multinomial", activation="RectifierWithDropout", stopping_rounds=5, stopping_tolerance=0, stopping_metric="logloss", input_dropout_ratio=0.2, l1=1e-5, hidden=[200,200,200]) %time model_tuned.train(x, y, training_frame = train, validation_frame = valid) #deeplearning Model Build progress: |██████████████████████████████████████| 100% #Wall time: 30.8 s model_tuned.model_performance(train).mean_per_class_error() #0.0 model_tuned.model_performance(valid).mean_per_class_error() #0.07696485401964853 perf_test = model_tuned.model_performance(test) print('Mean class error', perf_test.mean_per_class_error()) #Mean class error 0.05909090909090909 print('log loss', perf_test.logloss()) #log loss 0.14153784501504524 print('MSE', perf_test.mse()) #MSE 0.03497231075826773 print('RMSE', perf_test.rmse()) #RMSE 0.18700885208531637 perf_test.hit_ratio_table()

Top-8 Hit Ratios:

k	hit_ratio
1	0.9606299
2	0.984252
3	0.984252
4	0.992126
5	0.992126
6	0.992126
7	1.0
8	1.0

perf_test.confusion_matrix().as_data_frame()

	Australia	Belgium	Canada	Ecuador	France	Italy	U.K.	U.S.A.	Error	Rate
0	5.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.000000	0 / 5
1	0.0	3.0	0.0	0.0	0.0	0.0	0.0	0.0	0.000000	0 / 3
2	0.0	0.0	13.0	0.0	0.0	0.0	0.0	0.0	0.000000	0 / 13
3	0.0	0.0	0.0	3.0	0.0	0.0	0.0	0.0	0.000000	0 / 3
4	0.0	0.0	0.0	0.0	11.0	0.0	0.0	0.0	0.000000	0 / 11
5	0.0	0.0	0.0	0.0	1.0	8.0	0.0	1.0	0.200000	2 / 10
6	0.0	0.0	0.0	0.0	0.0	0.0	8.0	3.0	0.272727	3 / 11
7	0.0	0.0	0.0	0.0	0.0	0.0	0.0	71.0	0.000000	0 / 71
8	5.0	3.0	13.0	3.0	12.0	8.0	8.0	75.0	0.039370	5 / 127

model_tuned.plot()

As can be seen from the above plot, the early-stopping strategy stopped the model to overfit and the model achieves better accruacy on the test dataset..

5. Let’s save both the models, to the local disk, using save_model(), to export the binary version of the model. (Do not export a POJO.)

h2o.save_model(model, 'base_model') h2o.save_model(model_tuned, 'tuned_model')

We may want to include a seed in the model function above to get reproducible results.

Problem 3

Predict Price of a house with Stacked Ensemble model with H2O

The data is available at http://coursera.h2o.ai/house_data.3487.csv. This is a regression problem. We have to predict the “price” of a house given different feature values. We shall use python client for H2O again for this problem.

The data needs to be split into train and test, using 0.9 for the ratio, and a seed of 123. That should give 19,462 training rows and 2,151 test rows. The target is an RMSE below $123,000.

Let’s start H2O, load the chosen dataset and follow the data manipulation steps. For example, we can split date into year and month columns. We can then optionally combine them into a numeric date column. At the end of this step we shall have train, test, x and y variables, and possibly valid also. The below shows the code snippet to do this.

import h2o import pandas as pd import numpy as np import matplotlib.pyplot as plt import random from time import time h2o.init() url = "http://coursera.h2o.ai/house_data.3487.csv" house_df = h2o.import_file(url, destination_frame = "house_data") # Parse progress: |█████████████████████████████████████████████████████████| 100%

Preporcessing

house_df['year'] = house_df['date'].substring(0,4).asnumeric() house_df['month'] = house_df['date'].substring(4,6).asnumeric() house_df['day'] = house_df['date'].substring(6,8).asnumeric() house_df = house_df.drop('date') house_df.head()

id	price	bedrooms	bathrooms	sqft_living	sqft_lot	floors	condition	grade	sqft_above	sqft_basement	yr_built	yr_renovated	zipcode	lat	long	sqft_living15	sqft_lot15	year	month	day
7.1293e+09	221900	3	1	1180	5650	1	3	7	1180	0	1955	0	98178	47.5112	-122.257	1340	5650	2014	10	13
6.4141e+09	538000	3	2.25	2570	7242	2	3	7	2170	400	1951	1991	98125	47.721	-122.319	1690	7639	2014	12	9
5.6315e+09	180000	2	1	770	10000	1	3	6	770	0	1933	0	98028	47.7379	-122.233	2720	8062	2015	2	25
2.4872e+09	604000	4	3	1960	5000	1	5	7	1050	910	1965	0	98136	47.5208	-122.393	1360	5000	2014	12	9
1.9544e+09	510000	3	2	1680	8080	1	3	8	1680	0	1987	0	98074	47.6168	-122.045	1800	7503	2015	2	18
7.23755e+09	1.225e+06	4	4.5	5420	101930	1	3	11	3890	1530	2001	0	98053	47.6561	-122.005	4760	101930	2014	5	12
1.3214e+09	257500	3	2.25	1715	6819	2	3	7	1715	0	1995	0	98003	47.3097	-122.327	2238	6819	2014	6	27
2.008e+09	291850	3	1.5	1060	9711	1	3	7	1060	0	1963	0	98198	47.4095	-122.315	1650	9711	2015	1	15
2.4146e+09	229500	3	1	1780	7470	1	3	7	1050	730	1960	0	98146	47.5123	-122.337	1780	8113	2015	4	15
3.7935e+09	323000	3	2.5	1890	6560	2	3	7	1890	0	2003	0	98038	47.3684	-122.031	2390	7570	2015	3	12

house_df.describe()

id	price	bedrooms	bathrooms	sqft_living	sqft_lot	floors	waterfront	view	condition	grade	sqft_above	sqft_basement	yr_built	yr_renovated	zipcode	lat	long	sqft_living15	sqft_lot15	year	month	day
type	int	int	int	real	int	int	real	int	int	int	int	int	int	int	int	int	real	real	int	int	int	int	int
mins	1000102.0	75000.0	0.0	0.0	290.0	520.0	1.0	0.0	0.0	1.0	1.0	290.0	0.0	1900.0	0.0	98001.0	47.1559	-122.519	399.0	651.0	2014.0	1.0	1.0
mean	4580301520.864987	540088.1417665284	3.370841623097218	2.114757321982139	2079.899736269819	15106.96756581695	1.4943089807060526	0.007541757275713691	0.23430342849211097	3.4094295100171164	7.6568731781798105	1788.3906907879518	291.50904548188555	1971.0051357979064	84.4022579003377	98077.93980474674	47.56005251931665	-122.21389640494158	1986.5524915560036	12768.45565169118	2014.3229537778102	6.574422801091883	15.688196918521294
maxs	9900000190.0	7700000.0	33.0	8.0	13540.0	1651359.0	3.5	1.0	4.0	5.0	13.0	9410.0	4820.0	2015.0	2015.0	98199.0	47.7776	-121.315	6210.0	871200.0	2015.0	12.0	31.0
sigma	2876565571.3120522	367127.19648270035	0.930061831147451	0.7701631572177408	918.4408970468095	41420.51151513551	0.5399888951423489	0.08651719772788766	0.7663175692736117	0.6507430463662044	1.1754587569743344	828.0909776519175	442.57504267746685	29.373410802386235	401.67924001917555	53.50502625747248	0.13856371024192368	0.14082834238139297	685.3913042527788	27304.179631338524	0.4676160310451536	3.1153077787263648	8.635062534286034
zeros	0	0	13	10	0	0	0	21450	19489	0	0	0	13126	0	20699	0	0	0	0	0	0	0	0
missing	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0
0	7129300520.0	221900.0	3.0	1.0	1180.0	5650.0	1.0	0.0	0.0	3.0	7.0	1180.0	0.0	1955.0	0.0	98178.0	47.5112	-122.257	1340.0	5650.0	2014.0	10.0	13.0
1	6414100192.0	538000.0	3.0	2.25	2570.0	7242.0	2.0	0.0	0.0	3.0	7.0	2170.0	400.0	1951.0	1991.0	98125.0	47.721000000000004	-122.319	1690.0	7639.0	2014.0	12.0	9.0
2	5631500400.0	180000.0	2.0	1.0	770.0	10000.0	1.0	0.0	0.0	3.0	6.0	770.0	0.0	1933.0	0.0	98028.0	47.7379	-122.233	2720.0	8062.0	2015.0	2.0	25.0
3	2487200875.0	604000.0	4.0	3.0	1960.0	5000.0	1.0	0.0	0.0	5.0	7.0	1050.0	910.0	1965.0	0.0	98136.0	47.5208	-122.393	1360.0	5000.0	2014.0	12.0	9.0
4	1954400510.0	510000.0	3.0	2.0	1680.0	8080.0	1.0	0.0	0.0	3.0	8.0	1680.0	0.0	1987.0	0.0	98074.0	47.616800000000005	-122.045	1800.0	7503.0	2015.0	2.0	18.0
5	7237550310.0	1225000.0	4.0	4.5	5420.0	101930.0	1.0	0.0	0.0	3.0	11.0	3890.0	1530.0	2001.0	0.0	98053.0	47.6561	-122.005	4760.0	101930.0	2014.0	5.0	12.0
6	1321400060.0	257500.0	3.0	2.25	1715.0	6819.0	2.0	0.0	0.0	3.0	7.0	1715.0	0.0	1995.0	0.0	98003.0	47.3097	-122.327	2238.0	6819.0	2014.0	6.0	27.0
7	2008000270.0	291850.0	3.0	1.5	1060.0	9711.0	1.0	0.0	0.0	3.0	7.0	1060.0	0.0	1963.0	0.0	98198.0	47.4095	-122.315	1650.0	9711.0	2015.0	1.0	15.0
8	2414600126.0	229500.0	3.0	1.0	1780.0	7470.0	1.0	0.0	0.0	3.0	7.0	1050.0	730.0	1960.0	0.0	98146.0	47.5123	-122.337	1780.0	8113.0	2015.0	4.0	15.0
9	3793500160.0	323000.0	3.0	2.5	1890.0	6560.0	2.0	0.0	0.0	3.0	7.0	1890.0	0.0	2003.0	0.0	98038.0	47.3684	-122.031	2390.0	7570.0	2015.0	3.0	12.0

plt.hist(house_df.as_data_frame()['price'].tolist(), bins=np.linspace(0,10**6,1000)) plt.show()

We shall use cross-validation and not a validation dataset.

train, test = house_df.split_frame(ratios=[0.9], destination_frames = ['train', 'test'], seed=123) print("%d/%d" %(train.nrows, test.nrows)) # 19462/2151 ignore_fields = ['id', 'price'] x = [i for i in train.names if not i in ignore_fields] y = 'price'

2. Let’s now train at least four different models on the preprocessed datseet, using at least three different supervised algorithms. Let’s save all the models.

from h2o.estimators.gbm import H2OGradientBoostingEstimator from h2o.estimators.random_forest import H2ORandomForestEstimator from h2o.estimators.glm import H2OGeneralizedLinearEstimator from h2o.estimators.deeplearning import H2ODeepLearningEstimator from h2o.estimators.stackedensemble import H2OStackedEnsembleEstimator nfolds = 5 # for cross-validation

Let’s first fit a GLM model. The best performing α hyperparameter value (for controlling L1 vs. L2 regularization) for GLM will be found using GridSearch, as shown in the below code snippet.

g= h2o.grid.H2OGridSearch( H2OGeneralizedLinearEstimator(family="gaussian", nfolds=nfolds, fold_assignment="Modulo", keep_cross_validation_predictions=True, lambda_search=True), hyper_params={ "alpha":[x * 0.01 for x in range(0,100)], }, search_criteria={ "strategy":"RandomDiscrete", "max_models":8, "stopping_metric": "rmse", "max_runtime_secs":60 } ) g.train(x, y, train) g #glm Grid Build progress: |████████████████████████████████████████████████| 100% # alpha \ #0 [0.61] #1 [0.78] #2 [0.65] #3 [0.13] #4 [0.35000000000000003] #5 [0.05] #6 [0.32] #7 [0.55] # model_ids residual_deviance #0 Grid_GLM_train_model_python_1628864392402_41_model_3 2.626981989511134E15 #1 Grid_GLM_train_model_python_1628864392402_41_model_6 2.626981989511134E15 #2 Grid_GLM_train_model_python_1628864392402_41_model_5 2.626981989511134E15 #3 Grid_GLM_train_model_python_1628864392402_41_model_2 2.626981989511134E15 #4 Grid_GLM_train_model_python_1628864392402_41_model_4 2.626981989511134E15 #5 Grid_GLM_train_model_python_1628864392402_41_model_7 2.626981989511134E15 #6 Grid_GLM_train_model_python_1628864392402_41_model_0 2.626981989511134E15 #7 Grid_GLM_train_model_python_1628864392402_41_model_1 2.626981989511134E15

Model 1

model_GLM= H2OGeneralizedLinearEstimator( family='gaussian', #'gamma', model_id='glm_house', nfolds=nfolds, alpha=0.61, fold_assignment="Modulo", keep_cross_validation_predictions=True) %time model_GLM.train(x, y, train) #glm Model Build progress: |███████████████████████████████████████████████| 100% #Wall time: 259 ms model_GLM.cross_validation_metrics_summary().as_data_frame()

		mean	sd	cv_1_valid	cv_2_valid	cv_3_valid	cv_4_valid	cv_5_valid
0	mae	230053.23	715.8795	229225.16	230969.69	228503.45	230529.47	231038.42
1	mean_residual_deviance	1.31780157E11	4.5671977E9	1.32968604E11	1.41431144E11	1.31364495E11	1.32024402E11	1.21112134E11
2	mse	1.31780157E11	4.5671977E9	1.32968604E11	1.41431144E11	1.31364495E11	1.32024402E11	1.21112134E11
3	null_deviance	5.25455325E14	1.80834544E13	5.3056184E14	5.636807E14	5.23549568E14	5.26203388E14	4.83281095E14
4	r2	0.023522535	4.801036E-4	0.024299357	0.023168933	0.022531934	0.023340257	0.024272196
5	residual_deviance	5.12943247E14	1.7808912E13	5.17646773E14	5.5059142E14	5.11270625E14	5.13838982E14	4.71368433E14
6	rmse	362905.53	6314.0225	364648.6	376073.3	362442.4	363351.62	348011.7
7	rmsle	0.53911585	0.0047404445	0.54277176	0.5389013	0.5275475	0.53846484	0.54789394

model_GLM.model_performance(test) #ModelMetricsRegressionGLM: glm #** Reported on test data. ** #MSE: 128806123545.59714 #RMSE: 358895.7000934911 #MAE: 233890.6933813204 #RMSLE: 0.5456714021880726 #R^2: 0.03102347771355851 #Mean Residual Deviance: 128806123545.59714 #Null degrees of freedom: 2150 #Residual degrees of freedom: 2129 #Null deviance: 285935013037402.7 #Residual deviance: 277061971746579.44 #AIC: 61176.23965800522

As can be seen from above, GLM could not achieve the target of RMSE below $123k neither on cross-validation nor on test dataset.

The below models (GBM, DRF and DL) and the corresponding parameters were found with AutoML leaderboard and
GridSearch, along with some manual tuning.

from h2o.automl import H2OAutoML model_auto = H2OAutoML(max_runtime_secs=60, seed=123) model_auto.train(x, y, train) # AutoML progress: |████████████████████████████████████████████████████████| 100% # Parse progress: |█████████████████████████████████████████████████████████| 100% model_auto.leaderboard

model_id	mean_residual_deviance	rmse	mae	rmsle
GBM_grid_0_AutoML_20210814_005121_model_0	2.01725e+10	142030	77779.1	0.184269
GBM_grid_0_AutoML_20210814_005121_model_1	2.6037e+10	161360	93068.1	0.218365
DRF_0_AutoML_20210814_005121	3.27251e+10	180901	102782	0.243474
XRT_0_AutoML_20210814_005121	3.53492e+10	188014	104259	0.246899
GBM_grid_0_AutoML_20210813_201225_model_0	5.99803e+10	244909	153548	0.351959
GBM_grid_0_AutoML_20210813_201225_model_2	6.09613e+10	246903	152570	0.349919
GBM_grid_0_AutoML_20210813_201225_model_1	6.09941e+10	246970	153096	0.350852
GBM_grid_0_AutoML_20210813_201225_model_3	6.22174e+10	249434	153105	0.350598
DeepLearning_0_AutoML_20210813_201225	6.39672e+10	252917	163993	0.378761
DRF_0_AutoML_20210813_201225	6.76936e+10	260180	158078	0.360337

model_auto.leader.model_performance(test) # model_auto.leader.explain(test) #ModelMetricsRegression: gbm #** Reported on test data. ** #MSE: 17456681023.716145 #RMSE: 132123.73376390839 #MAE: 77000.00253466706 #RMSLE: 0.1899899418603569 #Mean Residual Deviance: 17456681023.716145 model = h2o.get_model(model_auto.leaderboard[4, 'model_id']) # get model by model_id print(model.params['model_id']['actual']['name']) print(model.model_performance(test).rmse()) [(k, v) for (k, v) in model.params.items() if v['default'] != v['actual'] and \ not k in ['model_id', 'training_frame', 'validation_frame', 'nfolds', 'keep_cross_validation_predictions', 'seed', 'response_column', 'fold_assignment', 'ignored_columns']] # GBM_grid_0_AutoML_20210813_201225_model_0 # 235011.60404473927 # [('score_tree_interval', {'default': 0, 'actual': 5}), # ('ntrees', {'default': 50, 'actual': 60}), # ('max_depth', {'default': 5, 'actual': 6}), # ('min_rows', {'default': 10.0, 'actual': 1.0}), # ('stopping_tolerance', {'default': 0.001, 'actual': 0.008577452408351779}), # ('seed', {'default': -1, 'actual': 123}), # ('distribution', {'default': 'AUTO', 'actual': 'gaussian'}), # ('sample_rate', {'default': 1.0, 'actual': 0.8}), # ('col_sample_rate', {'default': 1.0, 'actual': 0.8}), # ('col_sample_rate_per_tree', {'default': 1.0, 'actual': 0.8})]

Model 2

model_GBM = H2OGradientBoostingEstimator( model_id='gbm_house', nfolds=nfolds, ntrees=500, fold_assignment="Modulo", keep_cross_validation_predictions=True, seed=123) %time model_GBM.train(x, y, train) #gbm Model Build progress: |███████████████████████████████████████████████| 100% #Wall time: 54.9 s model_GBM.cross_validation_metrics_summary().as_data_frame()

		mean	sd	cv_1_valid	cv_2_valid	cv_3_valid	cv_4_valid	cv_5_valid
0	mae	64136.496	912.2387	62751.688	66573.63	63946.31	63873.707	63537.137
1	mean_residual_deviance	1.38268457E10	1.43582912E9	1.24595825E10	1.75283814E10	1.2894718E10	1.43893801E10	1.18621655E10
2	mse	1.38268457E10	1.43582912E9	1.24595825E10	1.75283814E10	1.2894718E10	1.43893801E10	1.18621655E10
3	r2	0.8979097	0.0075696795	0.90857375	0.87893564	0.9040519	0.89355356	0.90443367
4	residual_deviance	1.38268457E10	1.43582912E9	1.24595825E10	1.75283814E10	1.2894718E10	1.43893801E10	1.18621655E10
5	rmse	117288.305	5928.7188	111622.5	132394.8	113554.914	119955.74	108913.57
6	rmsle	0.16441989	0.0025737707	0.16231671	0.17041409	0.15941188	0.16528262	0.16467415

As can be seen from the above table (row 5, column 1), the mean RMSE for cross-validation is 117288.305, which is below $123k.

model_GBM.model_performance(test) #ModelMetricsRegression: gbm #** Reported on test data. ** #MSE: 14243079402.729088 #RMSE: 119344.37315068142 #MAE: 65050.344749203745 #RMSLE: 0.16421689257411975 #Mean Residual Deviance: 14243079402.729088

As can be seen from above, GBM could achieve the target of RMSE below $123k on test dataset.

Now, let’s try random forest model by finding best parameters with Grid Search:

g= h2o.grid.H2OGridSearch( H2ORandomForestEstimator( nfolds=nfolds, fold_assignment="Modulo", keep_cross_validation_predictions=True, seed=123), hyper_params={ "ntrees": [20, 25, 30], "stopping_tolerance": [0.005, 0.006, 0.0075], "max_depth": [20, 50, 100], "min_rows": [5, 7, 10] }, search_criteria={ "strategy":"RandomDiscrete", "max_models":10, "stopping_metric": "rmse", "max_runtime_secs":60 } ) g.train(x, y, train) #drf Grid Build progress: |████████████████████████████████████████████████| 100% g # max_depth min_rows ntrees stopping_tolerance \ #0 100 5.0 20 0.006 #1 100 5.0 20 0.005 #2 100 5.0 20 0.005 #3 100 7.0 30 0.006 #4 50 10.0 25 0.006 #5 50 10.0 20 0.005 # model_ids residual_deviance #0 Grid_DRF_train_model_python_1628864392402_40_model_0 2.0205038467456142E10 #1 Grid_DRF_train_model_python_1628864392402_40_model_5 2.0205038467456142E10 #2 Grid_DRF_train_model_python_1628864392402_40_model_1 2.0205038467456142E10 #3 Grid_DRF_train_model_python_1628864392402_40_model_3 2.099520493338354E10 #4 Grid_DRF_train_model_python_1628864392402_40_model_2 2.260686283035833E10 #5 Grid_DRF_train_model_python_1628864392402_40_model_4 2.279037520277947E10

Model 3

model_RF = H2ORandomForestEstimator( model_id='rf_house', nfolds=nfolds, ntrees=20, fold_assignment="Modulo", keep_cross_validation_predictions=True, seed=123) %time model_RF.train(x, y, train) #drf Model Build progress: |███████████████████████████████████████████████| 100% #Wall time: 13.2 s model_RF.cross_validation_metrics_summary().as_data_frame()

		mean	sd	cv_1_valid	cv_2_valid	cv_3_valid	cv_4_valid	cv_5_valid
0	mae	72734.0	1162.9153	73242.26	75062.21	73461.65	71646.195	70257.7
1	mean_residual_deviance	1.8545494E10	2.2018921E9	1.79095654E10	2.45911347E10	1.74433321E10	1.71117425E10	1.56716954E10
2	mse	1.8545494E10	2.2018921E9	1.79095654E10	2.45911347E10	1.74433321E10	1.71117425E10	1.56716954E10
3	r2	0.8632202	0.011770816	0.8685827	0.8301549	0.8702062	0.8734147	0.8737426
4	residual_deviance	1.8545494E10	2.2018921E9	1.79095654E10	2.45911347E10	1.74433321E10	1.71117425E10	1.56716954E10
5	rmse	135742.78	7726.2373	133826.62	156815.61	132073.2	130811.86	125186.64
6	rmsle	0.18275535	0.0020155373	0.18441868	0.18689767	0.17945778	0.1833288	0.17967385

model_RF.model_performance(test) ModelMetricsRegression: drf ** Reported on test data. ** MSE: 16405336914.530426 RMSE: 128083.3202041953 MAE: 71572.37981480274 RMSLE: 0.17712324625977907 Mean Residual Deviance: 16405336914.530426

As can be seen from above, DRF just missed the target of RMSE below $123k for on both the cross-validation and on test dataset.

Now, let’s try to fit a deep learning model, again tuning the parameters with Grid Search.

g= h2o.grid.H2OGridSearch( H2ODeepLearningEstimator( nfolds=nfolds, fold_assignment="Modulo", keep_cross_validation_predictions=True, reproducible=True, seed=123), hyper_params={ "epochs": [20, 25], "hidden": [[20, 20, 20], [25, 25, 25]], "stopping_rounds": [0, 5], "stopping_tolerance": [0.006] }, search_criteria={ "strategy":"RandomDiscrete", "max_models":10, "stopping_metric": "rmse", "max_runtime_secs":60 } ) g.train(x, y, train) g #deeplearning Grid Build progress: |███████████████████████████████████████| 100% # epochs hidden stopping_rounds stopping_tolerance \ #0 16.79120554889533 [25, 25, 25] 0 0.006 #1 3.1976799968879086 [25, 25, 25] 0 0.006 # model_ids \ #0 Grid_DeepLearning_train_model_python_1628864392402_55_model_0 #1 Grid_DeepLearning_train_model_python_1628864392402_55_model_1 # residual_deviance #0 1.6484562934855278E10 #1 2.1652538389322113E10

Model 4

model_DL = H2ODeepLearningEstimator(epochs=30, model_id='dl_house', nfolds=nfolds, stopping_rounds=7, stopping_tolerance=0.006, hidden=[30, 30, 30], reproducible=True, fold_assignment="Modulo", keep_cross_validation_predictions=True, seed=123 ) %time model_DL.train(x, y, train) #deeplearning Model Build progress: |██████████████████████████████████████| 100% #Wall time: 55.7 s model_DL.cross_validation_metrics_summary().as_data_frame()

		mean	sd	cv_1_valid	cv_2_valid	cv_3_valid	cv_4_valid	cv_5_valid
0	mae	72458.19	1241.8936	71992.18	73569.984	75272.75	70553.38	70902.65
1	mean_residual_deviance	1.48438886E10	5.5005555E8	1.42477005E10	1.59033723E10	1.54513889E10	1.48586271E10	1.37583514E10
2	mse	1.48438886E10	5.5005555E8	1.42477005E10	1.59033723E10	1.54513889E10	1.48586271E10	1.37583514E10
3	r2	0.8899759	0.0023493338	0.89545286	0.8901592	0.885028	0.89008224	0.88915724
4	residual_deviance	1.48438886E10	5.5005555E8	1.42477005E10	1.59033723E10	1.54513889E10	1.48586271E10	1.37583514E10
5	rmse	121793.58	2259.6975	119363.734	126108.58	124303.62	121895.97	117296.0
6	rmsle	0.18431115	0.0011469581	0.18251595	0.18650953	0.18453318	0.18555655	0.18244053

As can be seen from the above table (row 5, column 1), the mean RMSE for cross-validation is 121793.58, which is below $123k.

model_DL.model_performance(test) #ModelMetricsRegression: deeplearning #** Reported on test data. ** #MSE: 14781990070.095192 #RMSE: 121581.20771770278 #MAE: 72522.60487846025 #RMSLE: 0.1834924698171073 #Mean Residual Deviance: 14781990070.095192

As can be seen from above, the deep learning model could achieve the target of RMSE below $123k on test dataset.

3. Finally, let’s train a stacked ensemble of the models created in earlier steps. We may need to repeat steps two and three until the best model (which is usually the ensemble model, but does not have to be) has the minimum required performance on the cross-validation dataset. Note: only one model has to achieve the minimum required performance. If multiple models achieve it, so we need to choose the best performing one.

models = [model_GBM.model_id, model_RF.model_id, model_DL.model_id] #model_GLM.model_id, model_SE = H2OStackedEnsembleEstimator(model_id = 'se_gbm_dl_house', base_models=models) %time model_SE.train(x, y, train) #stackedensemble Model Build progress: |███████████████████████████████████| 100% #Wall time: 2.67 s #model_SE.model_performance(test) #ModelMetricsRegressionGLM: stackedensemble #** Reported on test data. ** #MSE: 130916347835.45828 #RMSE: 361823.6418967924 #MAE: 236448.3672215734 #RMSLE: 0.5514878971097109 #R^2: 0.015148783736682492 #Mean Residual Deviance: 130916347835.45828 #Null degrees of freedom: 2150 #Residual degrees of freedom: 2147 #Null deviance: 285935013037402.7 #Residual deviance: 281601064194070.75 #AIC: 61175.193832813566

As can be seen from above, the stacked ensemble model could not reach the required performance, neither on the cross-validation, nor on the test dataset.

4. Now let’s get the performance on the test data of the chosen model/ensemble, and confirm that this also reaches the minimum target on the test data.

Best Model

The model that performs best in terms of mean cross-validation RMSE and RMSE on the test dataset (both of them are below the minimum target $123k) is the gradient boositng model (GBM), which is the Model 2 above.

model_GBM.model_performance(test) #ModelMetricsRegression: gbm #** Reported on test data. ** #MSE: 14243079402.729088 #RMSE: 119344.37315068142 #MAE: 65050.344749203745 #RMSLE: 0.16421689257411975 #Mean Residual Deviance: 14243079402.729088 # save the models h2o.save_model(model_GBM, 'best_model (GBM)') # the final best model h2o.save_model(model_SE, 'SE_model') h2o.save_model(model_GBM, 'GBM_model') h2o.save_model(model_RF, 'RF_model') h2o.save_model(model_GLM, 'GLM_model') h2o.save_model(model_DL, 'DL_model')

Source Prolead brokers usa

Goldman 0 Comments

Aug 16 2021

Top Reasons to Use Python Language for Web Application Development

A reputed TIOBE index has considered Python as the major and one of the most popular programming languages for web and web app development. It is an extremely powerful, flexible, and advanced language for web design and development. Python development services gain ground among entrepreneurs globally for these reasons. Let’s discuss these reasons in this post.

Python has an upper hand over other programming languages when it comes to developing highly functional programming for enterprise websites and web applications. With the addition of various advancements, Python app development can easily meet the complexities and diverse business challenges. Python app developers can take the advantage of the versatility of this language to build efficient web app solutions.

It is sufficient to know the importance of Python that software giants like Google, Facebook, and Microsoft bank on this programming language. Let’s understand why Python is a preferred programming language for web application development. But before digging deep into these reasons, let’s have a brief introduction to Python.

What is Python Language?

It is a highly adaptable and efficient programming language with dynamic typing capabilities. It is useful for developing robust web and web application solutions. As a versatile programming language, Python enables developers to create all sorts of applications including scientific applications, graphics-based system applications, games, command-line utilities, etc. Python consultants can shed light on its usage.

As an open-source programming language, Python offers unrestricted copying, embedding, and distribution of the code. What’s more, Python developers can get all the coding information online with ease. As a result, a Python development company can come up with flexible and feature-rich web solutions. Python can give enterprises an edge over peers by offering seamless and future-ready solutions.

Python app development is steadily gaining popularity among entrepreneurs who want to integrate advancements of emerging technologies including AI, ML, and IoT. It is possible to bring automation in certain processes with the help of Python-based websites. Companies can hire Python developers to achieve this objective and get success in this challenging time. Let us go through how different web app development domains use this language.

Python Use Cases across Various Web Development Domains

AI (Artificial Intelligence) and ML (Machine Learning)

Python is one of the most preferred programming languages for integrating AI and ML in customized web solutions. It is useful for making the computer ready for ML and assisting AI to analyze large volumes of data. Python-based websites and web applications can easily deal with high web traffic and fetch user data.

Internet of Things (IoT)

Cameras and other in-built tools of the laptop or smartphones can be easily connected to the Internet as and when necessary in Python web applications. Python-powered business websites are capable of managing the existing IoT network when it comes to fetching and sharing valuable data.

Deep Learning

Web applications based on Python support robotics and image recognition. Deep Learning is useful for processing data in a way similar to that of our brain. Python app development services can assist entrepreneurs to bring innovative and intelligent web applications.

Today, hundreds of thousands of developers use Python for web and web app development. A recent Stackoverflow survey has shown Python as one of the highest in-demand programming languages. Python is a preferred language among developers, and many web developers want to learn it.

Let’s dig deep into the reasons why Python is a preferred language for web and web app development projects.

Top Reasons Why Developers Select Python for Web Development Projects

Python is not a new language. It has been around us since the 90s, but it has evolved in line with the market trends and changing expectations.

Secure Language

When you hire Python developers, you can remain assured of the security and scalability of the web application. A thriving fintech sector prefers Python language for its high security and capability of handling large amounts of data. Senior and experienced Python developers can come up with a functional fintech app with military-level security. Also, developers can find solutions to common issues of Python web development thanks to a thriving community.

Large and Robust Library

There is no exaggeration in mentioning that there is a Python library for everything. Whether entrepreneurs need an elegant website with seamless functionality or a secure and feature-rich web app, the Python library enables developers to build robust web solutions. The world’s most popular Machine Learning (ML) library facilitates Python web developers to integrate machine learning capabilities in the customized web app. SQLAlchemy library enables developers to give the power of SQL in the app or website. Python language is capable of enterprise web development patterns containing a simple database with the help of an SQLAlchemy library.

Django Framework

This is one of the biggest reasons for choosing Python for developing complex web applications. Django is the main web development framework with a highly useful collection of libraries. As a flexible and comprehensive platform for developing any type of web apps, Django can build powerful apps for modern enterprises. You can hire Python Django developers for building user-friendly web apps for your business. Django takes away the pain of the development process and developers can readily focus on demanding tasks instead of basic issues.

Python web development also offers Flask, a polar opposite of Django. Flask is a microframework and has much fewer ready-made parts than Django. However, this platform is not as flexible as Django. Talking about the differences- Django can save the developer’s time whereas Flask requires more time to adapt to changing requirements.

AI and ML Advantages

AI and machine learning technologies are the need of the hour. With Python, you can integrate the functionality of these emerging technologies. This is one of the major reasons for Python’s increasing popularity. It results in a large number of developers who have professional experience in integrating AI-based features into enterprise apps. You can also find many Python developers with ease. In other words, it is much easier to hire Python developers than to hire C++ or other web developers.

Final Thoughts

When it comes to performance, Python is great. Availability of developers and rich libraries are other big reasons why you should prefer Python for your upcoming web project. A wider talent pool is available for the Python language as compared to other programming languages. You can soon initiate the MVP (Minimum Viable Product) or a big web project using Python.

All you need to do is consult a reputed Python development company or meet experienced Python consultants to build a team quickly and start the development process as soon as possible for your enterprise.

Source Prolead brokers usa

Goldman 0 Comments

Aug 16 2021

Digital Transformation Through IoT

Time is evolving minute by minute and day-by-day. Currently, at this point there is no need to have a great product or service if you want to satisfy your customer requirements or retain the market.

One of the methods to place your business apart from the competition is by supporting innovation and adopting new technologies. That is the reason why organizations bet on digital transformation trying to remain significant and keep up with the market necessities.

With the emerging IoT technologies in digital transformation, there are various factors that enhance the IoT utility and drive its growth. Data is valuable and AI is making data actionable by supporting digital IoT apps to provide predictive and prospective analytics.

In this article, you will know how IoT affects digital transformation.

What Does Digital Transformation Means For Business?

As indicated by the State of Digital Transformation research, market pressure is the major drive for digital transformation as even well-known market leaders battle to compete with tech-empowered, agile businesses and startups.

Digital transformation is the best way to future-confirm your business and survive during tech disruption.

Along with the customer expectations, more companies are required to change the current business processes (or make totally new ones) with the assistance of technologies, for example leaving on the path of digital transformation.

From the customer experience that is offered to how you handle your internal processes, digital transformation significantly affects all parts of your business, both internal and external.

Advantages Of Digital Transformation

Improves customer experience

Providing digital and advanced tools to the customers assists with making their lives simple and easy. It makes the business more appealing to potential customers. Organizations that offer obsolete tools and technologies will experience trouble competing with those who utilize new and updated technologies.

Empowers data-driven decision-making

Digital Transformation enables organizations to carry out data-driven management by utilizing an assortment of tools for tracking metrics and data analysis. This, thus, assists in providing a better outcome and improving supply chain performance.

Improved efficiency

Inventive programming tools for process automation leads to further developed proficiency, which thus, brings about cost savings and decreases friction in the business.

Greater security

By changing to modern software frameworks, organizations can secure their data in a better way. Today, customers are very much aware of data security issues, so this is the best method to win their trust and loyalty.

How Does The Internet Of Things Affect Digital Transformation?

There are numerous startups, whose entire business model is developed around the IoT product line. However, traditional organizations across various different spaces can likewise profit by introducing emerging IoT technology solutions to fuel their established business measures.

There are various ways you can transform your business using IoT. Here are some of the methods in which IoT is driving digital transformation and increasing the demand for IoT App development:

Starting new business opportunities

By utilizing information generated by IoT devices, organizations can better understand their customers’ requirements and change their product offerings likewise and also present new products or services to cater to a more vast crowd.

Delivering meaningful, tailored customer experience

By profiting on new sources of consumer information, for example with IoT devices, organizations can acquire deep knowledge into the customer behavior and tailor their customer experience accordingly – through cutting edge personalization and increased availability.

Boosting business efficiency

Merging rich data experiences with autonomous sensors, the internet of things mobile applications has the potential to build business productivity through process automation. There are many eminent processes that can be streamlined, including stock management, logistic management, security, energy maintenance, and so on.

Reducing operating costs

Process automation will inevitably lead to cost savings and will allow you to use resources in a wise manner. For instance, IoT energy solutions can assist you with managing utility consumption and disposal of waste. This methodology can be applied to warming, ventilation and air conditioning systems, lighting, water supply, and so on.

Improving employee productivity

Very much like cloud and mobile technology advancements, IoT can assist you with engaging your staff, offering better dexterity and making your business system accessible anytime and anywhere. Smart sensors can keep employees connected all time and convey real-time experiences for better productivity.

Bottom Line

The emerging IoT technology has led businesses to work in smart ways by connecting devices and placing real-time information to customers and employees, to provide a personalized and satisfying experience. With IoT transformation, there is secure integration into business processes and workflow.

It is advised that organizations should get their technology stack in place to brace the impact that new technologies like IoT and Digital transformation will bring.

Source Prolead brokers usa

Goldman 0 Comments

Aug 15 2021

Understanding Probabilistic Programming

Even for many data scientists, Probabilistic Programming is a relatively unfamiliar territory. Yet, it is an area fast gaining in importance.

In this post, I explain briefly the exact problem being addressed by Probabilistic Programming

We can think of Probabilistic Programming as a tool for statistical modelling.

Probabilistic Programming has randomization at its core and the goal of Probabilistic Programming is to provide a statistical analysis that explains a phenomenon.

Probabilistic Programming is based on the idea of latent random variables which allow us to model uncertainty in a phenomenon. In turn, statistical inference in this case involves determining the values of these latent variables

A probabilistic programming language is based on a few primitives: we have a set of primitives for drawing random numbers, primitives for computing probabilities and expectations by conditioning and finally primitives for probabilistic inference

A PPL works a bit differently from traditional machine learning languages. The prior distributions are encoded as assumptions in the model. In the inference stage, the posterior distributions of the parameters of the model are computed based on observed data i.e., inference adjusts the prior probabilities based on observed data.

All this sounds a bit abstract. But how do you use it?

One way could be by Bayesian Probabilistic Graphical models implemented through packages like pymc3

Another way is to combine deep learning with PPLs by Deep PPLs implemented through packages like Tensorflow Probability

For more about Probabilistic deep learning, see

Probabilistic Deep Learning with Probabilistic Neural Networks and …

Finally, its important to emphasise that probabilistic programming takes a different approach to traditional model building

In traditional CS/ machine learning models, the model is defined by parameters which generate the output. In statistical/ Bayesian programming the parameters are not fixed / predetermined. Instead, we starat with a generative process and the parameters are determined as part of the inference based on the inputs

In subsequent posts, we will expand on these ideas in detail.

Image source: Tensorflow probability

References

https://www.cs.cornell.edu/courses/cs4110/2016fa/lectures/lecture33…

https://www.math.ucdavis.edu/~gravner/MAT135B/materials/ch11.pdf

https://medium.com/swlh/a-gentle-introduction-to-probabilistic-prog…

Source Prolead brokers usa

Goldman 0 Comments

Aug 15 2021

How to Use AI for Intelligent Inventory Management

Artificial Intelligence (AI) is highly demanded practically in every industry. The greatest example of the successful usage of top-notch technology is the retailers and other e-commerce companies, especially, their inventory management system. AI provides powerful insights for organizations like trends identified from large volumes of data analyzed so that business owners and their warehouse teams can better manage the daily tasks of inventory management.

Improved decision-making, reduced costs, eliminated risks, optimized warehouse work, increased productivity are just a few benefits of the implementation of AI technology. According to the statistics, in 2020 about 45.1% of companies have already invested in automation of the warehouse and 40.1% in AI solutions.

5 Ways To Use Artificial Intelligence For Inventory Management

It’s estimated that AI can add $1.3 trillion to the global economy in the next twenty years if the technology is used in supply chain and logistics management. It’s because AI can make supply chain management more efficient at all stages.

Nvidia, IBM, Amazon, Facebook, Microsoft, Salesforce, Alteryx, Twilio, Tencent, Alphabet are a few big names among the companies that have already leveraged the benefits of AI. The following are 5 ways that AI is revolutionizing inventory management.

1. Data Mining and Turning It Into Solutions

AI is extremely helpful in data mining. AI solutions have the ability not only to gather but analyze the data to transform it into timely actions. Thus, AI implemented into the inventory management system helps the business to evolve more rapidly and find more effective solutions to a particular situation. By monitoring, gathering, recording, and processing the data and interests of every customer, businesses can understand their customers’ demands to build more effective strategies and pre-plan the needs of the customers and stock products.

2. Dealing with Forecasting, Planning, and Control Issues In The Inventory Management Process

Inventory management is not only about storing and delivering items but it’s about forecasting, planning, and control. By implementing AI solutions, you minimize the risks of overstocking and understocking thanks to the ability of the technology to:

Accurately analyze and correlate demand insights;
Detect and respond to the change in demand for a specific product;
Consider location-specific demand.

AI-based solutions have the flexibility and ability to analyze all the possible factors and situations that are vital for the successful planning, stocking, and scheduling deliveries. Reducing the errors and issues in inventory management, the business can increase customer satisfaction and save costs.

3. Stock Management and Delivery

Planning errors and/or inadequate stock monitoring can result in shortages, delays, and other issues that affect the revenue. AI technology can be pretty helpful in it. The technology can collect the data about customers and analyze it to identify behavior patterns and other crucial factors that help:

Plan the stocking right;
Automate the stocking and fulfillment processes;
Leverage and react to incoming customer demands on time;
Establish efficient transportation and many more.

AI also can streamline deliveries and increase their efficiency. On-time deliveries and transportation are the fundamentals of supply chain management that have a huge impact on consumer satisfaction. AI analyzes and makes sense of all a company’s telematics data, helps to find the optimal routes to ensure the timely arrival of orders. Besides, the technology can identify any patterns and draw conclusions about the delivery processes of the company so that you can improve it.

4. AI-Powered Robots to Optimize Warehouse Operations

AI-powered robots are not a new thing. Such giants as Amazon have already used them for day-to-day tasks. It’s forecasted that the robot automation market’s value will reach $10+ billion by 2023. There are a number of benefits that put AI-based robots over human staff:

They can work 24/7 tirelessly;
Robots work with more optimal time per action;
They can locate wares and scan their conditions, collecting the needed data for further analysis;
They provide real-time tracking of products;
Robots can select and move orders, reducing manual errors;
They perform inventory optimization, and so on.

All that can save a business a big chunk of the operational budget. Besides that, AI-powered robots used in warehouses free employees so that they can be allocated for more urgent and vital tasks that require human cognition.

5. Logistics Route Optimization

One of the most critical components in logistics is route optimization. By implementing AI solutions, companies can reduce time lost in traffic, provide faster delivery times, and in such a way save costs. That’s because AI can help in:

Lowering shipping costs by learning all the possible variants and finding the fastest and most cost-effective ways to deliver the orders to the customers.
Planning the most optimal routes. AI can learn traffic patterns over time, analyze the received data, and consider the different factors while routing. All that enables the drivers to avoid traffic jams more effectively.
Calculating more precise delivery time. Using complex algorithms, AI technology can calculate the delivery time more accurately by taking into account historical and real-time data, optimal routes, and other factors that can affect delivery efficiency.

Final Thoughts

AI has revolutionized and reshaped both inventory management and the way companies stock and store products. The AI solutions implemented to enable the businesses to make the inventory management pre-planned, automated, based on customer demands, and even carried out by robots. AI empowers companies to:

Enhance user experience and consumer satisfaction;
Increase sales;
Reduce costs;
Boost the overall productivity of the company.

AI is the future of the industry. Thus, if you want to stay competitive, you should implement the technology as soon as possible. The results can be outstanding.

Source Prolead brokers usa

Goldman 0 Comments

Aug 13 2021

Synthetic Image Generation using GANs

Occasionally a novel neural network architecture comes along that enables a truly unique way of solving specific deep learning problems. This has certainly been the case with Generative Adversarial Networks (GANs), originally proposed by Ian Goodfellow et al. in a 2014 paper that has been cited more than 32,000 times since its publication. Among other applications, GANs have become the preferred method for synthetic image generation. The results of using GANs for creating realistic images of people who do not exist have raised many ethical issues along the way.

In this blog post we focus on using GANs to generate synthetic images of skin lesions for medical image analysis in dermatology.

Figure 1 – How a generative adversarial network (GAN) works.

A Quick GAN Lesson

Essentially, GANs consist of two neural network agents/models (called generator and discriminator) that compete with one another in a zero-sum game, where one agent’s gain is another agent’s loss. The generator is used to generate new plausible examples from the problem domain whereas the discriminator is used to classify examples as real (from the domain) or fake (generated). The discriminator is then updated to get better at discriminating real and fake samples in subsequent iterations, and the generator is updated based on how well the generated samples fooled the discriminator (Figure 1).

During its history, numerous architectural variations and improvements over the original GAN idea have been proposed in the literature. Most GANs today are at least loosely based on the DCGAN (Deep Convolutional Generative Adversarial Networks) architecture, formalized by Alec Radford, Luke Metz and Soumith Chintala in their 2015 paper.

You’re likely to see DCGAN, LAPGAN, and PGAN used for unsupervised techniques like image synthesis, and cycleGAN and Pix2Pix used for cross-modality image-to-image translation.

GANs for Medical Images

The use of GANs to create synthetic medical images is motivated by the following aspects:

Medical (imaging) datasets are heavily unbalanced, i.e., they contain many more images of healthy patients than any pathology. The ability to create synthetic images (in different modalities) of specific pathologies could help alleviate the problem and provide more and better samples for a deep learning model to learn from.
Manual annotation of medical images is a costly process (compared to similar tasks for generic everyday images, which could be handled using crowdsourcing or smart image labeling tools). If a GAN-based solution were reliable enough to produce appropriate images requiring minimal labeling/annotation/validation by a medical expert, the time and cost savings would be appealing.
Because the images are synthetically generated, there are no patient data or privacy concerns.

Some of the main challenges for using GANs to create synthetic medical images, however, are:

Domain experts would still be needed to assess quality of synthetic images while the model is being refined, adding significant time to the process before a reliable synthetic medical image generator can be deployed.
Since we are ultimately dealing with patient health, the stakes involved in training (or fine-tuning) predictive models using synthetic images are higher than using similar techniques for non-critical AI applications. Essentially, if models learn from data, we must trust the data that these models are trained on.

The popularity of using GANs for medical applications has been growing at a fast pace in the past few years. In addition to synthetic image generation in a variety of medical domains, specialties, and image modalities, other applications of GANs such as cross-modality image-to-image translation (usually among MRI, PET, CT, and MRA) are also being researched in prominent labs, universities, and research centers worldwide.

In the field of dermatology, unsupervised synthetic image generation methods have been used to create high resolution synthetic skin lesion samples, which have also been successfully used in the training of skin lesion classiﬁers. State-of-the-art (SOTA) algorithms have been able to synthesize high resolution images of skin lesions which expert dermatologists could not reliably tell apart from real samples. Figure 2 shows examples of synthetic images generated by a recently published solution as well as real images from the training dataset.

Figure 2 – (L) synthetically generated images using state-of-the-art techniques;
(R) actual skin lesion images from a typical training dataset.

An example

Here is an example of how to use MATLAB to generate synthetic images of skin lesions.

The training dataset consists of annotated images from the ISIC 2016 challenge, Task 3 (Lesion classification) data set, containing 900 dermoscopic lesion images in JPEG format.

The code is based on an example using a more generic dataset, and then customized for medical images. It highlights MATLAB’s recently added capabilities for handling more complex deep learning tasks, including the ability to:

Create deep neural networks with custom layers, in addition to commonly used built-in layers.
Train deep neural networks with custom training loop and enabling automatic differentiation.
Process and manage mini-batches of images and using custom mini-batch processing functions.
Evaluate the model gradients for each mini-batch – and update the generator and discriminator parameters accordingly.

The code walks through creating synthetic images using GANs from start (loading and augmenting the dataset) to finish (training the model and generating new images).

One of the nicest features of using MATLAB to create synthetic images is the ability to visualize the generated images and score plots as the networks are trained (and, at the end of training, rewind and watch the entire process in a “movie player” type of interface embedded into the Live Script). Figure 3 shows a screenshot of the process after 600 epochs / 4200 iterations. The total training time for a 2021 M1 Mac mini with 16 GB of RAM and no GPU was close to 10 hours.

Figure 3 – Snapshot of the GAN after training for 600 epochs / 4200 iterations. On the left: 25 randomly selected generated images; on the right, generator (blue) and discriminator (red) curves showing score (between 0 and 1, where 0.5 is best) for each iteration (right).

Figure 4 shows additional examples of 25 randomly selected synthetically generated images after training has completed. The resulting images resemble skin lesions but are not realistic enough to fool a layperson, much less a dermatologist. They indicate that the solution works (notice how the images are very diverse in nature, capturing the diversity of the training set used by the discriminator), but they display several imperfections, among them: a noisy periodic pattern (in what appears to be an 8×8 grid of blocks across the image) and other visible artifacts. It is worth mentioning that the network has also learned a few meaningful artifacts (such as colorful stickers) that are actually present in a significant number of images from the training set.

Figure 4 – Examples of synthetically generated images.

Practical hints and tips

If you choose to go down the path of improving, expanding, and adapting the example to your needs, keep in mind that:

Image synthesis using GANs is a very time-consuming process (just as most deep learning solutions). Be sure to secure as much computational resources as you can.
Some things can go wrong and could be detected by inspecting the training progress, among them: convergence failure (when the generator and discriminator do not reach a balance during training, with one of them overpowering the other) and mode collapse (when the GAN produces a small variety of images with many duplicates and little diversity in the output). Our example doesn’t suffer from either problem.
Your results may not look “great” (contrast Figure 4 with Figure 2), but that is to be expected. After all, in this example we are basically using the standard DCGAN (deep convolutional generative adversarial network) Specialized work in synthetic skin lesion image generation has moved significantly beyond DCGAN; SOTA solutions (such as the one by Bissoto et al. and the one by Baur et al.) use more sophisticated architectures, normalization options, and validation strategies.

Key takeaways

GANs (and their numerous variations) are here to stay. They are, according to Yann LeCun, “the coolest thing since sliced bread.” Many different GAN architectures have been successfully used for generating realistic (i.e., semantically meaningful) synthetic images, which may help training deep learning models in cases where real images are rare, difficult to find, and expensive to annotate.

In this blog post we have used MATLAB to show how to generate synthetic images of skin lesions using a simple DCGAN and training images from the ISIC archive.

Medical image synthesis is a very active research area, and new examples of successful applications of GANs in different medical domains, specialties, and image modalities are likely to emerge in the near future. If you’re interested in learning more about it, check out this review paper and use our example as a starting point for further experimentation.

Source Prolead brokers usa

Goldman 0 Comments

Aug 13 2021

No Code AI, No Kidding Aye – Part II

Challenges addressed by No Code AI platforms

An AI model building is challenging on three fundamental counts:

Availability of relevant data in good quantity and quality: The less I rant about it, the better.
Need for multiple skills: Building an effective and monetizable AI model is not just the realm of a data scientist alone. It needs data engineering skills and domain knowledge also.
The constant evolution of the ecosystem in terms of new techniques, approaches, methodologies, and tools

There is no easy way out to address the first challenge, at least not so far. So, let us brush that under the carpet for now.

The need for having multiple resources with complementing skills is an area where a no-code AI platform can add tremendous value. The average data scientist spends half of his/her time preparing and cleaning the data needed to build models and the other half fine-tuning the model for optimum performance. No Code AI platforms (such as Subex HyperSense) can step in with automated data engineering and ML programming accelerators that go a long way in alleviating the requirement of having a multi-skilled team. What’s more, it empowers even Citizen Data Scientists with the ability to build competent AI models without having the need to know any programming language or having any background in data engineering. Platforms like HyperSense provide advanced automated data exploration, data preparation, and multi-source data integration capabilities using simple drag-and-drop interfaces. It combines this ability with a rich visual representation of the results at every step of the process so that one does not need to wait until the end to realize an error that was done in an early step and have to go back and make changes everywhere.

As I briefly touched upon a while back, getting the data ready is one-half of the battle won. The plethora of options on the other half is still perplexing – Is it a bird? Is it a plane? Oh no, it is Superman! Well, in our context – it would be more like – Is it DBSCAN? Is it a Gaussian Mixture? Oh no, it is K-Means! Feature engineering and experimenting with different algorithms to get the most optimum results is a specialized skill. It requires an in-depth understanding of the data set, domain knowledge, and principles of how various algorithms work. Here again, No Code AI platforms like HyperSense come to the table with significant value adds. With capabilities like autonomous feature engineering and multi-algorithm trial and benchmarking, I daresay that it makes building models almost child’s play. Please do not get me wrong. I am not for a moment suggesting that these platforms will result in the extinction of the technical data scientist role, on the contrary, it will make them more efficient and give them superpowers to solve greater problems in lesser time while managing and guiding teams of citizen data scientists to solve the more mundane, yet, problem statements of existential importance.

So far, so good; and having brushed one challenge under the carpet and discussed the other one, there is one more – The constant evolution of AI techniques, methodologies, tools, and technologies. Today, just being able to build a model which performs well on a pre-defined set of metrics does not cut ice anymore. It is just not enough for a model to be simply accurate. As the AI landscape evolves, the chorus for the Explainability and Accountability in models is reaching a fever pitch. Why did K-Means give you a better result than Gaussian Mixture? Will, you then get the same result if a feature was modified or a new one added? Why did the model predict a similar outcome for most customers belonging to a certain ethnicity? Is the model replicating the bias and vagaries present in the historical data set or the person building the model? If there have been policies and practices in a business where any sort of decision bias crept into day-to-day functioning, it is but natural that the data sets you work on will have those biases and the model you build will continue to persuade you to make decisions with the same biases as before. As an organization that is striving to disrupt and transform your industry, it is pertinent that you identify and weed out such biases sooner than later before your AI models hit scale and it becomes a wild animal out of its cage.

As No Code AI platforms evolve, model explainability is something that is already getting addressed. Platforms like HyperSense give you the option to open up the proverbial ‘black-box’ and peep inside to see why a model behaved the way it did. It provides the analyst or the data scientist with an opportunity to tinker around advanced settings and fine-tune them to meet the objectives. Model accountability and ethics is a whole different ball game altogether. It is not restricted just to technology but also the frailties of human beings as a species. I am sure the evolving AI ecosystem will eventually figure out a way to make the world free of human biases – but hey, where’s the fun then? Human biases do make the world interesting and despicable in equal measure and I believe the holy grail for AI will be to strike a balance between the two.

Until then, let us empower more and more creative and business stakeholders to explore and unleash the true power of AI using No Code platforms like HyperSense so that the world can be a better place for all life forms.

Source Prolead brokers usa

Goldman 0 Comments

Aug 12 2021

DSC Weekly Digest 10 August 2021

The most baleful aspects of the Pandemic seem to be behind us, though the emergence of the Delta variant of the COVID-19 virus is causing companies to question whether it is perhaps too early to shift operations completely back to the office, and months turn into years, the likelihood of a hybrid work model emerging as the dominant approach to work is becoming more and more likely.

This has a major impact upon the shape of work, especially for knowledge workers including data scientists, programmers, designers, and others who work primarily with information systems, as well as those who manage them. As machine learning systems become more integrated into day-to-day activities. Other areas that are also being transformed include education, in all its varied manifestations, entertainment, supply chain management, security, manufacturing, even criminal activity.

As this process plays out, it is forcing a re-evaluation of nearly all aspects of work, including what productivity means in the AI era and whether or not such digital transformations (including Work From Home / Work from Anywhere) is beneficial or harmful to the economy. New DSC Columnist Michael Spencer, editor-in-chief of The Last Futurist, explores this theme in detail in this newsletter, asking whether the digital transformations that we’re seeing will come at the cost of local economies disappearing, especially in the entertainment and service sectors.

The entertainment sector is transforming in ways that would have been unthinkable ten years ago. Salesforce this week announced that they were launching their own business-oriented Streaming Service, even as companies such as Gamestop and AMC are on death watch on Wall Street. We are continuing the process of transforming atoms to bits then making these virtualized atoms transmissible through ever-faster networks. Scarlett Johannson took Disney to court about royalty revenues lost to streaming, which is likely to send shockwaves through the entertainment sector as creators use the opportunity to renegotiate how such creativity is compensated as the traditional movie theater gives way to the virtualization of location. At the same time, Disney’s last major animated project, Raya and the Last Dragon was completed almost completely from the homes of the various animators, editors and other creatives, to the extent that we may not be far from every actor having a green screen room in their house.

Even in the service sector, the skills required (and the demands upon workers) are changing. Delivery has become the next sector to face automation, requiring the coordination of thousands of drivers and fulfillment specialists through the use of highly complex networked systems, often managed through the same kind of tracking tools formerly reserved for large-scale software projects. There is a generation of DIY home manufacturers who are becoming adept at managing such supply chain and distribution issues, and that in turn is shaping how (and where) business gets done.

Ultimately, what is happening is that geolocation is ceasing to be as major a factor as it once was, while at the same time I think that we’ll see the pendulum swinging back towards where local business should be. In my town of Issaquah, here in the Pacific Northwest, the local restaurants along Main Street (or Front Street, in this case) are now seeing more and more patrons, as are the barbers and hair salons, and even a bookstore or two after a few decades of them being destroyed by the large chains (a trend I’m seeing in other sectors as well). I think we’ll find a balance again, but it will be a different equilibrium. We still need that third place, neither home nor work but common ground to re-establish community.

In media res,

Kurt Cagle
Community Editor,
Data Science Central

To subscribe to the DSC Newsletter, go to Data Science Central and become a member today. It’s free!

Source Prolead brokers usa

Goldman 0 Comments

Aug 12 2021

How to Digitally Transform a company from scratch?

Consumers want fast solutions to their problems. With the help of unprecedented innovation in technology, digital transformation empowers businesses to improve the overall business structure and, most notably, the customer experience. While it always made sense to adopt digitization across companies, but the adaptation of digital transformation has still been slow.

Amid the pandemic, the need for transforming digitally has never been more urgent. Businesses that neglect the transformation will likely be left behind and risk losing their market position.

How can you embrace digital transformation successfully? Consider these ideas:

Switch from being product-focused to being customer-focused mindset

Embracing digital transformation holds special significance for customer experience. The primary focus should not be your product features; instead, more emphasis should be put on understanding and catering to your target customer’s wants and requirements.

If you clearly understand your customers’ problems and extend them a customized experience to resolve them, they will become your loyal customers. The key to earning loyal customers is paved by understanding their problems and offer them a customized experience that can solve their problems.

Scale-up creating innovative digital experiences

With technology pacing forward continuously, customers expect businesses to produce personalized digital content faster and cheap (or even free).

Accordingly, businesses must adapt to this trend and swiftly scale their digital designs, content production, and collaborations to keep their customers engaged, interested, and responsive.

Create your customer journey without depending on technical teams.

We know what kept you wondering, whether it’s even possible to create a digitally customized customer journey without a technical team?

It’s possible! Businesses don’t necessarily have the complete skill sets and teams required to execute the desired action when starting from scratch. However, the market now possesses plenty of new technologies that entrepreneurs and businesses can leverage.

Also, with the boom of low-code or no-code technology, it has become hassle-free to find code-free platforms. For example WordPress and Wix for hosting; Squarespace and Canva for content and website design; Hotjar and Google Analytics for analytics visualization.

Low-code or no-code technologies are designed specifically for entrepreneurs who may not have any technical or design background to efficiently create digital experiences without having to recruit a different team.

Enable remote workforces and automation

Amid the pandemic situation, businesses across the world shifted and aced the remote-work setup. Advancement in technology and adapting digital transformation across the organization thoroughly, employees no longer need to work from offices in a specific place only all the time. Robots can even substitute some responsibilities. For example, in-store robots manage transactional tasks like checking inventory in store aisles and fulfilling small orders.

To enable remote workforces, count on using project management tools, to stay connected with your team with virtual conference platforms.

In addition to an inclined mindset that understands and implements this new management concept, it is essential to implement an effective strategy and use the right technology.

Digital Strategy:

Digital strategy requires creating a digital culture in the organizations that follow a clear and combined transformation strategy to realize the organization’s digital maturity, which begins with the vision: think digital at all levels stretched across all the departments, from the senior managers to the last employees, that includes clients, middle managers and also external stakeholders or collaborations.

This digital strategy also requires continuous restructuring and rethinking of the business model, thus maintaining a culture that constantly adapts to market trends and the transformation of products and services.

Digital Technology:

A Digitized Company practices all possible digital technologies to optimize management and satisfy all personalities involved in the business (employees, customers, suppliers, etc.): automation process, digital work stations and mobility, Big data, electronic documentation, sensors, and the Internet of things (IoT), etc.

Moreover, these technologies must be thoroughly integrated and with business management to be truly useful (a factor usually overlooked and causes most digital transformation process failures)

Although this might seem obvious at first, yet the majority of reports and interviews by consultants and experts in digitization usually fail to consider the essentiality of task coordination.

The time for digital transformation is now!

The pandemic was also a wake-up call for businesses to embrace their digital transformation journey. It won’t be easy to transform how you’ve been running your business so far but know that the beginning is always the hardest. With the proper practices we mentioned in this article, you can successfully get ahead in your transformation journey and keep your business striving!

Source Prolead brokers usa

Goldman 0 Comments

Pro Lead Brokers USA | Targeted Sales Leads | Pro Lead Brokers USA

Pro Lead Brokers USA | Targeted Sales Leads | Pro Lead Brokers USA

What SEO testing is and why it is important

An SEO audit and SEO testing: what’s the difference?

How often should SEO testing be done?

Problem 1

Problem 2

Predict Chocolate Makers Location with Deep Learning Model with H2O

Problem 3

Predict Price of a house with Stacked Ensemble model with H2O

Preporcessing

Model 1

Model 2

Model 3

Model 4

Best Model

What Does Digital Transformation Means For Business?

Advantages Of Digital Transformation

How Does The Internet Of Things Affect Digital Transformation?

Bottom Line

5 Ways To Use Artificial Intelligence For Inventory Management

1. Data Mining and Turning It Into Solutions

2. Dealing with Forecasting, Planning, and Control Issues In The Inventory Management Process

3. Stock Management and Delivery

4. AI-Powered Robots to Optimize Warehouse Operations

5. Logistics Route Optimization

Final Thoughts

A Quick GAN Lesson

GANs for Medical Images

An example

Practical hints and tips

Key takeaways

Create your customer journey without depending on technical teams.