Ecommerce Analytics and Data Science

Ecommerce Use Case: How Machine Learning Can Increase Profits

Written by Taras on November 30th, 2016

Want to increase your sales? Here’s a detailed case study on how ecommerce companies can leverage profits with machine learning algorithms.

Ecommerce is becoming a very crowded space. Competing businesses can sell their products all over the planet, and getting a good piece of the marketplace is harder and harder to accomplish.

Most ecommerce entrepreneurs have mastered content marketing. They understand the concepts of building relationships with customers, of keeping each content marketing platform engaging and up-to-date. They build their email lists and retain users with creative campaigns. They are even moving into geo-location and personalization with their content outreach. And still, they are not able to increase sales performance for all of their efforts.

The Answer Lies in Ecommerce Analytics and Data Science

Ecommerce Analytics

When e-commerce companies are asked about how they are using analytics and data, there seems to be a disconnect. More than a half of Fortune 500 companies are actually using big data to analyze their websites for traffic, user experience and behavior, in order to gather the information to alter user behavior. But small and mid-sized e-commerce businesses have not taken the full advantage of the data science and big data analytics that is out there. There are two reasons for this:

  • They may use Google data analytics and generate plenty of reports that show areas of weakness, but they are not sure how to effectively correct those weaknesses.
  • They may believe that they have to hire a data scientist like the “big boys” do, who can not only collect and analyze, but who can then collaborate with marketing staffs to develop complex and expensive strategies. Such strategies will address challenges (traffic patterns, bounce spots and rates, etc.), and, as well, analyze specific customer behaviors and how those behaviors can be targeted to increase sales in the future. It’s pretty amazing stuff, actually, and much of it is accomplished through data science machine learning, allowing machines to use algorithms and math to solve specific problems better than humans can.

The truth is this: An ecommerce business of any size can take advantage of data science for business and use it to ramp up its customer base (and thus profits). For small and mid-sized businesses, this does not mean adding expensive big data science experts. It means contracting out with a service that has the data scientists who can collect the data, organize and analyze it, develop models, and then collaborate with others on their teams to make recommendations to an ecommerce business, including ecommerce conversion rate.

Why Data Science?

Why Data Science

If you have purchased anything on Amazon recently, you will see some interesting things pop up, as you search for products and ultimately make a purchase. One of the most prominent features you will see, is the statement: “Other customers who purchased this product also purchased these.” And then additional products will be displayed for your viewing.

Data science for ecommerce has been used to group you with customers who may be of the same age range, the same sex, and with the same interests that you have. Data science is tracking your behavior and offering other potential purchases to you, based upon all of these factors. Chances are you will look at those other products, may purchase one or two, or at least be aware that they exist so that you may return and purchase them. Big data analysis allowed Amazon to customize its website in real time, just for you. And it can do much more.

Data science techniques, indeed, are powerful tools, and all ecommerce businesses should be using them. Let me show you how:

What Data Science Can Fix For Your Business

What Data Science Can Fix For Your Business

The problems that ecommerce businesses face are pretty typical – low conversion rates, high bounce rates, cart abandonment, lack of customer loyalty, etc. Their own analytics will show this in the reports they generate. But those reports lack the deep learning that data science can provide, so that individual solutions can be developed and implemented.

Romexsoft has the team and the tools for deep learning through data science – learning that can drive what a business does to increase its revenue, user by user, customer by customer.

Case Study: Boosting Customer Loyalty and The Average Check With Big Data

Big Data

Recently an online retailer contacted us with the following problem(s). He has a large line of casual and sports clothing and shoes for people of all ages, for both genders, and for style preferences.

What he was discovering was this: he could get a customer “in the door,” and often get a purchase. But most customers were not “coming back for more” and/or purchasing other products that would suit them.

What he wanted from Romexsoft was a full analysis of what he could do to change his customers’ behaviors and move them to purchase more.

Our process involved several steps, and in the end, we were able to make recommendations which, when implemented, increase his sales almost immediately. Here was the process:

Analysis of the Site Structure Itself


When our team entered the website, we were able to make a few suggestions after a detailed research. Using basic analytics, we were able to locate those pages which were obviously least popular, those pages that resulted in the most bounce rates, most and least popular products, based upon the correlation between views and actual purchases.

For example, there were several shoe products that the retailer was considering discarding. While there were many views, the proportion of purchases was quite low. What we discovered through our analytics, was that the problem was not the product – the problem was the pricing.

Our developers were able to remodel the structure of the site, revise groupings of products, and recommend the correct price points for “low sale” products.

But the real work to solve the problem was just beginning. The job ahead of us was ultimately analyze the behavior of each individual customer and determine how to change that behavior to translate into more purchases. This information would be valuable for existing customers but also for new customers who visited.

Generating The Test Data

To prepare for deep analysis, we had to first organize products based upon type (e.g., shirt, shoes) sex, age groups, their purpose (casual or sport), brands/pricing, and a full history of the numbers of views of each product page and the information that was provided on that page. We generated more than 150,000 records of data to test.

Statistical Analysis and Machine Learning

Using data science with Java and Apache Spark, we applied an item-to-item correlation filtering system recommended by Amazon. What this means is as follows:

  • Each product was described by its type, sex, age, brand and purpose.
  • We filtered by three variants – the item code, the product code, and the “rate” which we defined as click-throughs to that product.

We were then able to generate data on actual customer taste. Here is a sampling of that data:

User id Brand Product id Category of product Product type by age Product type by gender Product for sports or casual wear?
1 Brand A 42 shoes children male casual
1 Brand A 45 shoes children male casual
1 Brand A 48 shoes children male casual
1 Brand B 717 jacket children male sport
19761 Brand H 123 shoes children female casual
19761 Brand B 1186 shorts children male sport
19761 Brand C 1190 shorts children male sport
38335 Brand H 95 shoes adult female casual
38335 Brand C 1596 cap children male sport
38335 Brand C 1597 cap children male sport
39999 Brand J 41 shoes adult male casual
39999 Brand E 59 shoes children male casual
39999 Brand E 60 shoes children male casual
39999 Brand E 61 shoes children male casual
39999 Brand E 62 shoes children male casual
39999 Brand E 64 shoes children male casual

Establishing Predictions for Customer Rates Based Upon Actual Rates

Next, we wanted to generate data that would tell us the predicted rate (click throughs) of customers who looked at more than one product, if they were shown similar products. This is a sampling of that data:

This first chart shows a customer looking at a specific product and the actual product rate (number of times the customer actually clicked-through).

Users id Products id Products rate (in fact)
0004 0940 3
0005 1047 1
0007 1492 3
0010 0123 2
0011 0648 2
0012 0306 3
0014 0023 2
0017 0060 1
0019 0308 2
0020 0091 2
0021 0035 4
0025 0452 3

This next chart shows the same customer and the predicted product rate if shown similar items:

Users id Products id Products rate (in fact) Products rate (predicted)
0004 0940 3 3.199
0005 1047 1 1.722
0007 1492 3 2.615
0010 0123 2 2.724
0011 0648 2 1.830
0012 0306 3 2.708
0014 0023 2 2.105
0017 0060 1 1.196
0019 0308 2 2.403
0020 0091 2 2.468
0021 0035 4 3.255
0025 0452 3 2.119

You can clearly see how close the actual and predicted rates are, and they are based upon predictor models that have been proven. What this data science machine learning tells the business owner is that he should be showing individual customers similar products, which customer might not even heard about but which will suit him the most. And this is the value of using data science in retail – informing the retailer of the potential for customers to click-through to other products, when presented with them. And because the data puts customers into groups, those groups of customers, with similar behavior and interests, can be shown the same similar products.

Predictions of Product Presentations/Ratings Based Upon Customer Groups

Now that the retailer knows he will be presenting similar products to his customers, the next data science challenge is to determine the products to present. Again, machine learning takes over based upon customer groups and past product rates of those groups, and then generates a listing of the similar products to which customers should be exposed.

The following chart is an example of what this data report will show, based upon six additional products that should be shown to each customer, along with predicted ratings.

Users id Product id Rating Product id Rating Product id Rating Product id Rating Product id Rating Product id Rating
14 1027 3.919 1316 3.774 507 3.745 861 3.645 1154 3.63 1686 3.608
11 1316 3.042 1430 2.958 890 2.836 958 2.809 1551 2.807 1825 2.804
17 188 4.517 890 4.475 895 4.372 177 4.354 899 4.284 209 4.27
4 1825 4.276 497 4.195 720 4.137 786 4.125 1796 4.093 942 4.01
39 219 3.794 709 3.762 188 3.762 1316 3.728 890 3.706 284 3.698
42 196 3.168 891 3.14 1238 3.139 801 3.072 371 3.072 266 3.059
12 890 4.72 507 4.628 1554 4.579 786 4.552 1856 4.519 127 4.511
33 1547 4.249 1270 4.176 801 4.136 1649 4.082 1152 4.009 1480 4.005
7 482 5.294 890 5.129 1370 5.055 1620 5.01 149 4.979 1647 4.923

Based on the existing data, we can also determine the potential buyers for a certain group of products or a certain brand even if they did not express any prior interest in some particular brand. Our model allows juxtapositioning them against people who have similar shopping preferences and had previously purchased the brand in question. As a result, we can narrow down the potential buyer segment that will feel interested in a certain group of products:

Product id User id Rating User id Rating Userid Rating User id Rating User id Rating User id Rating User id
23 6444 4.574 5032 4.269 3161 4.211 2534 4.211 9964 4.21 1430 4.2 6645
648 6229 4.727 4077 4.564 4724 4.399 4171 4.28 9443 4.229 1368 4.185 2462
60 8784 4.281 4019 4.092 4165 4.063 3912 4.063 6893 4.063 3935 4.002 5063
940 2814 4.955 9849 4.893 6893 4.832 3912 4.832 4165 4.832 1329 4.821 4411
298 1605 4.169 3149 3.987 6133 3.936 3227 3.919 1767 3.885 9366 3.881 3125
374 7147 4.496 4623 3.973 2242 3.903 2786 3.82 5416 3.781 7043 3.732 861
306 557 4.626 6105 4.494 4003 4.322 3689 4.311 8077 4.181 4567 4.137 9104
1642 2209 4.564 5941 4.431 5846 4.403 6772 4.4 8862 4.172 4991 4.045 23

The concept is simple: Customers’ who have completed specific purchases in the past, and those purchases have been similar to those of a group of customers, then future purchases can be predicted. Using real data of these purchases, and applying machine learning for data science, the business owner can customize and personalize (and direct) each customer’s experience and journey on his site.

The Benefits of This Model

For our client, the benefits were obvious. He will increase the potential for purchases and, as the result, increase ecommerce sales by displaying a larger assortment of similar products to each customer – products the customer didn’t even realize were on the site and products that will suit customer’s needs the most.

Another value of this model is that sales can be more accurately. The business owner can then better manage his inventory – something that will certainly help to grow business profits. As outlined above, you can make more accurate predictions on the kind of goods to be likely purchased. The predictions can be as accurate as claiming that your company will sell 100-120 Nike Air Max Model shoes with a 90% probability in the next week.

What is more, our model allows determining the exact factors that may (or may not) impact the sales volumes. For instance, in most cases the frequency of visiting your website has no direct impact on the sales. Users may spend a lot of time browsing and comparing goods without committing to a purchase. While factors like age, seasonality and past record of purchases have a significant impact on the probability of a purchase.

So What are Your Problems?

You may have the insight to know that you are not growing as you should. Knowing why is another matter. And that is where business analytics comes in. It is a complex matter, but data science case studies continue to show that big data and machine learning can provide the answers.

Romexsoft is ready to build a model for you, based upon your unique circumstances. Let’s discuss your problem today.



Hey! I’m Taras, Data Scientist at Romexsoft. Want to know more about big data, machine learning and other cool stuff? Then follow my posts on Romexsoft blog.