Product recommender algorithms have (thankfully) moved past the Machine Learning/AI Hype curve in the past few years. There was a time when having a recommendation engine on your retail website was considered novel.

Retailers and their data science team members spent a lot of time in understanding the maths behind popular product recommendation algorithms, and then implementing one for their website.

Whether it is simple collaborative filtering or advanced latent factor models, there are tons of articles, blogs and even book chapters explaining how product recommendation algorithms work.

The machine learning community has also seen fundamental and exciting research in recommendation algorithms. Popular open source big data/machine learning libraries like Apache Spark (MLib) make it easy to build a recommendation system for your own site.

However, this deluge of information and availability of API black boxes can be misleading. We have often seen that small and medium scale retailers making a mistake of rolling their own in-house product recommender using some popular open source library, only to find that it doesn’t yield much result. At, we believe it is time to take a hard look at why this happens.

There are a multitude of issues which can result in poor performance. Let’s look at a few here:

Data Collection

Available algorithms assume that collection of user and product interaction data along with product metadata attributes is already instrumented. But as with any other data science project, data instrumentation and cleaning is often the part that consumes 80% of the time/resources.

Black Boxes Tuned for Different Use Case

The open source library that you use for building recommender might be optimizing for entirely different goal. For example, a model assumptions built with DVD rental kind of use cases may not match with a use case of a high priced jewellery retailing.

Implicit Feedback Doesn’t Mean What You Think It Means

Early algorithms required explicit ratings on an item/product given by a user. On a retail website, there are generally no explicit ratings. Even implicit feedback events like adding to cart or purchase are rare. Tuning implicit feedback recommendation models takes lot of work.

Skewness of Data

Purchase transaction events may be too few for a small or mid scale retail website. Volume of top of the funnel event like viewing a product differs from category to category significantly. On electronic category, users may spend lot of time carefully comparing features of one model with other. But on categories like books or apparel, impulse purchases are not rare.

Performance Measurements

A retailer might want to optimize for a different metric (say, net revenue from a single customer over time) than conventional metrics used with recommender systems like precision-recall.

At, we have spent considerable amounts of time in thinking hard about these problems. We would love to chat with you if you want to know more about our approach to managing product recommendation algorithms.



Database Selection and Scaling

Cede Control – Accelerate with Marketing AI