Assortment optimization as data science a training in retail

The client successfully operates a retail chain. In order to expand their analytical capabilities, a data science team was founded and a Hadoop-Spark platform was created. niologic trained the new team in retail analytics and the use of distributed systems such as Apache Spark.


Said retailer wanted to use Retail Analytics to strengthen their market position and force new customer offers. The newly established Data Science team was to provide the analyses for this. A new technical platform based on Apache Spark was set up in parallel.


In the beginning of the project, niologic clarified the existing data structure, the data volumes (billions of transactions) and the existing IT infrastructure. Use cases for data science and retail analytics were prioritized along with the customer.

In workshops lasting several days, various questions on assortment optimization and compound purchases were then worked out together with the new data scientists specifically for retail.

The general functionality of Hadoop and Spark was explained in a practical way. Algorithms were developed together using SparkR or SparkSQL and the differences between Hive and Tez and Spark were pointed out. YARN queues were also tested under Load and optimized together.

Using SparkML, a ProductRank was created, which quantified network strength within the assortment through a network analysis and thus made individual purchasing patterns or product clusters identifiable.

As part of the assortment optimization, an analysis of cannibalization effects was also implemented using R and Spark, so that similar and cannibalizing products became visible.

Result and customer value

The customer’s employees were trained in dealing with retail issues through several workshops and were quickly able to further develop their own issues. The technical platform could be further optimized in a first productive test and the handling of the platform could be trained.

niologic successfully supported the customer as a mentor for data science and in the initial development of the new department.