Blog Background

Machine learning in retail: what you really need to consider for effective forecasting

Everyone wants to predict sales. But few people think about what the model actually sees when it looks at the data. Machine learning has become a must-have for retail: demand models and personalized promotions—it all sounds great.

But despite high expectations, most models either fail to deliver tangible results or only work in a limited environment. This is confirmed by a systematic review entitled Why Big Data Projects Fail, according to which more than 80% of initiatives related to big data and ML fail. And it’s not about weak algorithms or incompetent teams. It’s just that very often models are built on data that does not take into account the real context.

In this article, we will examine why even the most advanced time series models can be wrong about basic things. And what needs to be taken into account so that forecasts really help in making decisions instead of imitating them.

Data illusion: why numbers are not everything

Let’s imagine a classic situation: an analyst receives a table with historical sales data. The data is daily, for each SKU, with the number of units. Formally, this is an ideal time series for forecasting: everything looks clean, logical, and structured.

At this stage, model development usually begins. And it doesn’t really matter what time series architecture is used for this, because what matters is something else entirely: what the model actually sees.

And it usually sees only one side of events — the result. Sales.

At the same time, it does not know that:

  • In some stores, there were no goods physically on the shelves during those days;
  • For some SKUs, there was a temporary “2 for the price of 1” promotion, which is not mentioned in the dataset;
  • There were no competing products, and this significantly changed buyer behavior;
  • the product was moved closer to the checkout area or received banner support on the main page of the website;
  • The SKU code or packaging has changed, but the data shows it as the same product.

From an algorithmic point of view, it’s all just numbers: there was a peak, there was a drop. Most ML models in retail are built around sales, which makes sense: sales are the end result of the interaction between a product and a buyer. But this is where the main mistake lies: sales are a consequence, not a cause. Sales fluctuations could be caused by circumstances that are in no way related to the actual demand for the product.

The model begins to predict behavior that will never be repeated in the real world. This is because it is based on exceptions rather than patterns.

What is important to consider for a quality forecast

1. Stock availability

If the product was not on the shelf, it is logical that there will be no sales. But this does not mean that there was no demand for it, it just means that the buyer did not have the opportunity to make a purchase. And if the model “learns” during this period, it misinterprets the lack of sales as a lack of interest.

Therefore, the model must have a complete array of stock balances. Otherwise, the forecast will be incorrect.

2. Promotional activities and discounts

Promotions, sales, coupons, 2+1 offers, push notifications, banners, new packaging—all of these have a significant impact on consumer behavior. However, this data is often missing or stored separately from the main dataset.

Without information about promotional activities, the model will analyze anomalies (sharp increases or decreases) as normal trends. And predict their recurrence where this will not happen.

3. Category context and competitor behavior

If you are analyzing a specific product, this is not enough. Buyers do not make choices in isolation. They see several alternatives on the shelf (or on the catalog page) and respond to the entire context.

Therefore, it is necessary to take into account:

  • what other products were in the category during this period;
  • which ones were available;
  • whether there were promotions or shortages.

For example, if a competitor disappears from the shelf, this could give the product a short-term boost. But a model that doesn’t know this will conclude that the reason is the popularity of the product.

4. Geography, channels, and logistics

The same product can be sold differently depending on the city, type of store, distribution model, shelf depth, and delivery. All of this is important.

For example:

  • In Kyiv, goods are delivered daily, and in the regions, once a week.
  • In one store, the goods are at eye level, in another, they are at the bottom of the shelf.
  • One network has a push campaign, while the other does not.

5. Physical changes to goods (SKU attributes)

Many models do not take into account that SKU is only a technical code. Goods on the shelf may change without changing the code.

What can influence:

  • new packaging design;
  • change in size or volume;
  • updated photo or name on the website;
  • marking as “new” or “top seller.”

How to tell if your ML model is sick: typical symptoms

Even without a deep audit, you can tell that the model is behaving suspiciously. It doesn’t necessarily mean it’s broken, but its predictions don’t match reality or are illogical.

Here are some signs to watch out for:

Sudden spikes or drops for no apparent reason
The model predicts a surge that has not happened before or a sharp drop that cannot be explained. Most likely, it is learning from random coincidences.
Overly smoothed trends
The model is risk-averse and predicts the “average temperature.” This is a sign that it does not see cause-and-effect relationships and is working too cautiously.
Low explainability
If you cannot answer the question, “Why did the model give this particular forecast?”, that is a problem. Business decisions must be based on logic.
High error rate for new SKUs
If the forecast for familiar products is more or less accurate, but for new items it is consistently wrong, it means that the model is unable to generalize.
Inconsistent response to obvious events
For example, when a large campaign is launched, the forecast does not change. Or, conversely, the model reacts to every little thing as if it were a breakthrough or a disaster.

Empty shelves – the main enemy of machine learning

Let’s talk in more detail about one of the most common distortions in forecasting models: when the model interprets the absence of sales as the absence of demand. This is logical from a numerical point of view, but completely wrong from a real business perspective. Zero sales is not a death sentence for a product. It is often a sign that the product was not physically available to the buyer. The problem is that ML models that do not take inventory into account cannot see this. And in order to make the right decisions, businesses need to see two vectors simultaneously: what is being purchased (demand) and what is physically available (inventory).

Combining these data gives:

  • Reduction in out-of-stock incidents: when a product is unavailable but there was demand for it;
  • Production optimization: to avoid overproduction and tying up money in inventory;
  • Accurate forecasts, because the model does not confuse the absence of a product with a lack of interest in it;
  • Reduction of disposal costs, surpluses, losses from unsold promotions.

While developing a production forecasting model for Novi Produkty, our team directly encountered a situation where the model predicted a decline in demand, even though the product was simply not available on the shelves at some locations. In such cases, analytics only creates the illusion of control instead of providing real support for decisions. This point became critical in setting up the model: we integrated inventory data to clearly distinguish real fluctuations in demand from situations where there was simply nothing to sell. We connected complete SKU inventory data, distinguished between situations where there was demand but no product, and were able to improve forecast accuracy. This helped us avoid overproduction, reduce losses, and make informed decisions based on real data.

A few words in conclusion

Weak models produce weak predictions. But even worse is when a strong model works with incomplete data and creates confidence where there can be none.

Machine learning in retail only works when the model sees the whole picture. No architecture can compensate for a lack of understanding of what lies behind the numbers. A high-quality forecast is based not only on the data itself, but also on the context from which that data originated.

This is how we approach our work at IWIS: our models are effective not in theory, but in real-world conditions. If you want to find out how this can work for you, we are just a message away.

Next post