This is a continuation of my multipart blog series on how retailers can adopt AI for their everyday operations. Advancements in AI based techniques along with cheap compute hardware hosted over cloud have given a major boost in the ability to churn terrabytes of data at a very cheap price. This has resulted in more powerful machine learning models capable of making predictions more accurately.
To keep things simple, we will only talk about sales forecasting in this blog. However, it must be noted that given enough data, any kind of forecasting can be modeled. Sales forecasting is done by all the companies – of course you are interested in knowing how much money you(r investors) are making. At a very high level, the process of forecasting (regardless of using machine learning) starts by collating all the historical data that you have, dividing it into two parts – one for model creation and another for testing; and then creating a statistical model which makes most sense out of it. You then test your model against your test data to confirm that your model works.
We begin by having conversations – we understand how your company works, your processes and what kind of data you have. We perform a basic EDA to determine which dataset (historical sales data, cancellation data, customer data, etc) can be used to take things forward and augment it by adding 3rd party data (demographic data, weather data, etc) if required. Once the data is decided, we start the actual process.
The process starts by removing outliers and invalid data, filling missing data and finally normalizing it. It prepares the data for further EDA and experimentations required to choose the correct model.
The characteristics of the trend is identified by asking the following questions:
The above questions help us decide the characteristics of the data and help us select an appropriate model for it.
Chances are very high that a time series forecasting model will be chosen for the exercise which will depend on the findings we get from the above step. For example, we might choose a method like CNN-LSTM (ignore the technical word) if we find that the data has seasonal patterns. Practically speaking, 4 to 6 different models are chosen (sometimes even more) to compensate for each others’ weaknesses and the final forecasting is calculated based on their individual scores.
The models also need to perform well in terms of both compute and accuracy based on your data. So, it is very important to look inside a model to see why it does what it does. And finally, the selection of models also depend if we are doing a short-term forecasting or trying to forecast for the next few months.
The models are run on the data (or partial data) multiple times and each time the “settings” are changed a little. Then all the outputs are measured and compared against each other and finally one set of “settings” is determined and used. Multiple metrics or their combinations are considered to determine the perfomance. The most common ones are – Mean Absolute Error (MAE), Root Mean Square Deviation (RMSD) and Mean Average Percentage Error (MAPE) (again, ignore the technical words).
Finally we host the model in a cloud service of your choice (AWS, Microsoft Azure, etc) or in your on-premise environment and start the training. Once the training is complete, it can be used for the actual forecasting.
The model and its effectiveness are finally tested against how it performs for the test data set, which is typically a part of the original data set. For example, if the original dataset is split into two parts – 70% and 30%; 70% of the data is used for creating and training the model and 30% is used to determine how accurate the model can predict the outcomes. Once the model gives satisfactory results and we are happy with its accuracy, the “actual” future is forecasted which you can incorporate in your decision making processes.
With this I would like to conclude this blog. Please reach out to me in case you have any queries. Would love to help 🙂