In this article, we will use statistic method to implement a price optimization using regression algorithm called Ordinary Least Square which is available on the statsmodel library. You can use any other machine learning regression algorithm like XGboost but I have chosen OLS as it provides substantial information relevant to price optimization like the R-squared values, P-values, F-statistic and especially the elasticity value. It is recommend to have a good understanding on those variables before moving forward into this article. There is a good explanation on those in this blog.

Apart from those statistical variables, you need to understand as well the principle of Price Elasticity. What are the different types of price elasticity and what are the conditions to be met to move forward to price estimation? You will find all types of price elasticity approaches here. We will use Level-Level price elasticity approach in this article.

For the dataset, I was wandering around to try to find a non-commercial price-quantity dataset and I couldn’t find better but only the online retail sales from UCI Machine Learning Repository. It is mostly used for time-series forecasting but we can use it as price optimization. The dataset contains online retail sales for various products from different countries around Europe. We will take only one product, WHITE HANGING HEART T-LIGHT HOLDER, on which we are having the highest number of records. We will also limit our simulation to UK.

Assuming you have already good knowledge on data handling with Pandas, I will jump directly to the lines which are relevant to our purpose.

First step is Exploratory Data analysis. After loading and filtering dataset, we need to aggregate data by Invoice date and take the mean values for Quantity and Price. We get below description of the dataset.

Fig 1 : Description of the dataset

Next, let’s plot the relation between Price & Quantity to visualize the density of the data. We can use seaborn.pairplot which will show below plot.

Fig 2 : PairPlot figure Quantity – Price

We can observe from both figures above that price’s data are mostly sitting between 2.8 and 3. Similarly, Quantity is mostly ranging from 5 to 18. We will take those values as baseline to tune up the dataset for an accurate model.

Fig 3 : Linear Plot Price – Quantity

As you can see, we are observing data sparsity here but as I said before, I couldn’t find a better dataset but at least you can grasp the general knowledge on price optimization and its principle.

Then, after running the OLS algorithm on the dataset, we get below result.

Fig 4 : OLS regression result

The most important values to look at this result are :

  • the R-squarred which shows the fitness of the model to the dataset. Usually, it should be as high as 90% at least but the best we could get here was just 59%.
  • the Price coef whose absolute value must be greater than 1 to comply to elasticity condition.

Now that we have the model, we can estimate the price for an optimal profit. We can plot the Quantity and Profit from a range of price of 1.8 to 2.8 and we can identify which price is giving the peak profit and what is the expected quantity.

Fig 5 : Profit – Quantity plot

The final result is shown below :

fig 6 : Final result

The full jupyter notebook is available on github.