Programming BI in SCM

Supply Chain Management with Fabric: Optimal Order Quantity

In a former post, I tried to use PyTorch parallel computing on the newsvendor problem. A key takeaway was that for thousands of SKUs and for analyzing hundreds of thousands of order lines distributed high-speed computing comes quickly into play.

For an explanation what the newsvendor problem is, please check the beginning of the post Newsvendor problem. Briefly, if you assume a normal distribution for the demand of a specific SKU, and you have figured the stock-out cost and holding cost for a unit of this SKU, you can estimate the optimal order size for this SKU.

One prominent choice to do this is Microsoft Fabric. Fabric uses Spark and we will use PySpark notebooks in the following.

In our example scenario we use 100.000 order lines, which is about one month’s worth of order lines. In the regarded period there has been demand for 1878 SKUs. As mentioned, the simplified assumption is that the demand per SKU is distributed according to a normal distribution. So, we need the mean and standard deviation for every SKU. In PySpark this can be easily achieved by a “Group By” expression and built in SQL functions:

RL environment

For every SKU we add holding costs and stockout costs per unit. They have been calculated elsewhere and are not subject to this consideration. The holding costs are operational costs for keeping a unit in stock, the stockout costs are the opportunity costs for not selling a unit.

Per SKU we draw 1000 values from a standard normal distribution and transform these values according to the mean and standard deviation per SKU to a sampled demand value.

RL environment

Please note that the cross-join operation is a distributed operation, and we keep the benefits of the Spark setup.

On a side note, we use the same random numbers for every SKU and then transform by mean and variance. This is not an issue, because it is more a question if the sample size (here: 1000) is sufficient and not what the sampled numbers are specifically.

At this stage we have samples of demand per SKU and if our order size does not hit the demand exactly, we have stockout costs for the order size being lower than the actual demand or holding costs for the order size being higher than the actual demand:

RL environment

Now the average per SKU and order size for the total cost is taken and the order size with the minimum average total cost is taken. We can write the result to a lake house delta table:

RL environment

Finally, we can use the notebook in a Fabric data pipeline to schedule an update of the delta table:

RL environment

Key Takeaways

Microsoft Fabric provides an efficient notebook setup with Spark to approach statistical supply chain problems in a scalable manner
Notebooks can be scheduled in a pipeline to automate the creation of optimal order quantities per SKU

Although the newsvendor problem has simplified assumptions, it is always a good entry point for this type of calculations.