Programming BI in SCM

PyTorch and Supply Chain Management

This sequence of blogs conducts a few experiments on parallel treatment of optimized order quantities. The base question is how much complexity we can incorporate in the optimization process when exploiting parallelism. We start with the simple newsvendor problem. We will apply vectorization and parallel computing on the GPU and check how far this gets us.

The Newsvendor Problem

Here is the defintion of the vanilla newsvendor problem: a newsvendor sells newspapers and faces uncertain daily demand. The newsvendor needs to decide how many newspapers to order each day from the supplier. Key parameters are as follows:

Demand distribution: The demand is supposed to follow a demand distribution. We can imagine a demand being sampled from this given distribution
Order Quantity Q: The decision variable representing the number of newspapers the newsvendor orders from the supplier.
Inventory Level I: The initial inventory level on any given day, considering the previous day’s leftover newspapers and the new order quantity.
Shortage Cost C_s: The cost incurred for each newspaper not sold due to underordering.
Excess Cost C_e The cost incurred for each excess newspaper that remains unsold due to overordering.

The total cost equation is: \( \text{Total Cost} = C_s \cdot \Pr(D > I) + C_e \cdot \Pr(D < I) \)

The objective is to minimize this cost. The cost is obviously zero if the inventory at hand matches the demand. The optimization must be executed for every SKU. If we assume in our initial simplistic approach that different article demands are not correlated, this task can be parellalized. Before getting into this, we will try out parallelization in general.

Parallel Agents

In this section an abstract scenario will be examined, where use a parallelized RL agent environment in order to check if the parralel and vectorized execution gives the desired computing time advantages. If this test is positive, we will setup an
algorithm that calculates the optimal order quantity for the SKUs in parallel. A standard agent environment interaction is described
everwhere on the web e.g.

Agent & Environment. In a nutshell, the agent is in a certain state and takes an action. From the environment the agent receives the new state and a reward for the taken action. We call this interaction of the agent with the environment a step. The agent’s goal is to maximize the reward received during its life time.
One of the most prominent examples at the moment lets the agent play an Atari game. Obviously, the goal of the agent is to reach a maximum score during its life time.

We now let agents run in parallel: we do about a million steps in total and a change the number of agents.

Number Agents	Total Number Steps	Time Elapsed [s]
2	1.048.576	2.385
4	1.048.576	973
8	1.048.576	522
16	1.048.576	261

The quick check gives the desired result: doubling the numer of agents roughly halves the compute time.

Vectorization and Parallelization of the Newsvendor Problem

As we want to focus on the parallelization and vectorization of the problem, we assume a normal distribution of every SKU with different means and standard deviations. This is all a simplistic modeling. The main purpose is to verify if the parallel calculation per SKU brings us the desired benefit. More complicated models will be then treated in subsequent blogs. In the same spirit we assume that we do not carry over any inventory to the next period. Thus, we want to find the order quantity per SKU that matches the demand best.

RL environment

When doing this parallel approach, we want maximized speed, but have pay with space. Basically, we build up tensors that comprise the dimensions (number_skus, number_samples, number_posssible_order_quantities). When testing on a laptop with
a standard GPU, the space capacity on the GPU is quickly exhausted. So, we just check the time advantage of parallel approach with a small model with 500 SKUs, 100 samples and 1000 order quantity possibilities. Measuring the time we have less than 1 second computing time vs. 80 seconds computing time.

We know that parallel computing starts to shine with massive data and that the advantage will more accentuated when running bigger models. If we go to 2000 SKUs ceteris paribus, we reach the free limits of Google Colab GPU capacity. The computing time ratio is now a little more than one second vs. 420 seconds for the sequential CPU setup.

Without any real proof, we checked by experiment that the time benefit scales with the number of SKUs n, this is: t_seq = n * c * t_parallel. With some constant c. This will presumably hold as well for the other dimensions sample size and number of order possibilities.

How is this in the real world?

If we look at the requirements and architecture of top class advancecd supply chain planning systems like Example of state of the art SC planning software, you will recognize that the planning is dictated by the clockspeed of the sales and operations planning process: if planning results on a certain level are adjusted, let’s say daily, you run your computations over night and there is no need to have a super fast calculation. If you imagine a digital twin, where the scenario result should be rendered in seconds, this might be different. In the above mentioned, industry-grade scalable systems, the parallel and vectorized computing is done on distributed systems if necessary. Technologies like MapReduce come into play. Functions that one uses have to be parallelized. In the above code we use the Kronecker Product for tensors. This is a good example where you have to custom-design a function for computation on distributed GPUs.

Key Takeaways

Parallel computing brings the desired benefit, at the cost of GPU space: The parallel computed PyTorch tensor setup gives great computing time, but needs GPU space.
All PyTorch methods are available for SCM : as soon as we know that the PyTorch tensor approach is feasible, we can try out the entire tool set of the framework. As mentioned, the purpose of all this is to try out these concepts in the specific supply chain domain. We are ready to combine AI models with SC models.
Distributed computing might be necessary : For the time critical computation of larger models the single GPU approach is not sufficient. Distributed computing will be necessary.

This concludes part 1 of the investigation how to use PyTorch to run supply chain models. In the subsequent parts we will try more realistic models, other than the newsvendor model and tackle distributed systems.
I would like to thank Patrick Westkamp, CEO and founder of businessoutcome and his team for the fruitful discussions about this topic.