Supply Chain Management with Fabric: Demand Volatility Monitoring in Real-Time

In communities you find numerous discussions if ABC-XYZ analysis makes sense. In this article we assume that the reader applies ABC-XYZ in a reasonable context.
The focus is here to apply ABC-XYZ using Fabric in a real-time manner. Microsoft Fabric is meant to be an efficient tool set for exercising business
process analytics with all the benefits of in-memory databases and distributed computing. Thus, we want to check if we can quickly set up
a near to real-time monitoring of the volatility of our demand using this method.

For our reasoning we will take a path as follows:

We take real demand data and apply the usual ABC-XYZ methodology.
The method is standard in every supply planning or IBP system. We want to figure out how we can quickly setup
this method with near to real-time events in Microsoft Fabric
Power Automate will pull the events and Fabric will be used to do the real-time analytics,
to see how the demand classifies according to ABC-XYZ

So, this is a little exercise about simulation on the Power Platform and Fabric as well.

Implementation

Custom Event Hub Setup in Microsoft Fabric

The event hub setup follows event hub setup.

Key implementation details:

You can setup a custom event hub in Fabric to process any streaming data source
Once you have specified the JSON of your order line event, a KQL database table is setup to capture the events as they come in

Order line polling with Power Automate

power_automate_order_polling

Key implementation details:

You have a REST API available where you can poll a number or order lines from
The HTML process step in Power Automate gives you a list of JSON objects with order line details
The order line objects are sent as event to the Fabric event hub

Stream Analytics with KQL

Fabric uses KQL databases for event capturing. Naturally, KQL (Kusto Query Language) is the query language of choice to analyze
the incoming data. KQL is tailored to stream analytics and structure-wise it resembles Power Query or F# pipelines
(and these are languages that are functional languages, although F# has mutable features).
In this simple example the order line object has a SKU, a date, and a quantity.

abc_xyz_kql

Key implementation details:

The order event table in the code is just a placeholder for the real table
The ABC analysis as per unit, as in the simple example here no turnover figures
per order line are submitted. X is the SKUs that account for 70%
of the turnover descending by number of units in the regarded period
XYZ uses the standard deviation by SKU divided by the mean as coefficient of variance.
A variance lower than 0.5 is X and so on. “L” is used for “Launch,” when first events
come in and a standard deviation cannot yet be calculated
The lookback period is one month
The result is a table with the columns SKU, ABC and XYZ

abc_xyz_table

This can easily be used in Power BI report showing the number of SKUs per
classification as a toy visualization:

abc_xyz_viz

In summary, the Fabric Data Activator workload lets you quickly setup a real-time monitoring by means of the ABC-XYZ method.
This is just a simplistic starting point. You can easily imagine an alert system where SKUs change to a more volatile classification.

As mentioned in the beginning, this is no judgement of the method as such.
The goal was to check if we can implement such method swiftly in a real-time fashion in the Fabric framework.

Supply Chain Management with Fabric: Optimal Order Quantity

In a former post, I tried to use PyTorch parallel computing on the newsvendor problem. A key takeaway was that for thousands of SKUs and for analyzing hundreds of thousands of order lines distributed high-speed computing comes quickly into play.

For an explanation what the newsvendor problem is, please check the beginning of the post Newsvendor problem. Briefly, if you assume a normal distribution for the demand of a specific SKU, and you have figured the stock-out cost and holding cost for a unit of this SKU, you can estimate the optimal order size for this SKU.

One prominent choice to do this is Microsoft Fabric. Fabric uses Spark and we will use PySpark notebooks in the following.

In our example scenario we use 100.000 order lines, which is about one month’s worth of order lines. In the regarded period there has been demand for 1878 SKUs. As mentioned, the simplified assumption is that the demand per SKU is distributed according to a normal distribution. So, we need the mean and standard deviation for every SKU. In PySpark this can be easily achieved by a “Group By” expression and built in SQL functions:

RL environment

For every SKU we add holding costs and stockout costs per unit. They have been calculated elsewhere and are not subject to this consideration. The holding costs are operational costs for keeping a unit in stock, the stockout costs are the opportunity costs for not selling a unit.

Per SKU we draw 1000 values from a standard normal distribution and transform these values according to the mean and standard deviation per SKU to a sampled demand value.

RL environment

Please note that the cross-join operation is a distributed operation, and we keep the benefits of the Spark setup.

On a side note, we use the same random numbers for every SKU and then transform by mean and variance. This is not an issue, because it is more a question if the sample size (here: 1000) is sufficient and not what the sampled numbers are specifically.

At this stage we have samples of demand per SKU and if our order size does not hit the demand exactly, we have stockout costs for the order size being lower than the actual demand or holding costs for the order size being higher than the actual demand:

RL environment

Now the average per SKU and order size for the total cost is taken and the order size with the minimum average total cost is taken. We can write the result to a lake house delta table:

RL environment

Finally, we can use the notebook in a Fabric data pipeline to schedule an update of the delta table:

RL environment

Key Takeaways

Microsoft Fabric provides an efficient notebook setup with Spark to approach statistical supply chain problems in a scalable manner
Notebooks can be scheduled in a pipeline to automate the creation of optimal order quantities per SKU

Although the newsvendor problem has simplified assumptions, it is always a good entry point for this type of calculations.

How can AI be disruptive in Supply Chain Management?

Silicon Valley is known for bringing disruption to industries or segments by changing traditional business models. As supply chain management is no business model, I was curious if the Valley tackles this process-driven discipline of supply chain management. When reviewing blogs and articles of the past year I did not look like someone was out there for disruption of this discipline. As to my knowledge, the major fields of AI involvement have been:

Process automation: application of working AI methods to in-process optimization
Forecasting, planning, and sensing: using the digitization of the supply chain in the IoT-sense to provide decision support based on AI/ML methods
The advent of digital twins in subareas of the supply chain

This all is hardly disruptive. It leaves organizational structures where they are and provides support for these organizations. It does not introduce new ways of doing processes.

There are significant reasons why this is the case:

Supply chain managers are risk-averse: the old saying is “inventory is your friend.” And although there are myriads of initiatives, technologies, and discussions about how to run a supply chain with less inventory keeping the same service levels, inefficient inventory deployment in the right place will make the supply chain more robust. And supply chain disruption (not meant in the sense of the heading of this article) is very costly for an enterprise

Operations is not a top innovation field: I have seen many technology companies that had great products and systems on the market, but their operations were not close to their product’s technology standard. Again, this is all understandable: the talent and the investment go where money can be earned

However, if we believe in the raison d’être of supply chain management, it is meant to contribute to the competitive advantage of a company.

A company has, apart from launches and innovative products, easily 80% of products that are substitutable. If furthermore, you consider the supply chain process from S&OP to cash received for products, then you are in-sync with the current IBP thinking and you compete with your mid-life products against other supply chains.

In my view, the only way to be disruptive is to go one level of abstraction higher. At least to my knowledge all approaches up to now are a refinement of the old control tower idea:

Better and more real-time data through newer network technologies and IoT to get supply chain events
In memory databases that make real-time analytics possible, and libraries of AI algorithms are used to analyze and predict in the supply chain

The sacred cow of all these approaches is that the actual decisions being made in a supply chain are not touched. There are metrics that measure the success of the decision directly or indirectly, but the ultimate decision is with the human decision maker. As an example, consider value add forecasting: the value of the added forecasting steps in the S&OP process is measured and then corrective actions are taken. The underlying assumption is still decision support: the decisions will be improved by a critical review of forecast value add.

So, no disruption: IBP helps to make the process and decisions better. Let us sketch a potentially disruptive approach:

So, Mr. Silicon Valley buys a mid-sized company. The unusual goal is now not to grow the company and sell it at a higher price, but to bring the company’s supply chain to the next level. The company has characteristics as follows:

It comes from a mature industry: relationships are long-standing, a lot of long-running products, few launches. In short, a sustainable business
The company is not overly innovative: there has been an innovative core some time ago and now this carries the company
It should have a lot of products: the higher the number of SKUs, the higher the chance that manual decisions, even with decision support, will lead to sub-optimal supply chain execution
The company has as standard supply chain execution system and is measuring the usual KPIs and service levels

In summary, it should be a mid-life, mid-size company. Why is this? This company will compete with its supply chain against other companies. There will be no defining moment, like the launch of the super product, that will change the market.

Here is the sketch of the program that Mr. Silicon Valley communicates to the staff:

We want to improve the competitiveness of the company by introducing an AI-driven supply chain. This means that AI algorithms will propose and often make the decisions after having learned sufficiently from the people
All this happens in interaction with the staff: the system’s decision can be overruled. This is just another learning input for the system
As before, everything is measured, and we want to see that the supply chain performance is improving. In addition, we want to see that the overall quality of decision making is improving

The disruptive point is the quality of decision making: our AI agent will have all relevant decision options as choices. This will be the big work: finding where the most relevant decisions in the supply chain are taken, either by systems or the staff.

The transformation is from using AI to support decisions to let AI learn a supply chain decision model.

On a more technical note, the resembles actor-critic methods: the human is still the critic, and the AI agent tries to be the actor.

Such an approach would be a big cultural change for the company. But still humans can overrule. This would change organizations and bring operations to the center of innovations.

Happy to discuss.

Thoughts about the evolution of Supply Chain Management, AI and ML

As AI is revolutionizing the world and bold statements like “Supply chain management will be fundamentally changed by AI” are everywhere, we will reflect and speculate about the future of supply chain management under the realm of AI. If this is of interest, then the evolution of AI is to be regarded as well, to aim at a comprehensive consideration of the subject. There will be a informal consideration of 3 key areas:

State of AI/ML:

A short characterization of where AI stands. This is an informal collection of facts,
that are relevant to getting an idea what the role of AI and machine learning in supply chain management is.

AI/ML in SCM:

A brief review of where AI is and can be used in supply chain management.

Digital Twins:

We are trying to figure out the role of digital twins in supply chain management. At least in the recent discussions, digital twins are supposed to be the upcoming decision support systems, not only for product development, but as well for all networks that give comprehensive feedback about their real-time state.

A short characterization of AI evolution

This is a rough, incomplete description of AI evolution. Just to set the scene for our thinking about SC evolution (the timestamps
are just illustrative, but not precise):

20 years ago:

Computing capacity was expensive and neural networks could not be fed with sufficient data to show great benefits. Traditional machine learning algorithms dominated the field with some satisfactory results, but in general far away from what we are getting today, what concerns non-tabular data like images.

10 years ago:

Compute capacity was affordable. Big data was a buzzword. Neural networks,
took over, being data-hungry, with the right architecture, and very performant for a lot of image recognition task and NLP tasks. The saying was that everything what a human can do, the machine does faster and more reliably.

Now:

The transformer architecture takes over. The replication and combination of conventional wisdom is impressive. Some people are hoping that we are on the brink of universal learning.

So, this is all great. Still, AI is shining, when mastering a specific problem (teach a robot to make a backflip, but it still cannot walk). Large language models need a lot of data and then they excellently inform you about conventional wisdom. If you ask for a standard algorithm, that is everywhere on the web, it just works great. If you ask about a rare problem that has only a few or no web citations, it will be completely lost, or even worse, tell you somthing, which is just not right. To be fair the newer version will say “I don’t know.”, which is OK. This will all improve, but I have a challenging time imagining that the fundamental problem will go away: how should the system know things where it does not have data (interpolation vs. extrapolation). So, surprise me. Even the praised Euclidean proofs fall under this category: this is a well defined strict field, where the evaluation of existing combinations can lead to new knowledge. The new alchemy is now, that the models will create their own data and cascading models will learn from each other. Sounds like the derivative of derivative stuff, that was AAA-rated and led to the financial crisis. Again, surprise me.

If you think about supply chain management, systems can either produce solutions that are widely accepted and just adopted, or systems provide decision support, where the supply chain manager still makes the final decision. When a system proposes an action,
this is similar to reinforcement learning: find the optimal decision to get a maximum reward.

The rough state of reinforcement learning is as follows: impressive results, when playing Atari games. Can be in danger of only
finding local optima. Might need a lot of data and long learning times. Is subject to a lot of tweaks that are supposed to enable quicker learning and learning of at least the best local optimum. These adjustments are (among others): reward shaping, model-based approches or reward learning. Long story short: the narrative is that the agent learns for itself, but in reality you model a lot.

Decision making under uncertainty

If you try to recognize a cat on a picture, the state space is very high-dimensional, but you find decision boundaries that are very clear in most cases. There might be raccoons or blurry pictures, but for a clearly identifiable cat you will get a high score from the model that this is a cat. If you have decisions in the supply chain that are cat-like then you can just automate them, and perhaps leave the supply chain raccoon for human judgement. The rest of the decisions is all different: you get a certain confidence level for decision and this confidences are far away from 0 and 1 (1 impyling full confidence). These systems will suggest a decision with a certain confidence level.

If you check the literature, there is a lot of discussions about probabilities in machine learning. The numbers between 0 and 1 that are produced by sigmoid and softmax, are only proxies for probabilty measures. Alternative approaches just assume a probability distribution and you learn the parameters.

We will examine below how these areas fit into supply chain management.

AI and ML in Supply Chain Management

Supply chain automation:

this is clearly the area where a system is faster and less error-prone in comparison to a human being (e.g. image recognition and classification of products, extraction of information from documents, …).
The technologies for these tasks are available. It is a question of organizational capabilities and economic efficency if the deployment of such technologies makes sense.

Decision support:

in the current discussion, one can get the impression that AI is a Swiss knife, that automatically solves problems that could not be solved before. One should not forget decades of operations research, where solutions, heuristics and approximations for typical business problems have been found. These solutions are excellent benchmarks for ML solutions. In general, all planning problems can be approached with ML techniques, but there might be traditional methods that do equally well or better. And never forget: there are a lot of methods that stem from applied statistics and it is highly debatable if this is machine learning or not. Demand forecasting on every supply chain level is a topic that is a good candidate for big progress: short-term demand sensing is already used and automated forecasting for thousands of SKUs with the right reinforcement learning mechanisms sounds promising.

Supply chain redesign:

the idea is simple: you get the state of your supply chain in real-time, the system uses the information and suggests modifications of the supply chain architecture. E.g. changes in BTO/BTS cross-over, number of warehouses, transportation modes, you name it. This seems far away. Geometric deep learning that exploits symmetries of data goes into that direction when regarding network architectures. This seems to require a more abstract understanding of supply chain architectures in order to uncover attributes that are abstractly transformable. Having said that: if we keep the scope small and consider a clearly defined set of alternatives for a specific characteristic, we can have system support for a decision about an isolated redesign choice. How to re-examine the entire SC network is a whole different story.

Digital Twins

Now we are ready for digital twins. First, we give a definition and then we examine how AI/ML fits in. The working assumption is that
the digital twin is the future central decision support tool for a supply chain, responsible for making decisions from operational, over tactical, to strategic.

Digital twin technology is the process of using real-time data to create a digital representation of a real-world object. In the literature this is distinguished from traditional simulation. The digital twin operates on real-time data, the user can interact with the model and changes can be dynmically incorporated. The idealized picture is, that the real object changes the model and the model changes the object in a continuous improvement cycle.

A network twin models not a single physical object, but a network of objects.

So, digital twins are simulations on steroids (just to be simplistic). Ever tried to simulate a retail warehouse with 10000 SKUs of the most different form factors, transportation modes and demand characteristics? Perhaps it is a central warehouse that delivers to the whole of Europe (EU and other countries), perhaps big retailers have specific packaging requirements, perhaps electronic devices have to be specifically labeled, perhaps you have to run campaigns for launch that make it necessary to bundle certain products. The list is endless.

Even for greater minds than myself, this is an ambitious task. Basically, you start a project to make a netowork twin of the entire warehouse, and you will fail, or you start small: take goods-out only, the fine picking area only and have a controlled start. Let’s not forget that the idea stems from product development, maintenance and improvement, where there is one product that lives in the IoT world. All the product characteristics have been carefully designed. Why would be anything less sufficient for our retail warehouse twin?

Are digital twins a specific AI or Machine Learning topic?

It is a topic of operations management, operations improvement, and operations research. AI/ML are just part of that.
As elaborated above, there might be AI/ML methods that can be used, but the modeling exercise is huge and cannot be covered by AI/ML methods alone. So, the traditional planning algorithm already does the job and in a lot of cases it is not clear that an AI algorithm will get better results. The same holds for traditional forecasting: if you screen the literature, there is no evidence that machine learning performs better. It is seen if transformers with self-attention mechanisms will provide the breakthrough. The modeling exercise is not going away. This has been done over decades. The only criterion is the performance of the model. So, up to now, no revolution. But the next quantum leap in AI will be changing this.

Where is this going?

In my view, the combination of human and machine will be the most promising development. The transformer models already incorporate human choices in their learning (human-in-the-loop). In a supply chain decision process, the approvement or adaptation of a machine-generated plan would be such a signal. In the same spirit, existing models that represent existing knowledge will be increasingly combined with AI models. The digital twin development in the idealized sense, goes in both directions: the real world changes the model, and the model influences the real world. So, this will dove-tail increasingly. AI/ML tackles every business process, and the development is highly dynamic. Exciting times. However, the machines are not taking over. Not yet.
This is all speculation, but sometimes it is fun, just to speculate.

Experiments in RL

This sequence of blogs conducts a few experiments on reinforcement learning. We will use the common practice tools that led to the success.
In agents winning Atari games on a super-human level. As you can find introductions to the methodology to play those games everywhere in the web.
We will use the tools below, but not explain them:

Parallel environments:

The Atari games showed that millions to billions of agent steps are necessary to reach a good learning level. In order to keep the computational time acceptable agents should be trained in parallel on the GPU.

Actor to Critic algorithm:

We use one of the standard algorithms that led to the current improvements in reinforcement learning. As mentioned, tutorials on this algorithm can be found everywhere, e.g. …

The idea of this blog is open-ended. We will apply the methodology to a toy problem and see how far we get. Our toy problem will be sports bets with quotes. And the agents have the freedom to place an amount on a certain bet or not to bet at all. Obviously, the goal is to learn how to obtain a win on average. Wether this is possible, is an open question. For now the following sequence of blogs
is planned:

Part 1: Setup of the parallel environment and a test if parallelism on the GPU gives
the promised compute time benefits.
Part 2: Reward shaping and investigation of improvements.

Hopefully, the first parts lead to further improvement ideas that then will be the subject of subsequent parts.

Short description of agent environments

In the world of Atari Games pre-figured environments are used, and rightfully so: in principle the agent observes the pixel state of the game screen, takes an action (e.g when the agent tries lands on the moon, if is accelerating or breaking (better English to be introduced)). Then the agent, receives a reward from the environment, and so on. The gymnasium module and its successors do all the heavy lifting. And the work is to find an algorithm that lets the agent learn. For our toy model we can program and environment from scratch. We follow the canonical defintion of a python environment class and implement class methods as follows:

RL environment

step:

W have an action as input. The environment keeps track of the current state the agent is in.
With the implemented reward scheme, the environment returns a reward, a done = false status and the next state. This goes on a long as the episode takes: in Atari games this is normally the “game over” state. then done = true is returned. In our toy model we have the possibility to place n randomly chosen bets and then the episode is over.

reset:

This resets the environment to the start state for a new episode. As we will randomly choose matches from our base data, we will just continue to do so, even after an episode has ended.

render:

for a Atari game the rendering is obviously the pixel state (screen) and the score after the taken action. In our model we can print out the reward received and win or loss of the bet.

Implementation of the step function

The record for the sports bet contains the quotes for home win, away win and draw, the propabilities derived from these quotes and the label, which is 1 for a home win and 0 for a draw or a away win. Everything what is not label, we call state. Here, we see already, that the definition of states of the model leaves ample room for definition. In the Atari game example this is self-definitory: the pixel state of the screen defines the state sufficiently.

Home Quote Away Quuote Draw Quote Home Prob Away Prob Draw Prob No Home Win Prob Label
1.57 3.99 4.44 0.57 0.23 0.20 0.43 1.0

We will use agents that will place their actions in parallel. Thus, the step functions will accept a vector of actions. All operations will be vectorized if possible and computed on the GPU.

RL environment

This the simplest environment in this domain: a match with quotes is randomly taken and the reward is just the amount won or lost. Actions can be 0 if no bet is placed, otherwise bets can be placed up to a maximum amount. Here, we see that we have full freedom of shaping the reward function. We will not go into detail about the standard actor critic algorithm that we are going to use. However, an interesting question is, why learning should be working at all in the described case: we choose the sequence of bets randomly and they are definetely completely independent from each other. One of the reasons might be that the methods used are model-free methods. They are not learning transition probabilities between states, rather they figure out policies directly. Now we have a simple environment that lets agents learn in parallel. So, let’s start the agents and see which rewards and returns we are getting.

Key Takeaways from Part 1:

Environments:

We have scetched a simple environment from scratch and understood that reward shaping leaves a lot of freedom to engineer the environment.

States:

With tabular data we can feature-engineer our state space. This is basically a classic feature engineering task. If we calculate the accuracy of a non-RL machine learning model, basically the policy is:  bet if the probability is bigger 0.5 and do not bet if the probability is smaller than 0.5 is applied. Thus, the feature engineering influences the probability and then the policy in this simple manner. So, can we find a state definition that facilitates agent learning.

Next steps:

Part 1: Setup of the parallel environment and a test if parallelism on the GPU gives
the promised compute time benefits.
Part 2: Reward shaping and investigation of improvements.



PyTorch and Supply Chain Management

This sequence of blogs conducts a few experiments on parallel treatment of optimized order quantities. The base question is how much complexity we can incorporate in the optimization process when exploiting parallelism. We start with the simple newsvendor problem. We will apply vectorization and parallel computing on the GPU and check how far this gets us.

The Newsvendor Problem

Here is the defintion of the vanilla newsvendor problem: a newsvendor sells newspapers and faces uncertain daily demand. The newsvendor needs to decide how many newspapers to order each day from the supplier. Key parameters are as follows:

Demand distribution:

The demand is supposed to follow a demand distribution.
We can imagine a demand being sampled from this given distribution

Order Quantity Q:

The decision variable representing the number of newspapers the newsvendor orders from the supplier.

Inventory Level I:

The initial inventory level on any given day, considering the previous day’s leftover newspapers
and the new order quantity.

Shortage Cost Cs:

The cost incurred for each newspaper not sold due to underordering.

Excess Cost Ce

The cost incurred for each excess newspaper that remains unsold due to overordering.

The total cost equation is: \( \text{Total Cost} = C_s \cdot \Pr(D > I) + C_e \cdot \Pr(D < I) \)

The objective is to minimize this cost. The cost is obviously zero if the inventory at hand matches the demand. The optimization must be executed for every SKU. If we assume in our initial simplistic approach that different article demands are not correlated, this task can be parellalized. Before getting into this, we will try out parallelization in general.

Parallel Agents

In this section an abstract scenario will be examined, where use a parallelized RL agent environment in order to check if the parralel and vectorized execution gives the desired computing time advantages. If this test is positive, we will setup an
algorithm that calculates the optimal order quantity for the SKUs in parallel. A standard agent environment interaction is described
everwhere on the web e.g.

Agent & Environment
. In a nutshell, the agent is in a certain state and takes an action. From the environment the agent receives the new state and a reward for the taken action. We call this interaction of the agent with the environment a step. The agent’s goal is to maximize the reward received during its life time.
One of the most prominent examples at the moment lets the agent play an Atari game. Obviously, the goal of the agent is to reach a maximum score during its life time.

We now let agents run in parallel: we do about a million steps in total and a change the number of agents.

Number Agents Total Number Steps Time Elapsed [s]
2 1.048.576 2.385
4 1.048.576 973
8 1.048.576 522
16 1.048.576 261

The quick check gives the desired result: doubling the numer of agents roughly halves the compute time.

Vectorization and Parallelization of the Newsvendor Problem

As we want to focus on the parallelization and vectorization of the problem, we assume a normal distribution of every SKU with different means and standard deviations. This is all a simplistic modeling. The main purpose is to verify if the parallel calculation per SKU brings us the desired benefit. More complicated models will be then treated in subsequent blogs. In the same spirit we assume that we do not carry over any inventory to the next period. Thus, we want to find the order quantity per SKU that matches the demand best.

RL environment

When doing this parallel approach, we want maximized speed, but have pay with space. Basically, we build up tensors that comprise the dimensions (number_skus, number_samples, number_posssible_order_quantities). When testing on a laptop with
a standard GPU, the space capacity on the GPU is quickly exhausted. So, we just check the time advantage of parallel approach with a small model with 500 SKUs, 100 samples and 1000 order quantity possibilities. Measuring the time we have less than 1 second computing time vs. 80 seconds computing time.

We know that parallel computing starts to shine with massive data and that the advantage will more accentuated when running bigger models. If we go to 2000 SKUs ceteris paribus, we reach the free limits of Google Colab GPU capacity. The computing time ratio is now a little more than one second vs. 420 seconds for the sequential CPU setup.

Without any real proof, we checked by experiment that the time benefit scales with the number of SKUs n, this is: t_seq = n * c * t_parallel. With some constant c. This will presumably hold as well for the other dimensions sample size and number of order possibilities.

How is this in the real world?

If we look at the requirements and architecture of top class advancecd supply chain planning systems like Example of state of the art SC planning software, you will recognize that the planning is dictated by the clockspeed of the sales and operations planning process: if planning results on a certain level are adjusted, let’s say daily, you run your computations over night and there is no need to have a super fast calculation. If you imagine a digital twin, where the scenario result should be rendered in seconds, this might be different. In the above mentioned, industry-grade scalable systems, the parallel and vectorized computing is done on distributed systems if necessary. Technologies like MapReduce come into play. Functions that one uses have to be parallelized. In the above code we use the Kronecker Product for tensors. This is a good example where you have to custom-design a function for computation on distributed GPUs.

Key Takeaways

Parallel computing brings the desired benefit, at the cost of GPU space:

The parallel computed PyTorch tensor setup gives great computing time, but needs GPU space.

All PyTorch methods are available for SCM :

as soon as we know that the PyTorch tensor approach is feasible,
we can try out the entire tool set of the framework. As mentioned, the purpose of all this is to try out these concepts in the
specific supply chain domain. We are ready to combine AI models with SC models.

Distributed computing might be necessary :

For the time critical computation of larger models the single GPU approach is not sufficient. Distributed computing will be necessary.

This concludes part 1 of the investigation how to use PyTorch to run supply chain models. In the subsequent parts we will try more realistic models, other than the newsvendor model and tackle distributed systems.
I would like to thank Patrick Westkamp, CEO and founder of businessoutcome and his team for the fruitful discussions about this topic.



Managing Inventory Risk with Power BI / DAX

Petra was getting tired. She was running a parasol production company on high leverage. Practically, all the capital to finance her enterprise came from the Rip Off Brothers Bank.

During the first years, there was no problem: she presented her forecast for the summer sales to the bank, and they checked the business plans from previous years. If the numbers were not totally off as compared to last years, they agreed to the necessary credit for the fiscal year.

Two years ago, everything changed:

there was this nerd guy Bob who asked questions like: what is the probability that your forecast is 20% too high? What is the money risk when this happens?

Petra tried to argue from her experience. Bob was never satisfied with her answers. He wanted to see some mathematical framework to assess the situation. The relationship got that bad that he wanted to have a corrected forecast after two months of the four months sales period.

This ended in unpleasant discussions, whether Petras guesses about her own business were right. Furthermore, the work to compile the mid-season forecast was heavy lifting.

Now, the business plan for the next fiscal year was due and she was not looking forward to the meeting in the bank. She asked her friend Sarah to go out for a drink. Sarah was a business analyst and programmer. Naturally, they talked about Petras business situation:

The sales season went from May to August and amounted for 95 % of the sales. The business idea for the parasols was to offer a selected number of styles and sizes that changed every year. The reason being that the standard parasol business with production in China was already covering the market. Petra was trying to play in a niche with stylish products, that should meet next summer’s taste.

It goes without saying, this was all push production: the planning took from autumn to spring. Then the main components were procured, the umbrellas were printed. Finally, the umbrellas were assembled, packed, and sent to selected retail stores in the entire country. The main components were procured in the Far East with a typical sea route lead time.

A postponement strategy for the printing was no option as the total sales figures were too low. So, she had one shot for the production figures per year. If she was too careful, she missed sales. If she was overoptimistic, she could only sell the overproduction at a salvage price through a B channel at a loss.

So, Bobs concerns were not completely unsubstantiated. However, Petra had no clue how to satisfy Bob. “Sarah, I will need this fancy AI stuff to get along!”.

Sarah said “I think Bob knows, that a forecast is always wrong, and he wants to have a quantification of the probabilities, how far the forecast will be off. Furthermore, he wants to see a framework, how these figures are derived. As the creation of the forecast is time consuming, you would need something automated that helps you create the forecast quickly.”

Petra and Sarah agreed that Sarah would meet Bob to understand what he really wanted. Here is what Bob told her: “To be perfectly honest, I don’t care about Petras forecast. More often than not it was completely off. I want to see a measurement of the downside risk of the business. So, I would need the company profit that would realize with a probability of 80 %. And I would have to understand how the figure was derived.”

Before we look at Sarahs solution, we do a little side trip how to generate a time series in Power BI.

Time Series Generation in Power BI

There are endless ways to make up time series within Power BI. The described way is using some assumptions that seem reasonable in this scenario. Then some randomness is added, and we get sales figures with a certain trend and a certain degree of randomness.

In our scenario we have 6 products with a sales price, total costs allocated per product and a salvage price below costs when products are sold in the B channel after season.

ID ProductName SalesPrice Costs SalvagePrice
1 Pink Rabbit 300 200 100
2 Yellow Submarine 200 120 70
3 Green Zoo 180 130 60
4 Purple Salamander 90 30 25
5 Black Widow 110 70 40
6 Blue Velvet 230 180 100

Every product gets a monthly base line quantity and an assumption of YoY growth.

ProductID BaseLine YoYGrowth MonthlyGrowth
1 50000 0.05 1.004
2 20000 -0.02 0.998
3 16000 0.02 1.001
4 13000 -0.01 0.999
5 9000 0 1
6 11000 -0.03 0.997
Product MaxTimeIndex Forecast StandardError
Pink Rabbit 48 60813 7802
Yellow Submarine 48 19257 2805
Green Zoo 48 17117 2548
Purple Salamander 48 12327 1917
Black Widow 48 8997 1303
Blue Velvet 48 94774 1499

The year-on-year growth is translated in a monthly growth assumption by means of a calculated column:

Inventory Growth DAX

If we assume that we have an already defined standard date table, we take the first day of every month of the relevant season months.

Relevant Season Dates DAX

We do a cross join of the relevant dates with the products to get all relevant combinations.

Cross Join Product Season Date DAX

We have now a record for every product in every season month and can again create a calculated column for the sales figure.

Synthetic Sales Figures DAX

Here, arbitrary randomness of plus/minus 25% is applied to the month-on-month growth. We get some random looking sales figures per product. As mentioned, there are countless ways to create this kind of data.

Synthetic Sales Figures Power BI Visual

Sarahs Solution

When Sarah looked at the data a certain trend, but this was heavily overlayed by fluctuations from month to month. It could easily be the case that for one year there were so strong fluctuations that the long-term trend was meaningless and the yearly sales per product had nothing to do with the trend.

Therefore, she decided not to do classical time series forecasting but rather use linear regression by product to get the most likely number of sales units by product. Furthermore, she wanted a simple model that Bob would understand, rather than a black box that produced a result, which could be hard to explain why.

Using linear regression is based on a lot of assumptions that might not completely hold. As mentioned, Sarah wanted an easy model that can be communicated, and the forecast is always wrong. Even the best model is based on assumptions and in all real-life scenarios these assumptions change over time. Forecasting is a special science in its own right and not the subject of this text.

The idea was to generate a baseline forecast per product and then do scenario considerations for the likelihood of the profits.

Rolling Yearly Sales Table

The base table had the beginning of the month as date and a date index and rolling four months sales units. This was a better choice as this was an aggregated figure as compared to the monthly sales figure and it was a yearly sales figure.

The forecast was done by linear regression by product with the assumption that the next time index as the x coordinate of our forecast value.

Forecast with linear Regression Table

Sarah used the DAX LINESTX command to create the forecast and the error.

Forecast with linear Regression DAX Forecast Error DAX

We just state the fact that with holding assumptions for linear regression, the forecast value is the mean of a normal distribution, and the standard error is the standard deviation of this distribution.

Sarah was happy so far: the forecast values stemmed from a clear methodology and could be easily explained to Bob.

However, Bob wanted the probability of the profit. After some thought Sarah went for the next big assumption in her model: the forecasts of the different products are independent of each other. This was presumably wrong, but again a simple model was the preferred option.

Her idea was to randomly draw a value from the normal distribution per product and interpret this as the demand that would realize. With one draw for each product the total profit could be calculated. If this was done a lot of times, the result would be a probability distribution for the profit. And then it would be easy to find e.g., the profit that would realize with a probability of 80%.

Monte Carlo DAX 1

In the example, the drawing from the normal distribution will be done 10000 times per product. Indices are created from 1 to 10000 per product. For every line the relevant product and forecast data are linked to the product name.

Monte Carlo Table

The random value that is calculated per line is the crucial point of the method. With the inverse of the normal distribution function a random value for the actual demand is created.

Random Gauss Value DAX

And this leads to random profits taking into account the salvage price for unsold products in the B channel.

Random Profit DAX

If we sum over all products per index, we get a total random profit.

Profit Distribution DAX

The distribution of profits can be visualized.

Profit Distribution Visual

Sarah wanted the question of the profit with 80 % probability graphically. She calculated the probability per profit bin and added column for the 80 % threshold.

Profit Bin Table Probability of Profit DAX

Now, the intersection of with the threshold showed the 80% probable profit.

Probability of Profit Visual

There would be a profit of 200.000 with 80% probability.

When Bob saw the presentation of the results, he was satisfied. “I did not think that I ever saw a Monte Carlo simulation in this work.”. He saw immediately that the risk of huge losses in the business was not big. He agreed to the necessary credit line for the business.

Key Takeaways

Use simple models to start:
If a simple model, like the linear regression here, does not give you results that that correspond to your business understanding, you have to dig into your business understanding or your model assumptions. A more sophisticated model can be more accurate, but will not change the fundamental drivers of your business.
Add randomness to your analysis:
How confident are you of a figure? This questions always arises. E.g. two gauss distrbutions with the same expectation value but drastically different standard deviations tell you a lot about the quality of your expectation value. The Vertipaq engine is very fast and it is absolutely feasible to create a lot of random numbers.

Multi-parent Tree-Structures in Business with Power BI / DAX

Bob was not in the best mood. He felt this was the low point of his career. The last meeting with the CEO Bill was bad.

For 4 months he was now head of controlling at Lost Bonus Agreements Corp. and this was the first time Bill was yelling at him. “We have simplified all our bonus agreements I have no f… clue, what our total bonus spend will be!”.

The fiscal year end was 1 month away and indeed the bonus situation were unclear. A year ago, the company had drastically simplified the bonus agreements and only the revenue of the fiscal year is the basis for the gratification. Then there is an individual percentage applied to this revenue and voilà the amount is calculated. What seemed so easy had a pitfall: in the bonus discussions completely individual revenue base lines have been agreed.

During the fiscal year two types of information were available:

  • The bonus relevant entities per employee stemming from the bonus sheets.
  • One big list containing the profit center, the child organization, and the turnover forecast for the fiscal year.

ID Profit Center Children Turnover
1 NA Sales NA Operations 6000
2 NA Corporate NA Operations 500
3 NA Operations Toys,Bikes 2000
4 Toys Puppets,Games 10000
5 Bikes 7000
6 Games 3000
7 Puppets 5000
8 APAC Sales Appliances,Singapore Operations
9 Appliances 5000
10 Singapore Operations 10000

Bob was lost with these sheets. He had now 5000 bonus agreements and one big sheet with all business divisions and their children and no clue how to get the total bonus relevant revenue sum. So, Bob called his favorite business analyst Ken (just returning from the vacation with his wife Barbie) for help.

The Solution

Ken received all documents and below we will describe Ken’s thought process: Ken opened his Power BI Desktop and stared at the ceiling of his window-less office.

Ken made several observations:

  • The above table represents a multi-parent tree structure. E.g., if an employee had NA Sales and NA Corporate on the bonus sheet, they both had the direct child NA Operations. So, this was a multi-parent structure in the graph.
  • It would have been easy if the relevant legal entity descendants had been extracted per employee. But there was only on big sheet with all parent child relationship of the legal entities. It was out of the question to isolate the part of the legal entity sheet being relevant per employee for 5000 employees.
  • The build in PATH function did not support several parents for one child.
  • DAX is not recursive. This means that the implementation of graph exploration strategies starting with only e.g., an adjacency list was not possible in DAX.

He identified three fundamental problems with the data input he had received:

  1. Revenue of the organization on the bonus sheet: the bonus sheet showed only the top organizations. The ERP system did not aggregate the revenue per legal entity before the end of the fiscal year based on the forecast for an entire fiscal year. So, the revenue of all descendants would have to be summed up manually.
  2. Double counting of profit centers: it was possible that the revenue of a profit center was assigned to both, sales organization A and sales organization B. If an employee had a b bonus agreement comprising sales organization A and B and both had a profit center as descendant, it would be counted double when separately regarding sales organization A and B.
  3. Redundant information: as mentioned above the parent child relationships of all legal entities have been given in one big sheet. So relevant descendants with their company IDs could be found, but a lot of entries were not relevant for the bonus agreement in scope.

Before we show Kens solution and implementation, we should address the fundamental algorithmic question. Why is this a graph and why do we need recursiveness in our programming language?

If we look at the turnover sheet above, NA Sales and NA Corporate are not children of any other profit center and therefore roots of our tree (APAC Sales is root, too. We come to this later.). If we then draw all parent child relationships, we come to the directed graph below:

Directed graph for bonus agreement 1

Let’s assume that NA Sales and NA Corporate are explicitly mentioned on the bonus sheet. Then these two vertices are the rightful roots of the graph; no parents of these vertices are relevant for the turnover sum to be calculated. Furthermore, we remember the fact that all relationships of all entities are in one big sheet. So, the above table is only an extract of the entire sheet. The relevant roots are marked, and our program should find all relevant profit centers automatically. If we say that all the above entities are connected, then the rest of the big sheet only contains entities that are not connected to the two roots above. This exactly the case for the tree with APAC Sales as root:

Directed graph for bonus agreement 2

Again, we assume that the rest of the big sheet is not connected to the above tree. In summary, or toy example contains two disjoint trees, and we know that the above tree represents a bonus sheet with only NA Sales explicitly written on it. The recursiveness of the problem is given by the fact that in automated solution we have no clue how many descendants our roots have and how many generations of descendants are connected to that root. In terms of our practical example this means that we are not isolating the connected entries of the big sheet for one bonus agreement, and we do not draw the hierarchy beforehand like above.

Implementation

At the core of the implementation is a standard breadth-first search algorithm (BFS):

Breadth-First-Search in Power Query

The algorithm takes an adjacency list, a queue and a visited list as input:

  • The adjacency list is created from the big sheet with all entities. The relevant part for the first tree from above is {{1,{3}},{2,{3}},{3,{4,5}},{4,{7}},{5,{}},{6,{4}},{7,{}}, {8,{9,10}, {9, {}},{10,{}}}.
  • The queue consists of the regarded roots of the trees. In our first example these are 1 and 2 that represent the entries on the bonus sheet
  • The visited is list is empty. The target is to visit all vertices of the graph that are connected to the roots.

As output the visited list will be filled with the visited vertices during the recursion. It is worth mentioning again that although NA Operations has two parents (NA Sales and NA Corporate) NA Operations will be counted only once as intended. In summary, with the right roots as input the algorithm will output all vertices relevant for the specific bonus sheet.

Power Query Tree Search Result

For bonus agreement 1 we get the right result:

Vertex
6
7
5
4
3
2
1

Now we can go over to DAX. All recursive work has been done and we only must match the turnovers of the vertices and are done for this specific bonus agreement.

DAX Entity Column DAX Turnover Column

This is a view of the end result:

Vertex Entity Turnover
6 Games 3000
7 Puppets 5000
5 Bikes 7000
4 Toys 10000
3 NA Operations 2000
2 NA Corporate 500
1 NA Sales 6000
Total 33500

Key Takeaways

Power Query is recursive and DAX is not:
if possible shift your recursive work to Power Query and pass the result to DAX for further work.
The graph search algorithm used above is a basic form of AI:
you do not know the depth and descendants of your graph beforehand and let the computer figure it out.
Cookie Consent with Real Cookie Banner