As has been noted Bubble used classical regression to develop their new pricing model. This methodology is invalid when applied to multiplicative processes, like computation resource consumption, as computation statistics violate core assumptions about centrality, normality, and homoscedasticity. But what is a multiplicative process in the context of computing? To illustrate this I’ll describe a simple coin flipping game that represents the increase or decrease in compute resources:
Imagine you are running a server and you are counting operations per second being computed. Each process can randomly lead to more processes occurring. The system follows these rules:
- At the start your server computes one operation per second.
- For the next step you flip a coin.
- Heads you double the operations per seconds.
- Tails you halve the operations per seconds.
- You iterate this over an hour, 3600 seconds.
- Then you add up the total amount of operations you ran by summing the operations per second over all the seconds.
Now it turns out this process leads to an obscure piece of mathematics that I happen to have studied deeply in graduate school called Integrated Geometric Brownian Motion. This is very important because the key feature of IGBM is that the average total resource consumption always increases exponentially in time; regardless of the weights of the coin and how much you multiplicative increase or decrease the rate of computation. As long as there is a chance, no matter how small, that the rate of operations per second can increase, the whole thing will on average get out of hand exponentially.
What is worse is that the distribution is fat-tailed meaning there will be processes that will get very, very, very large even if most of the processes stay within some reasonable limits. This also means that regression models, which are intrinsically linear or additive will badly underestimate the probability of these large events occurring. Basically the model they used assumes, in the use of the normal distribution, that large events do not happen that often. So, under the assumption of normality it does fairly assign a high cost to those events. Unfortunately, reality begs to differ and those large events happen much more frequently than the means and standard deviation of the regression model would indicate.
There are other dead giveaways that there are some serious power laws at work. Not the least of which is the extraordinarily large number of zombie applications, or applications that are nearly empty, in comparison to applications that have a functioning subscriber base.