January 28, 2023

Supply and Demand Model for Bitcoin Price

In this article, the author tries to simulate the price of bitcoin as a function of supply and demand. He explores the possibility of using a double logarithmic time model for supply and demand and rejects it because of serious heteroskedasticity. Then using the program auto.arima, he finds a fairly productive modelautoregressive integrated moving average. After that, he uses the supply and demand forecast built using ARIMA to model the future price of bitcoin, taking into account the fact that his supply volume is known.


  • Minimum required preliminary reading:
    • https://bitnovosti.com/2019/04/03/modelirovanie-tseny-bitkojna-ishodya-iz-ego-defitsitnosti/
    • https://bitnovosti.com/2019/09/27/falsifitsirovanie-koeffitsienta-stock-to-flow-kak-modeli-stoimosti-bitkojna/
    • https://ru.wikipedia.org/wiki/Demand Law
  • The analysis was performed using Stata 14 and R 3.4.4
  • Not a financial recommendation.
  • All models are incorrect, but some of them are useful.


Stock to Flow ratio, the designation of which is here we will reduce for convenience to St / f, as has been proved (1, 2), is a false predictor of the price of bitcoin.

Common criticism of price modeling with St / f lies in the fact that this does not take into account the influence of demand. After all, price is a function of supply volume. and demand. IN St / f supply is simulated, but how do models built on the basis of this metric respond to changes in demand?

In this article, we will take for absolutetruth two ideas, even though they may not be. We will call these truths axioms, and we will formulate these axioms so as to establish some basis on the basis of which it will be possible to develop a model of supply and demand.


Traditionally, the calculated value of the statistical parameter is indicated by a “header” above the symbol. Here we will use [] instead; estimated value β = [β]. The 2 × 2 matrix will be represented as [r1c1, r1c2 r2c1, r2c2], etc. To denote indexed elements we will use the @ symbol - for example, for the 10th position in the vector X, X is usually used with the subscript 10. Instead, we will write [email protected]

Axiom 1: Price is a function of supply and demand

Quoting the English-language Wiki,

In microeconomics, supply and demand areeconomic model of market pricing. It is postulated that, ceteris paribus, in the competitive market, the unit price of a certain product or other object of trade, such as labor or liquid financial assets, will change until it reaches the point at which demand (for the current price) is equal to supply ( at the current price) and the economic balance will not be established.

Assume that: price (P) = demand (D) / sentence (S) An increase in supply with a constant level of demand leads to a decrease in prices. Demand growth D at a constant level of supply leads to higher prices.

Here we determine the increase in the number of assets (flow in the coefficient St / f) as a monthly increase to avoid confusion withlong-term effects. Now, let's assume that the supply side of Bitcoin is modeled from the reverse with respect to scarcity (i.e., from abundance), i.e. S = 1 / St / F + ε = F / St + ε, Where ε - this is some arbitrary error. Our equation for price in this case will look like this: P= D/ (F / St + ε). Suppose also that demand D is also some derivative. And suppose that ε is an independent and randomly distributed value, with an arithmetic mean of about 0.1, and therefore it can (so far) be ignored in the model.

Then we get that P = D / (F / St), from which it follows that D = PF / St.

Axiom 2: Demand is a function of time t

We add the condition that demand is modeled by some function of time f (t) = D = βt.

Regular Least Squares Regression (OLS) -This is a method of finding a linear relationship between two or more variables. To begin, let's define a linear model as some function X, which is equal to Y with some error:

Y = βX + ε

where Y is a dependent variable, X is an independent variable, ε Is the magnitude of the error, and β - multiplier X. OLS task is to print the value β so as to minimize ε.

In order to derive a reliable calculated value [β], it is necessary to observe some basic conditions:

  1. The presence of a linear relationship between dependent and independent variables
  2. Homoskedasticity (i.e. constant dispersion) of errors
  3. The average value of the error distribution is usually zero
  4. Lack of autocorrelation of errors (that is, they do not correlate with the sequence of errors taken with a time shift)

Now we can calculate [D] using the least squares model [D] = [β] t + ε.


First, take a look at the unaltered scatterplot for the relationship of demand to time.

Figure 1. The relation of demand to time looks as potentially exponential.

In Figure 1 we see a familiar pattern of exponential growth. For such cases, as a rule, the double logarithmic model is well suited (Figure 2).

Figure 2. A straight line in a double logarithmic plot indicates a good match for the exponential ratio.

Figure 3. Bootstrap double logarithmic regression (using robust estimation for variance)

As shown in Figure 3, our calculated value is log ([D]) = 3.98log (t) -16, from which we can conclude that for each 10% increase over time, we expect an increase in demand by 46% (e.g. 1.10 ^ 3.98 = 1.46)

Figure 4. Estimated values ​​(green line) and actual logarithmic demand values ​​(red line).

Using this model, we can now determine the residuals [ε] and calculated values ​​[Y], and also check compliance with other conditions.


Subject to the condition for the constancy of variance inthe error value (i.e., homoskedasticity), the error for each value of the predicted value fluctuates arbitrarily around zero. Therefore, the graph of the ratio of residual value to estimated (Fig. 5) is a simple and effective way to graphically verify this condition. In Fig. 5, we observe a specific pattern, rather than random scattering, which indicates a significant variability in the variance in the error (i.e., heteroskedasticity).

Figure 5. Graph of the ratio of residual value to estimated. The presence of a pattern indicates a possible problem.

A consequence of such heteroskedasticity is a much larger dispersion and, consequently, lower accuracy of the calculated coefficient values ​​[β]. In addition, it leads to an exaggerated significance of p-values, since the OLS method does not reveal increased variance. Therefore, to calculate t- and F-values, we use an underestimated dispersion value, leading to a higher (unreliable) significance. It also affects the 95% confidence interval for [β], which is also a function of variance(through standard error). To try to improve this situation, we used a robust “sandwich” assessment (Huber's estimate) to determine the variance and bootstrap (this is a form of re-sampling) of the regression. However, these results indicate that even after all the adjustments made, we still cannot trust the results of this capture of the usual least squares. We can say that every OLS-model of time and price is subject to this problem (as, for example, given here). Therefore, instead of it, we will explore another, more appropriate time model - the ARIMA model.


More appropriate than simple time regressionor its variations, ARIMA is a method developed to model changes in time series over time. ARIMA stands for Auto Regressive Integrated Moving Averages, which translates to "autoregressive integrated moving averages." This method includes a whole class of models that explain time series based on their own past values ​​- such as lags or delayed forecast errors. Any time series showing a certain pattern and not being random white noise can be simulated using the ARIMA model (or its modified version).

The basic ARIMA models are defined by three members: p, d, q,


  • p is the autoregressive order (AR),
  • q is the order of the moving average (MA) and
  • d is the number of differentiations required to make the time series stationary (I)

Using R program auto.arima from the forecast package allows you to chooseARIMA-model that meets the Akaike information criterion - the program goes through various combinations of p, q and d and finds the best match. Here we can see that the program chose the order of autoregression 3, the order of the moving average 1 and the order of integration 2 (interestingly, auto.arima uses the KPSS test, which is already familiar to those who read the article on factor falsification, to determine the optimal integration order St / f)

Figure 6. R auto.arima program automatically selected the best parameters for ARIMA.

In Figure 6, we determined the coefficients for ARIMA.

Figure 7. The degree of statistics compliance for the model shown in Figure 6.

Now, watching the square root ofThe standard error of the model (RMSE) in Figure 7, we expect the formation of a small difference between projected and actual demand. The graph in Figure 8 clearly shows that this model estimates historical demand much more accurately than OLS.

Figure 8. The calculated ARIMA values ​​for the logarithmic demand values. The level of compliance is much higher than when using the OLS method.

The process of creating a dynamic forecast from ARIMAit is difficult to express here in the form of formulas, but if you are interested in studying this question in full detail, then take the time and familiarize yourself with these works: 1, 2 (English).

Figure 9. ARIMA forecast for the logarithmic demand values ​​for the next 2 years.

How does our demand forecast look on a linear scale?

Figure 10 - Forecast of supply and demand from ARIMA in linear space.

Connection models

Figure 11. Demand and supply depending on the price.

Now we can combine our forecast data and the expected values ​​of stocks and growth (amount of asset) to calculate the forecast price.

Earlier, we established that P = D / (F / St) see axiom 1.

We know what they will be (with a slightmargin of error) growth and stocks over time, so we can combine these numbers with the demand forecast obtained using ARIMA. The result is shown in Figure 12.

Figure 12. The actual price is marked with dots, the price of the model with a line. As you can see on the chart, according to this model, we can expect that the price of bitcoin will exceed $ 100,000 by the end of 2021.


We presented a simple and relatively concisea supply and demand model for the price of Bitcoin, in which supply is modeled on the basis of abundance (that is, from the opposite to the deficit, determined through the ratio of stocks to growth). This basic model has the potential to expand, in particular by studying demand models based on variables other than time.


  • This forecast relies heavily onARIMA. ARIMA calculations may turn out to be incorrect - forecast models often turn out to be erroneous. They are nothing more than a way to simplify reality in such a way as to help us better understand it. In this article, we tried to simulate the price of bitcoin as a derivative of supply and demand.
  • We did not perform any diagnostic tests.to verify ARIMA, and how much to trust the presented results, remains entirely at the discretion of the reader. Our goal was only to find a way to model price as a function of supply and demand, and not find the best supply and demand model. This task remains as an exercise for the inquisitive reader.
  • In addition, our second axiom may well be wrong. Time can be a good substitute for a true acceptance curve, but it is unlikely that demand alone could be explained to them.
  • The whole idea that price is justfunction of supply and demand (i.e., axiom 1), it is likely that it does not fully describe the real state of things. There may well be feedback loops and other structural relationships (as well as emotionality of consumption, etc.) that are not taken into account in this simple equation. Stay tuned for further development and research into potential structural relationships.