data:image/s3,"s3://crabby-images/98d49/98d4978a397d8739d1967b2b27fe4396f65fb8a0" alt=""
Projecting simple moving averages (SMAs) into future forecast periods can be challenging when using dynamic array framework in Microsoft Excel and Zoho Sheet. In this blog, we take you through two approaches. The first approach is effective when we need the entire vector from periods 1 to T, while the second approach is more effective when you need to directly obtain the Tth forecast.
SMA is a widely used technique for forecasting
The Simple Moving Average (SMA) is a commonly used method for forecasting independent variables. It is computed as the average of the most recent N observations, where N represents the window size.
For example, when an analyst aims to forecast margins for a year, they might use the average margins of the previous N periods, such as 5 years.
When extending forecasts beyond the first forecast period, SMA estimates are often calculated iteratively. For example, if the analyst selects a window size of N, here is how the second and third forecast values are calculated:
The second forecasted value is calculated as the simple average of the last four actual observations and the first forecasted value.
The third forecasted value is calculated as the simple average of the last three actual observations and the second forecasted value.
Once the forecast window surpasses the moving average window (in our example, starting from the 6th forecasted period), all estimates will equal the average of the estimated values for the previous N periods.
This calculation is very simple using the scalar iterative (i..e legacy) approach in MS Excel. For instance, assuming that the actual data is in cells D19:D23, then the user would type the following formula in cell D24.
=Average(D19:23)
It would then be copy pasted across the subsequent rows, as shown in the illustrative image, below. More proficient users may use Averageifs or Dynamic ranges to make the formula adapt to different values of N.
data:image/s3,"s3://crabby-images/99a75/99a751e70cc963a7f76de4aaaee757f4d1cb8600" alt=""
Modeling SMA in Dynamic Array is challenging
This straightforward SMA calculation is quite complex to model as a dynamic array. Unlike the traditional method, estimates from the previous periods are not stored, making it impossible to use a legacy-like approach. Functions such as Scan() and Reduce() can only retain the value from the previous one iteration, while this calculation requires us to retain the past N iterations.
One alternative is to stack all moving averages using VStack or HStack inside a reduce function. This approach is workable for small forecast horizon.
The SMA Lambda would be as follows:
SMAForecast
=LAMBDA(Data, Window, T,
LET(
Vals, IF(ROWS(Data) < Window,
EXPAND(Data, Window, 1, AVERAGE(Data)),
TAKE(Data, -Window)),
Array, REDUCE(Vals, SEQUENCE(T),
LAMBDA(a, b,
VSTACK(a, AVERAGE(TAKE(a, -Window)))
)
),
DROP(Array, Window)
)
)
In this approach, we stack all the observations, including the initial emprical observations together. Finally we drop the initial observation to get propagated moving averages.
This approach is fairly efficient. It, however, produces an entire array.
At times, we may need to use moving average inside a table. Unfortunately, by design, we cannot have an array output in a table. We would need to use Index or Chooserow function to pull out the specific value fromt he above array to have a scalar output. However, generating the full array in every row to pick up only one observation, is inefficient.
Thus, in this paper, we explore a closed form approach using Matrix exponentiation to arrive directly at the estimate for Tth variable. It significantly reduces the number of iterations required to achieve the results.
Mathematical foundation for a closed-form approach
Our approach relies on the fact that the SMA for all the forecasted periods are essentially a derivative of the same emprical data set. However, the influence of each of data changes for different forecast Horizon.
For instance, if we are computing a 5 period moving average, then all the five emprical data points will have a weightage of 1/5 or 0.2 for the first forecast.
For computing the second forecast, we would drop the first emprical observation. Thus, it loses the weigth while the first forecast would gain that weight as it is introduced in the calculation. Giving 0.2 weight to first forecast essentially means that we are increasing the weight of each emprical observation by 0.2*0.2.
Table 1 shows how the weight of emprical observation propogates across estimates.
Table 1: Weight propagation of emprical data across forecast horizon
Observation | 1st Est. | 2nd Est. | 3rd Est. | 4th Est. | 5th Est. |
1 | 0.20 | 0.04 | 0.05 | 0.06 | 0.07 |
2 | 0.20 | 0.24 | 0.09 | 0.11 | 0.13 |
3 | 0.20 | 0.24 | 0.29 | 0.15 | 0.17 |
4 | 0.20 | 0.24 | 0.29 | 0.35 | 0.21 |
5 | 0.20 | 0.24 | 0.29 | 0.35 | 0.41 |
The shift in weights can be modeled as a matrix with the following equation:
.
The above equation gives the weight for the first window and it can be used when the forecast horizon is less than or equal to the window used for the moving average.
If the forecast horzion exceeds the window size, then the weight for subsequent window needs to be computed using Matrix exponentiation as follows:
Once we have the weight matrix, we can extract the weight vector for a given period T by extracting the relevant column from the WeightMatrix
The relevant column C would be
c = Mod(T-1,N)+1
The Lambda and accompanying comments are provided below. You can copy and paste it into the Advanced Formula Environment. If you do not have access to it, a version without comments is available further down. You can copy and paste it in Name manager
//This Lambda can be used to compute simple moving average across multiple forecast periods.
//The Lambda requires following arguments:
//Data: Emprical data that will be used for calculating moving average
//Window: Represents the number of lagged periods that needs to be used for calculating moving average
//T represent the forecast period. If you need forecast for 10 years, T will be equal to 10.
//IsGetSeries is an optional parameter. If it is omitted or is zero, the function will return the Tth value. If it is set to TRUE, then the function will return values for all periods from 1 to T
SMAforecast
= LAMBDA(Data, Window, T, [IsGetSeries],
Let(
//Assign values to variables to simplify formulas
N, Window,
Wgt, 1/N,
Fctr, 1 + Wgt,
//Generate sequence number from 1 to Windows size
P, Sequence(N),
P_T, TRANSPOSE(P),
//Adjusting the size of data if it is not same as window size
//If data size is smaller we'll pad it with averge of existing data
//If data size is larger, we'll drop the earlier values
VVector, IF(N > ROWS(Data),
EXPAND(Data, N, , AVERAGE(Data)),
TAKE(Data, -N)
),
//Compute the initial weight matrix
UUMatrix, (P_T > P) * 1,
Distance, P_T - P,
WMtrx, Wgt * (Fctr ^ (P_T - 1)
- Fctr ^ (Distance - 1) * UUMatrix),
//Create function to get value for a specific T
Func, LAMBDA(X,
LET(
//Get the number of windows in T
Block, ROUNDUP(X / N, 0),
//Perform Matrix exponentiation
Mtrx, REDUCE(WMtrx, SEQUENCE(Block),
LAMBDA(a, b,
IF(b = 1,
a,
MMULT(WMtrx, a)
)
)
),
//Identify and extract the relevant column vector
Col, MOD(X - 1, N) + 1,
WVector, CHOOSECOLS(Mtrx, Col),
//Compute moving average for the given T
SUM(WVector * VVector)
)
),
//Get all time periods for which the function needs to work
SqT, IF(IsGetSeries, SEQUENCE(T), T),
//Apply the moving average function for the relvant periods
Return, MAP(SqT, Func),
Return
)
);
For this without AFE, go to Name Manger in the Formula tab. Click on New and give the name as SMAForecast and then in the Refers to section, copy and paste the following code.
= LAMBDA(Data, Window, T, [IsGetSeries],
Let(N, Window,
P, Sequence(N,1),
P_T, TRANSPOSE(P),
Wgt, 1/N,
Fctr, 1 + Wgt,
VVector, IF(N > ROWS(Data),
EXPAND(Data, N, , AVERAGE(Data)),
TAKE(Data, -N)
),
UUMatrix, (P_T > P) * 1,
Distance, P_T - P,
WMtrx, Wgt * (Fctr ^ (P_T - 1)
- Fctr ^ (Distance - 1) * UUMatrix),
SqT, IF(IsGetSeries, SEQUENCE(T), T),
Func, LAMBDA(X,
LET(Block, ROUNDUP(X / N, 0),
Col, MOD(X - 1, N) + 1,
Mtrx, REDUCE(WMtrx, SEQUENCE(Block),
LAMBDA(a, b,
IF(b = 1,
a,
MMULT(WMtrx, a)
)
)
),
WVector, CHOOSECOLS(Mtrx, Col),
SUM(WVector * VVector)
)
),
Return, MAP(SqT, Func),
Return
)
)
You can also download the .xlsx file containing the implementation of the solution.
The Lambda has the following limitations:
Sequence function has an upper limit of 2^20. This limits the number of forecast period to 2^20*Window size. However, this size mut be large enough for most purposes.
It expect emprical data to be given as a column vector and the output is also given as a column vector. Suitable modfiication would be required for working with row vectors.
There is reasonable lag for large T's when the IsGetSeries is set to 1.
コメント