Any metric that is measured over regular time intervals forms a time series.
Analysis of time series is commercially importance because of industrial need and relevance especially w.r.t forecasting (demand, sales, supply etc).
This is the second part of time series analysis, when we are considering fundamental issues related to time series models like:
Just to sum up, before we will continue, please take a look at the most popular functions that can help you handle time series :-)
library(tidyverse)
library(forecast)
myts <- ts(df, start = c(1981,1), frequency = 12)
autoplot(): Useful function to plot data and forecasts
ggseasonplot(): Create a seasonal plot
ggsubseriesplot(): Create mini plots for each season and show seasonal means
gglagplot(): Plot the time series against lags of itself
ggAcf(): Plot the autocorrelation function (ACF)
ggPacf(): Plot the partial autocorrelation function (PACF)
White Noise is another name for a time series of iid data. Purely random. Ideally your model residuals should look like white noise.
You can use the Ljung-Box test to check if a time series is white noise, here is an example with 24 lags:
Box.test(data, lag = 24, type="Lj")
p-value > 0.05 suggests data are not significantly different than white noise
The forecast package includes a few common models out of the box. Fit the model and create a forecast object, and then use the forecast() function on the object and a number of h periods to predict.
Example of the workflow:
train <- window(data, start = 1980)
fit <- naive(train)
checkresiduals(fit)
pred <- forecast(fit, h=4)
accuracy(pred, data)
Useful to benchmark against naive and seasonal naive models.
naive()
snaive()
Residuals are the difference between the model`s fitted values and the actual data. Residuals should look like white noise and be:
And ideally have:
checkresiduals(): helper function to plot the residuals, plot the ACF and histogram, and do a Ljung-Box test on the residuals.
Train/Test split with window function:
window(data, start, end): to slice the ts data
Use accuracy() on the model and test set:
accuracy(model, testset): Provides accuracy measures like MAE, MSE, MAPE, RMSE etc
Backtesting with one step ahead forecasts, aka Time series cross validation can be done with a helper function tsCV().
tsCV(): returns forecast errors given a forecastfunction that returns a forecast object and number of steps ahead h. At h = 1 the forecast errors will just be the model residuals.
Here`s an example using the naive() model, forecasting one period ahead:
tsCV(data, forecastfunction = naive, h = 1)
Exponential Models
additive
for additive version or multiplicative
for multiplicative versionETS Models
The forecast package includes a function ets() for your exponential smoothing models. ets() estimates parameters using the likelihood of the data arising from the model, and selects the best model using corrected AIC (AICc) * Error = {A, M} * Trend = {N, A, Ad} * Seasonal = {N, A, M}
ARIMA
Arima(): Implementation of the ARIMA function, set include.constant = TRUE to include drift aka the constant
auto.arima(): Automatic implentation of the ARIMA function in forecast. Estimates parameters using maximum likelihood and does a stepwise search between a subset of all possible models. Can take a lambda argument to fit the model to transformed data and the forecasts will be back-transformed onto the original scale. Turn stepwise = FALSE to consider more models at the expense of more time.
TBATS
Automated model that combines exponential smoothing, Box-Cox transformations, and Fourier terms. Pro: Automated, allows for complex seasonality that changes over time. Cons: Slow.
Using the naive method, all forecasts for the future are equal to the last observed value of the series. Naive forecasts are where all forecasts are simply set to be the value of the last observation. That is, \[ \hat{y}_{T+h|T} = y_{T}. \] This method works remarkably well for many economic and financial time series.
Using the average method, all future forecasts are equal to a simple average of the observed data. Here, the forecasts of all future values are equal to the mean of the historical data. If we let the historical data be denoted by \(y_{1},\dots,y_{T}\), then we can write the forecasts as \[ \hat{y}_{T+h|T} = \bar{y} = (y_{1}+\dots+y_{T})/T. \] The notation \(\hat{y}_{T+h|T}\) is a short-hand for the estimate of \(y_{T+h}\) based on the data \(y_1,\dots,y_T\).
A similar method is useful for highly seasonal data. In this case, we set each forecast to be equal to the last observed value from the same season of the year (e.g., the same month of the previous year). Formally, the forecast for time \(T+h\) is written as \[ \hat{y}_{T+h|T} = y_{T+h-km}, \] where \(m=\) the seasonal period, \(k=\lfloor (h-1)/m\rfloor+1\), and \(\lfloor u \rfloor\) denotes the integer part of \(u\). That looks more complicated than it really is.
For example, with monthly data, the forecast for all future February values is equal to the last observed February value. With quarterly data, the forecast of all future Q2 values is equal to the last observed Q2 value (where Q2 means the second quarter). Similar rules apply for other months and quarters, and for other seasonal periods.
A variation on the naive method is to allow the forecasts to increase or decrease over time, where the amount of change over time (called the “drift”) is set to be the average change seen in the historical data. Thus the forecast for time \(T+h\) is given by \[ \hat{y}_{T+h|T} = y_{T} + \frac{h}{T-1}\sum_{t=2}^T (y_{t}-y_{t-1}) = y_{T} + h \left( \frac{y_{T} -y_{1}}{T-1}\right). \] This is equivalent to drawing a line between the first and last observations, and extrapolating it into the future.
Figure below shows the first three methods applied to the quarterly beer production data.
Now its your turn. Use the WWWusage and bricksq data. Check whichever of meanf(), naive() or snaive() is more appropriate in each case.
?WWWusage
?bricksq
WWWusage %>% autoplot()
fc <- naive(WWWusage)
res <- residuals(fc)
autoplot(res)
checkresiduals(fc)
#now use the above for bricksq data and answer the questions above
#Residuals are correlated as shown by both the LB test and the ACF. #They seem to be normally distributed. There is considerable #information remaining in the residuals which has not been captured #with the naive method.
#now use the above for bricksq data and answer the same questions
\[ y_t = \beta_0 + \beta_1 x_{1,t} + \beta_2 x_{2,t} + \cdots + \beta_kx_{k,t} + \varepsilon_t. \]
Linear trend \[ x_t = t \]
Seasonal dummies
Outliers
Public holidays
Regression model: \[ y_t = \beta_0 + \beta_1 t + \beta_2d_{2,t} + \beta_3 d_{3,t} + \beta_4 d_{4,t} + \varepsilon_t \]
fit.beer <- tslm(beer ~ trend + season)
summary(fit.beer)
##
## Call:
## tslm(formula = beer ~ trend + season)
##
## Residuals:
## Min 1Q Median 3Q Max
## -14.96 -6.25 -0.58 4.72 18.86
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 153.1958 4.4262 34.61 < 2e-16 ***
## trend -0.2158 0.0748 -2.89 0.0061 **
## season2 -8.7842 5.6747 -1.55 0.1290
## season3 8.8317 5.6761 1.56 0.1271
## season4 -6.9525 5.6786 -1.22 0.2275
## season5 -6.3367 5.6820 -1.12 0.2710
## season6 -18.7208 5.6865 -3.29 0.0020 **
## season7 -12.1050 5.6919 -2.13 0.0392 *
## season8 -2.4892 5.6983 -0.44 0.6644
## season9 -6.8683 6.0202 -1.14 0.2602
## season10 20.0975 6.0225 3.34 0.0018 **
## season11 36.8133 6.0258 6.11 2.5e-07 ***
## season12 39.7792 6.0300 6.60 4.9e-08 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 8.97 on 43 degrees of freedom
## Multiple R-squared: 0.836, Adjusted R-squared: 0.791
## F-statistic: 18.3 on 12 and 43 DF, p-value: 3.97e-13
There is an average downward trend of -0.34 megalitres per quarter. On average, the second quarter has production of 34.7 megalitres lower than the first quarter, the third quarter has production of 17.8 megalitres lower than the first quarter, and the fourth quarter has production of 72.8 megalitres higher than the first quarter.
Now we can run the diagnostics (OLS!):
The data set fancy concerns the monthly sales figures of a shop which opened in January 1987 and sells gifts, souvenirs, and novelties. The sales volume varies with the seasonal population of tourists. There is a large influx of visitors to the town at Christmas and for the local surfing festival, held every March since 1988. Over time, the shop has expanded its premises, range of products, and staff.
?fancy
#The last feature above suggests taking logs to make the pattern (and variance) more stable
autoplot(log(fancy)) + ylab("log Sales")
#Create festival dummy:
festival <- cycle(fancy) == 3
festival[3] <- FALSE
# Fit linear model to logged data (by specifying lambda=0)
fit <- tslm(fancy ~ trend + season + festival, lambda = 0)
autoplot(residuals(fit))
qplot(fitted(fit), residuals(fit))
#The residuals are serially correlated. They reveal nonlinearity in the trend.
coefficients(fit)
checkresiduals(fit)
#The model can be improved by taking into account nonlinearity of the trend.
\[ \begin{aligned} \hat{y}_{y}^{*}={\alpha}{y_{t-1}}+(1-\alpha)\hat{y}_{t1} \end{aligned} \]
The Holt`s exponential smoothing approach can fit a time series that has an overall level and a trend (slope)
\(b_t\): slope
State space form: \[ \begin{aligned} y_t&=\ell_{t-1}+b_{t-1}+\varepsilon_t\\ \ell_t&=\ell_{t-1}+b_{t-1}+\alpha \varepsilon_t\\ b_t&=b_{t-1}+\alpha\beta^* \varepsilon_t \end{aligned} \]
For simplicity, set \(\beta=\alpha \beta^*\).
An alpha smoothing parameter controls the exponential decay for the level, and a beta smoothing parameter controls the exponential decay for the slope
Each parameter ranges from 0 to 1, with larger values giving more weight to recent observations
\[ \begin{aligned} \text{Forecast }&& {y}_{t+h|t} &= \ell_{t} + hb_{t} \\ \text{Level }&& \ell_{t} &= \alpha y_{t} + (1 - \alpha)(\ell_{t-1} + b_{t-1})\\ \text{Trend }&& b_{t} &= \beta^*(\ell_{t} - \ell_{t-1}) + (1 -\beta^*)b_{t-1}, \end{aligned} \]
\[ \begin{aligned} \text{Observation equation}&& y_t&=\ell_{t-1}+b_{t-1}+s_{t-m} + \varepsilon_t\\ \text{State equations}&& \ell_t&=\ell_{t-1}+b_{t-1}+\alpha \varepsilon_t\\ && b_t&=b_{t-1}+\beta \varepsilon_t \\ && s_t &= s_{t-m} + \gamma\varepsilon_t \end{aligned} \]
\[ \begin{aligned} {y}_{t+h|t} &= \ell_{t} + hb _{t} + s_{t+h-m(k+1)} \\ \ell_{t} &= \alpha(y_{t} - s_{t-m}) + (1 - \alpha)(\ell_{t-1} + b_{t-1})\\ b_{t} &= \beta^*(\ell_{t} - \ell_{t-1}) + (1 - \beta^*)b_{t-1}\\ s_{t} &= \gamma (y_{t}-\ell_{t-1}-b_{t-1}) + (1-\gamma)s_{t-m} \end{aligned} \]
\[ \begin{aligned} \text{Observation equation}&& y_t&= (\ell_{t-1}+b_{t-1})s_{t-m}(1 + \varepsilon_t)\\ \text{State equations}&& \ell_t&=(\ell_{t-1}+b_{t-1})(1+\alpha \varepsilon_t)\\ && b_t&=b_{t-1}(1+\beta \varepsilon_t) \\ &&s_t &= s_{t-m}(1 + \gamma\varepsilon_t) \end{aligned} \]
Data set books contains the daily sales of paperback and hardcover books at the same store. The task is to forecast the next four days sales for paperback and hardcover books.
?books
fcast1 <- ses(books[,"Hardcover"], h=4)
fcast2 <- ses(books[,"Paperback"], h=4)
autoplot(books) +
autolayer(fcast1, series="Hardcover", PI=FALSE) +
autolayer(fcast2, series="Paperback", PI=FALSE)
accuracy(fcast1)
accuracy(fcast2)
# Use the "holt" and "hw" functions and check the accuracy once again.
While exponential smoothing models are based on a description of the trend and seasonality in the data, ARIMA models aim to describe the autocorrelations in the data.
Autoregressive (AR) models:
\[ y_{t} = c + \phi_{1}y_{t - 1} + \phi_{2}y_{t - 2} + \cdots + \phi_{p}y_{t - p} + \varepsilon_{t} \] where \(\varepsilon_t\) is white noise. This is a multiple regression with lagged values of \(y_t\) as predictors. Autoregressive models are remarkably flexible at handling a wide range of different time series patterns.
Before we will continue, please plot the ACF and PACF for both of those objects (named w1 and w2).
Any conclusions? Can you see why those are AR(1) and AR(2) processes?
\(y_{t} = 2 -0.8 y_{t - 1} + \varepsilon_{t}\)
\(\varepsilon_t\sim N(0,1)\),\(T=100\).
\(y_{t} = c + \phi_1 y_{t - 1} + \varepsilon_{t}\)
\(y_t = 8 + 1.3y_{t-1} - 0.7 y_{t-2} + \varepsilon_t\\\)
\(\varepsilon_t\sim N(0,1)\), \(T=100\).
We normally restrict autoregressive models to stationary data, and then some constraints on the values of the parameters are required.
General condition for stationarity: Complex roots of \(1-\phi_1 z - \phi_2 z^2 - \dots - \phi_pz^p\) lie outside the unit circle on the complex plane.
Rather than using past values of the forecast variable in a regression, a moving average model uses past forecast errors in a regression-like model.
Moving Average (MA) models: \[ y_{t} = c + \varepsilon_t + \theta_{1}\varepsilon_{t - 1} + \theta_{2}\varepsilon_{t - 2} + \cdots + \theta_{q}\varepsilon_{t - q}, \]
where \(\varepsilon_t\) is white noise.
This is a multiple regression with past errors as predictors. Don’t confuse this with moving average smoothing!
Before we will continue, please plot the ACF and PACF for both of those objects (named ma1 and ma2).
Any conclusions? Can you see why those are MA(1) and MA(2) processes?
\(y_t = 20 + \varepsilon_t + 0.8 \varepsilon_{t-1}\)
\(\varepsilon_t\sim N(0,1)\),\(T=100\).
\(y_t = \varepsilon_t -\varepsilon_{t-1} + 0.8 \varepsilon_{t-2}\)
\(\varepsilon_t\sim N(0,1)\),\(T=100\).
AutoRegressive Moving Average models:
\[ y_{t} = c + \phi_{1}y_{t - 1} + \cdots + \phi_{p}y_{t - p} + \theta_{1}\varepsilon_{t - 1} + \cdots + \theta_{q}\varepsilon_{t - q} + \varepsilon_{t}. \]
ARIMA(\(p, d, q\)) model:
AR: \(p =\) order of the autoregressive part
I: \(d =\) degree of first differencing involved
MA: \(q =\) order of the moving average part
White noise model: ARIMA(0,0,0)
Random walk: ARIMA(0,1,0) with no constant
Random walk with drift: ARIMA(0,1,0) with \(\rlap{const.}\)
AR(\(p\)): ARIMA(\(p\),0,0)
MA(\(q\)): ARIMA(0,0,\(q\))
Now let`s focus on the retail trade.
autoplot(euretail) +
xlab("Year") + ylab("Retail index")
We should probably use some transformations to stabilize the variance.
euretail %>% diff(lag=4) %>% diff() %>%
ggtsdisplay()
fit <- Arima(euretail, order=c(0,1,1),
seasonal=c(0,1,1))
checkresiduals(fit)
##
## Ljung-Box test
##
## data: Residuals from ARIMA(0,1,1)(0,1,1)[4]
## Q* = 11, df = 6, p-value = 0.1
##
## Model df: 2. Total lags used: 8
##
## Ljung-Box test
##
## data: Residuals from ARIMA(0,1,1)(0,1,1)[4]
## Q* = 11, df = 6, p-value = 0.1
##
## Model df: 2. Total lags used: 8
fit <- Arima(euretail, order=c(0,1,3),
seasonal=c(0,1,1))
checkresiduals(fit)
## Series: euretail
## ARIMA(0,1,3)(0,1,1)[4]
##
## Coefficients:
## ma1 ma2 ma3 sma1
## 0.263 0.370 0.419 -0.661
## s.e. 0.124 0.126 0.130 0.156
##
## sigma^2 estimated as 0.156: log likelihood=-28.7
## AIC=67.4 AICc=68.53 BIC=77.78
checkresiduals(fit)
##
## Ljung-Box test
##
## data: Residuals from ARIMA(0,1,3)(0,1,1)[4]
## Q* = 0.51, df = 4, p-value = 1
##
## Model df: 4. Total lags used: 8
##
## Ljung-Box test
##
## data: Residuals from ARIMA(0,1,3)(0,1,1)[4]
## Q* = 0.51, df = 4, p-value = 1
##
## Model df: 4. Total lags used: 8
autoplot(forecast(fit, h=12))
auto.arima(euretail)
## Series: euretail
## ARIMA(0,1,3)(0,1,1)[4]
##
## Coefficients:
## ma1 ma2 ma3 sma1
## 0.263 0.370 0.419 -0.661
## s.e. 0.124 0.126 0.130 0.156
##
## sigma^2 estimated as 0.156: log likelihood=-28.7
## AIC=67.4 AICc=68.53 BIC=77.78
auto.arima(euretail,
stepwise=FALSE, approximation=FALSE)
## Series: euretail
## ARIMA(0,1,3)(0,1,1)[4]
##
## Coefficients:
## ma1 ma2 ma3 sma1
## 0.263 0.370 0.419 -0.661
## s.e. 0.124 0.126 0.130 0.156
##
## sigma^2 estimated as 0.156: log likelihood=-28.7
## AIC=67.4 AICc=68.53 BIC=77.78
A seasonal ARIMA model is formed by including additional seasonal terms in the ARIMA models we have seen so far. The seasonal part of the model consists of terms that are similar to the non-seasonal components of the model, but involve backshifts of the seasonal period.
The modelling procedure is almost the same as for non-seasonal data, except that we need to select seasonal AR and MA terms as well as the non-seasonal components of the model!
ARIMA | \(~\underbrace{(p, d, q)}\) | \(\underbrace{(P, D, Q)_{m}}\) |
---|---|---|
\({\uparrow}\) | \({\uparrow}\) | |
Non-seasonal part | Seasonal part of | |
of the model | of the model |
where \(m =\) number of observations per year.
E.g., ARIMA\((1, 1, 1)(1, 1, 1)_{4}\) model (without constant) \[(1 - \phi_{1}B)(1 - \Phi_{1}B^{4}) (1 - B) (1 - B^{4})y_{t} ~= ~ (1 + \theta_{1}B) (1 + \Theta_{1}B^{4})\varepsilon_{t}. \]
All the factors can be multiplied out and the general model written as follows: \begin{align*} y_{t} &= (1 + \phi_{1})y_{t - 1} - \phi_1y_{t-2} + (1 + \Phi_{1})y_{t - 4}\\ &\text{} - (1 + \phi_{1} + \Phi_{1} + \phi_{1}\Phi_{1})y_{t - 5} + (\phi_{1} + \phi_{1} \Phi_{1}) y_{t - 6} \\ & \text{} - \Phi_{1} y_{t - 8} + (\Phi_{1} + \phi_{1} \Phi_{1}) y_{t - 9} - \phi_{1} \Phi_{1} y_{t - 10}\\ &\text{} + \varepsilon_{t} + \theta_{1}\varepsilon_{t - 1} + \Theta_{1}\varepsilon_{t - 4} + \theta_{1}\Theta_{1}\varepsilon_{t - 5}. \end{align*}
The Census Bureaus (in many countries) use the following models most often:
ARIMA(0,1,1)(0,1,1)\(_m\) with log transformation
ARIMA(0,1,2)(0,1,1)\(_m\) with log transformation
ARIMA(2,1,0)(0,1,1)\(_m\) with log transformation
ARIMA(0,2,2)(0,1,1)\(_m\) with log transformation
ARIMA(2,1,2)(0,1,1)\(_m\) with no transformation
The seasonal part of an AR or MA model will be seen in the seasonal lags of the PACF and ACF.
ARIMA(0,0,0)(0,0,1)\(_{12}\) will show:
ARIMA(0,0,0)(1,0,0)\(_{12}\) will show:
Lets continue the analysis on the quarterly retail trade:
eu_retail <- as_tsibble(fpp2::euretail)
eu_retail %>% autoplot(value) +
xlab("Year") + ylab("Retail index")
eu_retail %>% gg_tsdisplay(
value %>% difference(4), plot_type='partial')
eu_retail %>% gg_tsdisplay(
value %>% difference(4) %>% difference(1),plot_type='partial')
fit <- eu_retail %>%
model(arima = ARIMA(value ~ pdq(0,1,1) + PDQ(0,1,1)))
augment(fit) %>% gg_tsdisplay(.resid, plot_type = "hist")
augment(fit) %>%
features(.resid, ljung_box, lag = 8, dof = 2)
fit <- eu_retail %>%
model(
arima013011 = ARIMA(value ~ pdq(0,1,3) + PDQ(0,1,1))
)
report(fit)
## Series: value
## Model: ARIMA(0,1,3)(0,1,1)[4]
##
## Coefficients:
## ma1 ma2 ma3 sma1
## 0.2630 0.3694 0.4200 -0.6636
## s.e. 0.1237 0.1255 0.1294 0.1545
##
## sigma^2 estimated as 0.156: log likelihood=-28.63
## AIC=67.26 AICc=68.39 BIC=77.65
augment(fit) %>%
gg_tsdisplay(.resid, plot_type = "hist")
augment(fit) %>%
features(.resid, ljung_box, lag = 8, dof = 4)
fit %>% forecast(h = "3 years") %>%
autoplot(eu_retail)
eu_retail %>% model(ARIMA(value)) %>% report()
## Series: value
## Model: ARIMA(0,1,3)(0,1,1)[4]
##
## Coefficients:
## ma1 ma2 ma3 sma1
## 0.2630 0.3694 0.4200 -0.6636
## s.e. 0.1237 0.1255 0.1294 0.1545
##
## sigma^2 estimated as 0.156: log likelihood=-28.63
## AIC=67.26 AICc=68.39 BIC=77.65
eu_retail %>% model(ARIMA(value, stepwise = FALSE,
approximation = FALSE)) %>% report()
## Series: value
## Model: ARIMA(0,1,3)(0,1,1)[4]
##
## Coefficients:
## ma1 ma2 ma3 sma1
## 0.2630 0.3694 0.4200 -0.6636
## s.e. 0.1237 0.1255 0.1294 0.1545
##
## sigma^2 estimated as 0.156: log likelihood=-28.63
## AIC=67.26 AICc=68.39 BIC=77.65