Univariate Time Series Modeling

Any metric that is measured over regular time intervals forms a time series.

Analysis of time series is commercially importance because of industrial need and relevance especially w.r.t forecasting (demand, sales, supply etc).

This is the second part of time series analysis, when we are considering fundamental issues related to time series models like:

  1. Naive models
  2. Linear models
  3. ETS models
  4. Seasonal and non-seasonal ARIMA models

Time series CHEAT-SHEET

Just to sum up, before we will continue, please take a look at the most popular functions that can help you handle time series :-)

Data Prep

library(tidyverse)
library(forecast)

myts <- ts(df, start = c(1981,1), frequency = 12)

Exploring and Plotting ts Data

autoplot(): Useful function to plot data and forecasts

Seasonality

ggseasonplot(): Create a seasonal plot
ggsubseriesplot(): Create mini plots for each season and show seasonal means

Lags and ACF, PACF

gglagplot(): Plot the time series against lags of itself
ggAcf(): Plot the autocorrelation function (ACF)
ggPacf(): Plot the partial autocorrelation function (PACF)

White Noise and the Ljung-Box Test

White Noise is another name for a time series of iid data. Purely random. Ideally your model residuals should look like white noise.

You can use the Ljung-Box test to check if a time series is white noise, here is an example with 24 lags:

Box.test(data, lag = 24, type="Lj")

p-value > 0.05 suggests data are not significantly different than white noise

Model Selection

The forecast package includes a few common models out of the box. Fit the model and create a forecast object, and then use the forecast() function on the object and a number of h periods to predict.

Example of the workflow:

train <- window(data, start = 1980)
fit <- naive(train)
checkresiduals(fit)
pred <- forecast(fit, h=4)
accuracy(pred, data)

Naive Models

Useful to benchmark against naive and seasonal naive models.

naive()
snaive()

Residuals

Residuals are the difference between the model`s fitted values and the actual data. Residuals should look like white noise and be:

  • Uncorrelated
  • Have mean zero

And ideally have:

  • Constant variance
  • A normal distribution
checkresiduals(): helper function to plot the residuals, plot the ACF and histogram, and do a Ljung-Box test on the residuals.

Evaluating Model Accuracy

Train/Test split with window function:

window(data, start, end): to slice the ts data

Use accuracy() on the model and test set:

accuracy(model, testset): Provides accuracy measures like MAE, MSE, MAPE, RMSE etc

Backtesting with one step ahead forecasts, aka Time series cross validation can be done with a helper function tsCV().

tsCV(): returns forecast errors given a forecastfunction that returns a forecast object and number of steps ahead h. At h = 1 the forecast errors will just be the model residuals.

Here`s an example using the naive() model, forecasting one period ahead:

tsCV(data, forecastfunction = naive, h = 1)

Many Models

Exponential Models

  • ses(): Simple Exponential Smoothing, implement a smoothing parameter alpha on previous data
  • holt(): Holt`s linear trend, SES + trend parameter. Use damped=TRUE for damped trending
  • hw(): Holt-Winters method, incorporates linear trend and seasonality. Set seasonal=additive for additive version or multiplicative for multiplicative version

ETS Models

The forecast package includes a function ets() for your exponential smoothing models. ets() estimates parameters using the likelihood of the data arising from the model, and selects the best model using corrected AIC (AICc) * Error = {A, M} * Trend = {N, A, Ad} * Seasonal = {N, A, M}

ARIMA

  • Arima(): Implementation of the ARIMA function, set include.constant = TRUE to include drift aka the constant

  • auto.arima(): Automatic implentation of the ARIMA function in forecast. Estimates parameters using maximum likelihood and does a stepwise search between a subset of all possible models. Can take a lambda argument to fit the model to transformed data and the forecasts will be back-transformed onto the original scale. Turn stepwise = FALSE to consider more models at the expense of more time.

TBATS

Automated model that combines exponential smoothing, Box-Cox transformations, and Fourier terms. Pro: Automated, allows for complex seasonality that changes over time. Cons: Slow.

  • T: Trigonometric terms for seasonality
  • B: Box-Cox transformations for heterogeneity
  • A: ARMA errors for short term dynamics
  • T: Trend (possibly damped)
  • S: Seasonal (including multiple and non-integer periods)

Naive approach

  • Using the naive method, all forecasts for the future are equal to the last observed value of the series. Naive forecasts are where all forecasts are simply set to be the value of the last observation. That is, \[ \hat{y}_{T+h|T} = y_{T}. \] This method works remarkably well for many economic and financial time series.

  • Using the average method, all future forecasts are equal to a simple average of the observed data. Here, the forecasts of all future values are equal to the mean of the historical data. If we let the historical data be denoted by \(y_{1},\dots,y_{T}\), then we can write the forecasts as \[ \hat{y}_{T+h|T} = \bar{y} = (y_{1}+\dots+y_{T})/T. \] The notation \(\hat{y}_{T+h|T}\) is a short-hand for the estimate of \(y_{T+h}\) based on the data \(y_1,\dots,y_T\).

Seasonal naive method

A similar method is useful for highly seasonal data. In this case, we set each forecast to be equal to the last observed value from the same season of the year (e.g., the same month of the previous year). Formally, the forecast for time \(T+h\) is written as \[ \hat{y}_{T+h|T} = y_{T+h-km}, \] where \(m=\) the seasonal period, \(k=\lfloor (h-1)/m\rfloor+1\), and \(\lfloor u \rfloor\) denotes the integer part of \(u\). That looks more complicated than it really is.

For example, with monthly data, the forecast for all future February values is equal to the last observed February value. With quarterly data, the forecast of all future Q2 values is equal to the last observed Q2 value (where Q2 means the second quarter). Similar rules apply for other months and quarters, and for other seasonal periods.

Drift method

A variation on the naive method is to allow the forecasts to increase or decrease over time, where the amount of change over time (called the “drift”) is set to be the average change seen in the historical data. Thus the forecast for time \(T+h\) is given by \[ \hat{y}_{T+h|T} = y_{T} + \frac{h}{T-1}\sum_{t=2}^T (y_{t}-y_{t-1}) = y_{T} + h \left( \frac{y_{T} -y_{1}}{T-1}\right). \] This is equivalent to drawing a line between the first and last observations, and extrapolating it into the future.

Examples

Figure below shows the first three methods applied to the quarterly beer production data.

Exercise 1.

Now its your turn. Use the WWWusage and bricksq data. Check whichever of meanf(), naive() or snaive() is more appropriate in each case.

?WWWusage
?bricksq
WWWusage %>% autoplot()
fc <- naive(WWWusage)
res <- residuals(fc)
autoplot(res)
checkresiduals(fc)
#now use the above for bricksq data and answer the questions above
#Residuals are correlated as shown by both the LB test and the ACF. #They seem to be normally distributed. There is considerable #information remaining in the residuals which has not been captured #with the naive method.

#now use the above for bricksq data and answer the same questions

Linear models

Multiple regression

\[ y_t = \beta_0 + \beta_1 x_{1,t} + \beta_2 x_{2,t} + \cdots + \beta_kx_{k,t} + \varepsilon_t. \]

  • \(y_t\) is the variable we want to predict: the response variable
  • Each \(x_{j,t}\) is numerical and is called a predictor. They are usually assumed to be known for all past and future times.
  • The coefficients \(\beta_1,\dots,\beta_k\) measure the effect of each predictor after taking account of the effect of all other predictors in the model. That is, the coefficients measure the marginal effects.
  • \(\varepsilon_t\) is a white noise error term

Some useful predictors for linear models

Trend

Linear trend \[ x_t = t \]

  • \(t=1,2,\dots,T\)
  • Strong assumption that trend will continue.

Seasonal dummies

  • For quarterly data: use 3 dummies
  • For monthly data: use 11 dummies
  • For daily data: use 6 dummies
  • What to do with weekly data?

Outliers

  • If there is an outlier, you can use a dummy variable (taking value 1 for that observation and 0 elsewhere) to remove its effect.

Public holidays

  • For daily data: if it is a public holiday, dummy=1, otherwise dummy=0.

Beer production - example

Regression model: \[ y_t = \beta_0 + \beta_1 t + \beta_2d_{2,t} + \beta_3 d_{3,t} + \beta_4 d_{4,t} + \varepsilon_t \]

  • \(d_{i,t} = 1\) if \(t\) is quarter \(i\) and 0 otherwise.
  • The first quarter variable has been omitted, so the coefficients associated with the other quarters are measures of the difference between those quarters and the first quarter.
fit.beer <- tslm(beer ~ trend + season)
summary(fit.beer)
## 
## Call:
## tslm(formula = beer ~ trend + season)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -14.96  -6.25  -0.58   4.72  18.86 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 153.1958     4.4262   34.61  < 2e-16 ***
## trend        -0.2158     0.0748   -2.89   0.0061 ** 
## season2      -8.7842     5.6747   -1.55   0.1290    
## season3       8.8317     5.6761    1.56   0.1271    
## season4      -6.9525     5.6786   -1.22   0.2275    
## season5      -6.3367     5.6820   -1.12   0.2710    
## season6     -18.7208     5.6865   -3.29   0.0020 ** 
## season7     -12.1050     5.6919   -2.13   0.0392 *  
## season8      -2.4892     5.6983   -0.44   0.6644    
## season9      -6.8683     6.0202   -1.14   0.2602    
## season10     20.0975     6.0225    3.34   0.0018 ** 
## season11     36.8133     6.0258    6.11  2.5e-07 ***
## season12     39.7792     6.0300    6.60  4.9e-08 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 8.97 on 43 degrees of freedom
## Multiple R-squared:  0.836,  Adjusted R-squared:  0.791 
## F-statistic: 18.3 on 12 and 43 DF,  p-value: 3.97e-13

There is an average downward trend of -0.34 megalitres per quarter. On average, the second quarter has production of 34.7 megalitres lower than the first quarter, the third quarter has production of 17.8 megalitres lower than the first quarter, and the fourth quarter has production of 72.8 megalitres higher than the first quarter.

Now we can run the diagnostics (OLS!):

Exercise 2.

The data set fancy concerns the monthly sales figures of a shop which opened in January 1987 and sells gifts, souvenirs, and novelties. The sales volume varies with the seasonal population of tourists. There is a large influx of visitors to the town at Christmas and for the local surfing festival, held every March since 1988. Over time, the shop has expanded its premises, range of products, and staff.

  • Produce a time plot of the data and describe the patterns in the graph.
  • Explain why it is necessary to take logarithms of these data before fitting a model.
  • Use R to fit a regression model to the logarithms of these sales data with a linear trend, seasonal dummies and a surfing festival dummy variable. What do the values of the coefficients tell you about each variable?
  • Plot the residuals against time and against the fitted values. Do these plots reveal any problems with the model?
?fancy
#The last feature above suggests taking logs to make the pattern (and variance) more stable
autoplot(log(fancy)) + ylab("log Sales")
#Create festival dummy:
festival <- cycle(fancy) == 3
festival[3] <- FALSE
# Fit linear model to logged data (by specifying lambda=0)
fit <- tslm(fancy ~ trend + season + festival, lambda = 0)
autoplot(residuals(fit))
qplot(fitted(fit), residuals(fit))
#The residuals are serially correlated. They reveal nonlinearity in the trend.
coefficients(fit)
checkresiduals(fit)
#The model can be improved by taking into account nonlinearity of the trend.

ETS models

Historical perspective

  • Developed in the 1950s and 1960s as methods (algorithms) to produce point forecasts.
  • Combine a level, trend (slope) and seasonal component to describe a time series.
  • The rate of change of the components are controlled by smoothing parameters: \(\alpha\), \(\beta\) and \(\gamma\) respectively.
  • Need to choose best values for the smoothing parameters (and initial states).
  • Equivalent ETS state space models developed in the 1990s and 2000s.

Simple method

  • Simple exponential smoothing is suitable for forecasting data with no clear trend or seasonal pattern.
  • Exponential smoothing was proposed in the late 1950s(Brown, 1959; Holt, 1957; Winters, 1960), and has motivated some of the most successful forecasting methods.
  • Forecasts produced using exponential smoothing methods are weighted averages of past observations, with the weights decaying exponentially as the observations get older.
  • In other words, the more recent the observation the higher the associated weight:

\[ \begin{aligned} \hat{y}_{y}^{*}={\alpha}{y_{t-1}}+(1-\alpha)\hat{y}_{t1} \end{aligned} \]

  • Want something in between that weights most recent data more highly.
  • Simple exponential smoothing uses a weighted moving average with weights that decrease exponentially!

Optimisation

  • The application of every exponential smoothing method requires the smoothing parameters and the initial values to be chosen.
  • A more reliable and objective way to obtain values for the unknown parameters is to estimate them from the observed data.
  • The unknown parameters and the initial values for any exponential smoothing method can be estimated by minimising the SSE.

ETS models with trend

  • The Holt`s exponential smoothing approach can fit a time series that has an overall level and a trend (slope)

  • \(b_t\): slope

  • State space form: \[ \begin{aligned} y_t&=\ell_{t-1}+b_{t-1}+\varepsilon_t\\ \ell_t&=\ell_{t-1}+b_{t-1}+\alpha \varepsilon_t\\ b_t&=b_{t-1}+\alpha\beta^* \varepsilon_t \end{aligned} \]

  • For simplicity, set \(\beta=\alpha \beta^*\).

  • An alpha smoothing parameter controls the exponential decay for the level, and a beta smoothing parameter controls the exponential decay for the slope

  • Each parameter ranges from 0 to 1, with larger values giving more weight to recent observations

\[ \begin{aligned} \text{Forecast }&& {y}_{t+h|t} &= \ell_{t} + hb_{t} \\ \text{Level }&& \ell_{t} &= \alpha y_{t} + (1 - \alpha)(\ell_{t-1} + b_{t-1})\\ \text{Trend }&& b_{t} &= \beta^*(\ell_{t} - \ell_{t-1}) + (1 -\beta^*)b_{t-1}, \end{aligned} \]

  • Two smoothing parameters \(\alpha\) and \(\beta^*\) (\(0\le\alpha,\beta^*\le1\)).
  • \(\ell_t\) level: weighted average between \(y_t\) and one-step ahead forecast for time \(t\), \((\ell_{t-1} + b_{t-1}={y}_{t|t-1})\).
  • \(b_t\) slope: weighted average of \((\ell_{t} - \ell_{t-1})\) and \(b_{t-1}\), current and previous estimate of slope.
  • Choose \(\alpha, \beta^*, \ell_0, b_0\) to minimise SSE.

Holt and Winters model

  • The Holt-Winters exponential smoothing approach can be used to fit a time series that has an overall level, a trend, and a seasonal component:

\[ \begin{aligned} \text{Observation equation}&& y_t&=\ell_{t-1}+b_{t-1}+s_{t-m} + \varepsilon_t\\ \text{State equations}&& \ell_t&=\ell_{t-1}+b_{t-1}+\alpha \varepsilon_t\\ && b_t&=b_{t-1}+\beta \varepsilon_t \\ && s_t &= s_{t-m} + \gamma\varepsilon_t \end{aligned} \]

  • \(s_t\): seasonal state.

Holt-Winters additive model

\[ \begin{aligned} {y}_{t+h|t} &= \ell_{t} + hb _{t} + s_{t+h-m(k+1)} \\ \ell_{t} &= \alpha(y_{t} - s_{t-m}) + (1 - \alpha)(\ell_{t-1} + b_{t-1})\\ b_{t} &= \beta^*(\ell_{t} - \ell_{t-1}) + (1 - \beta^*)b_{t-1}\\ s_{t} &= \gamma (y_{t}-\ell_{t-1}-b_{t-1}) + (1-\gamma)s_{t-m} \end{aligned} \]

  • \(k=\) integer part of \((h-1)/m\).
  • Parameters: \(0\le \alpha\le 1\), \(0\le \beta^*\le 1\), \(0\le \gamma\le 1-\alpha\) and \(m=\) period of seasonality (e.g.\(m=4\) for quarterly data).

Holt-Winters multiplicative method

  • Holt-Winters multiplicative method with multiplicative errors for when seasonal variations are changing proportional to the level of the series.

\[ \begin{aligned} \text{Observation equation}&& y_t&= (\ell_{t-1}+b_{t-1})s_{t-m}(1 + \varepsilon_t)\\ \text{State equations}&& \ell_t&=(\ell_{t-1}+b_{t-1})(1+\alpha \varepsilon_t)\\ && b_t&=b_{t-1}(1+\beta \varepsilon_t) \\ &&s_t &= s_{t-m}(1 + \gamma\varepsilon_t) \end{aligned} \]

  • \(k\) is integer part of \((h-1)/m\).

Exercise 3.

Data set books contains the daily sales of paperback and hardcover books at the same store. The task is to forecast the next four days sales for paperback and hardcover books.

  • Plot the series and discuss the main features of the data.
  • Apply several ETS methods and choose the most appropriate one based on the RMSE. Discuss your findings.
?books
fcast1 <- ses(books[,"Hardcover"], h=4)
fcast2 <- ses(books[,"Paperback"], h=4)
autoplot(books) +
  autolayer(fcast1, series="Hardcover", PI=FALSE) +
  autolayer(fcast2, series="Paperback", PI=FALSE)
accuracy(fcast1)
accuracy(fcast2)
# Use the "holt" and "hw" functions and check the accuracy once again.

Autoregressive models

While exponential smoothing models are based on a description of the trend and seasonality in the data, ARIMA models aim to describe the autocorrelations in the data.

Autoregressive (AR) models:

\[ y_{t} = c + \phi_{1}y_{t - 1} + \phi_{2}y_{t - 2} + \cdots + \phi_{p}y_{t - p} + \varepsilon_{t} \] where \(\varepsilon_t\) is white noise. This is a multiple regression with lagged values of \(y_t\) as predictors. Autoregressive models are remarkably flexible at handling a wide range of different time series patterns.

Before we will continue, please plot the ACF and PACF for both of those objects (named w1 and w2).

Any conclusions? Can you see why those are AR(1) and AR(2) processes?

AR(1) model

\(y_{t} = 2 -0.8 y_{t - 1} + \varepsilon_{t}\)

\(\varepsilon_t\sim N(0,1)\),\(T=100\).

\(y_{t} = c + \phi_1 y_{t - 1} + \varepsilon_{t}\)

  • When \(\phi_1=0\), \(y_t\) is equivalent to White Noise
  • When \(\phi_1=1\) and \(c=0\), \(y_t\) is equivalent to a Random Walk
  • When \(\phi_1=1\) and \(c\ne0\), \(y_t\) is equivalent to a Random Walk with drift
  • When \(\phi_1<0\), \(y_t\) tends to oscillate between positive and negative values.

AR(2) model

\(y_t = 8 + 1.3y_{t-1} - 0.7 y_{t-2} + \varepsilon_t\\\)

\(\varepsilon_t\sim N(0,1)\), \(T=100\).

Stationarity conditions

We normally restrict autoregressive models to stationary data, and then some constraints on the values of the parameters are required.

General condition for stationarity: Complex roots of \(1-\phi_1 z - \phi_2 z^2 - \dots - \phi_pz^p\) lie outside the unit circle on the complex plane.

  • For \(p=1\): \(-1<\phi_1<1\).
  • For \(p=2\): \(-1<\phi_2<1\qquad \phi_2+\phi_1 < 1 \qquad \phi_2 -\phi_1 < 1\).
  • More complicated conditions hold for \(p\ge3\).
  • Estimation software takes care of this.

Moving Average (MA) models

Rather than using past values of the forecast variable in a regression, a moving average model uses past forecast errors in a regression-like model.

Moving Average (MA) models: \[ y_{t} = c + \varepsilon_t + \theta_{1}\varepsilon_{t - 1} + \theta_{2}\varepsilon_{t - 2} + \cdots + \theta_{q}\varepsilon_{t - q}, \]

where \(\varepsilon_t\) is white noise.

This is a multiple regression with past errors as predictors. Don’t confuse this with moving average smoothing!

Before we will continue, please plot the ACF and PACF for both of those objects (named ma1 and ma2).

Any conclusions? Can you see why those are MA(1) and MA(2) processes?

MA(1) model

\(y_t = 20 + \varepsilon_t + 0.8 \varepsilon_{t-1}\)

\(\varepsilon_t\sim N(0,1)\),\(T=100\).

MA(2) model

\(y_t = \varepsilon_t -\varepsilon_{t-1} + 0.8 \varepsilon_{t-2}\)

\(\varepsilon_t\sim N(0,1)\),\(T=100\).

ARIMA models

AutoRegressive Moving Average models:

\[ y_{t} = c + \phi_{1}y_{t - 1} + \cdots + \phi_{p}y_{t - p} + \theta_{1}\varepsilon_{t - 1} + \cdots + \theta_{q}\varepsilon_{t - q} + \varepsilon_{t}. \]

  • Predictors include both lagged values of \(y_t\) and lagged errors.
  • Conditions on coefficients ensure stationarity.
  • Conditions on coefficients ensure invertibility.
  • Combined ARMA model with differencing.

ARIMA(\(p, d, q\)) model:

  • AR: \(p =\) order of the autoregressive part

  • I: \(d =\) degree of first differencing involved

  • MA: \(q =\) order of the moving average part

  • White noise model: ARIMA(0,0,0)

  • Random walk: ARIMA(0,1,0) with no constant

  • Random walk with drift: ARIMA(0,1,0) with \(\rlap{const.}\)

  • AR(\(p\)): ARIMA(\(p\),0,0)

  • MA(\(q\)): ARIMA(0,0,\(q\))

Exercise 4.

Now let`s focus on the retail trade.

autoplot(euretail) +
  xlab("Year") + ylab("Retail index")

We should probably use some transformations to stabilize the variance.

euretail %>% diff(lag=4) %>% diff() %>%
  ggtsdisplay()

  • \(d=1\) and \(D=1\) seems necessary.
  • Significant spike at lag 1 in ACF suggests non-seasonal MA(1) component.
  • Significant spike at lag 4 in ACF suggests seasonal MA(1) component.
  • Initial candidate model: ARIMA(0,1,1)(0,1,1)\(_4\).
  • We could also have started with ARIMA(1,1,0)(1,1,0)\(_4\).
fit <- Arima(euretail, order=c(0,1,1),
  seasonal=c(0,1,1))
checkresiduals(fit)

## 
##  Ljung-Box test
## 
## data:  Residuals from ARIMA(0,1,1)(0,1,1)[4]
## Q* = 11, df = 6, p-value = 0.1
## 
## Model df: 2.   Total lags used: 8
## 
##  Ljung-Box test
## 
## data:  Residuals from ARIMA(0,1,1)(0,1,1)[4]
## Q* = 11, df = 6, p-value = 0.1
## 
## Model df: 2.   Total lags used: 8
  • ACF and PACF of residuals show significant spikes at lag 2, and maybe lag 3.
  • AICc of ARIMA(0,1,2)(0,1,1)\(_4\) model is 74.36.
  • AICc of ARIMA(0,1,3)(0,1,1)\(_4\) model is 68.53.
fit <- Arima(euretail, order=c(0,1,3),
  seasonal=c(0,1,1))
checkresiduals(fit)
## Series: euretail 
## ARIMA(0,1,3)(0,1,1)[4] 
## 
## Coefficients:
##         ma1    ma2    ma3    sma1
##       0.263  0.370  0.419  -0.661
## s.e.  0.124  0.126  0.130   0.156
## 
## sigma^2 estimated as 0.156:  log likelihood=-28.7
## AIC=67.4   AICc=68.53   BIC=77.78
checkresiduals(fit)

## 
##  Ljung-Box test
## 
## data:  Residuals from ARIMA(0,1,3)(0,1,1)[4]
## Q* = 0.51, df = 4, p-value = 1
## 
## Model df: 4.   Total lags used: 8
## 
##  Ljung-Box test
## 
## data:  Residuals from ARIMA(0,1,3)(0,1,1)[4]
## Q* = 0.51, df = 4, p-value = 1
## 
## Model df: 4.   Total lags used: 8
autoplot(forecast(fit, h=12))

auto.arima(euretail)
## Series: euretail 
## ARIMA(0,1,3)(0,1,1)[4] 
## 
## Coefficients:
##         ma1    ma2    ma3    sma1
##       0.263  0.370  0.419  -0.661
## s.e.  0.124  0.126  0.130   0.156
## 
## sigma^2 estimated as 0.156:  log likelihood=-28.7
## AIC=67.4   AICc=68.53   BIC=77.78
auto.arima(euretail, 
  stepwise=FALSE, approximation=FALSE)
## Series: euretail 
## ARIMA(0,1,3)(0,1,1)[4] 
## 
## Coefficients:
##         ma1    ma2    ma3    sma1
##       0.263  0.370  0.419  -0.661
## s.e.  0.124  0.126  0.130   0.156
## 
## sigma^2 estimated as 0.156:  log likelihood=-28.7
## AIC=67.4   AICc=68.53   BIC=77.78

Seasonal ARIMA models

A seasonal ARIMA model is formed by including additional seasonal terms in the ARIMA models we have seen so far. The seasonal part of the model consists of terms that are similar to the non-seasonal components of the model, but involve backshifts of the seasonal period.

The modelling procedure is almost the same as for non-seasonal data, except that we need to select seasonal AR and MA terms as well as the non-seasonal components of the model!

ARIMA \(~\underbrace{(p, d, q)}\) \(\underbrace{(P, D, Q)_{m}}\)
\({\uparrow}\) \({\uparrow}\)
Non-seasonal part Seasonal part of
of the model of the model

where \(m =\) number of observations per year.

E.g., ARIMA\((1, 1, 1)(1, 1, 1)_{4}\) model (without constant) \[(1 - \phi_{1}B)(1 - \Phi_{1}B^{4}) (1 - B) (1 - B^{4})y_{t} ~= ~ (1 + \theta_{1}B) (1 + \Theta_{1}B^{4})\varepsilon_{t}. \]

All the factors can be multiplied out and the general model written as follows: \begin{align*} y_{t} &= (1 + \phi_{1})y_{t - 1} - \phi_1y_{t-2} + (1 + \Phi_{1})y_{t - 4}\\ &\text{} - (1 + \phi_{1} + \Phi_{1} + \phi_{1}\Phi_{1})y_{t - 5} + (\phi_{1} + \phi_{1} \Phi_{1}) y_{t - 6} \\ & \text{} - \Phi_{1} y_{t - 8} + (\Phi_{1} + \phi_{1} \Phi_{1}) y_{t - 9} - \phi_{1} \Phi_{1} y_{t - 10}\\ &\text{} + \varepsilon_{t} + \theta_{1}\varepsilon_{t - 1} + \Theta_{1}\varepsilon_{t - 4} + \theta_{1}\Theta_{1}\varepsilon_{t - 5}. \end{align*}

Common ARIMA models

The Census Bureaus (in many countries) use the following models most often:

  • ARIMA(0,1,1)(0,1,1)\(_m\) with log transformation

  • ARIMA(0,1,2)(0,1,1)\(_m\) with log transformation

  • ARIMA(2,1,0)(0,1,1)\(_m\) with log transformation

  • ARIMA(0,2,2)(0,1,1)\(_m\) with log transformation

  • ARIMA(2,1,2)(0,1,1)\(_m\) with no transformation

The seasonal part of an AR or MA model will be seen in the seasonal lags of the PACF and ACF.

ARIMA(0,0,0)(0,0,1)\(_{12}\) will show:

  • a spike at lag 12 in the ACF but no other significant spikes.
  • The PACF will show exponential decay in the seasonal lags; that is, at lags 12, 24, 36, etc.

ARIMA(0,0,0)(1,0,0)\(_{12}\) will show:

  • exponential decay in the seasonal lags of the ACF
  • a single significant spike at lag 12 in the PACF.

Exercise 5.

Lets continue the analysis on the quarterly retail trade:

eu_retail <- as_tsibble(fpp2::euretail)
eu_retail %>% autoplot(value) +
  xlab("Year") + ylab("Retail index")

eu_retail %>% gg_tsdisplay(
  value %>% difference(4), plot_type='partial')

eu_retail %>% gg_tsdisplay(
  value %>% difference(4) %>% difference(1),plot_type='partial')

  • \(d=1\) and \(D=1\) seems necessary.
  • Significant spike at lag 1 in ACF suggests non-seasonal MA(1) component.
  • Significant spike at lag 4 in ACF suggests seasonal MA(1) component.
  • Initial candidate model: ARIMA(0,1,1)(0,1,1)\(_4\).
  • We could also have started with ARIMA(1,1,0)(1,1,0)\(_4\).
fit <- eu_retail %>%
  model(arima = ARIMA(value ~ pdq(0,1,1) + PDQ(0,1,1)))
augment(fit) %>% gg_tsdisplay(.resid, plot_type = "hist")

augment(fit) %>%
  features(.resid, ljung_box, lag = 8, dof = 2)
  • ACF and PACF of residuals show significant spikes at lag 2, and maybe lag 3.
  • AICc of ARIMA(0,1,1)(0,1,1)\(_4\) model is 75.72
  • AICc of ARIMA(0,1,2)(0,1,1)\(_4\) model is 74.27.
  • AICc of ARIMA(0,1,3)(0,1,1)\(_4\) model is 68.39.
  • AICc of ARIMA(0,1,4)(0,1,1)\(_4\) model is 70.73.
fit <- eu_retail %>%
  model(
    arima013011 = ARIMA(value ~ pdq(0,1,3) + PDQ(0,1,1))
  )
report(fit)
## Series: value 
## Model: ARIMA(0,1,3)(0,1,1)[4] 
## 
## Coefficients:
##          ma1     ma2     ma3     sma1
##       0.2630  0.3694  0.4200  -0.6636
## s.e.  0.1237  0.1255  0.1294   0.1545
## 
## sigma^2 estimated as 0.156:  log likelihood=-28.63
## AIC=67.26   AICc=68.39   BIC=77.65
augment(fit) %>%
  gg_tsdisplay(.resid, plot_type = "hist")

augment(fit) %>%
  features(.resid, ljung_box, lag = 8, dof = 4)
fit %>% forecast(h = "3 years") %>%
  autoplot(eu_retail)

eu_retail %>% model(ARIMA(value)) %>% report()
## Series: value 
## Model: ARIMA(0,1,3)(0,1,1)[4] 
## 
## Coefficients:
##          ma1     ma2     ma3     sma1
##       0.2630  0.3694  0.4200  -0.6636
## s.e.  0.1237  0.1255  0.1294   0.1545
## 
## sigma^2 estimated as 0.156:  log likelihood=-28.63
## AIC=67.26   AICc=68.39   BIC=77.65
eu_retail %>% model(ARIMA(value, stepwise = FALSE,
  approximation = FALSE)) %>% report()
## Series: value 
## Model: ARIMA(0,1,3)(0,1,1)[4] 
## 
## Coefficients:
##          ma1     ma2     ma3     sma1
##       0.2630  0.3694  0.4200  -0.6636
## s.e.  0.1237  0.1255  0.1294   0.1545
## 
## sigma^2 estimated as 0.156:  log likelihood=-28.63
## AIC=67.26   AICc=68.39   BIC=77.65

Your turn! (continued)

  • Try to use the naive approach, linear trend models (with/without seasonality), ETS models for your series.
  • Next, plot your time series, and try to identify an appropriate ARIMA model.
  • Do residual diagnostic checking of your ARIMA model. Are the residuals white noise?
  • Use your chosen ARIMA model to forecast the next (12m/4q/30d).
  • Do residual diagnostic checking of your ETS model. Are the residuals white noise?
  • Use your chosen ETS model to forecast the next (12m/4q/30d).
  • Which of the two models do you prefer?

Karol Flisikowski