Forecasting with {modeltime} - Part V

Selecting the best model for each city

Time Series

Forecasting

Machine Learning

Author

Arcenis Rojas

Published

February 21, 2024

This is the final (planned) post for this series on the modeltime (Dancho 2023) package I will identify the best model to forecast the Case-Shiller Home Price Index using an iterative modeling process. The models to work with will be the 11 models that I identified through the global modeling process in Forecasting with {modeltime} - Part IV. Additionally I’ll be using parallel (2023) for parallel processing, tidymodels (Kuhn and Wickham 2020) for a modeling framework, tidyverse (Wickham et al. 2019) for data wrangling and visualization, and gt (Iannone et al. 2024) for table output.

Code

library(tidyverse)
library(timetk)
library(modeltime)
library(tidymodels)
library(parallel)
library(gt)
conflicted::conflict_prefer("filter", "dplyr")
conflicted::conflict_prefer("lag", "dplyr")
tidymodels_prefer()

Below is a look at the names of the model specifications from the last post.

Code

part4_mods |> gt() |> gt_bold_head()

.model_desc
BASE_REC_ETS_DEFAULT
BASE_REC_ARIMA_DEFAULT
BASE_REC_ARIMA_SPEC2
BASE_REC_ETS_SPEC14
BASE_REC_ETS_SPEC6
DEMOG_REC_ARIMA_SPEC1
DEMOG_REC_ARIMA_SPEC3
NNET_REC_NNET_SPEC22
NNET_REC_NNET_SPEC9
NNET_REC_NNET_SPEC12
BASE_REC_STLM_SPEC3

And this is what the workflowset object looked like.

Code

hpi_global_wfset

# A workflow set/tibble: 90 × 4
   wflow_id               info             option    result    
   <chr>                  <list>           <list>    <list>    
 1 base_rec_arima_default <tibble [1 × 4]> <opts[0]> <list [0]>
 2 base_rec_arima_spec1   <tibble [1 × 4]> <opts[0]> <list [0]>
 3 base_rec_arima_spec2   <tibble [1 × 4]> <opts[0]> <list [0]>
 4 base_rec_arima_spec3   <tibble [1 × 4]> <opts[0]> <list [0]>
 5 base_rec_arima_spec4   <tibble [1 × 4]> <opts[0]> <list [0]>
 6 econ_rec_arima_default <tibble [1 × 4]> <opts[0]> <list [0]>
 7 econ_rec_arima_spec1   <tibble [1 × 4]> <opts[0]> <list [0]>
 8 econ_rec_arima_spec2   <tibble [1 × 4]> <opts[0]> <list [0]>
 9 econ_rec_arima_spec3   <tibble [1 × 4]> <opts[0]> <list [0]>
10 econ_rec_arima_spec4   <tibble [1 × 4]> <opts[0]> <list [0]>
# ℹ 80 more rows

Filtering workflows

I’ll first filter the set of workflows to the 11 that I want to keep.

Code

mod_ids <- str_to_lower(part4_mods$.model_desc)

hpi_global_wfset <- hpi_global_wfset |>
  filter(wflow_id %in% mod_ids)

These are all of the necessary workflows including recipes and model specifications.

Splitting the data

Next up is creating the data split object. This is a different set of functions than what I used to split the data for the global modeling process. In this workflow the future dates get added for each city up front.

Code

econ_splits <- econ_data |>
  extend_timeseries(
    .id_var = city,
    .date_var = date,
    .length_future = 12
  ) |>
  nest_timeseries(
    .id_var = city,
    .length_future = 12,
    .length_actual = 17 * 12
  ) |>
  split_nested_timeseries(.length_test = 24)

econ_splits

# A tibble: 4 × 4
  city      .actual_data        .future_data       .splits         
  <chr>     <list>              <list>             <list>          
1 Dallas    <tibble [204 × 15]> <tibble [12 × 15]> <split [180|24]>
2 Detroit   <tibble [204 × 15]> <tibble [12 × 15]> <split [180|24]>
3 New York  <tibble [204 × 15]> <tibble [12 × 15]> <split [180|24]>
4 San Diego <tibble [204 × 15]> <tibble [12 × 15]> <split [180|24]>

With these two objects prepared I can now fit the models iteratively for each city.

Fit models

I’ll first prepare a parallel processing cluster and set a random seed for each node then fit the models.

Note

modeltime_nested_fit() requires that models be specified individually rather than as a workflowset object.

Code

cl <- makePSOCKcluster(nrow(part4_mods))
doParallel::registerDoParallel(cl)
invisible(
  clusterCall(cl, function(x) .libPaths(x), .libPaths())
)
print(cl)
invisible(clusterEvalQ(cl, set.seed(500)))

hpi_wflowset <- modeltime_nested_fit(
  econ_splits,
  hpi_global_wfset |> extract_workflow(mod_ids[1]),
  hpi_global_wfset |> extract_workflow(mod_ids[2]),
  hpi_global_wfset |> extract_workflow(mod_ids[3]),
  hpi_global_wfset |> extract_workflow(mod_ids[4]),
  hpi_global_wfset |> extract_workflow(mod_ids[5]),
  hpi_global_wfset |> extract_workflow(mod_ids[6]),
  hpi_global_wfset |> extract_workflow(mod_ids[7]),
  hpi_global_wfset |> extract_workflow(mod_ids[8]),
  hpi_global_wfset |> extract_workflow(mod_ids[9]),
  hpi_global_wfset |> extract_workflow(mod_ids[10]),
  hpi_global_wfset |> extract_workflow(mod_ids[11]),
  control = control_nested_fit(verbose = TRUE, allow_par = TRUE)
)

stopCluster(cl)
rm(cl)

Code

hpi_wflowset |> as_tibble()

# A tibble: 4 × 5
  city      .actual_data        .future_data .splits          .modeltime_tables
  <chr>     <list>              <list>       <list>           <list>           
1 Dallas    <tibble [204 × 15]> <tibble>     <split [180|24]> <mdl_tm_t>       
2 Detroit   <tibble [204 × 15]> <tibble>     <split [180|24]> <mdl_tm_t>       
3 New York  <tibble [204 × 15]> <tibble>     <split [180|24]> <mdl_tm_t>       
4 San Diego <tibble [204 × 15]> <tibble>     <split [180|24]> <mdl_tm_t>

Notice that this object is grouped by city whereas the corresponding object for the global modeling process is listed by model. This happens because modeltime_nested_fit() calls a split object that has the data grouped by ID already. All of the functions in the modeltime Nested Forecasting workflow geared toward doing things at the level of the ID variable, which is “city” in this case. Each row contains a mdl_time_tbl object in the “.modeltime_tables” column for each row which has one row for each model that was fit to the data of the ID in that row. This mdl_time_tbl object contains calibration data, which gets created during the iterative modeling process. Below is an example.

Code

hpi_wflowset |> slice(1) |> pluck(".modeltime_tables", 1)

# Modeltime Table
# A tibble: 11 × 5
   .model_id .model       .model_desc           .type .calibration_data
       <int> <named list> <chr>                 <chr> <list>           
 1         1 <workflow>   ETSAADN               Test  <tibble [24 × 4]>
 2         2 <workflow>   ARIMA                 Test  <tibble [24 × 4]>
 3         3 <workflow>   ARIMA                 Test  <tibble [24 × 4]>
 4         4 <workflow>   ETSAADA               Test  <tibble [24 × 4]>
 5         5 <workflow>   ETSAADA               Test  <tibble [24 × 4]>
 6         6 <workflow>   REGRESSION            Test  <tibble [24 × 4]>
 7         7 <workflow>   REGRESSION            Test  <tibble [24 × 4]>
 8         8 <NULL>       NULL                  <NA>  <lgl [1]>        
 9         9 <NULL>       NULL                  <NA>  <lgl [1]>        
10        10 <NULL>       NULL                  <NA>  <lgl [1]>        
11        11 <workflow>   SEASONAL DECOMP ARIMA Test  <tibble [24 × 4]>

Selecting the best models

{modeltime} has the modeltime_nested_select_best() for selecting the best model for each ID in a nested model table. It requires the specification of a metric for determining which model is “best”. As before, I’ll use the Scaled Mean Average Percent Error (SMAPE).

Code

hpi_best <- hpi_wflowset |>
  modeltime_nested_select_best(
    metric = "smape",
    minimize = TRUE,
    filter_test_forecasts = TRUE
  )

hpi_best

# Nested Modeltime Table
  # A tibble: 4 × 5
  city      .actual_data        .future_data .splits          .modeltime_tables 
  <chr>     <list>              <list>       <list>           <list>            
1 Dallas    <tibble [204 × 15]> <tibble>     <split [180|24]> <mdl_tm_t [1 × 5]>
2 Detroit   <tibble [204 × 15]> <tibble>     <split [180|24]> <mdl_tm_t [1 × 5]>
3 New York  <tibble [204 × 15]> <tibble>     <split [180|24]> <mdl_tm_t [1 × 5]>
4 San Diego <tibble [204 × 15]> <tibble>     <split [180|24]> <mdl_tm_t [1 × 5]>

Notice that after selecting the best model it’s only the “.modeltime_tables” column that changes. It now contains a table with one single row corresponding to the model that yielded the best value of the selected metric.

View accuracy best models

One gets the accuracy from a nested modeltime table by using extract_nested_test_accuracy() . Interestingly, this function shows the accuracy of all models even if it’s run with the table of the “best” models. So below I’ll use group_by() to get the accuracy for the best model for each city.

Code

hpi_wflowset |>
  extract_nested_test_accuracy() |>
  group_by(city) |>
  slice_min(smape, n = 1) |>
  ungroup() |>
  table_modeltime_accuracy(.interactive = FALSE) |>
  gt_bold_head()

Accuracy Table
city	.model_id	.model_desc	.type	mae	mape	mase	smape	rmse	rsq
Dallas	11	SEASONAL DECOMP ARIMA	Test	39.49	14.11	7.54	15.46	44.89	0.86
Detroit	11	SEASONAL DECOMP ARIMA	Test	4.16	2.56	2.37	2.59	4.80	0.86
New York	11	SEASONAL DECOMP ARIMA	Test	9.12	3.50	3.84	3.58	10.70	0.90
San Diego	6	REGRESSION	Test	30.77	8.19	4.36	7.75	35.89	0.29

Next are the forecasts against the test data.

Code

fcst_pal <- c(
  "#2c3e50", "#FF0000", "#00FFFF", "#FFFF00", "#0000FF", "#00FF00",
  "#FF00FF", "#FF8000", "#0080FF", "#80FF00", "#8000FF", "#00FF80",
  "#FF0080"
)

hpi_best |>
  extract_nested_test_forecast() |>
  group_by(city) |>
  plot_modeltime_forecast(.interactive = FALSE) +
  scale_color_manual(values = fcst_pal)

1: Create a color palette for plotting.

For Dallas, Detroit, and New York the 11th model in the original workflow which is a Seasonal Decomposition ARIMA model and for San Diego it’s the 6th model which is just called “REGRESSION”. I’ll show the table of model names here here with a “model_num” column indicating model number.

Code

part4_mods |>
  mutate(model_num = row_number()) |>
  gt() |>
  gt_bold_head()

.model_desc	model_num
BASE_REC_ETS_DEFAULT	1
BASE_REC_ARIMA_DEFAULT	2
BASE_REC_ARIMA_SPEC2	3
BASE_REC_ETS_SPEC14	4
BASE_REC_ETS_SPEC6	5
DEMOG_REC_ARIMA_SPEC1	6
DEMOG_REC_ARIMA_SPEC3	7
NNET_REC_NNET_SPEC22	8
NNET_REC_NNET_SPEC9	9
NNET_REC_NNET_SPEC12	10
BASE_REC_STLM_SPEC3	11

We can see that the 11th model is an STLM model and the 6th model is the first ARIMA specification with the “DEMOG” recipe (this is the one that included economic and demographic covariates).

The plot for San Diego does not include a margin of error and it’s listed as having an error in the legend. This is something that can be traced to the calibration. First I’ll extract the “.modeltime_tables”.

Code

hpi_wflowset |>
  filter(city %in% "San Diego") |>
  pluck(".modeltime_tables", 1)

# Modeltime Table
# A tibble: 11 × 5
   .model_id .model       .model_desc           .type .calibration_data
       <int> <named list> <chr>                 <chr> <list>           
 1         1 <workflow>   ETSAADN               Test  <tibble [24 × 4]>
 2         2 <workflow>   ARIMA                 Test  <tibble [24 × 4]>
 3         3 <workflow>   ARIMA                 Test  <tibble [24 × 4]>
 4         4 <workflow>   ETSAADA               Test  <tibble [24 × 4]>
 5         5 <workflow>   ETSAADA               Test  <tibble [24 × 4]>
 6         6 <workflow>   REGRESSION            Test  <tibble [24 × 4]>
 7         7 <workflow>   REGRESSION            Test  <tibble [24 × 4]>
 8         8 <NULL>       NULL                  <NA>  <lgl [1]>        
 9         9 <NULL>       NULL                  <NA>  <lgl [1]>        
10        10 <NULL>       NULL                  <NA>  <lgl [1]>        
11        11 <workflow>   SEASONAL DECOMP ARIMA Test  <tibble [24 × 4]>

I can see that the 6th model does contain calibration data, so this isn’t the problem. Now I’ll extract the calibration data just for this model to see what’s going on.

Code

hpi_wflowset |>
  filter(city %in% "San Diego") |>
  pluck(".modeltime_tables", 1) |>
  filter(.model_id == 6) |>
  pluck(".calibration_data", 1)

# A tibble: 24 × 4
   date       .actual .prediction .residuals
   <date>       <dbl>       <dbl>      <dbl>
 1 2021-01-01    302.         NA        NA  
 2 2021-02-01    311.         NA        NA  
 3 2021-03-01    321.         NA        NA  
 4 2021-04-01    332.         NA        NA  
 5 2021-05-01    341.         NA        NA  
 6 2021-06-01    350.         NA        NA  
 7 2021-07-01    355.        409.      -53.6
 8 2021-08-01    357.        410.      -53.2
 9 2021-09-01    360.        412.      -51.6
10 2021-10-01    364.        413.      -48.8
# ℹ 14 more rows

Gaaaahhhh! It’s the missing data created by the lags in the recipe for this model. It might be interesting to run this model again without those lags, but I’ll be happy just to know why there is no margin of error for this one. Interestingly, though, it looks like the STLM did a pretty good job with Detroit and New York.

Now it’s on to forecasting the future.

Forecasting future HPI

The first step in forecasting is re-fitting the models to the full data set including the test portion. I’ll do this with modeltime_nested_refit() . I’ll immediately use that refit object to plot the forecasts.

Code

hpi_refit <- hpi_best |>
  modeltime_nested_refit(control = control_nested_refit(verbose = FALSE))

hpi_refit |>
  extract_nested_future_forecast() |>
  group_by(city) |>
  plot_modeltime_forecast(.interactive = FALSE) +
  scale_color_manual(values = fcst_pal)

It looks like the missing values due to the lags prevented the ARIMA from generating forecasts, but the other forecasts look quite believable and much better than the forecasts made by the global process.

Comparing forecasts to what really happened

I used data that ended in December 2022 because that’s the most recent data I had for some demographic variables. This isn’t a problem for the STLM since this model doesn’t rely on demographic covariates, but there is HPI data through September of 2023, and the forecasts go through December of 2023. So, here I’ll just get the HPI data for 2023 for Dallas, Detroit, and New York.

Code

case_shiller_ids <- fredr::fredr_series_search_text("Case-Shiller") |>
  filter(
    str_detect(
      title,
      "TX-Dallas|NY-New York|MI-Detroit Home Price Index"
    ),
    seasonal_adjustment_short %in% "NSA",
    frequency %in% "Monthly"
  ) |>
  mutate(city = str_extract(title, "New York|Dallas|San Diego|Detroit")) |>
  select(city, id)

# Get Case Shiller data for 2023
hpi_data <- case_shiller_ids |>
  mutate(
    data = map(
      id,
      \(x) fredr::fredr(
        series_id = x,
        observation_start = ymd("2020-01-01"),
        observation_end = ymd("2023-12-31"),
        frequency = "m"
      )
    )
  ) |>
  select(-id) |>
  unnest(data) |>
  select(city, date, hpi = value)

hpi_data |>
  pivot_wider(names_from = city, values_from = hpi) |>
  filter(year(date) == 2023) |>
  gt() |>
  gt_bold_head()

date	Dallas	New York	Detroit
2023-01-01	281.7092	272.8493	165.3712
2023-02-01	281.6864	271.9243	165.0817
2023-03-01	284.7053	274.9276	168.7419
2023-04-01	288.6548	278.5348	172.7052
2023-05-01	293.1761	283.2171	176.2015
2023-06-01	295.2387	286.5359	177.9996
2023-07-01	296.1252	289.0024	179.2959
2023-08-01	295.6140	290.5704	180.7211
2023-09-01	295.2280	292.3354	181.9622
2023-10-01	294.2782	293.4502	182.6364
2023-11-01	292.4071	294.2285	181.8668

Now I can make a table of accuracy statistics for these series.

Code

hpi_data |>
  left_join(
    hpi_refit |>
      filter(!city %in% "San Diego") |>
      extract_nested_future_forecast() |>
      filter(.key %in% "prediction") |>
      unite("model_name", .model_id, .model_desc) |>
      select(!.key),
    by = c("city", "date" = ".index")
  ) |>
  group_nest(city) |>
  mutate(
    smape = map_dbl(
      data,
      \(x) smape(x, hpi, .value) |> pull(.estimate)
    ),
    rmse = map_dbl(
      data,
      \(x) rmse(x, hpi, .value) |> pull(.estimate)
    ),
    rsq = map_dbl(
      data,
      \(x) rsq(x, hpi, .value) |> pull(.estimate)
    )
  ) |>
  select(-data) |>
  ungroup() |>
  gt() |>
  gt_bold_head()

city	smape	rmse	rsq
Dallas	3.522232	13.336181	0.0272001
Detroit	5.161485	10.526957	0.2394261
New York	1.658322	5.643017	0.9973690

These are some pretty low (good) values of SMAPE. Let’s see what these look like in plots.

Code

hpi_data |>
  mutate(model_name = "ACTUAL") |>
  rename(.value = hpi) |>
  bind_rows(
    hpi_refit |>
      extract_nested_future_forecast() |>
      filter(.key %in% "prediction", !city %in% "San Diego") |>
      unite("model_name", .model_id, .model_desc) |>
      select(!.key) |>
      rename(date = .index)
  ) |>
  mutate(
    model_name = fct_relevel(model_name, "ACTUAL"),
    lwidth = if_else(model_name %in% "ACTUAL", 1, 2)
  ) |>
  ggplot(aes(x = date, group = model_name, color = model_name)) +
  geom_line(aes(y = .value, linewidth = lwidth)) +
  geom_ribbon(aes(ymin = .conf_lo, ymax = .conf_hi), alpha = 0.2, color = NA) +
  scale_color_manual(values = fcst_pal) +
  scale_y_continuous(expand = c(0.5, 0.5)) +
  scale_linewidth(range = c(0.5, 1)) +
  facet_wrap(~ city, scales = "free_y", ncol = 1) +
  labs(
    color = NULL,
    x = NULL,
    y = NULL,
    title = "Forecasts vs. Actual Data"
  ) +
  guides(linewidth = "none") +
  theme_timetk +
  theme(plot.title = element_text(hjust = 0.5))

Not too bad! All three forecasts look like they were very close in the first few months of the forecast period, but started to underestimate later on.

This ends the series of planned posts on using modeltime with R. All in all this is a great package with all sorts of goodies for forecasting time series.

References

Dancho, Matt. 2023. “Modeltime: The Tidymodels Extension for Time Series Modeling.” https://CRAN.R-project.org/package=modeltime.

Iannone, Richard, Joe Cheng, Barret Schloerke, Ellis Hughes, Alexandra Lauer, and JooYoung Seo. 2024. “Gt: Easily Create Presentation-Ready Display Tables.” https://CRAN.R-project.org/package=gt.

Kuhn, Max, and Hadley Wickham. 2020. “Tidymodels: A Collection of Packages for Modeling and Machine Learning Using Tidyverse Principles.” https://www.tidymodels.org.

R Core Team. 2023. “R: A Language and Environment for Statistical Computing.” https://www.R-project.org/.

Wickham, Hadley, Mara Averick, Jennifer Bryan, Winston Chang, Lucy D’Agostino McGowan, Romain François, Garrett Grolemund, et al. 2019. “Welcome to the Tidyverse” 4: 1686. https://doi.org/10.21105/joss.01686.