TimePartitioning

ViEWS Time-Partitioning Scheme

ViEWS use a time-partitioning scheme that splits the available data into three partitions/periods: training, calibration, and testing/forecasting. The time periods for these partitions are deﬁned based on the time stamps for the observed outcomes. The approach is described in-depth in Appendix A of the Hegre et al. (2020).

τ refers to calendar time, 4 but we add subscripts to identify when the partitions start and end. Because the partitions diﬀer between evaluation and true forecasting, we have also added the superscript e to all notations of our evaluation partitions. The periodization table below shows the partitioning of data for estimating model weights, hyper-parameter tuning, evaluation, and forecasting.

The "forecast" periodization is for actual forecasting
The "evaluation" periodization is for testing models and ensembles.
The training periods are used to train the models
The calibration periods are used for hyper-parameter tuning and to estimate model weights.

Periodization

After calibration EBMA, and hyper-parameter tuning, we retrain our models using both the training and calibration partitions

Usage

1. define the partitioning scheme

partitioner = data_partitioner.DataPartitioner.from_legacy_periods([
    legacy.Period("A",
                  train_start=121,train_end=396,
                  predict_start=397,predict_end=432)
])

2. Apply the partitioner

training_a = partitioner("A","train",hh_data_model)
print(training_a.index.get_level_values(0)[[0,-1]])

3. Train the model

from stepshift import views
from sklearn.ensemble import RandomForestRegressor
mdl = views.StepshiftedModels(
    RandomForestRegressor(), 
    [*range(1,4)], 
    "ln_ged_sb_dep")

4. Generate the predictions

predictions = mdl.predict(partitioner("A","predict",hh_data))

The resulting object contains rows starting at predict_start=397 and ending at predict_end=432

time	unit	step_pred_1	step_pred_2	step_pred_3	step_combined

397	530	3.248208	0.068794	0.071361	3.248208
398	530	0.060043	3.139737	0.071361	3.139737
399	530	0.060043	0.068794	3.100956	3.100956
400	530	0.060043	0.068794	0.071361	NaN
401	530	0.060043	0.068794	0.071361	NaN
402	530	0.060043	0.068794	0.071361	NaN
403	530	0.060043	0.068794	0.071361	NaN
404	530	1.534400	0.068794	0.071361	NaN
405	530	0.060043	1.540886	0.071361	NaN
406	530	0.060043	0.068794	1.649546	NaN
407	530	0.060043	0.068794	0.071361	NaN
408	530	3.075878	0.068794	0.071361	NaN
409	530	3.255441	2.440132	0.071361	NaN
410	530	0.060043	3.509400	2.962537	NaN
411	530	0.060043	0.068794	3.711153	NaN
412	530	0.060043	0.068794	0.071361	NaN
413	530	0.060043	0.068794	0.071361	NaN
414	530	0.060043	0.068794	0.071361	NaN
415	530	0.060043	0.068794	0.071361	NaN
416	530	1.748024	0.068794	0.071361	NaN
417	530	0.060043	1.963898	0.071361	NaN
418	530	0.060043	0.068794	1.900073	NaN
419	530	0.060043	0.068794	0.071361	NaN
420	530	0.060043	0.068794	0.071361	NaN
421	530	2.670264	0.068794	0.071361	NaN
422	530	0.060043	2.285300	0.071361	NaN
423	530	0.060043	0.068794	2.428119	NaN
424	530	0.060043	0.068794	0.071361	NaN
425	530	0.060043	0.068794	0.071361	NaN
426	530	0.969150	0.068794	0.071361	NaN
427	530	0.060043	1.005730	0.071361	NaN
428	530	0.060043	0.068794	1.022231	NaN
429	530	0.060043	0.068794	0.071361	NaN
430	530	0.060043	0.068794	0.071361	NaN
431	530	0.060043	0.068794	0.071361	NaN
432	530	0.060043	0.068794	0.071361	NaN

Naming conventions and partition definition

To retain the name conventions we have established the columns currently called step_pred_1, step_pred_2 etc should be called ss_1, ss_2, etc., or, if preferred, step_spec_1, step_spec_2. The column called step_combined is in line with convention (but were called sc in views2). The advantage with the two-letter abbreviation is that when we are generating ensembles, there will a large number of columns with different model-name prefix and then _ss_1 or _step_spec_1.

Throughout, the partition is defined in terms of the month of the actuals we are targeting, not in terms of the last month with data or the last month in the training set. The partial exception is the step_combined, which is defined both in terms of the last month in the training set and the month of the actual. Accordingly, the step_combined series starts at predict_start and ends at predict_start plus the number of steps in the call to StepshiftedModels.

The figures below are illustrations of the process from Hegre et al. (2020).

Timeshifting

Test_vs_forecast_2020