Cross-validation for Longitudinal Datasets with Unstable Correlations
This demo is based on the KDD 2025 paper "Cross-Validation for Longitudinal Datasets with Unstable Correlations".
It simulates an outcome that is a linear combination of two features:
one consistently predictive over time (the stable feature)
one only occasionally predictive (the unstable feature)
It then compares the expected MSE estimated by different CV strategies—random CV, block CV, and our proposed approach (|block CV output - random CV output|) of two linear models:
one that only uses the stable feature (the stable model) to predict the outcome
one that only uses the unstable feature (the unstable model) to predict the outcome
Random and block CV often estimate the unstable model as having a lower MSE than the stable model, resulting in models that will fail over time. On the other hand our method avoids this pitfall and provides more reliable model selection. To play with this demo, click the sliding bars for
a: the coefficient associated with the stable feature
b: the coefficient associated with the unstable feature
Au: the average proportion of training data where the unstable feature and outcome are associated
Vu: the variance in the proportion of training data where the unstable feature and outcome are associated over time
a:
b:
Au:
Vu:
Strength of correlation of stable vs. unstable feature and outcome over time
Time period (t)
■ Stable (a)
■ Unstable (bpu(t))
Note:Stable - Random CV and Stable - Block CV overlap entirely because they always produce the same output.