Cross-validation for Longitudinal Datasets

This demo is based on the KDD 2025 paper "Cross-Validation for Longitudinal Datasets with Unstable Correlations".
It simulates an outcome that is a linear combination of two features:

one consistently predictive over time (the stable feature)
one only occasionally predictive (the unstable feature)

It then compares the expected MSE estimated by different CV strategies—random CV, block CV, and our proposed approach (|block CV output - random CV output|) of two linear models:

one that only uses the stable feature (the stable model) to predict the outcome
one that only uses the unstable feature (the unstable model) to predict the outcome

Random and block CV often estimate the unstable model as having a lower MSE than the stable model, resulting in models that will fail over time. On the other hand our method avoids this pitfall and provides more reliable model selection. To play with this demo, click the sliding bars for

a: the coefficient associated with the stable feature
b: the coefficient associated with the unstable feature
A_u: the average proportion of training data where the unstable feature and outcome are associated
V_u: the variance in the proportion of training data where the unstable feature and outcome are associated over time