In this thesis, we demonstrate that scale is not the only path to generalization by: developing multivariate architectures that leverage cross-channel dependencies efficiently while reducing forecast error; showing that architectures can generalize beyond their training distribution in both patterns and concepts; and verifying variance-aware architectural designs that extract richer training signals from existing data, provably reducing gradient variance while reducing forecast error and improving calibration.
Within the first theme, we further propose pretraining strategies for multivariate TSFMs to investigate whether data balancing and curriculum learning can improve downstream generalization given the same pretraining corpora. Within the second theme, we propose an additional dimension of generalization, extending beyond pattern and concept generalization to horizon generalization, an important consideration for TSFMs applied across diverse tasks and domains. Overall, this work contributes new insights into advancing time series forecasting generalization through efficient architectural design.
John Dolan
Barnabás Póczos
Michael W. Mahoney (University of California, Berkeley)
