First, this thesis investigates the role of data quality and coverage. We demonstrate how incomplete representation of rare but critical behaviors, particularly those in the tails of the data distribution, can significantly impair forecasting reliability. We then propose strategies for improving dataset coverage through targeted data collection in safety-critical scenarios, and show how these interventions lead to more robust generalization on forecasting benchmarks.
Second, we examine benchmarking and evaluation practices, revealing that widely used metrics often obscure failure modes such as collisions or socially unlikely interactions. To address this, we introduce and advocate for evaluation metrics that align with safety objectives and better reflect the conditions necessary for deployment on real-world robotic systems. These provide more faithful signals of model performance and enable more meaningful comparisons across forecasting approaches.
Finally, building upon improved data and evaluation foundations, this thesis presents a forecasting method that makes effective use of the wealth of information present in sensor data. We introduce a forecasting approach that utilizes human body pose features as well as deep semantic environment features, resulting in predictions that are more socially consistent and better obey environmental constraints without sacrificing accuracy. Our method benefits from the foundations of comprehensive data coverage and safety-oriented benchmarking, demonstrating that advances in forecasting methods are most meaningful when built upon solid data and evaluation foundations.
Collectively, this work provides a unified framework for understanding and improving trajectory forecasting reliability. By addressing data, evaluation, and modeling together, this thesis contributes insights and tools toward building forecasting systems that are better aligned with the requirements for real-world autonomous decision-making.
Deva Ramanan, Co-chair
