Multimodal Human Mesh Recovery for Stand-off Triage in Mass Casualty Scenarios
Abstract
Mass-casualty triage robots must judge a victim’s ability to move—even when light is scarce, geometry is ambiguous, or some sensors fail. We tackle this challenge with a multimodal mesh-recovery framework that fuses RGB, LiDAR, and infrared (IR) inputs to reconstruct full-body SMPL meshes from whichever subset of sensors is available. Built on a frozen HMR 2.0 backbone, a lightweight transformer-based Modality Unifier is trained with random modality dropout so the same weights handle any sensor combination offline.
To supervise and benchmark such fusion, we curate two mesh-annotated datasets: STCrowd-Mesh (~10k RGB+LiDAR pedestrian frames) and LLVIP-Mesh (~15k aligned RGB-IR pairs). On STCrowd-Mesh, adding LiDAR to RGB lowers MPJPE from 86.9mm to 75.8mm (–14.3%) and PA-MPJPE from 63.1mm to 57.8mm (–5.3mm), confirming LiDAR’s value for absolute spatial accuracy. In low-light LLVIP-Mesh scenes, fusing IR with RGB yields consistent but smaller gains, indicating complementary appearance cues.
We deploy the trained model offline on real-world DARPA Triage Challenge field logs recorded from a mobile Spot robot. The recovered SMPL meshes are used to infer motor alertness by analyzing joint movement patterns across time, classifying casualties into normal, abnormal, or unresponsive categories. These results show that dense, modality-flexible 3D pose estimation can enable remote physiological assessment, offering a promising step toward fully autonomous triage in complex and degraded environments.
BibTeX
@mastersthesis{Agarwal-2025-148159,author = {Aniket Agarwal},
title = {Multimodal Human Mesh Recovery for Stand-off Triage in Mass Casualty Scenarios},
year = {2025},
month = {July},
school = {Carnegie Mellon University},
address = {Pittsburgh, PA},
number = {CMU-RI-TR-25-66},
keywords = {human mesh recovery, HMR, lidar, DARPA, multimodal fusion, pose detection},
}