Audio-Visual State-Aware Representation Learning from Interaction-Rich, Egocentric Videos - Robotics Institute Carnegie Mellon University