Creating Multi-Modal, User-Centric Records of Meetings with the Carnegie Mellon Meeting Recorder Architecture

Satanjeev Banerjee, Jason Cohen, Thomas Quisel, Arthur Chan, Yash Patodia, Ziad Al Bawab, Rong Zhang, Alan Black, Richard Stern, Roni Rosenfeld, Alexander Rudnicky, Paul Rybski, and Manuela Veloso
International Conference on Acoustics, Speech, and Signal Processing, Meeting Recognition Workshop, May, 2004.


Download
  • Adobe portable document format (pdf) (274KB)
Copyright notice: This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright. These works may not be reposted without the explicit permission of the copyright holder.

Abstract
Our goal is to build conversational agents that combine information from speech, gesture, hand-writing, text and presentations to create an understanding of the ongoing conversation (e.g. by identifying the action items agreed upon), and that can make useful contributions to the meeting based on such an understanding (e.g. by confirming the details of the action items). To create a corpus of relevant data, we have implemented the Carnegie Mellon Meeting Recorder to capture detailed multi-modal recordings of meetings. This software differs somewhat from other meeting room architectures in that it focuses on instrumenting the individual rather than the room and assumes that the meeting space is not fixed in advance. Thus, most of the sensors are user-centric (closetalking microphones connected to laptop computers, instrumented note-pads, instrumented presentation software, etc), although some are indeed ”room-centric” (instrumented whiteboard, distant cameras, table-top microphones, etc). This paper describes the details of our data collection environment. We report on the current status of our data collection, transcription and higher-level discourse annotation efforts. We also describe some of our initial research on conversational turn-taking based on this corpus.

Notes
Sponsor: DARPA
Associated Center(s) / Consortia: Vision and Autonomous Systems Center
Associated Lab(s) / Group(s): People Image Analysis Consortium and MultiRobot Lab
Associated Project(s): Camera Assisted Meeting Event Observer
Number of pages: 6

Text Reference
Satanjeev Banerjee, Jason Cohen, Thomas Quisel, Arthur Chan, Yash Patodia, Ziad Al Bawab, Rong Zhang, Alan Black, Richard Stern, Roni Rosenfeld, Alexander Rudnicky, Paul Rybski, and Manuela Veloso, "Creating Multi-Modal, User-Centric Records of Meetings with the Carnegie Mellon Meeting Recorder Architecture," International Conference on Acoustics, Speech, and Signal Processing, Meeting Recognition Workshop, May, 2004.

BibTeX Reference
@inproceedings{Black_2004_6660,
   author = "Satanjeev Banerjee and Jason Cohen and Thomas Quisel and Arthur Chan and Yash Patodia and Ziad Al Bawab and Rong Zhang and Alan Black and Richard Stern and Roni Rosenfeld and Alexander Rudnicky and Paul Rybski and Manuela Veloso",
   title = "Creating Multi-Modal, User-Centric Records of Meetings with the Carnegie Mellon Meeting Recorder Architecture",
   booktitle = "International Conference on Acoustics, Speech, and Signal Processing, Meeting Recognition Workshop",
   month = "May",
   year = "2004",
}