Evaluation of an Integrated Multi-Task Machine Learning System with Humans in the Loop

Aaron Steinfeld, S. Rachael Bennett, Kyle Cunningham, Matt Lahut, Pablo-Alejandro Quinones, Django Wexler, Daniel Siewiorek, Jordan Hayes, Paul Cohen, Julie Fitzgerald, Othar Hansson, Mike Pool and Mark Drummond
Conference Paper, NIST Performance Metrics for Intelligent Systems Workshop (PerMIS), January, 2007

View Publication

Copyright notice: This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright. These works may not be reposted without the explicit permission of the copyright holder.


Performance of a cognitive personal assistant, RADAR, consisting of multiple machine learning components, natural language processing, and optimization was examined with a test explicitly developed to measure the impact of integrated machine learning when used by a human user in a real world setting. Three conditions (conventional tools, Radar without learning, and Radar with learning) were evaluated in a large-scale, between-subjects study. The study revealed that integrated machine learning does produce a positive impact on overall performance. This paper also discusses how specific machine learning components contributed to human-system performance.

author = {Aaron Steinfeld and S. Rachael Bennett and Kyle Cunningham and Matt Lahut and Pablo-Alejandro Quinones and Django Wexler and Daniel Siewiorek and Jordan Hayes and Paul Cohen and Julie Fitzgerald and Othar Hansson and Mike Pool and Mark Drummond},
title = {Evaluation of an Integrated Multi-Task Machine Learning System with Humans in the Loop},
booktitle = {NIST Performance Metrics for Intelligent Systems Workshop (PerMIS)},
year = {2007},
month = {January},
} 2017-09-13T10:42:22-04:00