/Answer-Aware Attention on Grounded Question Answering in Images

Answer-Aware Attention on Grounded Question Answering in Images

Junjie Hu, Desai Fan, Shuxin Yao and Jean Hyaejin Oh
Conference Paper, AAAI Fall Symposium, November, 2017

Download Publication (PDF)

Copyright notice: This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author’s copyright. These works may not be reposted without the explicit permission of the copyright holder.


Grounding natural language expressions to visual context in an image is essential to understanding the semantic meaning of an image. Recent attention approaches on the task of grounded question answering in images simply rely on either attention over arbitrary regions in an image or attention over words in a question, which have not exploited the information behind candidate answers when encoding the question. To address this limitation, we propose two Answer-Aware Attention (AAA) models which use attention over candidate answers, i.e., global and local attention over answers, each of which learns an answer-aware summarization vector of a question. Our proposed attention model leverages information from both textual and visual modalities, which boosts the prediction accuracy in the grounded question answering task. Extensive experiments show that our proposed attention model performs comparably to the state-of-the-art mod- els with much fewer learning parameters.

BibTeX Reference
author = {Junjie Hu and Desai Fan and Shuxin Yao and Jean Hyaejin Oh},
title = {Answer-Aware Attention on Grounded Question Answering in Images},
booktitle = {AAAI Fall Symposium},
year = {2017},
month = {November},
keywords = {grounded question answering, attention model},