Embodied Question Answering

Creators: Das, Abhishek; Datta, Samyak; Gkioxari, Georgia; Lee, Stefan; Parikh, Devi; Batra, Dhruv

Abstract

We present a new AI task - Embodied Question Answering(EmbodiedQA) - where an agent is spawned at a random location in a 3D environment and asked a question ('What color is the car?'). In order to answer, the agent must first intelligently navigate to explore the environment, gather necessary visual information through first-person (egocentric) vision, and then answer the question ('orange'). EmbodiedQA requires a range of AI skills - language understanding, visual recognition, active perception, goal-driven navigation, commonsense reasoning, long-term memory, and grounding language into actions. In this work, we develop a dataset of questions and answers in House3D environments [1], evaluation metrics, and a hierarchical model trained with imitation and reinforcement learning.

Additional Information

We thank the developers of PyTorch for building an excellent framework, and Yuxin Wu for help with House3D environments. This work was funded in part by NSF CAREER awards to DB and DP, ONR TYIP awards to DP and DB, ONR Grant N00014-14-1-0679 to DB, ONR Grant N00014-16-1-2713 to DP, an Allen Distinguished Investigator award to DP from the Paul G. Allen Family Foundation, Google Faculty Research Awards to DP and DB, Amazon Academic Research Awards to DP and DB, DARPA XAI grant to DB and DP, AWS in Education Research grant to DB, and NVIDIA GPU donations to DB. The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the official policies or endorsements, either expressed or implied, of the U.S. Government, or any sponsor.

Additional details

Views

Downloads

	All versions	This version
Views	0	0
Downloads	0	0
Data volume	0 Bytes	0 Bytes

More info on how stats are collected....

Resource type: Book Section - Chapter
Publisher: IEEE
Imprint: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 1-10. Piscataway, NJ. ISBN: 978-1-5386-6420-9.
Conference: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, 18-23 June 2018