Predicting human gaze beyond pixels

Creators: Xu, Juan; Jiang, Ming; Wang, Shuo; Kankanhalli, Mohan S.; Zhao, Qi

Style

An error occurred while generating the citation.

Abstract

A large body of previous models to predict where people look in natural scenes focused on pixel-level image attributes. To bridge the semantic gap between the predictive power of computational saliency models and human behavior, we propose a new saliency architecture that incorporates information at three layers: pixel-level image attributes, object-level attributes, and semantic-level attributes. Object- and semantic-level information is frequently ignored, or only a few sample object categories are discussed where scaling to a large number of object categories is not feasible nor neurally plausible. To address this problem, this work constructs a principled vocabulary of basic attributes to describe object- and semantic-level information thus not restricting to a limited number of object categories. We build a new dataset of 700 images with eye-tracking data of 15 viewers and annotation data of 5,551 segmented objects with fine contours and 12 semantic attributes (publicly available with the paper). Experimental results demonstrate the importance of the object- and semantic-level information in the prediction of visual attention.

Additional Information

© 2014 ARVO. Received December 4, 2012; Accepted August 27, 2013. The authors would like to thank Dr. Christof Koch and members of the Koch Lab at Caltech for valuable comments. This research was partially supported by the Singapore NRF under its IRC@SG Funding Initiative and administered by the IDMPO, and the Singapore Ministry of Education Academic Research Fund Tier 1 (No.R-263-000-648-133).

Attached Files

Published - 28.full.pdf

Files

28.full.pdf

Files (2.1 MB)

Name	Size	Download all
28.full.pdf md5:b87222354203427f26d2777f96219835	2.1 MB	Preview Download

Additional details

	All versions	This version
Views	0	0
Downloads	0	0
Data volume	0 Bytes	0 Bytes