Predicting human gaze using low-level saliency combined with face detection
Abstract
Under natural viewing conditions, human observers shift their gaze to allocate processing resources to subsets of the visual input. Many computational models try to predict such voluntary eye and attentional shifts. Although the important role of high level stimulus properties (e.g., semantic information) in search stands undisputed, most models are based on low-level image properties. We here demonstrate that a combined model of face detection and low-level saliency significantly outperforms a low-level model in predicting locations humans fixate on, based on eye-movement recordings of humans observing photographs of natural scenes, most of which contained at least one person. Observers, even when not instructed to look for anything particular, fixate on a face with a probability of over 80% within their first two fixations; furthermore, they exhibit more similar scanpaths when faces are present. Remarkably, our model's predictive performance in images that do not contain faces is not impaired, and is even improved in some cases by spurious face detector responses.
Attached Files
Published - nips2007.pdf
Supplemental Material - NIPS2007_1074.extra.zip
Supplemental Material - NIPS2007_1074.mp3
Supplemental Material - NIPS2007_1074_slide.pdf
Files
Additional details
- Eprint ID
- 40642
- Resolver ID
- CaltechAUTHORS:20130816-103345252
- Created
-
2008-01-26Created from EPrint's datestamp field
- Updated
-
2022-01-11Created from EPrint's last_modified field
- Caltech groups
- Koch Laboratory (KLAB)