Describing Common Human Visual Actions in Images
- Creators
- Ronchi, Matteo Ruggero
-
Perona, Pietro
Abstract
Which common human actions and interactions are recognizable in monocular still images? Which involve objects and/or other people? How many is a person performing at a time? We address these questions by exploring the actions and interactions that are detectable in the images of the MS COCO dataset. We make two main contributions. First, a list of 140 common 'visual actions', obtained by analyzing the largest online verb lexicon currently available for English (VerbNet) and human sentences used to describe images in MS COCO. Second, a complete set of annotations for those 'visual actions', composed of subject-object and associated verb, which we call COCO-a (a for 'actions'). COCO-a is larger than existing action datasets in terms of number instances of actions, and is unique because it is data-driven, rather than experimenter-biased. Other unique features are that it is exhaustive, and that all subjects and objects are localized. A statistical analysis of the accuracy of our annotations and of each action, interaction and subject-object combination is provided.
Additional Information
© 2015. The copyright of this document resides with its authors. It may be distributed unchanged freely in print or electronic forms.Attached Files
Published - BMVC15_DescribingCommonVisualActions_PAPER.pdf
Submitted - 1506.02203.pdf
Supplemental Material - BMVC15_DescribingCommonVisualActions_SUPP.pdf
Supplemental Material - sup052.zip
Files
Additional details
- Eprint ID
- 59927
- Resolver ID
- CaltechAUTHORS:20150827-113206063
- Created
-
2015-08-28Created from EPrint's datestamp field
- Updated
-
2022-10-24Created from EPrint's last_modified field