Welcome to the new version of CaltechAUTHORS. Login is currently restricted to library staff. If you notice any issues, please email coda@library.caltech.edu
Published June 2010 | Accepted Version
Book Section - Chapter Open

Online crowdsourcing: rating annotators and obtaining cost-effective labels

Abstract

Labeling large datasets has become faster, cheaper, and easier with the advent of crowdsourcing services like Amazon Mechanical Turk. How can one trust the labels obtained from such services? We propose a model of the labeling process which includes label uncertainty, as well a multi-dimensional measure of the annotators' ability. From the model we derive an online algorithm that estimates the most likely value of the labels and the annotator abilities. It finds and prioritizes experts when requesting labels, and actively excludes unreliable annotators. Based on labels already obtained, it dynamically chooses which images will be labeled next, and how many labels to request in order to achieve a desired level of confidence. Our algorithm is general and can handle binary, multi-valued, and continuous annotations (e.g. bounding boxes). Experiments on a dataset containing more than 50,000 labels show that our algorithm reduces the number of labels required, and thus the total cost of labeling, by a large factor while keeping error rates low on a variety of datasets.

Additional Information

©2010 IEEE. We thank Catherine Wah, Florian Schroff, Steve Branson, and Serge Belongie for motivation, discussions and help with the data collection. We also thank Piotr Dollar, Merrielle Spain, Michael Maire, and Kristen Grauman for helpful discussions and feedback. This work was supported by ONR MURI Grant #N00014-06-1-0734 and ONR/Evolution Grant #N00173-09-C-4005.

Attached Files

Accepted Version - WelinderPerona10.pdf

Files

WelinderPerona10.pdf
Files (1.2 MB)
Name Size Download all
md5:4f978fd46d4c3b9f62e8c22b4d954fa1
1.2 MB Preview Download

Additional details

Created:
August 19, 2023
Modified:
October 26, 2023