Welcome to the new version of CaltechAUTHORS. Login is currently restricted to library staff. If you notice any issues, please email coda@library.caltech.edu
Published December 20, 2022 | Submitted
Report Open

Data Distillation: Towards Omni-Supervised Learning

Abstract

We investigate omni-supervised learning, a special regime of semi-supervised learning in which the learner exploits all available labeled data plus internet-scale sources of unlabeled data. Omni-supervised learning is lower-bounded by performance on existing labeled datasets, offering the potential to surpass state-of-the-art fully supervised methods. To exploit the omni-supervised setting, we propose data distillation, a method that ensembles predictions from multiple transformations of unlabeled data, using a single model, to automatically generate new training annotations. We argue that visual recognition models have recently become accurate enough that it is now possible to apply classic ideas about self-training to challenging real-world data. Our experimental results show that in the cases of human keypoint detection and general object detection, state-of-the-art models trained with data distillation surpass the performance of using labeled data from the COCO dataset alone.

Additional Information

We would like to thank Daniel Rueckert for his support and guidance during the initial stages of the project.

Attached Files

Submitted - 1712.04440.pdf

Files

1712.04440.pdf
Files (2.8 MB)
Name Size Download all
md5:b37083962f18b7340ce449c45fb2e613
2.8 MB Preview Download

Additional details

Created:
August 19, 2023
Modified:
October 24, 2023