DiscoBox: Weakly Supervised Instance Segmentation and Semantic Correspondence from Box Supervision
Abstract
We introduce DiscoBox, a novel framework that jointly learns instance segmentation and semantic correspondence using bounding box supervision. Specifically, we propose a self-ensembling framework where instance segmentation and semantic correspondence are jointly guided by a structured teacher in addition to the bounding box supervision. The teacher is a structured energy model incorporating a pairwise potential and a cross-image potential to model the pairwise pixel relationships both within and across the boxes. Minimizing the teacher energy simultaneously yields refined object masks and dense correspondences between intra-class objects, which are taken as pseudo-labels to supervise the task network and provide positive/negative correspondence pairs for dense contrastive learning. We show a symbiotic relationship where the two tasks mutually benefit from each other. Our best model achieves 37.9% AP on COCO instance segmentation, surpassing prior weakly supervised methods and is competitive to supervised methods. We also obtain state of the art weakly supervised results on PASCAL VOC12 and PF-PASCAL with real-time inference.
Additional Information
© 2021 IEEE. Work done during an internship at NVIDIA Research.Attached Files
Submitted - 2105.06464.pdf
Files
Name | Size | Download all |
---|---|---|
md5:5ad9d338cf766c574acda99d772e76db
|
2.9 MB | Preview Download |
Additional details
- Eprint ID
- 110644
- DOI
- 10.1109/ICCV48922.2021.00339
- Resolver ID
- CaltechAUTHORS:20210831-203854134
- NVIDIA Corporation
- Created
-
2021-09-01Created from EPrint's datestamp field
- Updated
-
2022-07-26Created from EPrint's last_modified field