Towards Weakly-Supervised Text Spotting using a Multi-Task Transformer

Creators: Kittenplon, Yair; Lavi, Inbal; Fogel, Sharon; Bar, Yarin; Manmatha, R.; Perona, Pietro

Style

An error occurred while generating the citation.

Abstract

Text spotting end-to-end methods have recently gained attention in the literature due to the benefits of jointly optimizing the text detection and recognition components. Existing methods usually have a distinct separation between the detection and recognition branches, requiring exact annotations for the two tasks. We introduce TextTranSpotter (TTS), a transformer-based approach for text spotting and the first text spotting framework which may be trained with both fully- and weakly-supervised settings. By learning a single latent representation per word detection, and using a novel loss function based on the Hungarian loss, our method alleviates the need for expensive localization annotations. Trained with only text transcription annotations on real data, our weakly-supervised method achieves competitive performance with previous state-of-the-art fully-supervised methods. When trained in a fully-supervised manner, TextTranSpotter shows state-of-the-art results on multiple benchmarks.

Attached Files

Submitted - 2202.05508.pdf

Files

2202.05508.pdf

Files (12.8 MB)

Name	Size	Download all
2202.05508.pdf md5:d4c2d6518d871e3306d56e09781f6a1d	12.8 MB	Preview Download

Additional details

	All versions	This version
Views	50	50
Downloads	27	27
Data volume	345.0 MB	345.0 MB