Sequence-to-Sequence Contrastive Learning for Text Recognition

Creators: Aberdam, Aviad; Litman, Ron; Tsiper, Shahar; Anschel, Oron; Slossberg, Ron; Mazor, Shai; Manmatha, R.; Perona, Pietro

Style

An error occurred while generating the citation.

Abstract

We propose a framework for sequence-to-sequence contrastive learning (SeqCLR) of visual representations, which we apply to text recognition. To account for the sequence-to-sequence structure, each feature map is divided into different instances over which the contrastive loss is computed. This operation enables us to contrast in a sub-word level, where from each image we extract several positive pairs and multiple negative examples. To yield effective visual representations for text recognition, we further suggest novel augmentation heuristics, different encoder architectures and custom projection heads. Experiments on hand-written text and on scene text show that when a text decoder is trained on the learned representations, our method out-performs non-sequential contrastive methods. In addition, when the amount of supervision is reduced, SeqCLR significantly improves performance compared with supervised training, and when fine-tuned with 100% of the labels, our method achieves state-of-the-art results on standard hand-written text recognition benchmarks.

Additional Information

Attached Files

Submitted - 2012.10873.pdf

Files

2012.10873.pdf

Files (2.0 MB)

Name	Size	Download all
2012.10873.pdf md5:04d2bd455818c6387d9aeb473b716604	2.0 MB	Preview Download

Additional details

	All versions	This version
Views	0	0
Downloads	0	0
Data volume	0 Bytes	0 Bytes