Welcome to the new version of CaltechAUTHORS. Login is currently restricted to library staff. If you notice any issues, please email coda@library.caltech.edu
Published May 26, 2020 | Submitted
Report Open

Rethinking Zero-shot Video Classification: End-to-end Training for Realistic Applications

Abstract

Trained on large datasets, deep learning (DL) can accurately classify videos into hundreds of diverse classes. However, video data is expensive to annotate. Zero-shot learning (ZSL) proposes one solution to this problem. ZSL trains a model once, and generalizes to new tasks whose classes are not present in the training dataset. We propose the first end-to-end algorithm for ZSL in video classification. Our training procedure builds on insights from recent video classification literature and uses a trainable 3D CNN to learn the visual features. This is in contrast to previous video ZSL methods, which use pretrained feature extractors. We also extend the current benchmarking paradigm: Previous techniques aim to make the test task unknown at training time but fall short of this goal. We encourage domain shift across training and test data and disallow tailoring a ZSL model to a specific test dataset. We outperform the state-of-the-art by a wide margin. Our code, evaluation procedure and model weights are available at this http URL.

Additional Information

We thank Amazon for generously supporting the project, and Alina Roitberg for a productive discussion on the evaluation protocol.

Attached Files

Submitted - 2003.01455.pdf

Files

2003.01455.pdf
Files (3.0 MB)
Name Size Download all
md5:c8bd901d84e85ab6e9c4185e5f06516c
3.0 MB Preview Download

Additional details

Created:
August 19, 2023
Modified:
October 20, 2023