ConstrastivePose: A contrastive learning approach for self-supervised feature engineering for pose estimation and behavorial classification of interacting animals
Abstract
In recent years, supervised machine learning models trained on videos of animals with pose estimation data and behavior labels have been used for automated behavior classification. Applications include, for example, automated detection of neurological diseases in animal models. However, there are two problems with these supervised learning models. First, such models require a large amount of labeled data but the labeling of behaviors frame by frame is a laborious manual process that is not easily scalable. Second, such methods rely on handcrafted features obtained from pose estimation data that are usually designed empirically. In this paper, we propose to overcome these two problems using contrastive learning for self-supervised feature engineering on pose estimation data. Our approach allows the use of unlabeled videos to learn feature representations and reduce the need for handcrafting of higher-level features from pose positions. We show that this approach to feature representation can achieve better classification performance compared to handcrafted features alone, and that the performance improvement is due to contrastive learning on unlabeled data rather than the neural network architecture.Author SummaryAnimal models are widely used in medicine to study diseases. For example, the study of social interactions between animals such as mice are used to investigate changes in social behaviors in neurological diseases. The process of manually annotating animal behaviors from videos is slow and tedious. To solve this problem, machine learning approaches to automate the video annotation process have become more popular. Many of the recent machine learning approaches are built on the advances in pose-estimation technology which enables accurate localization of key points of the animals. However, manual labeling of behaviors frame by frame for the training set is still a bottleneck that is not scalable. Also, existing methods rely on handcrafted feature engineering from pose estimation data. In this study, we propose ConstrastivePose, an approach using contrastive learning to learn feature representation from unlabeled data. We demonstrate the improved performance using the features learnt by our method versus handcrafted features for supervised learning. This approach can be helpful for work seeking to build supervised behavior classification models where behavior labelled videos are scarce.
Additional Information
The copyright holder for this preprint is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY 4.0 International license. The authors have declared no competing interest.Attached Files
Submitted - 2022.11.09.515746v1.full.pdf
Files
Name | Size | Download all |
---|---|---|
md5:f737913f83eb57552f9ebb9340f1b64a
|
633.5 kB | Preview Download |
Additional details
- Eprint ID
- 120303
- Resolver ID
- CaltechAUTHORS:20230322-101383000.5
- Created
-
2023-03-24Created from EPrint's datestamp field
- Updated
-
2023-03-24Created from EPrint's last_modified field