Learn2Smile: Learning non-verbal interaction through observation
Abstract
Interactive agents are becoming increasingly common in many application domains, such as education, healthcare and personal assistance. The success of such embodied agents relies on their ability to have sustained engagement with their human users. Such engagement requires agents to be socially intelligent, equipped with the ability to understand and reciprocate both verbal and non-verbal cues. While there has been tremendous progress in verbal communication, mostly driven by the success of speech recognition and question-answering, teaching agents to appropriately react to facial expressions has received less attention. In this paper, we focus on non-verbal facial cues for face-to-face communication between a user and an embodied agent. We propose a method that automatically learns to update the agent's facial expressions based on the user's expressions. We adopt a learning scheme and train a deep neural network on hundreds of videos, containing pairs of people engaging in a conversation, and without external human supervision. Our experimental results show the efficacy of our model in sustained long-term prediction of the agent's facial landmarks. We present comparative results showing that our model significantly outperforms baseline approaches and provide insightful human studies to better understand our model's qualitative performance. We release our dataset to further encourage research in this field.
Attached Files
Accepted Version - learn2smile-learning-verbal.pdf
Files
Name | Size | Download all |
---|---|---|
md5:a67dc46792936f7195d1e5a77dae1a30
|
2.9 MB | Preview Download |
Additional details
- Eprint ID
- 118370
- Resolver ID
- CaltechAUTHORS:20221215-789739000.9
- Created
-
2022-12-20Created from EPrint's datestamp field
- Updated
-
2022-12-20Created from EPrint's last_modified field