Vision for Social Robots: Human Perception and Pose Estimation

Citation

Ronchi, Matteo Ruggero (2020) Vision for Social Robots: Human Perception and Pose Estimation. Dissertation (Ph.D.), California Institute of Technology. doi:10.7907/n2v1-1g79. https://resolver.caltech.edu/CaltechTHESIS:05212020-155425112

Abstract

In order to extract the underlying meaning from a scene captured from the surrounding world in a single still image, social robots will need to learn the human ability to detect different objects, understand their arrangement and relationships relative both to their own parts and to each other, and infer the dynamics under which they are evolving. Furthermore, they will need to develop and hold a notion of context to allow assigning different meanings (semantics) to the same visual configuration (syntax) of a scene.

The underlying thread of this Thesis is the investigation of new ways for enabling interactions between social robots and humans, by advancing the visual perception capabilities of robots when they process images and videos in which humans are the main focus of attention.

First, we analyze the general problem of scene understanding, as social robots moving through the world need to be able to interpret scenes without having been assigned a specific preset goal. Throughout this line of research, i) we observe that human actions and interactions which can be visually discriminated from an image follow a very heavy-tailed distribution; ii) we develop an algorithm that can obtain a spatial understanding of a scene by only using cues arising from the effect of perspective on a picture of a person’s face; and iii) we define a novel taxonomy of errors for the task of estimating the 2D body pose of people in images to better explain the behavior of algorithms and highlight their underlying causes of error.

Second, we focus on the specific task of 3D human pose and motion estimation from monocular 2D images using weakly supervised training data, as accurately predicting human pose will open up the possibility of richer interactions between humans and social robots. We show that when 3D ground-truth data is only available in small quantities, or not at all, it is possible to leverage knowledge about the physical properties of the human body, along with additional constraints related to alternative types of supervisory signals, to learn models that can regress the full 3D pose of the human body and predict its motions from monocular 2D images.

Taken in its entirety, the intent of this Thesis is to highlight the importance of, and provide novel methodologies for, social robots' ability to interpret their surrounding environment, learn in a way that is robust to low data availability, and generalize previously observed behaviors to unknown situations in a similar way to humans.

Item Type:

Thesis (Dissertation (Ph.D.))

Subject Keywords:

Computer vision, social robots, machine learning, weakly-supervised learning, self-supervised learning, scene understanding, 3d pose estimation.

Degree Grantor:

California Institute of Technology

Division:

Engineering and Applied Science

Major Option:

Computer Science

Thesis Availability:

Public (worldwide access)

Research Advisor(s):

Perona, Pietro

Thesis Committee:

Yue, Yisong (chair)
Perona, Pietro
Ames, Aaron D.
Bouman, Katherine L.
Papon, Jeremie

Defense Date:

20 December 2019

Record Number:

CaltechTHESIS:05212020-155425112

Persistent URL:

https://resolver.caltech.edu/CaltechTHESIS:05212020-155425112

DOI:

10.7907/n2v1-1g79

Related URLs:

URL	URL Type	Description
http://www.vision.caltech.edu/~mronchi/	Author	Personal web page containing all the materials included in this thesis.
https://doi.org/10.5244/C.29.52	DOI	Article adapted for Ch. 4.
https://doi.org/10.1007/978-3-319-10590-1_21	DOI	Article adapted for Ch. 5.
https://doi.org/10.1109/ICCV.2017.48	DOI	Article adapted for Ch. 6.
https://doi.org/10.1109/ICDM.2016.0156	DOI	Article adapted for Ch. 7.
https://arxiv.org/abs/1805.06880	arXiv	Article adapted for Ch. 8.
http://www.vision.caltech.edu/~mronchi/projects/MultiView3DPose/	Other	Article adapted for Ch. 9.

ORCID:

Author	ORCID
Ronchi, Matteo Ruggero	0000-0002-4277-3314

Default Usage Policy:

No commercial reproduction, distribution, display or performance rights in this work are provided.

ID Code:

13713

Collection:

CaltechTHESIS

Deposited By:

Matteo Ruggero Ronchi

Deposited On:

08 Jun 2020 16:10

Last Modified:

04 Jun 2024 23:14

Thesis Files

Preview

PDF - Final Version
See Usage Policy.
40MB

Repository Staff Only: item control page