Welcome to the new version of CaltechAUTHORS. Login is currently restricted to library staff. If you notice any issues, please email coda@library.caltech.edu
Published March 16, 2023 | Submitted
Report Open

Multiview Compressive Coding for 3D Reconstruction

Abstract

A central goal of visual recognition is to understand objects and scenes from a single image. 2D recognition has witnessed tremendous progress thanks to large-scale learning and general-purpose representations. Comparatively, 3D poses new challenges stemming from occlusions not depicted in the image. Prior works try to overcome these by inferring from multiple views or rely on scarce CAD models and category-specific priors which hinder scaling to novel settings. In this work, we explore single-view 3D reconstruction by learning generalizable representations inspired by advances in self-supervised learning. We introduce a simple framework that operates on 3D points of single objects or whole scenes coupled with category-agnostic large-scale training from diverse RGB-D videos. Our model, Multiview Compressive Coding (MCC), learns to compress the input appearance and geometry to predict the 3D structure by querying a 3D-aware decoder. MCC's generality and efficiency allow it to learn from large-scale and diverse data sources with strong generalization to novel objects imagined by DALLâ‹…E 2 or captured in-the-wild with an iPhone.

Attached Files

Submitted - 2301.08247.pdf

Files

2301.08247.pdf
Files (13.7 MB)
Name Size Download all
md5:4693be9776c1fa29e6f2c9649ec7a975
13.7 MB Preview Download

Additional details

Created:
August 20, 2023
Modified:
October 25, 2023