Welcome to the new version of CaltechAUTHORS. Login is currently restricted to library staff. If you notice any issues, please email coda@library.caltech.edu
Published April 2020 | public
Journal Article

Safety Augmented Value Estimation From Demonstrations (SAVED): Safe Deep Model-Based RL for Sparse Cost Robotic Tasks

Abstract

Reinforcement learning (RL) for robotics is challenging due to the difficulty in hand-engineering a dense cost function, which can lead to unintended behavior, and dynamical uncertainty, which makes exploration and constraint satisfaction challenging. We address these issues with a new model-based reinforcement learning algorithm, Safety Augmented Value Estimation from Demonstrations (SAVED), which uses supervision that only identifies task completion and a modest set of suboptimal demonstrations to constrain exploration and learn efficiently while handling complex constraints. We then compare SAVED with 3 state-of-the-art model-based and model-free RL algorithms on 6 standard simulation benchmarks involving navigation and manipulation and a physical knot-tying task on the da Vinci surgical robot. Results suggest that SAVED outperforms prior methods in terms of success rate, constraint satisfaction, and sample efficiency, making it feasible to safely learn a control policy directly on a real robot in less than an hour. For tasks on the robot, baselines succeed less than 5% of the time while SAVED has a success rate of over 75% in the first 50 training iterations. Code and supplementary material is available at https://tinyurl.com/saved-rl .

Additional Information

© 2020 IEEE. Manuscript received September 10, 2019; accepted February 7, 2020. Date of publication February 26, 2020; date of current version March 24, 2020. This letter was recommended for publication by Associate Editor T. Inamura and Editor D. Lee upon evaluation of the reviewers' comments. This work was supported in part by the Scalable Collaborative Human-Robot Learning (SCHooL) Project, a NSF National Robotics Initiative under Grant 1734633, and in part by donations from Google and Toyota Research Institute. The work of A. Balakrishna was supported by an NSF GRFP. The work of U. Rosolia was supported by the Office of Naval Research under Grant N00014-311. (Brijen Thananjeyan and Ashwin Balakrishna contributed equally to this work.) (Corresponding author: Brijen Thananjeyan.) This research was performed at the AUTOLAB at UC Berkeley in affiliation with the Berkeley AI Research (BAIR) Lab, Berkeley Deep Drive (BDD), the Real-Time Intelligent Secure Execution (RISE) Lab, and the CITRIS "People and Robots" (CPAR) Initiative. This article solely reflects the opinions and conclusions of its authors and do not reflect the views of the sponsors or their associated entities. We thank our colleagues who provided helpful feedback and suggestions, in particular Suraj Nair, Jeffrey Ichnowski, Anshul Ramachandran, Daniel Seita, Marius Wiggert, and Ajay Tanwani.

Additional details

Created:
August 19, 2023
Modified:
October 20, 2023