Welcome to the new version of CaltechAUTHORS. Login is currently restricted to library staff. If you notice any issues, please email coda@library.caltech.edu
Published February 14, 2020 | Accepted Version
Report Open

On the distance between two neural networks and the stability of learning

Abstract

This paper relates parameter distance to gradient breakdown for a broad class of nonlinear compositional functions. The analysis leads to a new distance function called deep relative trust and a descent lemma for neural networks. Since the resulting learning rule seems to require little to no learning rate tuning, it may unlock a simpler workflow for training deeper and more complex neural networks. The Python code used in this paper is here: https://github.com/jxbz/fromage

Additional Information

The authors would like to thank Dillon Huff, Jeffrey Pennington and Florian Schaefer for useful conversations. They made heavy use of a codebase built by Jiahui Yu. They are much obliged to Sivakumar Arayandi Thottakara, Jan Kautz, Sabu Nadarajan and Nithya Natesan for infrastructure support. JB is supported by an NVIDIA fellowship.

Attached Files

Accepted Version - 2002.03432.pdf

Files

2002.03432.pdf
Files (577.7 kB)
Name Size Download all
md5:1bb690f095a02f9093fdfcbd34da2ee6
577.7 kB Preview Download

Additional details

Created:
August 19, 2023
Modified:
October 20, 2023