Inverse Abstraction of Neural Networks Using Symbolic Interpolation
Abstract
Neural networks in real-world applications have to satisfy critical properties such as safety and reliability. The analysis of such properties typically requires extracting information through computing pre-images of the network transformations, but it is well-known that explicit computation of pre-images is intractable. We introduce new methods for computing compact symbolic abstractions of pre-images by computing their overapproximations and underapproximations through all layers. The abstraction of pre-images enables formal analysis and knowledge extraction without affecting standard learning algorithms. We use inverse abstractions to automatically extract simple control laws and compact representations for pre-images corresponding to unsafe outputs. We illustrate that the extracted abstractions are interpretable and can be used for analyzing complex properties.
Additional Information
© 2019 Association for the Advancement of Artificial Intelligence. The work is supported by DARPA Assured Autonomy, NSF CNS-1830399 and the VeHICaL project (NSF grant #1545126).Attached Files
Accepted Version - dgm19-aiaa.pdf
Files
Name | Size | Download all |
---|---|---|
md5:0939dab5dd0f3dc910150c87c69163d5
|
580.6 kB | Preview Download |
Additional details
- Eprint ID
- 99059
- DOI
- 10.1609/aaai.v33i01.33013437
- Resolver ID
- CaltechAUTHORS:20191003-134611922
- Defense Advanced Research Projects Agency (DARPA)
- NSF
- CNS-1830399
- NSF
- CNS-1545126
- Created
-
2019-10-03Created from EPrint's datestamp field
- Updated
-
2021-11-16Created from EPrint's last_modified field
- Caltech groups
- Center for Autonomous Systems and Technologies (CAST)