Uncertainty Quantification in Graph-Based Classification of High Dimensional Data
Abstract
Classification of high dimensional data finds wide-ranging applications. In many of these applications equipping the resulting classification with a measure of uncertainty may be as important as the classification itself. In this paper we introduce, develop algorithms for, and investigate the properties of a variety of Bayesian models for the task of binary classification; via the posterior distribution on the classification labels, these methods automatically give measures of uncertainty. The methods are all based on the graph formulation of semisupervised learning. We provide a unified framework which brings together a variety of methods that have been introduced in different communities within the mathematical sciences. We study probit classification [C. K. Williams and C. E. Rasmussen, "Gaussian Processes for Regression," in Advances in Neural Information Processing Systems 8, MIT Press, 1996, pp. 514--520] in the graph-based setting, generalize the level-set method for Bayesian inverse problems [M. A. Iglesias, Y. Lu, and A. M. Stuart, Interfaces Free Bound., 18 (2016), pp. 181--217] to the classification setting, and generalize the Ginzburg--Landau optimization-based classifier [A. L. Bertozzi and A. Flenner, Multiscale Model. Simul., 10 (2012), pp. 1090--1118], [Y. Van Gennip and A. L. Bertozzi, Adv. Differential Equations, 17 (2012), pp. 1115--1180] to a Bayesian setting. We also show that the probit and level-set approaches are natural relaxations of the harmonic function approach introduced in [X. Zhu et al., "Semi-supervised Learning Using Gaussian Fields and Harmonic Functions," in ICML, Vol. 3, 2003, pp. 912--919]. We introduce efficient numerical methods, suited to large datasets, for both MCMC-based sampling and gradient-based MAP estimation. Through numerical experiments we study classification accuracy and uncertainty quantification for our models; these experiments showcase a suite of datasets commonly used to evaluate graph-based semisupervised learning algorithms.
Additional Information
© 2018 SIAM and ASA. Submitted: 16 June 2017; Accepted: 05 February 2018; Published online: 26 April 2018. The work of the first author was funded by NSF grant DMS-1118971, NSF grant DMS-1417674, and ONR grant N00014-16-1-2119. The work of the third author was funded by DARPA (EQUIPS) and EPSRC (Programme Grant). KCZ was partially supported by a grant from the Simons Foundation and by the Alan Turing Institute under the EPSRC grant EP/N510129/1. Part of this work was done during the authors' stay at the Newton Institute for the program Stochastic Dynamical Systems in Biology: Numerical Methods and Applications. AMS is grateful to Omiros Papaspiliopoulos for illuminating discussions about the probit model.Attached Files
Published - 17m1134214.pdf
Submitted - 1703.08816.pdf
Supplemental Material - M113421_01.pdf
Files
Additional details
- Alternative title
- Uncertainty Quantification in the Classification of High Dimensional Data
- Eprint ID
- 79021
- Resolver ID
- CaltechAUTHORS:20170712-141757416
- NSF
- DMS-1118971
- NSF
- DMS-1417674
- Office of Naval Research (ONR)
- N00014-16-1-2119
- Defense Advanced Research Projects Agency (DARPA)
- Engineering and Physical Sciences Research Council (EPSRC)
- EP/N510129/1
- Simons Foundation
- Created
-
2017-07-12Created from EPrint's datestamp field
- Updated
-
2021-11-15Created from EPrint's last_modified field