Welcome to the new version of CaltechAUTHORS. Login is currently restricted to library staff. If you notice any issues, please email coda@library.caltech.edu
Published April 26, 2018 | Submitted + Supplemental Material + Published
Journal Article Open

Uncertainty Quantification in Graph-Based Classification of High Dimensional Data

Abstract

Classification of high dimensional data finds wide-ranging applications. In many of these applications equipping the resulting classification with a measure of uncertainty may be as important as the classification itself. In this paper we introduce, develop algorithms for, and investigate the properties of a variety of Bayesian models for the task of binary classification; via the posterior distribution on the classification labels, these methods automatically give measures of uncertainty. The methods are all based on the graph formulation of semisupervised learning. We provide a unified framework which brings together a variety of methods that have been introduced in different communities within the mathematical sciences. We study probit classification [C. K. Williams and C. E. Rasmussen, "Gaussian Processes for Regression," in Advances in Neural Information Processing Systems 8, MIT Press, 1996, pp. 514--520] in the graph-based setting, generalize the level-set method for Bayesian inverse problems [M. A. Iglesias, Y. Lu, and A. M. Stuart, Interfaces Free Bound., 18 (2016), pp. 181--217] to the classification setting, and generalize the Ginzburg--Landau optimization-based classifier [A. L. Bertozzi and A. Flenner, Multiscale Model. Simul., 10 (2012), pp. 1090--1118], [Y. Van Gennip and A. L. Bertozzi, Adv. Differential Equations, 17 (2012), pp. 1115--1180] to a Bayesian setting. We also show that the probit and level-set approaches are natural relaxations of the harmonic function approach introduced in [X. Zhu et al., "Semi-supervised Learning Using Gaussian Fields and Harmonic Functions," in ICML, Vol. 3, 2003, pp. 912--919]. We introduce efficient numerical methods, suited to large datasets, for both MCMC-based sampling and gradient-based MAP estimation. Through numerical experiments we study classification accuracy and uncertainty quantification for our models; these experiments showcase a suite of datasets commonly used to evaluate graph-based semisupervised learning algorithms.

Additional Information

© 2018 SIAM and ASA. Submitted: 16 June 2017; Accepted: 05 February 2018; Published online: 26 April 2018. The work of the first author was funded by NSF grant DMS-1118971, NSF grant DMS-1417674, and ONR grant N00014-16-1-2119. The work of the third author was funded by DARPA (EQUIPS) and EPSRC (Programme Grant). KCZ was partially supported by a grant from the Simons Foundation and by the Alan Turing Institute under the EPSRC grant EP/N510129/1. Part of this work was done during the authors' stay at the Newton Institute for the program Stochastic Dynamical Systems in Biology: Numerical Methods and Applications. AMS is grateful to Omiros Papaspiliopoulos for illuminating discussions about the probit model.

Attached Files

Published - 17m1134214.pdf

Submitted - 1703.08816.pdf

Supplemental Material - M113421_01.pdf

Files

1703.08816.pdf
Files (5.8 MB)
Name Size Download all
md5:115056f73dce55cd56bfd8ab24efb529
3.3 MB Preview Download
md5:2260afa01ce8845db08f5d565ce4a567
480.5 kB Preview Download
md5:4a42540013ef19e0744e55fa7dfdbdad
2.0 MB Preview Download

Additional details

Created:
August 19, 2023
Modified:
March 5, 2024