Latent Variable Graphical Model Selection via Convex Optimization

Creators: Chandrasekaran, Venkat; Parrilo, Pablo A.; Willsky, Alan S.

Style

An error occurred while generating the citation.

Abstract

Suppose we have samples of a subset of a collection of random variables. No additional information is provided about the number of latent variables, nor of the relationship between the latent and observed variables. Is it possible to discover the number of hidden components, and to learn a statistical model over the entire collection of variables? We address this question in the setting in which the latent and observed variables are jointly Gaussian, with the conditional statistics of the observed variables conditioned on the latent variables being specified by a graphical model. As a first step we give natural conditions under which such latent-variable Gaussian graphical models are identifiable given marginal statistics of only the observed variables. Essentially these conditions require that the conditional graphical model among the observed variables is sparse, while the effect of the latent variables is "spread out" over most of the observed variables. Next we propose a tractable convex program based on regularized maximum-likelihood for model selection in this latent-variable setting; the regularizer uses both the ℓ_1 norm and the nuclear norm. Our modeling framework can be viewed as a combination of dimensionality reduction (to identify latent variables) and graphical modeling (to capture remaining statistical structure not attributable to the latent variables), and it consistently estimates both the number of hidden components and the conditional graphical model structure among the observed variables. These results are applicable in the high-dimensional setting in which the number of latent/observed variables grows with the number of samples of the observed variables. The geometric properties of the algebraic varieties of sparse matrices and of low-rank matrices play an important role in our analysis.

Additional Information

© 2012 Institute of Mathematical Statistics. Received August 2010; revised November 2011. First available in Project Euclid: 30 October 2012. This work was supported in part by AFOSR grant FA9550-08-1-0180, in part under a MURI through AFOSR grant FA9550-06-1-0324, in part under a MURI through AFOSR grant FA9550-06-1-0303, and in part by NSF FRG 0757207. We would like to thank James Saunderson and Myung Jin Choi for helpful discussions, and Kim-Chuan Toh for kindly providing us specialized code to solve larger instances of our convex program.

Attached Files

Accepted Version - 1008.1290.pdf

Submitted - cpw_lgm_preprint10.pdf

Supplemental Material - euclid.aos.1351602527.pdf

Files

cpw_lgm_preprint10.pdf

Files (1.0 MB)

Name	Size	Download all
cpw_lgm_preprint10.pdf md5:f0fd8a3c056beb3575acf1cc09bb1479	433.3 kB	Preview Download
euclid.aos.1351602527.pdf md5:725d4d3f4b3e98f06ad9e424fb474afb	197.8 kB	Preview Download
1008.1290.pdf md5:0d9834225cc3d3d64201c207441a8444	412.4 kB	Preview Download

Additional details

	All versions	This version
Views	47	47
Downloads	70	70
Data volume	24.8 MB	24.8 MB