Guaranteed clustering and biclustering via semidefinite programming

Creators: Ames, Brendan P. W.

Style

An error occurred while generating the citation.

Abstract

Identifying clusters of similar objects in data plays a significant role in a wide range of applications. As a model problem for clustering, we consider the densest k -disjoint-clique problem, whose goal is to identify the collection of k disjoint cliques of a given weighted complete graph maximizing the sum of the densities of the complete subgraphs induced by these cliques. In this paper, we establish conditions ensuring exact recovery of the densest k cliques of a given graph from the optimal solution of a particular semidefinite program. In particular, the semidefinite relaxation is exact for input graphs corresponding to data consisting of k large, distinct clusters and a smaller number of outliers. This approach also yields a semidefinite relaxation with similar recovery guarantees for the biclustering problem. Given a set of objects and a set of features exhibited by these objects, biclustering seeks to simultaneously group the objects and features according to their expression levels. This problem may be posed as that of partitioning the nodes of a weighted bipartite complete graph such that the sum of the densities of the resulting bipartite complete subgraphs is maximized. As in our analysis of the densest k -disjoint-clique problem, we show that the correct partition of the objects and features can be recovered from the optimal solution of a semidefinite program in the case that the given data consists of several disjoint sets of objects exhibiting similar features. Empirical evidence from numerical experiments supporting these theoretical guarantees is also provided.

Additional Information

© 2013 Springer-Verlag Berlin Heidelberg and Mathematical Optimization Society. Received: 16 February 2012; Accepted: 9 November 2013; Published online: 30 November 2013. This research was supported in part by the Institute for Mathematics and its Applications with funds provided by the National Science Foundation, and by a Postgraduate Scholarship from NSERC (Natural Science and Engineering Research Council of Canada). I am grateful to Stephen Vavasis, Henry Wolkowicz, Levent Tunçel, Shai Ben-David, Inderjit Dhillon, Ben Recht, and Ting Kei Pong for their helpful comments and suggestions. I am especially grateful to Ting Kei for his help implementing the ADMM algorithms used to perform the numerical trials. I would also like to thank Warren Schudy for suggesting some relevant references that were omitted in an earlier version. Finally, I thank the two anonymous reviewers, whose suggestions vastly improved the presentation and organization of this paper.

Attached Files

Submitted - 1202.3663.pdf

Files

1202.3663.pdf

Files (757.0 kB)

Name	Size	Download all
1202.3663.pdf md5:8d34f415bef94963753e7fd5a57597db	757.0 kB	Preview Download

Additional details

	All versions	This version
Views	0	0
Downloads	0	0
Data volume	0 Bytes	0 Bytes