On the Minimax Misclassification Ratio of Hypergraph Community Detection

Abstract

Community detection in hypergraphs is explored. Under a generative hypergraph model called "d-wise hypergraph stochastic block model" (d-hSBM), which naturally extends the stochastic block model (SBM) from graphs to d-uniform hypergraphs, the fundamental limit of the misclassification ratio (the loss function in the community detection problem) is studied. For the converse part, a lower bound of the minimax risk, that is, the minimax expected misclassification ratio, is derived. Asymptotically, it decays exponentially fast to zero as the number of nodes tends to infinity, and the rate function is a weighted combination of several divergence terms, each of which is the Rényi divergence of order 1/2 between two Bernoulli distributions. The Bernoulli distributions involved in the characterization of the rate function are those governing the random instantiation of hyperedges in d-hSBM. For the achievability part, we propose a two-step polynomial-time algorithm which, with high probability, has a misclassification ratio with a decaying exponent that is asymptotically greater than or equal to that of our proposed lower bound. The first step of the algorithm is a hypergraph spectral clustering method, which achieves partial recovery to a certain precision level. The second step is a local refinement method, which leverages the underlying probabilistic model along with parameter estimation from the outcome of the first step. To characterize the asymptotic performance of the proposed algorithm, we first derive a sufficient condition for attaining weak consistency in the hypergraph spectral clustering step. Then, under the guarantee of weak consistency in the first step, we upper bound the loss (with high probability) attained in the local refinement step by an exponentially decaying function of the size of the hypergraph and characterize the decaying rate. Compared to existing works in SBM, the main technical challenge lies in the complex structure of error events since community relations become much more complicated. The experimental results on both the synthetic data and real-world datasets validate our theoretical finding that the refinement step is critical in achieving the optimal statistical limit. For the special case d = 3, it is further shown by analyzing the performance of MLE that the proposed lower bound of the minimax risk is tight, and hence, the exponential decaying rate of the asymptotic minimax risk is characterized. We conjecture that this continues to hold for general d as well.

Additional Information

© 2019 IEEE. Manuscript received February 3, 2018; revised February 8, 2019; accepted June 8, 2019. Date of publication July 12, 2019; date of current version November 20, 2019. This work was supported in part by the Ministry of Science and Technology of Taiwan and MOST Joint Research Center for AI Technology and All Vista Healthcare under Grant MOST 107-2634-F-002-009 and Grant 108-2634-F-002-006. This paper was presented in part at the 2017 IEEE International Symposium on Information Theory [1] and in part at the 2018 International Conference on Artificial Intelligence and Statistics [2]. The authors would like to thank the anonymous reviewers for pointing out a major flaw of the argument in the original submission as well as their constructive suggestions toward the completeness of this work. I E. Chien and Chung-Yi Lin contributed equally to this work.

	All versions	This version
Views	0	0
Downloads	0	0
Data volume	0 Bytes	0 Bytes

On the Minimax Misclassification Ratio of Hypergraph Community Detection

Abstract

Additional Information

Additional details