The complexity of information extraction

Creators: Abu-Mostafa, Yaser S.

Abstract

How difficult are decision problems based on natural data, such as pattern recognition? To answer this question, decision problems are characterized by introducing four measures defined on a Boolean function f of N variables: the implementation cost C(f), the randomness R(f), the deterministic entropy H(f), and the complexity K(f). The highlights and main results are roughly as follows, 1) C(f) approx R(f) H(f) approx K(f), all measured in bits. 2) Decision problems based on natural data are partially random (in the Kolmogorov sense) and have low entropy with respect to their dimensionality, and the relations between the four measures translate to lower and upper bounds on the cost of solving these problems. 3) Allowing small errors in the implementation of f saves a lot in the low entropy case but saves nothing in the high-entropy case. If f is partially structured, the implementation cost is reduced substantially.

Additional Information

© 1986 IEEE. Reprinted with permission. Manuscript received August 8, 1984; revised October 23, 1985. This paper was presented in part at the IEEE International Symposium on Information Theory, Brighton, England, June 1985. I gratefully acknowledge Dr. D. Psaltis, who contributed significantly to this work. I also wish to thank Drs. R. McEliece, C. Mead, and E. Posner for their help. This paper is dedicated to the memory of Herbert J. Ryser.

Attached Files

Published - ABUieeetit86.pdf

Files

ABUieeetit86.pdf

Files (2.4 MB)

Name	Size	Download all
ABUieeetit86.pdf md5:5b12682008034f17f79eff38eb6d97bd	2.4 MB	Preview Download

Additional details

	All versions	This version
Views	0	0
Downloads	0	0
Data volume	0 Bytes	0 Bytes