Welcome to the new version of CaltechAUTHORS. Login is currently restricted to library staff. If you notice any issues, please email coda@library.caltech.edu
Published June 2020 | Submitted
Book Section - Chapter Open

What is the Value of Data? on Mathematical Methods for Data Quality Estimation

Abstract

Data is one of the most important assets of the information age, and its societal impact is undisputed. Yet, rigorous methods of assessing the quality of data are lacking. In this paper, we propose a formal definition for the quality of a given dataset. We assess a dataset's quality by a quantity we call the expected diameter, which measures the expected disagreement between two randomly chosen hypotheses that explain it, and has recently found applications in active learning. We focus on Boolean hyperplanes, and utilize a collection of Fourier analytic, algebraic, and probabilistic methods to come up with theoretical guarantees and practical solutions for the computation of the expected diameter. We also study the behaviour of the expected diameter on algebraically structured datasets, conduct experiments that validate this notion of quality, and demonstrate the feasibility of our techniques.

Additional Information

© 2020 IEEE.

Attached Files

Submitted - 2001.03464.pdf

Files

2001.03464.pdf
Files (509.9 kB)
Name Size Download all
md5:e6067fc8aa981bbfbffc3fdd760c4761
509.9 kB Preview Download

Additional details

Created:
August 19, 2023
Modified:
October 20, 2023