Mathematics, Statistics and Data Science
- Creators
- Bühlmann, Peter
-
Stuart, A. M.
Abstract
The process of extracting information from data has a long history (see, for example, [1]) stretching back over centuries. Because of the proliferation of data over the last few decades, and projections for its continued proliferation over coming decades, the term Data Science has emerged to describe the substantial current intellectual effort around research with the same overall goal, namely that of extracting information. The type of data currently available in all sorts of application domains is often massive in size, very heterogeneous and far from being collected under designed or controlled experimental conditions. Nonetheless, it contains information, often substantial information, and data science requires new interdisciplinary approaches to make maximal use of this information. Data alone is typically not that informative and (machine) learning from data needs conceptual frameworks. Mathematics and statistics are crucial for providing such conceptual frameworks. The frameworks enhance the understanding of fundamental phenomena, highlight limitations and provide a formalism for properly founded data analysis, information extraction and quantification of uncertainty, as well as for the analysis and development of algorithms that carry out these key tasks. In this personal commentary on data science and its relations to mathematics and statistics, we highlight three important aspects of the emerging field: Models, High-Dimensionality and Heterogeneity, and then conclude with a brief discussion of where the field is now and implications for the mathematical sciences.
Additional Information
© 2016 European Mathematical Society. A. M. Stuart is grateful to DARPA, EPSRC, ERC and ONR for financial support that led to some of the research underpinning this article.Attached Files
Accepted Version - stuart22c.pdf
Files
Name | Size | Download all |
---|---|---|
md5:40e47012fbb95e9abe4a3c3865adb482
|
46.3 kB | Preview Download |
Additional details
- Eprint ID
- 71937
- Resolver ID
- CaltechAUTHORS:20161111-103206810
- Defense Advanced Research Projects Agency (DARPA)
- Engineering and Physical Sciences Research Council (EPSRC)
- European Research Council (ERC)
- Office of Naval Research (ONR)
- Created
-
2016-11-15Created from EPrint's datestamp field
- Updated
-
2019-10-03Created from EPrint's last_modified field
- Other Numbering System Name
- Andrew Stuart
- Other Numbering System Identifier
- C22