The 21th century is coined the age of data due to the massive amounts of data collected on a daily basis, not only in social networks, but foremost also in science such as life sciences. The ever increasing diversity of such data ranging from images over data on manifolds to immensely complex high-dimensional data has to be matched by sophisticated methodologies for their acquisition, analysis, storage, and transmission. This poses an intriguing challenge to mathematics as a whole, since not only is it evidently of key importance for such methods to have a substantial mathematical foundation, but also often only a mathematical approach allows the development of appropriate approaches. In addition, data-based methods such as deep neural networks have lately shown tremendous success in even outperforming mathematical methods based on traditional modeling such as in the area of inverse problems or for numerical analysis of partial differential equations.
The novel area of mathematics of data science draws from various areas of traditional mathematics such as applied harmonic analysis, functional analysis, numerical linear algebra, optimization, and statistics. It also intersects with the area of machine learning, customarily assigned to computer science. Another intriguing feature of this area is the interplay between the development of deep mathematical theories and a truly interdisciplinary and even transdisciplinary component.
Berlin Research Groups
Several branches of the area of mathematics of data science are prominently represented within Berlin mathematics, taking internationally leading positions.
The state of the art for data acquisition is the methodology of compressed sensing (Kutyniok), which was introduced in 2006. It surprisingly predicts that high-dimensional signals, which allow a sparse representation by a suitable basis or, more generally, a frame, can be recovered from what was previously considered highly incomplete linear measurements by using efficient algorithms such as convex optimization approaches. The DFG Priority Program Compressed Sensing in information processing is also coordinated from Berlin (coordinator: Kutyniok).
On the data analysis side, deep neural networks (Friz, Kutyniok, Noe) have shown outstanding success in real-world applications. However, most of the related research is empirically driven and a mathematical foundation is almost completely missing, thereby posing an exciting challenge to mathematicians. Key questions in this range are the expressibility of a network architecture, performance of the learning algorithm, the analysis of the generalization error, the interpretability of the neural network, and the application to either specific areas such as life sciences or to problem settings such as inverse problems.
Of key importance for a variety of data analysis tasks such as hypothesis testing, regression or clustering are statistical methods such as Markov processes, statistical learning theory, or Bayesian statistics (Noe, Reiss, Spokoiny).
Another methodological approach to analysis of data sets such as for dimension reduction or feature selection are methods based on the novel paradigm in mathematical data science that data typically allows a sparse approximation by a suitable basis, often in the form of a variational form with a sparse prior (Conrad, Kutyniok, Schütte). An appropriate representation system can be derived as a prescribed system (wavelet, shearlets, ...) or it can be derived by dictionary learning techniques (Kutyniok).
One particularly intriguing and versatile application area for mathematical data science is life sciences with data ranging from EEG signals to MRI images to dynamical, multimodal, hierarchical data sets from patients. The expertise of Berlin mathematics includes molecular dynamics (Noe, Schütte), analysis of -omics data (Conrad, Schütte), personalized medicine (Schütte), and medical imaging sciences (Hintermüller, Kutyniok). Another application area of data science methods represented in Berlin is finance (Friz, Reiss, Spokoiny, Stannat), which typically requires analysis of time series data. This area is also supported by RTG 1845 "Stochastic Analysis with Applications in Biology, Finance and Physics" (coordinator: Imkeller).
Statistical methods for Data Science
◦ Bayesian statistics
◦ Concentration inequalities, empirical risk minimization
◦ (Maximum likelihood) estimation
◦ Random matrices
◦ Regression and classification, regularization, and (un)supervised learning
Analysis of high-dimensional data
◦ Basics of compressed sensing and (sparse) approximation theory
◦ Complexity measures for data such entropy
◦ Data representations such as structured representations and dictionary learning
◦ Methods for dimension reduction such as Johnson-Lindenstrauss Lemma or PCA
Topics for advanced courses reach from “Deep Learning” and “Rough Paths and the Signature Method in Machine Learning” over “Geometric Functional Analysis” and “(High-Dimensional) Convex Geometry” to “Nonlinear Optimization” and “Image Processing”.