YES IX: “Scalable Statistics: on Accuracy and Computational Complexity”

Datum 7 mrt. 2018 - 9 mrt. 2018
Lokatie Eurandom, Eindhoven

The role of scalable inference methods is becoming more and more central in statistics and machine learning, due to an explosive increase in size and complexity of datasets. In such large-scale and complex scenarios that is, in a practical sense, a diminishing relevance of classic statistical methods that despite being endowed with sound statistical performance guarantees are not able to deal with the computational and memory constraints imposed by the existing hardware.

There are several avenues to deal with this problem. In some cases it is possible to relax the optimization problems that arise from the statistical procedure so that they become computationally appealing while still retaining good (sometimes nearly optimal) statistical performance. Another approach, which is fueled by the advent of cloud-computing, is to distribute the computation (and possibly data) among different machines. In its most simple form a dataset is split into smaller datasets that are processed in parallel on local servers – the results of those computations are then aggregated on a central machine. There are still no general principles that can be used to guide practitioners in splitting the data or later aggregating the results, and this is currently an active research area in the interface of statistics and computer science. Understanding the tradeoff between statistical accuracy and computational feasibility (in both distributed and non-distributed settings) is of paramount importance to address these issues.

