This was part of
The Multifaceted Complexity of Machine Learning
New Problems and Perspectives on Sampling, Memory, and Learning
Gregory Valiant, Stanford University
Tuesday, April 13, 2021
Abstract: In the first portion of the talk, I will introduce the problem of data “amplification”. Given n independent draws from a distribution, D, to what extent is it possible to output a set of m > n datapoints that are indistinguishable from m i.i.d. draws from D? Curiously, we show that nontrivial amplification is often possible in the regime where n is too small to learn D to any nontrivial accuracy. Next, I will discuss a new framework for modeling learning and estimation in the many settings where the data is non-iid (e.g. due to non-stationarity, correlated sampling, etc.). Surprisingly, this framework provides some new (and improved!) estimators for some basic classical settings. Finally (and time dependent), I will discuss linear regression and convex optimization under memory constraints. The emphasis throughout will be on the many open directions for future research provided by these new perspectives and frameworks.