Statistical phenomena in data selection and data enrichment

titleStatistical phenomena in data selection and data enrichment
start_date2024/06/27
schedule11h30-12h30
onlineno
location_infoAmphi Jaurès
summaryBuilding powerful machine learning models and training them has become increasingly possible thanks to new architecture, software infrastructure, and the prevalence of foundation models. Nowadays, developing a high-quality dataset for a specific use case of interest is often the key bottleneck to successful machine learning applications. I will discuss two approaches towards alleviating this problem: selecting highly informative samples from a large dataset, and merging a small data sat with surrogate data from a different source. In will overview some of the ideas in the literature on this problem, and present some findings arising from the analysis of simple statistical models. [Based on joint work with Germain Kolossov, Ayush Jain, Eren Sasoglu, Pulkit Tandon]
responsiblesLoureiro, Lorenzi, Peyré, Biroli, Mallat