LADE Seminar

Sebastian Goldt (SISSA)

Title: The Gaussian world is not enough – how training data shapes neural representations


What do neural networks learn from their data? We discuss this question in two learning paradigms: supervised classification with feed-forward networks, and masked language modelling with transformers. First, we give analytical and experimental evidence for a “distributional simplicity bias”, whereby neural networks learn increasingly complex distributions of their inputs. We then show that neural networks learn from the higher-order cumulants (HOCs) more efficiently than lazy methods, and show how HOCs shape the learnt features. We finally characterise the distributions that are learnt by single- and multi-layer transformers, and show a similar distributional simplicity bias for masked language modelling.

Nov 29, 2023 11:30 AM — 12:30 PM
TBD, Centro Congressi, Area Science Park
Località Padriciano 99,, Trieste, 34149
