Bbabo NET

Science & Technology News

Scientists have shown why large neural networks work better

In a paper presented in December at the flagship NeurIPS conference, Sebastien Bubeck of Microsoft Research and Mark Sellke of Stanford University showed that neural networks need to be much larger to avoid some of the problems in their operation.

Standard expectations about the size of neural networks are based on an analysis of how they remember data. One of the popular tasks for neural networks is the identification of objects in images. To create such a neural network, researchers first provide it with a set of images with object labels, teaching it to study the correlations between them. As soon as the neural network remembers enough training data, it also gets the ability to predict the labels of objects that it has never seen, with varying degrees of accuracy. This process is known as generalization.

The size of a network determines how much information it can remember. Images, for example, are described by hundreds or thousands of values ​​- one for each pixel. This set of free values ​​is mathematically equivalent to the coordinates of a point in multidimensional space. The number of coordinates is called the dimension.

Since the 80s, neural networks have been given as many n parameters to fit n data points—regardless of the dimensionality of the data. However, modern neural networks have more parameters than the number of training samples.

The researchers considered such a parameter as the reliability of the neural network in connection with its scale. In their work, they show that redundant parameterization is necessary for network reliability.

Scientists have shown that fitting high-dimensional data points requires not just n parameters, but n × d parameters, where d is the dimension of the input data (for example, 784 for a 784-pixel image). The proof is based on a fact from multidimensional geometry, namely that randomly distributed points placed on the surface of a sphere are almost all within a full diameter of each other.

Other research has identified additional reasons why overparametrization is useful. For example, it can improve the efficiency of the learning process, as well as the ability of the neural network to generalize.

Earlier, Google published a study on the main machine learning trends in 2021. The company predicts the development of more powerful general purpose ML models with billions and even trillions of parameters.

Scientists have shown why large neural networks work better