Bbabo NET

Science & Technology News

Yandex has updated the YandexART diffusion neural network to version 1.3

Hello! My name is Evgeniy Lyapustin, I am a senior developer in the computer vision team. Together with our colleagues from Yandex Research, we have updated the YandexART diffusion neural network to version 1.3.

The main change is that the neural network switched to latent diffusion technology. In addition, the dataset on which the model was trained was increased by 2.5 times. Thanks to this, the new version of YandexART better understands text queries and creates even more realistic images.

YandexART 1.3 is already used in Masterpiece, whose users now have the opportunity to create images in different formats, such as 16:9, 4:3 or 3:4. Later, the updated neural network will begin to be used in other Yandex services.

With cascade diffusion, the image progressively improves with increasing resolution. Latent diffusion works differently. It forms an intermediate latent representation of the image in the form of a compact description containing basic information about the image in a compressed form. The neural network then expands the code into a full high-resolution image in one step.

Latent diffusion technology consumes less computing resources and allows you to create more realistic graphics. We have seen this in practice. We trained two versions of the model under the most similar conditions: cascade and latent. And at each stage of training, the latent one won in quality and speed measurements.

The dataset has been increased from 330 million picture-text pairs to more than 850 million pairs. In order for the model to better understand user requests, synthetic texts were added to the dataset on which it was trained—more detailed descriptions of images generated by the neural network. In the picture below you can see an example of synthetic text.

In addition, in order for YandexART to take into account more details from the prompt, the new model uses not one, but two text encoders. The first is our encoder from the previous version 1.2, which was trained on matching picture-text pairs.

The second one is new for us, based on the open source umt5_xxl. Unlike the first one, this encoder was trained only on texts. Two different encoders give the model signals of different nature.

According to the results of SBS measurements by Yandex assessors, the YandexART 1.3 neural network wins in 57 percent of cases compared to Midjourney V5.2 and in 63 percent of cases compared to the previous version of YandexART 1.2.

Yandex has updated the YandexART diffusion neural network to version 1.3