AI imaging tools like Stable Diffusion remember training images and generate nearly identical copies of them. Writes about it Gizmodo.
According to the paper, the researchers extracted more than a thousand training examples from the models, which included photographs of people, film stills, company logos, and other images. Scientists have found that AI can generate exactly the same pictures with small changes like increasing the noise.
As an example, they cited a photograph of the American preacher Ann Graham Lotz, taken from Wikipedia. When they entered a query into Stable Diffusion “Anne Graham Lotz”, the AI returned the same image with added noise.
Models such as Stable Diffusion are trained on copyrighted, trademarked, private, and sensitive images.
Yet, our new paper shows that diffusion models memorize images from their training data and emit them at generation time.
— Eric Wallace (@Eric_Wallace_) January 31, 2023
The researchers measured the distance between the pixels in both images. The analysis showed that they are almost identical.
The process of finding duplicates turned out to be quite simple. The researchers entered the same clue several times in a row. When the generator returned identical images, they manually searched for the same image in the training set.
Scientists noted that the effect of “remembering” is rare. In total, they checked about 300,000 requests. The analysis showed that the rate of “memorization” of the generators is only 0.03%.
Moreover, Stable Diffusion is the least likely to copy images. Scientists believe that this was achieved thanks to the deduplication of the training dataset.
Google’s Imagen algorithm is more prone to copying.
“The caveat is that the model should generalize and generate new images, not produce a learned version,” said co-author Vikash Sehwag.
The study also showed that as AI generators grow, the “memorization” effect will increase.
“Whatever new model comes out, much bigger and more powerful, the potential risks of “remembering” will be much higher than now,” said study co-author Eric Wallace.
Scientists believe that the ability of diffusion generators to reproduce content can become a reason for disputes over copyright. According to Florian Tramer, professor of computer science at ETHZ, many companies provide a license for the exchange and monetization of AI images. However, if the generator recreates a copyrighted work, this may cause conflicts.
Most images we extract are copyrighted. Very few (eg. the picture in Eric’s tweet) allow for free re-distribution (with attribution).
Not a lawyer, so I don’t know what this implies.
But you likely can’t make the (common) argument that these models don’t copy training data! pic.twitter.com/vVEahLA13C
— Florian Tramer (@florian_tramer) January 31, 2023
The study was conducted by scientists from Google, DeepMind, ETH Zurich, Princeton University and UC Berkeley.
Recall that in January, a group of artists sued the developers of AI generators due to possible copyright infringement.
Subscribe to CryptoNewsHerald news in Telegram: CryptoNewsHerald AI – all the news from the world of AI!
Found a mistake in the text? Select it and press CTRL+ENTER
CryptoNewsHerald Newsletters: Keep your finger on the pulse of the bitcoin industry!