This study »AI models collapse when trained on recursively generated data« compellingly shows what happens when an AI uses its own texts as a basis. If this process is repeated — i.e., the AI is repeatedly fed its own generated texts — a clear pattern emerges by the ninth iteration at the latest: the output becomes increasingly nonsensical.
Therefore, developers of AI models must carefully weigh which sources are truly valuable and which are better avoided. In this context, applying digital fingerprints to texts could make sense, if not become essential, to prevent AIs from consuming their own output1. Technically, however, this will not provide complete protection, since these mechanisms could be circumvented by using different GPT models. Even if AI providers agree on common fingerprinting methods, truly free chatbots running on local hardware will always be able to bypass such measures.
Maybe this technology is also doomed to fail — time will tell.
The text was automatically translated from German into English. The German quotations were also translated in sense.
The proverb “eat your own dog food” rings truer here than ever. ↩︎
Want to reply?
Send me a note via email and let's start a conversation. You can also follow along via RSS or Mastodon.