If AIs use data produced by other AIs, it degrades their ability to make meaningful observations or reach useful conclusions.
In other words, they go insane.
The problem is called “AI model collapse” and it occurs when AIs like LLMS (ChatGPT et al) create a “recursive” loop of generating and consuming data. For those of us old enough to remember copiers, think copy of a copy of a copy.
The images get blurry and the text harder to read.
It’s an inevitable outcome of the fact that AI developers have all but used up most of the data available on the Internet to train their models, and the most rich sources of reliable stuff are now restricting said access.
And estimates range from some to lots of that available (or formerly so) data is already tainted by AI, whether by creation or translation. It’s no surprise since there are concerted efforts underway to generate AI content on purpose for the very purpose of training AIs: called “synthetic data,” it has been suggested that it could top the presence of “real data” by the end of the decade.
Just think of the potential for AIs to get stuff wrong by default or, worse, AIs used to purposely generate data to convince other AIs of something wrong or sinister. It will supercharge gaming a system of information that is already corrupt.
There are three most obvious ways to address this emergent problem:
First, and probably the most overt, will be the development of tools to try to stop or mitigate it. We’ll hear from the tech boffins about “guardrails” and “safeguards” that will make AI model collapse less likely or severe.
And, when something weird or scary happens, they’ll label it with something innocuous (I love the idea that AIs already making shit up is called “hallucinating”) and then come up with more AIs to police that new problem that also creates demands for money as it prompts more problems.
Second, and more insidiously, the boffins will continue to flood our lives with gibberish about how “free speech” means the unfettered sharing and amplifying of misunderstanding, falsehoods, and lies, which will further erode our ability to distinguish between sane or mad AI. After all, who’s to say people of color weren’t Nazis or other fictions weren’t historical fact (or visa versa)?
Slowly, we’re being conditioned to see bias and inaccuracies as artifacts of opinion or process, not facts. An insane AI may well be viewed as no worse than our friends and family members whose ideas and beliefs are demonstrably wrong or nutty (though utterly right to them).
Third, and least likely, is that regulators could step up and do something about it, like demand that the data used for training AIs is vetted and maybe even certified fresh, or good, or whatever.
We rely on government to help ensure that hamburgers aren’t filled with Styrofoam and prescription drugs aren’t made with strychnine. Cars must meet some bare minimum safety threshold, as do buildings under construction, etc.
How is it that there are no such regulatory criteria for training AI models?
Maybe they just don’t understand it. Maybe they’re too scared to impede “innovation” or some other buzzword that they’ve been sold. Maybe they’ve asked ChatGPT for its advice and were told that there’s nothing to worry about.
Whatever the cause, the fact that we’re simply watching Als slowly descend into insanity is simply mad.