When Generative AI Goes "Wrong"
/A version of this first appeared in Private Company Director: https://www.privatecompanydirector.com/features/what-your-board-should-know-diving-ai
Most companies are rapidly working to develop an AI strategy, prompted in no small part by the success of “generative AI” tools like GPT. For all of their business potential, it’s important to remember that their responses are sometimes very plausible and very wrong. Over time, they may get better in some ways and intentionally worse in others.
Today’s wrong answers are largely unintentional, and happen for two reasons. First, tools like Microsoft’s GPT based Bing and Google’s Bard have been trained on huge amounts of data scraped from the Internet. From this, they synthesize likely responses to questions. Their breadth of coverage is stunning, but as any parent will tell you it’s not a good idea to believe everything that’s posted online. The Internet is full of unfiltered, biased information, misinformation, and sometimes disinformation. Ask a question about medicine and you may get an answer sourced from a combination of peer-reviewed scientific journals and a blog from someone with an agenda to push. If your training data is junk, so are your answers. Microsoft’s first foray into AI chatbots was called Tay, and was “trained” by interacting online. Unfortunately, an online community trained it to be racist and homophobic. Microsoft pulled the plug.
The second reason is that generative AI generates. The act of synthesizing an answer includes extrapolating. Simply put, that means inventing likely but made-up data. In one recent example, an attorney in New York used ChatGPT to research legal precedents for a court filing. ChatGPT generated 6 bogus court cases and even created fake excerpts of judicial opinions. When challenged, ChatGPT insisted that the fake information was real. The court ordered both the attorney and his firm to explain why they shouldn’t be penalized for the citation of “…non-existent cases…[and] judicial opinions”.
This doesn’t mean avoiding AI. Generative AI like ChatGPT is changing the way people search. Instead of returning thousands of relevant links to read, it already “read” the material and synthesizes an answer. As we grow to depend on synthesized answers, it’s worth remembering what we lose in this trade; we don’t get to see the data behind the answers. Not having to slog through piles of results was the point, of course, but this lack of transparency means we don’t know the quality of the sources used and what references were invented.
Back when we got a pile of links, we could apply critical thinking to what we read. If we knew the source had an agenda, we could take that into account. Now, we get a result and we have to trust that the models were trained on unbiased data. Trusting the quality is hard to do when most systems have been trained by crawling the Internet. There is a massive amount of information to learn online, but some of it is massively wrong.
Why not train our own AI systems to ensure the data is unbiased? There are two problems with that approach.
First, the value of these systems is based on the vast amount of content they have been trained on. Simply put, they know more, and can do more, because they consumed more. Having to check and pre-screen everything isn’t practical if we want them to learn from everything possible. Early tools, such as Wolfram Alpha, trained their systems on data that was carefully curated. While accurate, their answers were limited in scope. It’s why most people who’ve heard about ChatGPT haven’t heard about Wolfram Alpha.
The second issue is cost. It’s been estimated that it costs millions of dollars to train a system like ChatGPT using rented, specialty processors from AWS.
If you have a narrow field of use, such as reading financial reports, you can curate specific data and train your own models. For a general-purpose search engine replacement that seems to “know” about everything, it isn’t practical (just as it isn’t practical for everyone to build their own search engine).
As with Internet search, there will probably be only a handful of successful “GPT search engines”. Small wonder that Microsoft and Google are rushing to be dominant players. Nobody wants to be a future version of Altavista.
Since this new type of general purpose GPT search comes with the downside of not knowing if their sources were wrong, it implies another issue: what if the sources are intentionally wrong or biased? With traditional search we have “sponsored” results; people pay to promote their answers by placing them near the top of the pile. How do search companies, who make money selling ads and promoting specific content, charge for a single answer? When I ask “what’s the safest and most reliable car under $40,000?” in a GPT search engine, I have to understand that answer might be biased or invented by accident. Do I also have to worry that a car company might have “sponsored” the answer by paying to bias the training data and promote their product.
Hackers are already testing ways to intentionally bias AI training data and influence the answers these systems give. Is it really a stretch to think that advertisers won’t want to intentionally bias the answers to make money?
What’s needed is either transparency into how answers were generated, eliminating the AI “black box”, or testing and certification that the data used to train AI models was unbiased. Without that, we all need to double check the answers before buying that car.