Generative AI accuracy: How to prevent hallucinations

Tue, 5th Dec 2023

FYI, this story is more than a year old

By Warren Schilpzand, Area Vice President of Australia and New Zealand, DataStax

Large language models (LLMs) have taken the world by storm since the launch of ChatGPT in November 2022 and Google Bard a few months later. Individuals use generative AI-based LLMs for everything from planning a trip to writing poetry, while business has put them to work, reducing mundane tasks for employees, improving customer service, and so much more.

But there's an elephant in the room. LLMs don't always give the correct answer. In the industry, these wrong responses are called 'hallucinations' and mean LLMs sometimes can't be relied upon to get things right.

Why do LLMs hallucinate? At the recent DataStax I Love AI event, which was held on August 23, Alan Ho, who leads our generative AI strategy, took a deep dive into this troubling behaviour, providing some insight into why this ground-breaking technology sometimes just makes answers up. He also gave some answers as to how to prevent hallucinations, the most important of which is to use retrieval augmented generation (RAG), which we'll get to shortly.

Why LLMs hallucinate

There are two ways to build and use an LLM. The first, characterised by ChatGPT and others, is to train the model using publicly available data – that is, information scraped off the internet. For businesses, however, there are real problems with public training data.

Along with public data, LLMs also train themselves using whatever information the user enters into the prompt box, and for this reason, lots of companies have banned the use of public AI. This prevents workers from entering sensitive data and then having this data become part of the public training set.

For businesses, an approach to overcome this is to build their own LLM using corporate data, which won't be released to the general public. The typical workflow is to build and train the model (there are pre-built models for organisations to use, so they don't have to go to the expense of creating their own), fine-tune it with the corporate data, and then prompt it.

The thing is, an LLM is only as good as the data it's trained on, which is why they make things up. LLMs hallucinate when a prompt asks for information that isn't part of its training set or if the data – such as information taken from the internet – is incorrect. Finally, LLMs simply can't remember all the facts, and when this happens, again, they hallucinate. Worse, fine-tuning the LLM, which is said to solve the problem of hallucinations, doesn't actually prevent the problem from occurring.

So, what can be done to prevent hallucinations?

Building better, more reliable LLMs.

How does an LLM embody what it knows? There are two ways – explicitly and implicitly, and a good AI system needs to use both.

Explicit knowledge is when the facts and queries are stored in a database, while implicit knowledge means it's encoded in the weights and connections between the LLM's neurons. To reduce hallucinations, both systems must work together. Otherwise, the chance of getting a wrong answer increases.

One of the most powerful methods for eliminating hallucinations is through using retrieval augmented generation, the aforementioned RAG. With RAG, the LLM relies on answers stored in a vector database. This means the answers are checked, and correct answers are then stored within the database, reinforcing the responses made by the LLM to subsequent prompts.

LLMs are powerful tools and can perform tasks including summarising information, planning, and even engaging in basic reasoning. Reasoning also allows the LLM to reduce hallucinations because it's able to make inferences based on very little concrete information, just like the human brain does.

Reasoning is then strengthened by feedback when the answer is correct. A good analogy is how children learn: they approach a problem based on past experience, inferring what the correct course of action is. When a parent tells them they got the answer right, it reinforces their knowledge and their ability to make reasoned responses in future.

By using RAG and reasoning, the chance of an LLM hallucinating is reduced but not eliminated. By lowering the frequency when an LLM hallucinates, owners can build trust in the responses it gives. We're still at the beginning of the AI/LLM revolution, and it's going to be exciting to see how the technology develops and becomes more reliable and trustworthy during the next few years.

I LOVE AI is coming to Sydney on December 7. Discover how AI leaders are overcoming the biggest challenges to delivering AI at scale at the I Love AI World Tour.

Share on: