Image of an outstretched hand clutching a white flag.

The Failure of RAG

RAG, or Retrieval-Augmented Generation, has been one of the big ideas since the launch of ChatGPT. It’s easy to get up and running as a demo but often falls short without a deep understanding of how to make it work.

Search problems

First, RAG relies on accurate and up-to-date information retrieval. If the data sources are flawed or outdated, you end up with garbage in, garbage out. Data goes out of date. It can also contain opinions and contradictions. When the information that is passed to the LLM is wrong, the end result is likely to be disappointing at best. An LLM doesn’t understand what is right or wrong. This is a search problem, and there are lots of ways to overcome this problem that the simplistic RAG demos don’t deal with.

Second, RAG assumes that the retrieval process will be seamless and effective. But we know that’s not always the case. Sometimes, the relevant information is buried under layers of data, making it hard to access. It’s like searching for a needle in a haystack, but you also need to make sure you are know you have the right haystack. You spend more time digging through irrelevant content than finding what you need. This is a search problem.

Context is king

Third, there’s the problem of context. Context is king. And this is related to the relevance of retrieved documents from search which can be hit or miss. The model sometimes grabs info that seems right on the surface but may not answer the user’s needs. RAG doesn’t always grasp the full context of the query. This can lead to outputs that miss the mark entirely. Just because you pull in some data doesn’t mean it fits well with the generated content or the intended question. The output can feel disjointed, leading to confusion, or it can just be plain wrong.

We’ve seen many instances where LLMs combine answers from different search results to develop a plausible but entirely wrong answer. One example was when we worked with a client to answer policy questions. There was one policy for children and another for cancer. When asked to create a “child cancer care” policy, the LLM created a horrifyingly credible description of a child cancer policy that didn’t exist. This example is totally unacceptable for a business and was caught during the development process. It was never deployed into a production environment. Hopefully, it illustrates how easy it is for LLMs to add two and two and get five. It’s best not to combine multiple context sources unless you provide a lot of context that the LLM can use to understand the nuances of what might be happening.

Corroboration

Lastly, the human touch is difficult to replicate. While RAG models can provide facts and figures, they often miss the nuance of human interaction. This is getting better and will continue to do so.

But, whether they sound human or not, if they are wrong, there will be problems that need attention.

As we build more robust, production-ready LLM systems, we’ve developed ways to handle many of these issues. Our SAGE approach ensures we embed evaluation into our process. This is designed to generate corroboration from multiple data sources to build confidence in answers like humans. The more weight we can generate for an answer being right, the more confident we can be. It’s a trade-off between speed and accuracy. Sometimes speed is more important. But, for almost any chat implementation for an enterprise, high accuracy is required.

A robust and repeatable way for delivery accuracy

These issues show that while RAG has potential, it doesn’t always deliver how we’d like it to without being clear about how you want answers to be dealt with. We know we must build many test and edge cases for things that can cause issues. The complexity of human language and the nuances of communication are tough challenges to overcome. Staying aware of these limitations when working with RAG and language models is essential.

Chat, and other open-ended generative tasks are some of the hardest things LLMs are required to do. In our experience, RAG can deliver 60-80% accuracy straight out of the box. For simpler tasks, you can get 95+% levels of accuracy quickly. But, automated chat will always require greater levels of accuracy for enterprise interactions. You can’t deploy systems that may or may not give the right answers to the public or unsuspecting users.

So, what happens when RAG is not getting the results you need? Our SAGE model leads to highly accurate answers and will cater to most knowledge agent systems. However, there are use cases where the required accuracy level, coupled with the nuance expected in incoming questions, drives us towards RAG and/or fine-tuned models. More about that in another blog post soon!

Get in touch

Have a question? Drop in your info, and let's start chatting.

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Cookie	Duration	Description
_ga	2 years	The _ga cookie, installed by Google Analytics, calculates visitor, session and campaign data and also keeps track of site usage for the site's analytics report. The cookie stores information anonymously and assigns a randomly generated number to recognize unique visitors.
_gcl_au	3 months	Provided by Google Tag Manager to experiment advertisement efficiency of websites using their services.
_gid	1 day	Installed by Google Analytics, _gid cookie stores information on how visitors use a website, while also creating an analytics report of the website's performance. Some of the data that are collected include the number of visitors, their source, and the pages they visit anonymously.
_hjAbsoluteSessionInProgress	30 minutes	Hotjar sets this cookie to detect the first pageview session of a user. This is a True/False flag set by the cookie.
_hjFirstSeen	30 minutes	Hotjar sets this cookie to identify a new user’s first session. It stores a true/false value, indicating whether it was the first time Hotjar saw this user.
_hjid	1 year	This is a Hotjar cookie that is set when the customer first lands on a page using the Hotjar script.
_hjIncludedInPageviewSample	2 minutes	Hotjar sets this cookie to know whether a user is included in the data sampling defined by the site's pageview limit.
CONSENT	16 years 2 months 20 days 10 hours 10 minutes	YouTube sets this cookie via embedded youtube-videos and registers anonymous statistical data.

Cookie	Duration	Description
_fbp	3 months	This cookie is set by Facebook to display advertisements when either on Facebook or on a digital platform powered by Facebook advertising, after visiting the website.
fr	3 months	Facebook sets this cookie to show relevant advertisements to users by tracking user behaviour across the web, on sites that have Facebook pixel or Facebook social plugin.
IDE	1 year 24 days	Google DoubleClick IDE cookies are used to store information about how the user uses the website to present them with relevant ads and according to the user profile.
test_cookie	15 minutes	The test_cookie is set by doubleclick.net and is used to determine if the user's browser supports cookies.
uid	68 years 18 days 3 hours 14 minutes	This cookie is used to measure the number and behavior of the visitors to the website anonymously. The data includes the number of visits, average duration of the visit on the website, pages visited, etc. for the purpose of better understanding user preferences for targeted advertisments.
VISITOR_INFO1_LIVE	5 months 27 days	A cookie set by YouTube to measure bandwidth that determines whether the user gets the new or old player interface.
YSC	session	YSC cookie is set by Youtube and is used to track the views of embedded videos on Youtube pages.
yt-remote-connected-devices	never	YouTube sets this cookie to store the video preferences of the user using embedded YouTube video.
yt-remote-device-id	never	YouTube sets this cookie to store the video preferences of the user using embedded YouTube video.
yt.innertube::nextId	never	This cookie, set by YouTube, registers a unique ID to store data on what videos from YouTube the user has seen.
yt.innertube::requests	never	This cookie, set by YouTube, registers a unique ID to store data on what videos from YouTube the user has seen.

Cookie	Duration	Description
_pin_unauth	1 year	No description available.
_pinterest_ct_ua	1 year	No description available.
lastPage	1 hour	No description available.