A framework for designing intelligent agents

SAGE

LLMs, RAG and beyond

ChatGPT, Claude and Gemini are super powerful applications. And they are becoming more powerful every month.

Whilst they are powerful and general, they aren’t always immediately applicable to enterprise solutions without strategies for integrating data, aligning language and behaviours, and ensuring they’re reliable and performant for specific work-based tasks. This means LLMs such as GPT4, Llama3 and Mistral need to be extended.

Retrieval Augmented Generation (RAG) has become a standard pattern for connecting internal and proprietary data sources with powerful language models to create intelligent agents.

RAG is often discussed as a singular concept. In truth, many decisions must be made when considering the requirements of the intelligent agent. At Brightbeam, our SAGE framework helps us consider all of the key elements for ensuring that intelligent agents are fit for purpose for the task at hand. But RAG doesn’t do everything you need it to. Sometimes, you need to consider using multiple models and agents or fine-tuning for specific tasks or complex pre- and post-processing tasks.

This post is an initial overview of the framework to think through these decisions.

The right solution for the right problem

The first thing to consider is what the intelligent agent is supposed to do. We are already used to seeing very powerful, general-purpose chatbots like ChatGPT. You can ask them almost anything, and they will do a decent job. But in Enterprise workflows, you don’t necessarily need general purpose, you likely need specific and specialised knowledge.

For instance, if you have something that can scan paperwork, extract the right data, store that within Enterprise systems, take action from what is stored and kick off workflows with human colleagues, this has a different set of requirements than an email writer or a customer chatbot. It’s always essential to look at how you believe an intelligent agent will be used by people (if at all) and what the right interface (chat or otherwise) is.

Considerations for design

SAGE examines several aspects of the architecture and design of AI-based agents. We will explore each of these.

Store and search

Store and search are linked concepts. How we store something – the strategy we decide on – and what information we need to complete specific tasks will impact our search and retrieval strategies.

Some decisions we will need to make:

What information will drive our agent? Do we need to ingest a lot of information before we deploy it so that it can do the job?
Is there information we need to store and have access to that is more dynamic? I.e. from databases, up-to-date news
When we ingest new data, do we need to enrich it and store it in ways that add meaning later on?

We must consider how we connect unstructured knowledge from wikis, documents, email or other written sources with structured data such as CRM or Operational data sources as examples. The way we store data using things like Vector Databases will impact how easy to find and retrieve the right information when interacting with our agent. There are loads of technical choices, and the decisions will likely be driven by existing infrastructure decisions (AWS, Azure, GCP, and others) and databases. There are now many new vector databases that specifically support LLMs and intelligent search, which are very good.

Search encompasses everything we need to identify, find and retrieve for our intelligent agent to do its job effectively. This could be from internal systems or external data sources. It might be something in the intelligent agent or completely from the outside. Search is all about balancing speed and correctness in order to do a task. It might also be part of a process that prompts the intelligent agent to ask more questions before they have all the necessary information to do their job effectively.

Assess, access and augment

Assess examines how we should treat search results. We need to decide if we think the information we have is trustworthy, accurate, and reliable enough to execute a task or generate an answer.

Often, search is about figuring out the right haystacks to dig in to find the perfect needle. Assess takes the needle-finding one step further and determines whether it’s the right needle or whether the agent needs more information.

In many companies, certain pieces of information aren’t for public consumption or use. We want agents that are capable of using as much information as possible, but we must also ensure that the user of the agent has the appropriate credentials and access. Access can be determined at all points when interacting with an intelligent agent. Access controls can be built in at design time – i.e. don’t give an agent information it shouldn’t be using – or in runtime – i.e. does the person I’m currently interacting with have the appropriate permissions to know what I’ve found?

Augmented generation is about providing the right context for the intelligent agent to do the job. In reality, you don’t want to give too much information as this can result in hallucinations, confabulations and misinformation. Too little, and it can’t do the job correctly. There are also cost and speed implications of providing too much information. Now we have context windows that can support 1.4M words (or 2M tokens) it does mean LLMs have access to a lot of information, but that doesn’t come for free and will take time to process in every transaction.

There are also many ways that intelligent agents can be augmented—it might be through Prompt Engineering, Machine-Managed Prompting such as DSPy, or frameworks such as LangChain or LlamaIndex. Each choice has pros and cons, which we will cover in future posts.

Generate

Generate is the point where we end up using foundational, fine-tuned or bespoke models to do the job at hand. In a chatbot, this is used to interact with the user. For some other intelligent agents, it might be to execute a task. The approach to generation, prediction, summarisation or any other discriminative or generative approach will be dictated by the task at hand. We must decide what the best model is for speed, accuracy, latency and cost.

We always have the choice of running multiple models chained together with processing between them. We can have one model check the output of another model. We can have them collaborate with each other. We can use the power of a general model like GPT4 and couple that with a fine-tuned bespoke model that really understands the Enterprise context and language. All of these are valid choices depending on the use case.

Evaluate

One of the toughest parts of any intelligent agent is to determine if the task you’ve just done is correct or not. Did you use the right inputs? Has the generation produced something that is coherent and appropriate for the end used?

If you’re about to give an answer back to a user, in some cases where accuracy is demanded, you have to be sure that you are giving them the right information. For many knowledge tasks, this requires corroborated information or some other way to triangulate whether the task is being done successfully. Our evaluation choices need to take this into consideration.

We consider and design appropriate benchmarks during the training and design process. But, we also consider how to corroborate answers with previous experiences and known good answers. For instance, if we have a customer service support AI, we might examine previous interactions with customers that were done with people to see how well the AI aligns. In some ways, this is how people work. We rely on previous experiences and knowledge to know whether what we are going to say is good or not.

Levels of accuracy and performance

We have levels for each element of SAGE that allow us to examine how much we need to build to ensure accuracy. But accuracy and precision come at a price. The price is often speed and cost, so you don’t always want to use the most sophisticated models, the longest context lengths, or multiple calls to different models. We will explore the different levels in following posts.

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Cookie	Duration	Description
_ga	2 years	The _ga cookie, installed by Google Analytics, calculates visitor, session and campaign data and also keeps track of site usage for the site's analytics report. The cookie stores information anonymously and assigns a randomly generated number to recognize unique visitors.
_gcl_au	3 months	Provided by Google Tag Manager to experiment advertisement efficiency of websites using their services.
_gid	1 day	Installed by Google Analytics, _gid cookie stores information on how visitors use a website, while also creating an analytics report of the website's performance. Some of the data that are collected include the number of visitors, their source, and the pages they visit anonymously.
_hjAbsoluteSessionInProgress	30 minutes	Hotjar sets this cookie to detect the first pageview session of a user. This is a True/False flag set by the cookie.
_hjFirstSeen	30 minutes	Hotjar sets this cookie to identify a new user’s first session. It stores a true/false value, indicating whether it was the first time Hotjar saw this user.
_hjid	1 year	This is a Hotjar cookie that is set when the customer first lands on a page using the Hotjar script.
_hjIncludedInPageviewSample	2 minutes	Hotjar sets this cookie to know whether a user is included in the data sampling defined by the site's pageview limit.
CONSENT	16 years 2 months 20 days 10 hours 10 minutes	YouTube sets this cookie via embedded youtube-videos and registers anonymous statistical data.

Cookie	Duration	Description
_fbp	3 months	This cookie is set by Facebook to display advertisements when either on Facebook or on a digital platform powered by Facebook advertising, after visiting the website.
fr	3 months	Facebook sets this cookie to show relevant advertisements to users by tracking user behaviour across the web, on sites that have Facebook pixel or Facebook social plugin.
IDE	1 year 24 days	Google DoubleClick IDE cookies are used to store information about how the user uses the website to present them with relevant ads and according to the user profile.
test_cookie	15 minutes	The test_cookie is set by doubleclick.net and is used to determine if the user's browser supports cookies.
uid	68 years 18 days 3 hours 14 minutes	This cookie is used to measure the number and behavior of the visitors to the website anonymously. The data includes the number of visits, average duration of the visit on the website, pages visited, etc. for the purpose of better understanding user preferences for targeted advertisments.
VISITOR_INFO1_LIVE	5 months 27 days	A cookie set by YouTube to measure bandwidth that determines whether the user gets the new or old player interface.
YSC	session	YSC cookie is set by Youtube and is used to track the views of embedded videos on Youtube pages.
yt-remote-connected-devices	never	YouTube sets this cookie to store the video preferences of the user using embedded YouTube video.
yt-remote-device-id	never	YouTube sets this cookie to store the video preferences of the user using embedded YouTube video.
yt.innertube::nextId	never	This cookie, set by YouTube, registers a unique ID to store data on what videos from YouTube the user has seen.
yt.innertube::requests	never	This cookie, set by YouTube, registers a unique ID to store data on what videos from YouTube the user has seen.

Cookie	Duration	Description
_pin_unauth	1 year	No description available.
_pinterest_ct_ua	1 year	No description available.
lastPage	1 hour	No description available.