Skip to main content

Not all RAGs are made equal

· 2 min read
Benjamin D. Brodie

Hero image

Image credit: DALL-E 3 by OpenAI - prompt: "Digital render of a high-tech computer server room bathed in blue and green ambient lighting. Positioned prominently above the servers, digital data streams converge clearly and boldly to spell out the letters 'R', 'A', and 'G' in succession"

You know the feeling.

The seller said it was amazing, things would work perfectly out of the box.

But now that you have packed it out and powered it up, your new gadget just doesn't live up to the hype.

That was the feeling I had after using our first implementation of a chat bot. Trained on our documentation with everything in a nice FAISS vector database and the powerful GPT-4 to magically convert similarity scored embeddings into useful, scenario specific advice. Instead, it kept saying "The information provided does not specifically relate to ..."

What is going on? Where is the magic I was promised?

Confident that I had followed the instructions on the box to the letter, I figured someone else had to run into this issue. But before I could get around to Google'ing, this popped up on my X feed:

RAG is more than just embedding search

As I read the following section, a light switched on.

Query-Document Mismatch: This model assumes that query embedding and the content embedding are similar in the embedding space, which is not always true based on the text you're trying to search over. Only using queries that are semantically similar to the content is a huge limitation!

In other words, we shouldn't assume that the embedding for a user input is semantically similar to the embedding of the documentation content we want to provide as context to the LLM.

Consequences

So what does this mean for our use case?

We need to evaluate the original user input in a more traditional way, using a free text index capable of dealing with relative keyword relevance, typo correction, synonym substitution, etc. Then, based on context specific ranking parameters, produce a set of content chunks that are feed to the LLM for in-context learning.

Let's see what we can accomplish using open source tools.