This article revisits the RAG pipeline, which has been covered in a number of previous notebooks. This pipeline is a combination of a similarity instance (embeddings or similarity pipeline) to build a question context and a model that answers questions.

The RAG pipeline recently underwent a number of major upgrades to support the following.

Ability to run embeddings searches. Given that content is supported, text can be retrieved from the embeddings instance.
In addition to extractive qa, support text generation models, sequence to sequence models and custom pipelines

These changes enable embeddings-guided and prompt-driven search with Large Language Models (LLMs) 🔥🔥🔥

Install dependencies

Install txtai and all dependencies.

# Install txtai
pip install txtai datasets

Create Embeddings and RAG instances

An Embeddings instance defines methods to represent text as vectors and build vector indexes for search.

The RAG pipeline is a combination of a similarity instance (embeddings or similarity pipeline) to build a question context and a model that answers questions. The model can be a prompt-driven large language model (LLM), an extractive question-answering model or a custom pipeline.

Let's run a basic example.

from txtai import Embeddings, RAG

# Create embeddings model with content support
embeddings = Embeddings({"path": "sentence-transformers/all-MiniLM-L6-v2", "content": True})

# Create RAG instance
rag = RAG(embeddings, "google/flan-t5-base")

data = ["Giants hit 3 HRs to down Dodgers",
        "Giants 5 Dodgers 4 final",
        "Dodgers drop Game 2 against the Giants, 5-4",
        "Blue Jays beat Red Sox final score 2-1",
        "Red Sox lost to the Blue Jays, 2-1",
        "Blue Jays at Red Sox is over. Score: 2-1",
        "Phillies win over the Braves, 5-0",
        "Phillies 5 Braves 0 final",
        "Final: Braves lose to the Phillies in the series opener, 5-0",
        "Lightning goaltender pulled, lose to Flyers 4-1",
        "Flyers 4 Lightning 1 final",
        "Flyers win 4-1"]

def prompt(question):
  return f"""
    Answer the following question using the context below.
    Question: {question}
    Context:
  """

questions = ["What team won the game?", "What was score?"]

execute = lambda query: rag([(question, query, prompt(question), False) for question in questions], data)

for query in ["Red Sox - Blue Jays", "Phillies - Braves", "Dodgers - Giants", "Flyers - Lightning"]:
    print("----", query, "----")
    for answer in execute(query):
        print(answer)
    print()

---- Red Sox - Blue Jays ----
('What team won the game?', 'Blue Jays')
('What was score?', '2-1')

---- Phillies - Braves ----
('What team won the game?', 'Phillies')
('What was score?', '5-0')

---- Dodgers - Giants ----
('What team won the game?', 'Giants')
('What was score?', '5-4')

---- Flyers - Lightning ----
('What team won the game?', 'Flyers')
('What was score?', '4-1')

This code runs a series of questions. First it runs an embeddings filtering query to find the most relevant text. For example, Red Sox - Blue Jays finds text related to those teams. Then What team won the game? and What was the score? are asked.

This logic is the same logic found in Notebook 5 - Extractive QA with txtai but uses prompt-based QA vs extractive QA.

Embeddings-guided and Prompt-driven Search

Now for the fun stuff. Let's build an embeddings index for the ag_news dataset (a set of news stories from the mid 2000s). Then we'll use prompts to ask questions with embeddings results as the context.

from datasets import load_dataset

dataset = load_dataset("ag_news", split="train")

# List of all text elements
texts = dataset["text"]

# Create an embeddings index over the dataset
embeddings = Embeddings({"path": "sentence-transformers/all-MiniLM-L6-v2", "content": True})
embeddings.index((x, text, None) for x, text in enumerate(texts))

# Create RAG instance
rag = RAG(embeddings, "google/flan-t5-large")

def prompt(question):
  return f"""Answer the following question using only the context below. Say 'no answer' when the question can't be answered.
Question: {question}
Context: """

def search(query, question=None):
  # Default question to query if empty
  if not question:
    question = query

  return rag([("answer", query, prompt(question), False)])[0][1]

question = "Who won the 2004 presidential election?"
answer = search(question)
print(question, answer)

nquestion = "Who did the candidate beat?"
print(nquestion, search(f"{question} {answer}. {nquestion}"))

Who won the 2004 presidential election? George W. Bush
Who did the candidate beat? John F. Kerry

And there are the answers. Let's unpack how this works.

The first thing the RAG pipeline does is run an embeddings search to find the most relevant text within the index. A context string is then built using those search results.

After that, a prompt is generated, run and the answer printed. Let's see what a full prompt looks like.

text = prompt(question)
text += "\n" + "\n".join(x["text"]for x in embeddings.search(question))
print(text)

Answer the following question using only the context below. Say 'no answer' when the question can't be answered.
Question: Who won the 2004 presidential election?
Context: 
Right- and left-click politics The 2004 presidential race ended last week in a stunning defeat for Massachusetts Senator John F. Kerry, as incumbent President George W. Bush cruised to an easy victory.
2004 Presidential Endorsements (AP) AP - Newspaper endorsements in the 2004 presidential campaign between President Bush, a Republican, and Sen. John Kerry, a Democrat.
Presidential Campaign to Nov. 2, 2004 (Reuters) Reuters - The following diary of events\leading up to the presidential election on Nov. 2.

The prompt has the information needed to determine the answers to the questions.

Additional examples

Before moving on, a couple more example questions.

question = "Who won the World Series in 2004?"
answer = search(question)
print(question, answer)

nquestion = "Who did they beat?"
print(nquestion, search(f"{question} {answer}. {nquestion}"))

Who won the World Series in 2004? Boston
Who did they beat? St Louis

search("Tell me something interesting?")

herrings communicate by farting

Whhaaaattt??? Is this a model hallucination?

Let's run an embeddings query and see if that text is in the results.

answer = "herrings communicate by farting"
for x in embeddings.search("Tell me something interesting?"):
  if answer in x["text"]:
    start = x["text"].find(answer)
    print(x["text"][start:start + len(answer)])

herrings communicate by farting

Sure enough it is. It appears that the FLAN-T5 model has a bit of an immature sense of humor 😃

External API Integration

In addition to support for Hugging Face models, the RAG pipeline also supports custom question-answer models. This could be a call to the OpenAI API (GPT-3), Cohere API, Hugging Face API or using langchain to manage that. All that is needed is a Callable object or a function!

Let's see an example that uses the Hugging Face API to answer questions. We'll use the original sports dataset to demonstrate.

import requests

data = ["Giants hit 3 HRs to down Dodgers",
        "Giants 5 Dodgers 4 final",
        "Dodgers drop Game 2 against the Giants, 5-4",
        "Blue Jays beat Red Sox final score 2-1",
        "Red Sox lost to the Blue Jays, 2-1",
        "Blue Jays at Red Sox is over. Score: 2-1",
        "Phillies win over the Braves, 5-0",
        "Phillies 5 Braves 0 final",
        "Final: Braves lose to the Phillies in the series opener, 5-0",
        "Lightning goaltender pulled, lose to Flyers 4-1",
        "Flyers 4 Lightning 1 final",
        "Flyers win 4-1"]

def prompt(question):
  return f"""
    Answer the following question using the context below.
    Question: {question}
    Context:
  """

# Submits a series of prompts to the Hugging Face API.
# This call can easily be switched to use the OpenAI API (GPT-3), Cohere API or a library like langchain.
def api(prompts):
  response = requests.post("https://api-inference.huggingface.co/models/google/flan-t5-base",
                           json={"inputs": prompts})

  return [x["generated_text"] for x in response.json()]

# Create embeddings model with content support
embeddings = Embeddings({"path": "sentence-transformers/all-MiniLM-L6-v2", "content": True})

# Create RAG instance, submit prompts to the Hugging Face inference API
rag = RAG(embeddings, api)

questions = ["What team won the game?", "What was score?"]

execute = lambda query: rag([(question, query, prompt(question), False) for question in questions], data)

for query in ["Red Sox - Blue Jays", "Phillies - Braves", "Dodgers - Giants", "Flyers - Lightning"]:
    print("----", query, "----")
    for answer in execute(query):
        print(answer)
    print()

---- Red Sox - Blue Jays ----
('What team won the game?', 'Blue Jays')
('What was score?', '2-1')

---- Phillies - Braves ----
('What team won the game?', 'Phillies')
('What was score?', '5-0')

---- Dodgers - Giants ----
('What team won the game?', 'Giants')
('What was score?', '5-4')

---- Flyers - Lightning ----
('What team won the game?', 'Flyers')
('What was score?', '4-1')

Everything matches with first example above in Create Embeddings and RAG instances except the prompts are run as an external API call.

The Embeddings instance can also swap out the vectorization, database and vector store components with external API services. Check out the txtai documentation documentation for more information.

Wrapping up

This notebook covered how to run embeddings-guided and prompt-driven search with LLMs. This functionality is a major step forward towards Generative Semantic Search for txtai. More to come, stay tuned!

Prompt-driven search with LLMs

Embeddings-guided and Prompt-driven search with Large Language Models (LLMs)