Building a Chat Bot with LLM on your data

In recent years, chatbots have become an integral part of various industries, from customer support to personal assistants. They enable automated interactions with users, providing timely responses and assistance. With the advent of Large Language Models (LLM) like GPT-3 and GPT-4, building chatbots that can understand and generate human-like text has become more accessible and powerful. In this blog post, we will explore the steps to build a chatbot using an LLM with your data.

Large Language Models are deep learning models trained on massive amounts of text data, enabling them to understand and generate human-like text. They have the ability to complete sentences, generate paragraphs, and even hold coherent conversations with users.

But they are not trained on your data, your documents, your information.

A solution that is commonly used to solve the above problem is RAG

Retrieval Augmented Generation (RAG) is an exciting and innovative approach in the field of natural language processing (NLP) that combines the power of both retrieval-based and generative models.

RAG can be used for various tasks like question-answering, text generation, and more. Let's delve into the details, steps, and relevant concepts of RAG.

What is RAG?

RAG is a framework that combines two key components:

Retrieval Component: This part focuses on retrieving relevant information from a large corpus of text. It uses information retrieval techniques to find the most relevant documents or passages related to a given query.
Generation Component: Once the relevant documents or passages are retrieved, the generation component takes over. It uses generative models, like GPT-3 or GPT-4, to generate human-like text based on the retrieved information.

This simple approach is able to produce great results.

For the Retrieval Component to work, you need to have the documents on which you want LLM to work.

If you have a very small document, you can send that complete document in the API call to LLM. But if your document is big enough or you have multiple documents, you have to select the best documents that you can send to LLM.

Some basic steps that are needed to select the best documents.

Load documents
Split documents
Create embeddings for documents (using a Text Embedding Model)
Store documents and embeddings in a vectorstore
Query the vectorstore and select the best matching documents.

Now your documents can be present in PDF files, doc files, CSV or databases, etc, so you might need different types of document loaders, frameworks like langchain can help.

Even a single document file, can be big enough, but we need a way to split the file into multiple parts, we can use approaches like:

splitting the documents based on page numbers
splitting on word count (each split containing 20 words)
creating a split of each line
If your data is in csv, creating a split of each row.

Now you have loaded the documents, and splitted the docs, you need to find the embeddings and save them in a place, so that you can retreive the most similar documents easily.

What are embeddings

An embedding is a vector (list) of floating point numbers.

Text embeddings measure the relatedness of text strings. Embeddings are commonly used for:

Search (where results are ranked by relevance to a query string)
Clustering (where text strings are grouped by similarity)
Recommendations (where items with related text strings are recommended)
Anomaly detection (where outliers with little relatedness are identified)
Diversity measurement (where similarity distributions are analyzed)
Classification (where text strings are classified by their most similar label)

The distance between two vectors measures their relatedness. Small distances suggest high relatedness and large distances suggest low relatedness

For measuring distance you can use cosine angle between two vectors

For calculating the embedding, you can use different models. Few options can be

Open AI text embeddings
Sbert text embeddings

SBERT model is an open source that you can deploy on your own, and use without paying any cost per API call.

But this model truncates the input sequence length to 256 tokens

Now you have the embeddings of the input documents. You have to save the embeddings somewhere. Few options include

Keeping embedding in memory
Using a vector database

Vector database options include:
- Chroma, an open-source embeddings store
- Milvus, a vector database built for scalable similarity search
- Pinecone, a fully managed vector database
- Qdrant, a vector search engine
- Redis as a vector database
- Typesense, fast open source vector search
- Weaviate, an open-source vector search engine
- Zilliz, data infrastructure, powered by Milvus

If the number of embeddings are small, you can keep them in-memory also.

When you receive user input, you have to calculate the embedding and search for the best-matching documents in the vector store, Then we have to send these documents as context, and user query to LLM with a prompt.

A simple prompt can be

You are an AI assistant. Follow the user's requirements carefully, and try to answer the question from the following context information only. If the user query is not present in the context information, Say "Sorry" and end the chat.

Context Information: <Best matching documents>

User Query: <user question>

Complete Diagram of the flow:

If you liked this blog, you can follow me on twitter, and learn something new with me.