We recorded an instructional video for this demo: 📺 bilibili.
What is RAG?
Imagine you’re taking an open-book exam. You are the large language model — capable of understanding the questions and writing the answers.RAG (Retrieval-Augmented Generation) is a framework that allows a large language model (LLM) to first retrieve relevant documents or knowledge before generating answers. The model uses this retrieved information as context to enhance the quality, factuality, and reliability of its output.
However, you can’t possibly remember every piece of knowledge.
Now, you’re allowed to bring a reference book — that’s retrieval.
You look up relevant sections in the book, combine them with your own reasoning, and write an answer that is both accurate and well-grounded.
This process is RAG — Retrieval-Augmented Generation.
Workflow
Retrieval Stage — Retrieve the most relevant content from a document collection (e.g., knowledge base, web pages, etc.) based on the user’s query.\

Why RAG?
- Improves factual accuracy and reduces hallucinations
- Keeps responses up-to-date without retraining the model
- Increases interpretability and trustworthiness
Corpus Encoding and Indexing
Before using RAG, you must first encode your corpus (convert text into vector representations) and build an index.This enables the system to efficiently search through large-scale corpora and retrieve relevant content at query time.
- Embedding — Converts natural language text into numerical vectors so that semantic similarity can be computed mathematically.
- Indexing — Organizes the vectors (e.g., using FAISS) so that the system can quickly retrieve the most relevant documents among millions.

Example Corpus (Wiki Text)
id represents the document identifier and contents contains the text content.We will later encode the
contents and build an index for retrieval.
Writing the Encoding & Indexing Pipeline
Build the Pipeline File
Modify the Parameter File
Run the Pipeline File
We recommend running them in the background using
screen or nohup, for example:
Building the RAG Pipeline
Once the corpus index is ready, the next step is to combine the retriever and LLM into a complete RAG workflow.This allows the system to retrieve relevant documents for a query and then generate the final answer using the model.
Retrieval Process

Generation Process

Data Format (Example: NQ Dataset)
golden_answers), and metadata (meta_data),which serve as input and evaluation reference.
Writing the RAG Pipeline
- Load data
- Initialize retriever and perform search
- Start the LLM service
- Construct the prompt
- Generate the answer
- Extract the final result
- Evaluate the performance