This section will guide you step-by-step to implement the most basic RAG workflow: Vanilla RAG. You will learn how to use UltraRAG to build an inference system with a complete process from data loading, retrieval, generation to evaluation.

Step 1: Define Workflow Structure

The most basic process of Vanilla RAG is as follows:
Data Loading → Document Retrieval → Model Generation → Answer Evaluation
To ensure the retrieval step works properly, please prioritize completing the encoding and index construction of the corpus. For related content, refer to the tutorial: [Encoding and Indexing Large-scale Corpora with UltraRAG]

Step 2: Implement Necessary Tools

In this process, we want the model’s final answer to be wrapped in \boxed{} for subsequent automatic extraction. Therefore, we need to:
  • Customize a Prompt Tool: construct the question and retrieved content into a standardized generation input;
  • Customize a Tool: extract the answer text inside \boxed{};
  • Other modules (retrieval, generation, evaluation) can reuse existing UltraRAG components.

Step 2.1: Build Prompt

First, prepare the prompt template file prompt/qa_boxed.jinja:
/images/jinja.svgprompt/qa_boxed.jinja
Please answer the following question.
Think step by step.
Provide your final answer in the format \boxed{YOUR_ANSWER}.

Question: {{question}}
Then, implement the following Tool in the Prompt Server:
servers/prompt/src/prompt.py
# prompt for QA RAG boxed
@app.prompt(output="q_ls,ret_psg,template->prompt_ls")
def qa_rag_boxed(
    q_ls: List[str], ret_psg: List[str | Any], template: str | Path
) -> list[PromptMessage]:
    template: Template = load_prompt_template(template)
    ret = []
    for q, psg in zip(q_ls, ret_psg):
        passage_text = "\n".join(psg)
        p = template.render(question=q, documents=passage_text)
        ret.append(p)
    return ret

Step 2.2: Extract Answer

To extract the answer wrapped in \boxed{} from the model output, implement the following Tool in the Custom Server:
servers/custom/src/custom.py
@app.tool(output="ans_ls->pred_ls")
def output_extract_from_boxed(ans_ls: List[str]) -> Dict[str, List[str]]:
    def extract(ans: str) -> str:
        start = ans.rfind(r"\boxed{")
        if start == -1:
            content = ans.strip()
        else:
            i = start + len(r"\boxed{")
            brace_level = 1
            end = i
            while end < len(ans) and brace_level > 0:
                if ans[end] == "{":
                    brace_level += 1
                elif ans[end] == "}":
                    brace_level -= 1
                end += 1
            content = ans[i : end - 1].strip()
            content = re.sub(r"^\$+|\$+$", "", content).strip()
            content = re.sub(r"^\\\(|\\\)$", "", content).strip()
            if content.startswith(r"\text{") and content.endswith("}"):
                content = content[len(r"\text{") : -1].strip()
            content = content.strip("()").strip()
        # Restore \\
        content = content.replace("\\", " ")
        content = content.replace("  ", " ")
        return content

    return {"pred_ls": [extract(ans) for ans in ans_ls]}

Step 3: Write Pipeline Configuration File

After completing the above code development, create a new configuration file in the examples/ directory: vanilla_rag.yaml.
/images/yaml.svgexamples/vanilla_rag.yaml
# Vanilla RAG

# MCP Server
servers:
  benchmark: servers/benchmark
  retriever: servers/retriever
  prompt: servers/prompt
  generation: servers/generation
  evaluation: servers/evaluation
  custom: servers/custom

# MCP Client Pipeline
pipeline:
- benchmark.get_data
# If you don't have a deployed retriever, you can replace it with the following two steps:
# - retriever.retriever_init      
# - retriever.retriever_search
- retriever.retriever_deploy_search
- prompt.qa_rag_boxed
- generation.generate
- custom.output_extract_from_boxed
- evaluation.evaluate

Step 4: Configure Pipeline Parameters

Run the following command:
ultrarag build examples/vanilla_rag.yaml
Open the generated examples/parameter/vanilla_rag_parameter.yaml and modify the configuration as follows:
/images/yaml.svgexamples/parameter/vanilla_rag_parameter.yaml
benchmark:
  benchmark:
    key_map:
      gt_ls: golden_answers
      q_ls: question
    limit: 2
    name: asqa
    path: data/sample_asqa_5.jsonl
custom: {}
evaluation:
  metrics:
  - acc
  - f1
  - em
  - coverem
  - stringem
  - rouge-1
  - rouge-2
  - rouge-l
  save_path: output/asqa.json
generation:
  base_url: http://localhost:8000/v1
  model_name: openbmb/MiniCPM4-8B
  sampling_params:
    extra_body:
      chat_template_kwargs:
        enable_thinking: false
      include_stop_str_in_output: true
      top_k: 20
    max_tokens: 2048
    temperature: 0.7
    top_p: 0.8
prompt:
  template: prompt/qa_boxed.jinja
retriever:
  query_instruction: 'Query: '
  retriever_url: http://localhost:8080
  top_k: 5

Step 5: Run Your Inference Pipeline!

Once everything is ready, run the following command to start the inference pipeline:
ultrarag run examples/vanilla_rag.yaml

Evaluation Results

The model prediction results will be automatically evaluated and saved to the save_path configured in the Evaluation Server. For example:
output/asqa.json

Run Logs

Detailed logs during the inference process will be saved in the logs/ directory, with filenames generated based on the run time for easy retrieval and reproduction. For example:
logs/20250804_193900.log

Intermediate Result Files

All intermediate results during the execution of the process (including input and output of each step) will be recorded as memory files, saved by default in the output/ directory, for example:
output/memory_asqa_vanilla_rag_20250804_193900.json