Branch-based RAG Workflow

This section will guide you to implement a RAG reasoning workflow with branching decision capability: Search-o1.

Paper: https://arxiv.org/pdf/2501.05366

The core idea of Search-o1 is to let the LLM autonomously decide when it lacks knowledge during reasoning, proactively generate search queries to call an external retrieval module for supplementary documents, and then—via a Reason-in-Documents module—analyze and refine long retrieval results, extract useful information, and inject it back into subsequent reasoning to reduce noise.

Step 1: Define Workflow Structure

First, let’s review the Search-o1 algorithm flowchart:

In UltraRAG, we structure this algorithm into the following modules:

The Tools involved in the prompt, router, and custom modules need to be implemented by you; other components (e.g., generation, retrieval, evaluation) can reuse UltraRAG’s standard Servers.

Step 2: Implement Necessary Tools

Step 2.1: Implement Prompt Server

In the Search-o1 workflow, the Prompt Server mainly undertakes two tasks:

Build the initial prompt (with search capability instructions)
Orchestrate the alternating process of search → reasoning, including inserting search results and performing document analysis

Below are the three tool functions to implement and their corresponding templates. search_o1_init: Initialize reasoning template This tool constructs the initial model prompt, guiding the model to understand it has a “search tool” and how to use it. First, define the template in prompt/search_o1_reasoning.jinja:

prompt/search_o1_reasoning.jinja

You are a reasoning assistant with the ability to perform web searches to help you answer the user’s question
accurately. You have special tools:
To perform a search: write <|begin_search_query|> your query here <|end_search_query|>.
Then, the system will search and analyze relevant web pages, then provide you with helpful information in the
format <|begin_search_result|> ...search results... <|end_search_result|>.
You can repeat the search process multiple times if necessary. The maximum number of search attempts is
limited to {{MAX_SEARCH_LIMIT}}.
Once you have all the information you need, continue your reasoning.
Example:
Question: “...”
Assistant thinking steps:
- I might need to look up details about ...
Assistant:
<|begin_search_query|>...<|end_search_query|>
(System returns processed information from relevant web pages)
Assistant continues reasoning with the new information...
Remember:
- Use <|begin_search_query|> to request a web search and end with <|end_search_query|>.
- When done searching, continue your reasoning.
Please answer the following question. You should think step by step to solve it.\n\n
Provide your final answer in the format \\boxed{YOUR_ANSWER}.\n\n
Question:\n{{question}}\n\n

Corresponding tool function:

servers/prompt/src/prompt.py

@app.prompt(output="q_ls, template -> prompt_ls")
def search_o1_init(
    q_ls: List[str],
    template: str | Path,
) -> List[PromptMessage]:
    template: Template = load_prompt_template(template)
    # This value is currently fixed
    MAX_SEARCH_LIMIT = 10
    ret = []
    for q in q_ls:
        p = template.render(question=q, MAX_SEARCH_LIMIT=MAX_SEARCH_LIMIT)
        ret.append(p)
    return ret

searcho1_reasoning_indocument: Reason after reading documents This tool injects search results into the model input to help the model analyze retrieved results and continue reasoning. Create the template prompt/search_o1_refinement.jinja:

prompt/search_o1_refinement.jinja

**Task Instruction:**

You are tasked with reading and analyzing web pages based on the following inputs: **Previous Reasoning Steps**, **Current Search Query**, and **Searched Web Pages**. Your objective is to extract relevant and helpful information for **Current Search Query** from the **Searched Web Pages** and seamlessly integrate this information into the **Previous Reasoning Steps** to continue reasoning for the original question.

**Guidelines:**

1. **Analyze the Searched Web Pages:**
- Carefully review the content of each searched web page.
- Identify factual information that is relevant to the **Current Search Query** and can aid in the reasoning process for the original question.

2. **Extract Relevant Information:**
- Select the information from the Searched Web Pages that directly contributes to advancing the **Previous Reasoning Steps**.
- Ensure that the extracted information is accurate and relevant.

3. **Output Format:**
- **If the web pages provide helpful information for current search query:** Present the information beginning with `**Final Information**` as shown below.
**Final Information**

[Helpful information]

- **If the web pages do not provide any helpful information for current search query:** Output the following text.

**Final Information**

No helpful information found.

**Inputs:**
- **Previous Reasoning Steps:**  
{{prev_reasoning}}

- **Current Search Query:**  
{{search_query}}

- **Searched Web Pages:**  
{{document}}

Now you should analyze each web page and find helpful information based on the current search query "{{search_query}}" and previous reasoning steps.

The corresponding tool function implementation:

servers/prompt/src/prompt.py

@app.prompt(
    output="prompt_ls,extract_query_list,ret_psg,reasoning_indoc_template->prompt_ls"
)
def searcho1_reasoning_indocument(
    prompt_ls: List[PromptMessage],
    extract_query_list: List[str],
    ret_psg: List[str | Any],
    template: str | Path,
) -> List[PromptMessage]:
    template: Template = load_prompt_template(template)
    ret = []
    for prompt, squery, psg in zip(prompt_ls, extract_query_list, ret_psg):
        # passages = [psg[index]["segment"] for index in range(min(5, len(psg)))]
        passages = psg[:3]
        passage_text = "\n".join(passages)
        _pro = prompt.content.text
        p = template.render(
            prev_reasoning=_pro, search_query=squery, document=passage_text
        )
        ret.append(p)
    return ret

To allow this tool to use an independent template parameter reasoning_indoc_template, remember to explicitly register it in servers/prompt/parameter.yaml:

servers/prompt/parameter.yaml

template: prompt/qa_boxed.jinja
reasoning_indoc_template: prompt/qa_boxed.jinja

This avoids configuration overwriting after build when multiple prompt tools share the same template parameter. search_o1_insert: Insert documents as search results into the reasoning context This tool does not depend on a template; it directly appends the search results as <|begin_search_result|>…<|end_search_result|> to the existing prompt:

servers/prompt/src/prompt.py

@app.prompt(output="prompt_ls,ans_ls->prompt_ls")
def search_o1_insert(
    prompt_ls: List[PromptMessage],
    ans_ls: List[str],
) -> List[PromptMessage]:
    ret = []
    for prompt, ans in zip(prompt_ls, ans_ls):
        _pro = prompt.content.text
        p = _pro + "<|begin_search_result|>" + ans + "<|end_search_result|>"
        ret.append(p)
    return ret

Step 2.2: Implement Router Server

Add in servers/router/src/router.py:

servers/router/src/router.py

@app.tool(output="ans_ls->ans_ls")
def search_o1_check(ans_ls: List[str]) -> Dict[str, List[Dict[str, str]]]:
    def get_eos(text):
        # If a special end token is present, return True
        if "<|im_end|>" in text:
            return True
        elif "<|end_search_query|>" in text:
            return False

    ans_ls = [
        {
            "data": answer,
            "state": "stop" if get_eos(answer) else "retrieve",
        }
        for answer in ans_ls
    ]
    return {"ans_ls": ans_ls}

Whether to continue retrieval is decided by whether the end tokens <|im_end|> or <|end_search_query|> are present.

Step 2.3: Implement Custom Server

Add in servers/custom/src/custom.py:

servers/custom/src/custom.py

@app.tool(output="ans_ls->extract_query_list")
def search_r1_query_extract(ans_ls: List[str]) -> Dict[str, List[str]]:

    def get_query(text):
        import re

        # Match the last <search> tag and the content after it
        pattern = re.compile(r"<search>([^<]*)", re.DOTALL)
        matches = pattern.findall(text)

        if matches:
            query = matches[-1].strip()
            if not query.endswith("?"):
                query += "?"
            return query
        else:
            return "There is no query."

    query = [get_query(answer) for answer in ans_ls]

    return {"extract_query_list": query}

Extracts the query string inside the <search> tag from the generated text.

Step 3: Write the Pipeline Configuration

Following the above, create examples/search_o1.yaml:

examples/search_o1.yaml

# Search-o1 demo

# MCP Servers
servers:
  benchmark: servers/benchmark
  generation: servers/generation
  retriever: servers/retriever
  prompt: servers/prompt
  evaluation: servers/evaluation
  router: servers/router
  custom: servers/custom

pipeline:
# 1. Load dataset
- benchmark.get_data
# 2. Initialize (construct instruction + question)
- prompt.search_o1_init
# 3. First-round initial generation (get initial reasoning / possible first <SEARCH_QUERY>)
- generation.generate
# 4. Loop: multi-round search + reasoning
- loop:
    times: 3
    steps:
    - branch:
        router:
        - router.search_o1_check # Output e.g. status=incomplete/complete
        branches:
          retrieve:
          # 4.1 Extract the latest query to execute (if none, need_search_check should return complete)
          - custom.search_o1_query_extract
          # 4.3 Perform retrieval (pass the extracted query list)
          - retriever.retriever_deploy_search:
              input:
                query_list: extract_query_list
          # 4.4 reasoning_indocument
          - prompt.searcho1_reasoning_indocument
          - generation.generate
          # 4.5 Append <BEGIN_SEARCH_RESULT> …)
          - prompt.search_o1_insert
          # 4.6 Generate a new round of reasoning / possibly a new query
          - generation.generate
          stop: []
# 5. Evaluation (use final answer / reasoning)
- evaluation.evaluate:
    input:
      pred_ls: ans_ls

Step 4: Configure Pipeline Parameters

Run:

ultrarag build examples/search_o1.yaml

Open the generated examples/parameter/search_o1.yaml and modify benchmark/retriever/generation as needed (or set defaults in each Server’s parameter.yaml before build):

examples/parameter/search_o1.yaml

benchmark:
  benchmark:
    key_map:
      gt_ls: golden_answers
      q_ls: question
    limit: 2
    name: asqa
    path: data/sample_asqa_5.jsonl
custom: {}
evaluation:
  metrics:
  - acc
  - f1
  - em
  - coverem
  - stringem
  - rouge-1
  - rouge-2
  - rouge-l
  save_path: output/asqa.json
generation:
  base_url: http://localhost:8000/v1
  model_name: openbmb/MiniCPM4-8B
  sampling_params:
    extra_body:
      chat_template_kwargs:
        enable_thinking: false
      include_stop_str_in_output: true
      top_k: 20
    max_tokens: 2048
    temperature: 0.7
    top_p: 0.8
prompt:
  reasoning_indoc_template: prompt/search_o1_refinement.jinja
  template: prompt/search_o1_reasoning.jinja
retriever:
  query_instruction: 'Query: '
  retriever_url: http://localhost:8080
  top_k: 5

Step 5: Run Your Reasoning Workflow

Once everything is ready, execute:

ultrarag run examples/search_o1.yaml

Getting Started

Developer Guide

Branch-based RAG Workflow

Step 1: Define Workflow Structure

Step 2: Implement Necessary Tools

Step 2.1: Implement Prompt Server

Step 2.2: Implement Router Server

Step 2.3: Implement Custom Server

Step 3: Write the Pipeline Configuration

Step 4: Configure Pipeline Parameters

Step 5: Run Your Reasoning Workflow

Getting Started

Developer Guide

​Step 1: Define Workflow Structure

​Step 2: Implement Necessary Tools

​Step 2.1: Implement Prompt Server

​Step 2.2: Implement Router Server

​Step 2.3: Implement Custom Server

​Step 3: Write the Pipeline Configuration

​Step 4: Configure Pipeline Parameters

​Step 5: Run Your Reasoning Workflow

Step 1: Define Workflow Structure

Step 2: Implement Necessary Tools

Step 2.1: Implement Prompt Server

Step 2.2: Implement Router Server

Step 2.3: Implement Custom Server

Step 3: Write the Pipeline Configuration

Step 4: Configure Pipeline Parameters

Step 5: Run Your Reasoning Workflow