分支型 RAG Workflow

本节将带你实现一个具备分支决策能力的 RAG 推理流程：Search-o1。

论文：https://arxiv.org/pdf/2501.05366

Search-o1 的核心思想是让大模型在推理过程中自主判断何时缺乏知识，并主动生成搜索查询以调用外部检索模块获取补充文档。随后通过设计的 Reason-in-Documents 模块对冗长检索结果进行“分析—精炼”，提取有用信息再注入后续推理，从而减少噪声干扰。

Step 1：明确工作流结构

我们首先回顾 Search-o1 算法流程图：

在 UltraRAG 中，我们将该算法结构化为如下模块：

其中 prompt、router 和 custom 模块中涉及的 Tool 需要由用户自定义实现；其余组件（如生成、检索、评估）可复用 UltraRAG 提供的标准 Server。

Step 2：实现必要Tool

Step 2.1：实现 Prompt Server

在 Search-o1 推理流程中，Prompt Server 主要承担两个任务：

构建初始化提示语（带有搜索能力说明）
组织“搜索 → 推理”的交替流程，包括插入搜索结果和文档分析

以下是需要实现的 3 个工具函数及其配套模板。 search_o1_init：初始化推理模板 该工具用于构造初始模型提示语，引导模型理解其拥有“搜索工具”，并说明使用规则。首先，在 prompt/search_o1_reasoning.jinja 中定义提示模板，样例如下：

prompt/search_o1_reasoning.jinja

You are a reasoning assistant with the ability to perform web searches to help you answer the user’s question
accurately. You have special tools:
To perform a search: write <|begin_search_query|> your query here <|end_search_query|>.
Then, the system will search and analyze relevant web pages, then provide you with helpful information in the
format <|begin_search_result|> ...search results... <|end_search_result|>.
You can repeat the search process multiple times if necessary. The maximum number of search attempts is
limited to {{MAX_SEARCH_LIMIT}}.
Once you have all the information you need, continue your reasoning.
Example:
Question: “...”
Assistant thinking steps:
- I might need to look up details about ...
Assistant:
<|begin_search_query|>...<|end_search_query|>
(System returns processed information from relevant web pages)
Assistant continues reasoning with the new information...
Remember:
- Use <|begin_search_query|> to request a web search and end with <|end_search_query|>.
- When done searching, continue your reasoning.
Please answer the following question. You should think step by step to solve it.\n\n
Provide your final answer in the format \\boxed{YOUR_ANSWER}.\n\n
Question:\n{{question}}\n\n

对应工具函数：

servers/prompt/src/prompt.py

@app.prompt(output="q_ls, template -> prompt_ls")
def search_o1_init(
    q_ls: List[str],
    template: str | Path,
) -> List[PromptMessage]:
    template: Template = load_prompt_template(template)
    # 目前这个变量写固定了
    MAX_SEARCH_LIMIT = 10
    ret = []
    for q in q_ls:
        p = template.render(question=q, MAX_SEARCH_LIMIT=MAX_SEARCH_LIMIT)
        ret.append(p)
    return ret

searcho1_reasoning_indocument：阅读文档后再推理 该工具用于将搜索结果注入模型输入中，帮助模型对检索结果进行分析并继续 reasoning。请新建模板 prompt/search_o1_refinement.jinja，示例如下：

prompt/search_o1_refinement.jinja

**Task Instruction:**

You are tasked with reading and analyzing web pages based on the following inputs: **Previous Reasoning Steps**, **Current Search Query**, and **Searched Web Pages**. Your objective is to extract relevant and helpful information for **Current Search Query** from the **Searched Web Pages** and seamlessly integrate this information into the **Previous Reasoning Steps** to continue reasoning for the original question.

**Guidelines:**

1. **Analyze the Searched Web Pages:**
- Carefully review the content of each searched web page.
- Identify factual information that is relevant to the **Current Search Query** and can aid in the reasoning process for the original question.

2. **Extract Relevant Information:**
- Select the information from the Searched Web Pages that directly contributes to advancing the **Previous Reasoning Steps**.
- Ensure that the extracted information is accurate and relevant.

3. **Output Format:**
- **If the web pages provide helpful information for current search query:** Present the information beginning with `**Final Information**` as shown below.
**Final Information**

[Helpful information]

- **If the web pages do not provide any helpful information for current search query:** Output the following text.

**Final Information**

No helpful information found.

**Inputs:**
- **Previous Reasoning Steps:**  
{{prev_reasoning}}

- **Current Search Query:**  
{{search_query}}

- **Searched Web Pages:**  
{{document}}

Now you should analyze each web page and find helpful information based on the current search query "{{search_query}}" and previous reasoning steps.

对应工具函数实现如下：

servers/prompt/src/prompt.py

@app.prompt(
    output="prompt_ls,extract_query_list,ret_psg,reasoning_indoc_template->prompt_ls"
)
def searcho1_reasoning_indocument(
    prompt_ls: List[PromptMessage],
    extract_query_list: List[str],
    ret_psg: List[str | Any],
    template: str | Path,
) -> List[PromptMessage]:
    template: Template = load_prompt_template(template)
    ret = []
    for prompt, squery, psg in zip(prompt_ls, extract_query_list, ret_psg):
        # passages = [psg[index]["segment"] for index in range(min(5, len(psg)))]
        passages = psg[:3]
        passage_text = "\n".join(passages)
        _pro = prompt.content.text
        p = template.render(
            prev_reasoning=_pro, search_query=squery, document=passage_text
        )
        ret.append(p)
    return ret

为了让该 tool 使用独立的模板参数 reasoning_indoc_template，请不要忘了在 servers/prompt/parameter.yaml 中显式注册：

servers/prompt/parameter.yaml

template: prompt/qa_boxed.jinja
reasoning_indoc_template: prompt/qa_boxed.jinja

这样可以避免多个 prompt 工具复用相同的模板参数导致 build 后配置被覆盖的问题。 search_o1_insert：将文档作为搜索结果插入推理上下文 该工具不依赖模板，直接将搜索结果作为 <|begin_search_result|>…<|end_search_result|> 内容追加到现有 prompt 中：

servers/prompt/src/prompt.py

@app.prompt(output="prompt_ls,ans_ls->prompt_ls")
def search_o1_insert(
    prompt_ls: List[PromptMessage],
    ans_ls: List[str],
) -> List[PromptMessage]:
    ret = []
    for prompt, ans in zip(prompt_ls, ans_ls):
        _pro = prompt.content.text
        p = _pro + "<|begin_search_result|>" + ans + "<|end_search_result|>"
        ret.append(p)
    return ret

Step 2.2：实现 Router Server

在 servers/router/src/router.py 中添加：

servers/router/src/router.py

@app.tool(output="ans_ls->ans_ls")
def search_o1_check(ans_ls: List[str]) -> Dict[str, List[Dict[str, str]]]:
    def get_eos(text):
        # 如果包含特殊终止 token，则直接返回 True
        if "<|im_end|>" in text:
            return True
        elif "<|end_search_query|>" in text:
            return False

    ans_ls = [
        {
            "data": answer,
            "state": "stop" if get_eos(answer) else "retrieve",
        }
        for answer in ans_ls
    ]
    return {"ans_ls": ans_ls}

判断是否继续检索，核心依据是是否命中 <|im_end|> 或 <|end_search_query|> 等终止符。

Step 2.3：实现 Custom Server

在 servers/custom/src/custom.py 中添加：

servers/custom/src/custom.py

@app.tool(output="ans_ls->extract_query_list")
def search_r1_query_extract(ans_ls: List[str]) -> Dict[str, List[str]]:

    def get_query(text):
        import re

        # 匹配最后一个 <search> 及其后面的内容
        pattern = re.compile(r"<search>([^<]*)", re.DOTALL)
        matches = pattern.findall(text)

        if matches:
            query = matches[-1].strip()
            if not query.endswith("?"):
                query += "?"
            return query
        else:
            return "There is no query."

    query = [get_query(answer) for answer in ans_ls]

    return {"extract_query_list": query}

提取生成文本中的 <search> 标签内的查询字符串。

Step 3：编写 Pipeline 配置文件

按照上述流程，在 examples/ 下新建一个 search_o1.yaml：

examples/search_o1.yaml

# Search-o1 demo

# MCP Servers
servers:
  benchmark: servers/benchmark
  generation: servers/generation
  retriever: servers/retriever
  prompt: servers/prompt
  evaluation: servers/evaluation
  router: servers/router
  custom: servers/custom

pipeline:
# 1. 加载数据集
- benchmark.get_data
# 2. 初始化 (构造 instruction + 问题)
- prompt.search_o1_init
# 3. 首轮初始生成 (得到初始 reasoning / 可能的第一个 <SEARCH_QUERY>)
- generation.generate
# 4. 循环：多轮搜索 + 推理
- loop:
    times: 3
    steps:
    - branch:
        router:
        - router.search_o1_check # 输出 e.g. status=incomplete/complete
        branches:
          retrieve:
          # 4.1 抽取最新需要执行的查询（若没有则在 need_search_check 中应返回 complete）
          - custom.search_o1_query_extract
          # 4.3 执行检索 (将抽取的 query 列表传入)
          - retriever.retriever_deploy_search:
              input:
                query_list: extract_query_list
          # 4.4 reasoning_indocument
          - prompt.searcho1_reasoning_indocument
          - generation.generate
          # 4.5 追加 <BEGIN_SEARCH_RESULT> …）
          - prompt.search_o1_insert
          # 4.6 生成新一轮 reasoning / 可能新的查询
          - generation.generate
          stop: []
# 5. 评估 (使用最终答案 / reasoning)
- evaluation.evaluate:
    input:
      pred_ls: ans_ls

Step 4：配置 Pipeline 参数

执行下列命令：

ultrarag build examples/search_o1.yaml

打开生成的 examples/parameter/search_o1.yaml，可根据需求修改 benchmark、retriever、generation 等参数配置，或提前在各 Server 下的 parameter.yaml 文件中设置默认值再进行 build。

examples/parameter/search_o1.yaml

benchmark:
  benchmark:
    key_map:
      gt_ls: golden_answers
      q_ls: question
    limit: 2
    name: asqa
    path: data/sample_asqa_5.jsonl
custom: {}
evaluation:
  metrics:
  - acc
  - f1
  - em
  - coverem
  - stringem
  - rouge-1
  - rouge-2
  - rouge-l
  save_path: output/asqa.json
generation:
  base_url: http://localhost:8000/v1
  model_name: openbmb/MiniCPM4-8B
  sampling_params:
    extra_body:
      chat_template_kwargs:
        enable_thinking: false
      include_stop_str_in_output: true
      top_k: 20
    max_tokens: 2048
    temperature: 0.7
    top_p: 0.8
prompt:
  reasoning_indoc_template: prompt/search_o1_refinement.jinja
  template: prompt/search_o1_reasoning.jinja
retriever:
  query_instruction: 'Query: '
  retriever_url: http://localhost:8080
  top_k: 5

Step 4：运行你的推理流程！

一切准备就绪后，执行：

ultrarag run examples/search_o1.yaml

开始使用

开发指南

Step 1：明确工作流结构

Step 2：实现必要Tool

Step 2.1：实现 Prompt Server

Step 2.2：实现 Router Server

Step 2.3：实现 Custom Server

Step 3：编写 Pipeline 配置文件

Step 4：配置 Pipeline 参数

Step 4：运行你的推理流程！

开始使用

开发指南

​Step 1：明确工作流结构

​Step 2：实现必要Tool

​Step 2.1：实现 Prompt Server

​Step 2.2：实现 Router Server

​Step 2.3：实现 Custom Server

​Step 3：编写 Pipeline 配置文件

​Step 4：配置 Pipeline 参数

​Step 4：运行你的推理流程！

Step 1：明确工作流结构

Step 2：实现必要Tool

Step 2.1：实现 Prompt Server

Step 2.2：实现 Router Server

Step 2.3：实现 Custom Server

Step 3：编写 Pipeline 配置文件

Step 4：配置 Pipeline 参数

Step 4：运行你的推理流程！