本节将带你实现一个具备分支决策能力的 RAG 推理流程:Search-o1
论文:https://arxiv.org/pdf/2501.05366
Search-o1 的核心思想是让大模型在推理过程中自主判断何时缺乏知识,并主动生成搜索查询以调用外部检索模块获取补充文档。随后通过设计的 Reason-in-Documents 模块对冗长检索结果进行“分析—精炼”,提取有用信息再注入后续推理,从而减少噪声干扰。

Step 1:明确工作流结构

我们首先回顾 Search-o1 算法流程图: 在 UltraRAG 中,我们将该算法结构化为如下模块: 其中 prompt、router 和 custom 模块中涉及的 Tool 需要由用户自定义实现;其余组件(如生成、检索、评估)可复用 UltraRAG 提供的标准 Server。

Step 2:实现必要Tool

Step 2.1:实现 Prompt Server

在 Search-o1 推理流程中,Prompt Server 主要承担两个任务:
  1. 构建初始化提示语(带有搜索能力说明)
  2. 组织“搜索 → 推理”的交替流程,包括插入搜索结果和文档分析
以下是需要实现的 3 个工具函数及其配套模板。 search_o1_init:初始化推理模板 该工具用于构造初始模型提示语,引导模型理解其拥有“搜索工具”,并说明使用规则。 首先,在 prompt/search_o1_reasoning.jinja 中定义提示模板,样例如下:
/images/jinja.svgprompt/search_o1_reasoning.jinja
You are a reasoning assistant with the ability to perform web searches to help you answer the user’s question
accurately. You have special tools:
To perform a search: write <|begin_search_query|> your query here <|end_search_query|>.
Then, the system will search and analyze relevant web pages, then provide you with helpful information in the
format <|begin_search_result|> ...search results... <|end_search_result|>.
You can repeat the search process multiple times if necessary. The maximum number of search attempts is
limited to {{MAX_SEARCH_LIMIT}}.
Once you have all the information you need, continue your reasoning.
Example:
Question: “...”
Assistant thinking steps:
- I might need to look up details about ...
Assistant:
<|begin_search_query|>...<|end_search_query|>
(System returns processed information from relevant web pages)
Assistant continues reasoning with the new information...
Remember:
- Use <|begin_search_query|> to request a web search and end with <|end_search_query|>.
- When done searching, continue your reasoning.
Please answer the following question. You should think step by step to solve it.\n\n
Provide your final answer in the format \\boxed{YOUR_ANSWER}.\n\n
Question:\n{{question}}\n\n
对应工具函数:
servers/prompt/src/prompt.py
@app.prompt(output="q_ls, template -> prompt_ls")
def search_o1_init(
    q_ls: List[str],
    template: str | Path,
) -> List[PromptMessage]:
    template: Template = load_prompt_template(template)
    # 目前这个变量写固定了
    MAX_SEARCH_LIMIT = 10
    ret = []
    for q in q_ls:
        p = template.render(question=q, MAX_SEARCH_LIMIT=MAX_SEARCH_LIMIT)
        ret.append(p)
    return ret
searcho1_reasoning_indocument:阅读文档后再推理 该工具用于将搜索结果注入模型输入中,帮助模型对检索结果进行分析并继续 reasoning。 请新建模板 prompt/search_o1_refinement.jinja,示例如下:
/images/jinja.svgprompt/search_o1_refinement.jinja
**Task Instruction:**

You are tasked with reading and analyzing web pages based on the following inputs: **Previous Reasoning Steps**, **Current Search Query**, and **Searched Web Pages**. Your objective is to extract relevant and helpful information for **Current Search Query** from the **Searched Web Pages** and seamlessly integrate this information into the **Previous Reasoning Steps** to continue reasoning for the original question.

**Guidelines:**

1. **Analyze the Searched Web Pages:**
- Carefully review the content of each searched web page.
- Identify factual information that is relevant to the **Current Search Query** and can aid in the reasoning process for the original question.

2. **Extract Relevant Information:**
- Select the information from the Searched Web Pages that directly contributes to advancing the **Previous Reasoning Steps**.
- Ensure that the extracted information is accurate and relevant.

3. **Output Format:**
- **If the web pages provide helpful information for current search query:** Present the information beginning with `**Final Information**` as shown below.
**Final Information**

[Helpful information]

- **If the web pages do not provide any helpful information for current search query:** Output the following text.

**Final Information**

No helpful information found.

**Inputs:**
- **Previous Reasoning Steps:**  
{{prev_reasoning}}

- **Current Search Query:**  
{{search_query}}

- **Searched Web Pages:**  
{{document}}

Now you should analyze each web page and find helpful information based on the current search query "{{search_query}}" and previous reasoning steps.
对应工具函数实现如下:
servers/prompt/src/prompt.py
@app.prompt(
    output="prompt_ls,extract_query_list,ret_psg,reasoning_indoc_template->prompt_ls"
)
def searcho1_reasoning_indocument(
    prompt_ls: List[PromptMessage],
    extract_query_list: List[str],
    ret_psg: List[str | Any],
    template: str | Path,
) -> List[PromptMessage]:
    template: Template = load_prompt_template(template)
    ret = []
    for prompt, squery, psg in zip(prompt_ls, extract_query_list, ret_psg):
        # passages = [psg[index]["segment"] for index in range(min(5, len(psg)))]
        passages = psg[:3]
        passage_text = "\n".join(passages)
        _pro = prompt.content.text
        p = template.render(
            prev_reasoning=_pro, search_query=squery, document=passage_text
        )
        ret.append(p)
    return ret
为了让该 tool 使用独立的模板参数 reasoning_indoc_template,请不要忘了在 servers/prompt/parameter.yaml 中显式注册:
/images/yaml.svgservers/prompt/parameter.yaml
template: prompt/qa_boxed.jinja
reasoning_indoc_template: prompt/qa_boxed.jinja
这样可以避免多个 prompt 工具复用相同的模板参数导致 build 后配置被覆盖的问题。 search_o1_insert:将文档作为搜索结果插入推理上下文 该工具不依赖模板,直接将搜索结果作为 <|begin_search_result|><|end_search_result|> 内容追加到现有 prompt 中:
servers/prompt/src/prompt.py
@app.prompt(output="prompt_ls,ans_ls->prompt_ls")
def search_o1_insert(
    prompt_ls: List[PromptMessage],
    ans_ls: List[str],
) -> List[PromptMessage]:
    ret = []
    for prompt, ans in zip(prompt_ls, ans_ls):
        _pro = prompt.content.text
        p = _pro + "<|begin_search_result|>" + ans + "<|end_search_result|>"
        ret.append(p)
    return ret

Step 2.2:实现 Router Server

servers/router/src/router.py 中添加:
servers/router/src/router.py
@app.tool(output="ans_ls->ans_ls")
def search_o1_check(ans_ls: List[str]) -> Dict[str, List[Dict[str, str]]]:
    def get_eos(text):
        # 如果包含特殊终止 token,则直接返回 True
        if "<|im_end|>" in text:
            return True
        elif "<|end_search_query|>" in text:
            return False

    ans_ls = [
        {
            "data": answer,
            "state": "stop" if get_eos(answer) else "retrieve",
        }
        for answer in ans_ls
    ]
    return {"ans_ls": ans_ls}
判断是否继续检索,核心依据是是否命中 <|im_end|><|end_search_query|> 等终止符。

Step 2.3:实现 Custom Server

servers/custom/src/custom.py 中添加:
servers/custom/src/custom.py
@app.tool(output="ans_ls->extract_query_list")
def search_r1_query_extract(ans_ls: List[str]) -> Dict[str, List[str]]:

    def get_query(text):
        import re

        # 匹配最后一个 <search> 及其后面的内容
        pattern = re.compile(r"<search>([^<]*)", re.DOTALL)
        matches = pattern.findall(text)

        if matches:
            query = matches[-1].strip()
            if not query.endswith("?"):
                query += "?"
            return query
        else:
            return "There is no query."

    query = [get_query(answer) for answer in ans_ls]

    return {"extract_query_list": query}
提取生成文本中的 <search> 标签内的查询字符串。

Step 3:编写 Pipeline 配置文件

按照上述流程,在 examples/ 下新建一个 search_o1.yaml
/images/yaml.svgexamples/search_o1.yaml
# Search-o1 demo

# MCP Servers
servers:
  benchmark: servers/benchmark
  generation: servers/generation
  retriever: servers/retriever
  prompt: servers/prompt
  evaluation: servers/evaluation
  router: servers/router
  custom: servers/custom

pipeline:
# 1. 加载数据集
- benchmark.get_data
# 2. 初始化 (构造 instruction + 问题)
- prompt.search_o1_init
# 3. 首轮初始生成 (得到初始 reasoning / 可能的第一个 <SEARCH_QUERY>)
- generation.generate
# 4. 循环:多轮搜索 + 推理
- loop:
    times: 3
    steps:
    - branch:
        router:
        - router.search_o1_check # 输出 e.g. status=incomplete/complete
        branches:
          retrieve:
          # 4.1 抽取最新需要执行的查询(若没有则在 need_search_check 中应返回 complete)
          - custom.search_o1_query_extract
          # 4.3 执行检索 (将抽取的 query 列表传入)
          - retriever.retriever_deploy_search:
              input:
                query_list: extract_query_list
          # 4.4 reasoning_indocument
          - prompt.searcho1_reasoning_indocument
          - generation.generate
          # 4.5 追加 <BEGIN_SEARCH_RESULT> …)
          - prompt.search_o1_insert
          # 4.6 生成新一轮 reasoning / 可能新的查询
          - generation.generate
          stop: []
# 5. 评估 (使用最终答案 / reasoning)
- evaluation.evaluate:
    input:
      pred_ls: ans_ls

Step 4:配置 Pipeline 参数

执行下列命令:
ultrarag build examples/search_o1.yaml
打开生成的 examples/parameter/search_o1.yaml,可根据需求修改 benchmark、retriever、generation 等参数配置,或提前在各 Server 下的 parameter.yaml 文件中设置默认值再进行 build。
/images/yaml.svgexamples/parameter/search_o1.yaml
benchmark:
  benchmark:
    key_map:
      gt_ls: golden_answers
      q_ls: question
    limit: 2
    name: asqa
    path: data/sample_asqa_5.jsonl
custom: {}
evaluation:
  metrics:
  - acc
  - f1
  - em
  - coverem
  - stringem
  - rouge-1
  - rouge-2
  - rouge-l
  save_path: output/asqa.json
generation:
  base_url: http://localhost:8000/v1
  model_name: openbmb/MiniCPM4-8B
  sampling_params:
    extra_body:
      chat_template_kwargs:
        enable_thinking: false
      include_stop_str_in_output: true
      top_k: 20
    max_tokens: 2048
    temperature: 0.7
    top_p: 0.8
prompt:
  reasoning_indoc_template: prompt/search_o1_refinement.jinja
  template: prompt/search_o1_reasoning.jinja
retriever:
  query_instruction: 'Query: '
  retriever_url: http://localhost:8080
  top_k: 5

Step 4:运行你的推理流程!

一切准备就绪后,执行:
ultrarag run examples/search_o1.yaml