本节将实现一个更加复杂的多轮 RAG 推理流程:IRCoT
论文:https://arxiv.org/pdf/2212.10509
IRCoT 的核心思想是:在每一轮中,模型基于当前检索到的文档、历史推理链和问题生成新的推理内容(CoT),并据此触发下一轮检索。 这一交替式的循环可持续推进推理深度,直至满足终止条件(如明确回答已生成)。因此,它需要对多轮中间结果进行有效记录与访问——这正是 UltraRAG 中 Memory 机制的用武之地。

Step 1:明确工作流结构

IRCoT 的推理过程包括如下步骤:
  1. 初始检索:以原始问题为查询,获取第一批文档;
  2. 推理-检索交替循环(最多 N 轮):
  • 使用当前检索文档 + 历史 CoT 推理句生成下一步 CoT;
  • 若新生成文本中包含 “So the answer is:“,则提前终止;
  • 否则提取当前 CoT 的第一句话作为下一轮检索查询;
  1. 最终回答生成:在结束迭代后提取答案用于评估。
为了实现上述流程,我们需要扩展 UltraRAG 内置功能,补充以下 Tool:
  • Prompt 构造:支持拼接历史 CoT + 文档,首轮由于无CoT需特殊处理;
  • 判断是否提前结束:识别是否生成了最终回答语句(原论文通过判断是否包含 So the answer is:);
  • 构造下一轮查询:用于构造下一轮的检索查询,每轮检索使用模型生成CoT内容的第一句话;
  • 答案抽取:从生成文本中提取最终答案内容。
为了访问到此前的 CoT 以及检索历史记录,需要使用 UltraRAG 实现的 Memory 中间变量存储机制,只需在变量名前拼接 memory_ 即可访问之前每轮迭代的中间结果。
关于这部分,请参考教程 中间变量存储机制

Step 2:实现必要tool

Step 2.1:实现 ircot_next_prompt

首先,在 prompt/IRCoT.jinja 中定义提示模板,样例如下:
/images/jinja.svgprompt/IRCoT.jinja
You serve as an intelligent assistant, adept at facilitating users through complex, multi-hop reasoning across multiple documents. This task is illustrated through demonstrations, each consisting of a document set paired with a relevant question and its multi-hop reasoning thoughts. Your task is to generate one thought for current step, DON'T generate the whole thoughts at once! If you reach what you believe to be the final step, start with "So the answer is:".

Wikipedia Title: Kurram Garhi
Kurram Garhi is a small village located near the city of Bannu, which is the part of Khyber Pakhtunkhwa province of Pakistan. Its population is approximately 35000. Barren hills are near this village. This village is on the border of Kurram Agency. Other nearby villages are Peppal, Surwangi and Amandi Kala.

Wikipedia Title: 2001–02 UEFA Champions League second group stage
Eight winners and eight runners- up from the first group stage were drawn into four groups of four teams, each containing two group winners and two runners- up. Teams from the same country or from the same first round group could not be drawn together. The top two teams in each group advanced to the quarter- finals.

Wikipedia Title: Satellite tournament
A satellite tournament is either a minor tournament or event on a competitive sporting tour or one of a group of such tournaments that form a series played in the same country or region.

Wikipedia Title: Trojkrsti
Trojkrsti is a village in Municipality of Prilep, Republic of Macedonia.

Wikipedia Title: Telephone numbers in Ascension Island
Country Code:+ 247< br> International Call Prefix: 00 Ascension Island does not share the same country code( +290) with the rest of St Helena.

Question: Are both Kurram Garhi and Trojkrsti located in the same country?
Thought: Kurram Garhi is located in the country of Pakistan. Trojkrsti is located in the country of Republic of Macedonia. Thus, they are not in the same country. So the answer is: no.

{{documents}}

Question: {{question}}
Thought: {{cur_answer}}
然后,在 Prompt Server 中添加:
servers/prompt/src/prompt.py
# prompt for IRCOT
@app.prompt(output="memory_q_ls,memory_ret_psg,template->prompt_ls")
def ircot_next_prompt(
    memory_q_ls: List[List[str | None]],
    memory_ret_psg: List[List[List[str]] | None],
    template: str | Path,
) -> List[PromptMessage]:
    template: Template = load_prompt_template(template)
    ret: List[PromptMessage] = []
    # ---------- 单轮 ----------
    if len(memory_q_ls) == 1:
        for q, psg in zip(memory_q_ls[0], memory_ret_psg[0]):
            if q is None:
                continue
            passage_text = "" if psg is None else "\n".join(psg)
            ret.append(
                template.render(documents=passage_text, question=q, cur_answer="")
            )
        return ret
    # ---------- 多轮 ----------
    data_num = len(memory_q_ls[0])
    round_cnt = len(memory_q_ls)
    for i in range(data_num):
        if memory_q_ls[0][i] is None:  # 已终止样本
            continue
        all_passages, all_cots = [], []
        for r in range(round_cnt):
            psg = None
            if memory_ret_psg is not None and r < len(memory_ret_psg):
                round_psg = memory_ret_psg[r]
                if round_psg is not None and i < len(round_psg):
                    psg = round_psg[i]
            if psg:  
                all_passages.extend(psg)
            if r > 0:
                cot = memory_q_ls[r][i]
                if cot:
                    all_cots.append(cot)
        passage_text = "\n".join(all_passages)
        cur_answer = " ".join(all_cots).strip()
        q = memory_q_ls[0][i]
        ret.append(
            template.render(documents=passage_text, question=q, cur_answer=cur_answer)
        )
    return ret
该函数会自动拼接每一轮的历史检索结果和历史 CoT,并传入当前模板渲染,构造模型输入。

Step 2.2:实现 router.ircot_check_end

用于检测模型生成内容中是否包含 “So the answer is:”:
servers/router/src/router.py
@app.tool(output="ans_ls->ans_ls")
def ircot_check_end(ans_ls: List[str]) -> Dict[str, List[Dict[str, str]]]:
    ans_ls = [
        {
            "data": ans,
            "state": "complete" if "so the answer is" in ans.lower() else "incomplete",
        }
        for ans in ans_ls
    ]
    return {"ans_ls": ans_ls}
返回每条样本是否完成(state: complete / incomplete)。

Step 2.3:实现 custom.ircot_get_first_sent

由于 IRCoT 在每次检索时,使用的都是当前模型生成内容的第一句话作为数据,因此需要在 Custom Server 中额外实现一个单独的提取第一句话的逻辑代码:
servers/custom/src/custom.py
@app.tool(output="ans_ls->q_ls")
def ircot_get_first_sent(
    ans_ls: List[str],
) -> Dict[str, List[str]]:
    ret = []
    for ans in ans_ls:
        match = re.search(r"(.+?[。!?.!?])", ans)
        if match:
            ret.append(match.group(1))
        else:
            ret.append(ans.strip())
    return {"q_ls": ret}

Step 2.4:实现 custom.ircot_extract_ans

用于从最后一轮模型生成内容中抽取具体答案:
@app.tool(output="ans_ls->pred_ls")
def ircot_extract_ans(ans_ls: List[str]) -> Dict[str, List[str]]:
    ret = []
    pattern = re.compile(r"so the answer is[\s:]*([^\n]*)", re.IGNORECASE)
    for ans in ans_ls:
        match = pattern.search(ans)
        if match:
            ret.append(match.group(1).strip())
        else:
            ret.append(ans.strip())
    return {"pred_ls": ret}

Step 3:编写 Pipeline 配置文件

完成上述 Tool 后,即可使用以下 YAML 文件构建 IRCoT 推理流程:
/images/yaml.svgexamples/IRCoT.yaml
# MCP Server
servers:
  benchmark: servers/benchmark
  generation: servers/generation
  retriever: servers/retriever
  prompt: servers/prompt
  evaluation: servers/evaluation
  router: servers/router
  custom: servers/custom

# MCP Client Pipeline
pipeline:
- benchmark.get_data
# 前n-1次检索,需要提取CoT的第一句话
- loop:
    times: 2
    steps:
    # 检索Q->D
    - retriever.retriever_deploy_search
    # T_i = Reason(Q + D + T_i-1)
    - prompt.ircot_next_prompt
    - generation.generate
    - branch:
        router:
        # 判断是否包含so the answer is
        - router.ircot_check_end
        branches:
          incomplete:
          # 提取第一句话作为CoT
          - custom.ircot_get_first_sent
          complete: []
# 第n次检索,不提取第一句话
# T_3 = Reason(Q + D + T_2)
- retriever.retriever_deploy_search
- prompt.ircot_next_prompt
- generation.generate
- custom.ircot_extract_ans
- evaluation.evaluate

Step 4:配置 Pipeline 参数

执行命令构建参数模板:
ultrarag build examples/IRCoT.yaml
然后编辑生成的 examples/parameter/IRCoT_parameter.yaml,配置如下内容:
/images/yaml.svgexamples/parameter/IRCoT_parameter.yaml
benchmark:
  benchmark:
    key_map:
      gt_ls: golden_answers
      q_ls: question
    limit: 2
    name: asqa
    path: data/sample_asqa_5.jsonl
custom: {}
evaluation:
  metrics:
  - acc
  - f1
  - em
  - coverem
  - stringem
  - rouge-1
  - rouge-2
  - rouge-l
  save_path: output/asqa.json
generation:
  base_url: http://localhost:8000/v1
  model_name: openbmb/MiniCPM4-8B
  sampling_params:
    extra_body:
      chat_template_kwargs:
        enable_thinking: false
      include_stop_str_in_output: true
      top_k: 20
    max_tokens: 2048
    temperature: 0.7
    top_p: 0.8
prompt:
  template: prompt/IRCoT.jinja
retriever:
  query_instruction: 'Query: '
  retriever_url: http://localhost:8080
  top_k: 5

Step 5:运行你的推理流程!

一切准备就绪后,执行以下命令启动推理流程:
ultrarag run examples/IRCoT.yaml