Server 别名复用型 RAG Workflow

本节实现一个需要复用 Generation Server 的 RAG 流程 —— RankCoT（知识精炼 + 最终回答）。

论文：https://arxiv.org/abs/2502.17888

RankCoT 包含两个阶段：先用“知识精炼模型”对检索文档进行提炼，再用“答案生成模型”基于提炼结果作答。两者本质都是 LLM 推理，因此在 UR-2.0 中可通过 Server 别名复用同一套 Generation Server 代码，只在 Pipeline 中为其取不同别名并配置不同参数/模型即可（详见 Server 别名复用机制）。

Step 1: 明确工作流结构

我们先来回顾论文提出的 RankCoT原始流程结构图：

在 UR-2.0 中，我们将该算法结构化为如下模块：

其中 Prompt Server 需要自定义模板；知识精炼与问答生成复用 Generation Server，但使用两套不同的模型或推理参数。

Step 2：实现必要Tool

RankCoT知识精炼部分先基于检索回来的外部知识和问题生成一条推理链，为此在 servers/prompt/src/prompt.py 中添加如下代码：

servers/prompt/src/prompt.py

@app.prompt(output="q_ls,ret_psg,kr_template->prompt_ls")
def RankCoT_kr(
    q_ls: List[str],
    ret_psg: List[str | Any],
    template: str | Path,
) -> list[PromptMessage]:
    template: Template = load_prompt_template(template)
    ret = []
    for q, psg in zip(q_ls, ret_psg):
        passage_text = "\n".join(psg)
        p = template.render(question=q, documents=passage_text)
        ret.append(p)
    return ret

问答生成部分基于生成的推理链和问题生成最终答案，在 servers/prompt/src/prompt.py 中添加如下代码：

servers/prompt/src/prompt.py

@app.prompt(output="q_ls,kr_ls,qa_template->prompt_ls")
def RankCoT_qa(
    q_ls: List[str],
    kr_ls: List[str],
    template: str | Path,
) -> list[PromptMessage]:
    template: Template = load_prompt_template(template)
    ret = []
    for q, cot in zip(q_ls, kr_ls):
        p = template.render(question=q, CoT=cot)
        ret.append(p)
    return ret

Step 3：编写 Pipeline 配置文件

在 examples/ 目录下新建一个 YAML 文件，如 RankCoT.yaml：

examples/RankCoT.yaml

# RankCoT demo

# MCP Server
servers:
  benchmark: servers/benchmark
  retriever: servers/retriever
  prompt: servers/prompt
  # generation: servers/generation
  cot: servers/generation
  gen: servers/generation
  evaluation: servers/evaluation

# MCP Client Pipeline
pipeline:
- benchmark.get_data
- retriever.retriever_deploy_search
- prompt.RankCoT_kr
- cot.generate:
    output:
     ans_ls: kr_ls
- prompt.RankCoT_qa
- gen.generate
- evaluation.evaluate:
    input:
      pred_ls: ans_ls

在这里我们将同一个 servers/generation 别名为 cot 与 gen，并在 Pipeline 中分别调用。

Step 4：配置 Pipeline参数

执行下列命令：

ultrarag build examples/RankCoT.yaml

打开 examples/parameter/RankCoT_parameter.yaml，为两个别名分别指定不同的模型或推理参数：

examples/parameter/RankCoT_parameter.yaml

benchmark:
  benchmark:
    key_map:
      gt_ls: golden_answers
      q_ls: question
    limit: 1
    name: nq
    path: data/sample_nq_10.jsonl
    seed: 42
    shuffle: false
cot:
  base_url: http://localhost:65501/v1
  model_name: RankCoT model
  sampling_params:
    extra_body:
      chat_template_kwargs:
        enable_thinking: false
      include_stop_str_in_output: true
      top_k: 20
    max_tokens: 2048
    temperature: 0.7
    top_p: 0.8
evaluation:
  metrics:
  - acc
  - f1
  - em
  - coverem
  - stringem
  - rouge-1
  - rouge-2
  - rouge-l
  save_path: output/nq.json
gen:
  base_url: http://localhost:65501/v1
  model_name: qwen3-8b
  sampling_params:
    extra_body:
      chat_template_kwargs:
        enable_thinking: false
      include_stop_str_in_output: true
      top_k: 20
    max_tokens: 2048
    temperature: 0.7
    top_p: 0.8
prompt:
  kr_template: prompt/RankCoT_knowledge_refinement.jinja
  qa_template: prompt/RankCoT_question_answering.jinja
retriever:
  query_instruction: 'Query: '
  retriever_url: http://localhost:5005/search
  top_k: 5

可以看到 cot 与 gen 各自拥有独立参数区块，互不覆盖。

Step 5：运行你的推理流程！

一切准备就绪后，执行以下命令启动推理流程：

ultrarag run examples/RankCoT.yaml

开始使用

开发指南

Server 别名复用型 RAG Workflow

Step 1: 明确工作流结构

Step 2：实现必要Tool

Step 3：编写 Pipeline 配置文件

Step 4：配置 Pipeline参数

Step 5：运行你的推理流程！

开始使用

开发指南

​Step 1: 明确工作流结构

​Step 2：实现必要Tool

​Step 3：编写 Pipeline 配置文件

​Step 4：配置 Pipeline参数

​Step 5：运行你的推理流程！

Step 1: 明确工作流结构

Step 2：实现必要Tool

Step 3：编写 Pipeline 配置文件

Step 4：配置 Pipeline参数

Step 5：运行你的推理流程！