Server Alias Reuse-based RAG Workflow

This section implements a RAG workflow that requires reusing the Generation Server — RankCoT (Knowledge Refinement + Final Answer).

Paper: https://arxiv.org/abs/2502.17888

RankCoT consists of two stages: first, a “knowledge refinement model” refines the retrieved documents, then an “answer generation model” generates answers based on the refined results. Both are essentially LLM inferences, so in UR-2.0, the same Generation Server code can be reused through Server aliases, only assigning different aliases and configuring different parameters/models in the Pipeline (see Server Alias Reuse Mechanism for details).

Step 1: Clarify Workflow Structure

Let’s first review the original RankCoT workflow structure diagram proposed in the paper:

In UR-2.0, we structure the algorithm into the following modules:

Where the Prompt Server requires custom templates; knowledge refinement and answer generation reuse the Generation Server but use two different models or inference parameters.

Step 2: Implement Necessary Tools

The knowledge refinement part of RankCoT first generates a reasoning chain based on the retrieved external knowledge and the question. For this, add the following code in servers/prompt/src/prompt.py:

servers/prompt/src/prompt.py

@app.prompt(output="q_ls,ret_psg,kr_template->prompt_ls")
def RankCoT_kr(
    q_ls: List[str],
    ret_psg: List[str | Any],
    template: str | Path,
) -> list[PromptMessage]:
    template: Template = load_prompt_template(template)
    ret = []
    for q, psg in zip(q_ls, ret_psg):
        passage_text = "\n".join(psg)
        p = template.render(question=q, documents=passage_text)
        ret.append(p)
    return ret

The answer generation part generates the final answer based on the generated reasoning chain and the question. Add the following code in servers/prompt/src/prompt.py:

servers/prompt/src/prompt.py

@app.prompt(output="q_ls,kr_ls,qa_template->prompt_ls")
def RankCoT_qa(
    q_ls: List[str],
    kr_ls: List[str],
    template: str | Path,
) -> list[PromptMessage]:
    template: Template = load_prompt_template(template)
    ret = []
    for q, cot in zip(q_ls, kr_ls):
        p = template.render(question=q, CoT=cot)
        ret.append(p)
    return ret

Step 3: Write Pipeline Configuration File

Create a new YAML file under the examples/ directory, such as RankCoT.yaml:

examples/RankCoT.yaml

# RankCoT demo

# MCP Server
servers:
  benchmark: servers/benchmark
  retriever: servers/retriever
  prompt: servers/prompt
  # generation: servers/generation
  cot: servers/generation
  gen: servers/generation
  evaluation: servers/evaluation

# MCP Client Pipeline
pipeline:
- benchmark.get_data
- retriever.retriever_deploy_search
- prompt.RankCoT_kr
- cot.generate:
    output:
     ans_ls: kr_ls
- prompt.RankCoT_qa
- gen.generate
- evaluation.evaluate:
    input:
      pred_ls: ans_ls

Here we alias the same servers/generation as cot and gen, and call them separately in the Pipeline.

Step 4: Configure Pipeline Parameters

Run the following command:

ultrarag build examples/RankCoT.yaml

Open examples/parameter/RankCoT_parameter.yaml, and assign different models or inference parameters for the two aliases:

examples/parameter/RankCoT_parameter.yaml

benchmark:
  benchmark:
    key_map:
      gt_ls: golden_answers
      q_ls: question
    limit: 1
    name: nq
    path: data/sample_nq_10.jsonl
    seed: 42
    shuffle: false
cot:
  base_url: http://localhost:65501/v1
  model_name: RankCoT model
  sampling_params:
    extra_body:
      chat_template_kwargs:
        enable_thinking: false
      include_stop_str_in_output: true
      top_k: 20
    max_tokens: 2048
    temperature: 0.7
    top_p: 0.8
evaluation:
  metrics:
  - acc
  - f1
  - em
  - coverem
  - stringem
  - rouge-1
  - rouge-2
  - rouge-l
  save_path: output/nq.json
gen:
  base_url: http://localhost:65501/v1
  model_name: qwen3-8b
  sampling_params:
    extra_body:
      chat_template_kwargs:
        enable_thinking: false
      include_stop_str_in_output: true
      top_k: 20
    max_tokens: 2048
    temperature: 0.7
    top_p: 0.8
prompt:
  kr_template: prompt/RankCoT_knowledge_refinement.jinja
  qa_template: prompt/RankCoT_question_answering.jinja
retriever:
  query_instruction: 'Query: '
  retriever_url: http://localhost:5005/search
  top_k: 5

You can see that cot and gen each have independent parameter blocks without overwriting each other.

Step 5: Run Your Inference Workflow!

Once everything is ready, run the following command to start the inference workflow:

ultrarag run examples/RankCoT.yaml

Getting Started

Developer Guide

Server Alias Reuse-based RAG Workflow

Step 1: Clarify Workflow Structure

Step 2: Implement Necessary Tools

Step 3: Write Pipeline Configuration File

Step 4: Configure Pipeline Parameters

Step 5: Run Your Inference Workflow!

Getting Started

Developer Guide

​Step 1: Clarify Workflow Structure

​Step 2: Implement Necessary Tools

​Step 3: Write Pipeline Configuration File

​Step 4: Configure Pipeline Parameters

​Step 5: Run Your Inference Workflow!

Step 1: Clarify Workflow Structure

Step 2: Implement Necessary Tools

Step 3: Write Pipeline Configuration File

Step 4: Configure Pipeline Parameters

Step 5: Run Your Inference Workflow!