RAG Workflow with Multi-round Intermediate Results

This section implements a more complex multi-round RAG reasoning process: IRCoT.

Paper: https://arxiv.org/pdf/2212.10509

The core idea of IRCoT is: in each round, the model generates new reasoning content (CoT) based on the currently retrieved documents, historical reasoning chains, and the question, and triggers the next round of retrieval accordingly. This alternating loop can continuously deepen the reasoning until a termination condition is met (e.g., the final answer has been generated). Therefore, it requires effective recording and accessing of multi-round intermediate results — this is exactly where the Memory mechanism in UltraRAG comes into play.

Step 1: Clarify the Workflow Structure

The IRCoT reasoning process includes the following steps:

Initial retrieval: use the original question as the query to obtain the first batch of documents;
Reasoning-retrieval alternating loop (up to N rounds):

Use the currently retrieved documents + historical CoT reasoning sentences to generate the next step CoT;
If the newly generated text contains “So the answer is:”, terminate early;
Otherwise, extract the first sentence of the current CoT as the query for the next retrieval;

Final answer generation: extract the answer for evaluation after the iteration ends.

To implement the above process, we need to extend UltraRAG’s built-in functionality and supplement the following Tools:

Prompt construction: support concatenating historical CoT + documents, with special handling for the first round due to no CoT;
Determine early termination: identify whether the final answer sentence has been generated (the original paper judges whether it contains So the answer is:);
Construct the next round query: used to construct the retrieval query for the next round, each round uses the first sentence of the model-generated CoT content;
Answer extraction: extract the final answer content from the generated text.

To access previous CoTs and retrieval history, the Memory intermediate variable storage mechanism implemented in UltraRAG is used. Simply prepend memory_ to the variable name to access the intermediate results of each previous iteration.

For this part, please refer to the tutorial Intermediate Variable Storage Mechanism

Step 2: Implement Necessary Tools

Step 2.1: Implement ircot_next_prompt

First, define the prompt template in prompt/IRCoT.jinja, sample as follows:

prompt/IRCoT.jinja

You serve as an intelligent assistant, adept at facilitating users through complex, multi-hop reasoning across multiple documents. This task is illustrated through demonstrations, each consisting of a document set paired with a relevant question and its multi-hop reasoning thoughts. Your task is to generate one thought for current step, DON'T generate the whole thoughts at once! If you reach what you believe to be the final step, start with "So the answer is:".

Wikipedia Title: Kurram Garhi
Kurram Garhi is a small village located near the city of Bannu, which is the part of Khyber Pakhtunkhwa province of Pakistan. Its population is approximately 35000. Barren hills are near this village. This village is on the border of Kurram Agency. Other nearby villages are Peppal, Surwangi and Amandi Kala.

Wikipedia Title: 2001–02 UEFA Champions League second group stage
Eight winners and eight runners- up from the first group stage were drawn into four groups of four teams, each containing two group winners and two runners- up. Teams from the same country or from the same first round group could not be drawn together. The top two teams in each group advanced to the quarter- finals.

Wikipedia Title: Satellite tournament
A satellite tournament is either a minor tournament or event on a competitive sporting tour or one of a group of such tournaments that form a series played in the same country or region.

Wikipedia Title: Trojkrsti
Trojkrsti is a village in Municipality of Prilep, Republic of Macedonia.

Wikipedia Title: Telephone numbers in Ascension Island
Country Code:+ 247< br> International Call Prefix: 00 Ascension Island does not share the same country code( +290) with the rest of St Helena.

Question: Are both Kurram Garhi and Trojkrsti located in the same country?
Thought: Kurram Garhi is located in the country of Pakistan. Trojkrsti is located in the country of Republic of Macedonia. Thus, they are not in the same country. So the answer is: no.

{{documents}}

Question: {{question}}
Thought: {{cur_answer}}

Then, add in Prompt Server:

servers/prompt/src/prompt.py

# prompt for IRCOT
@app.prompt(output="memory_q_ls,memory_ret_psg,template->prompt_ls")
def ircot_next_prompt(
    memory_q_ls: List[List[str | None]],
    memory_ret_psg: List[List[List[str]] | None],
    template: str | Path,
) -> List[PromptMessage]:
    template: Template = load_prompt_template(template)
    ret: List[PromptMessage] = []
    # ---------- Single round ----------
    if len(memory_q_ls) == 1:
        for q, psg in zip(memory_q_ls[0], memory_ret_psg[0]):
            if q is None:
                continue
            passage_text = "" if psg is None else "\n".join(psg)
            ret.append(
                template.render(documents=passage_text, question=q, cur_answer="")
            )
        return ret
    # ---------- Multiple rounds ----------
    data_num = len(memory_q_ls[0])
    round_cnt = len(memory_q_ls)
    for i in range(data_num):
        if memory_q_ls[0][i] is None:  # Sample already terminated
            continue
        all_passages, all_cots = [], []
        for r in range(round_cnt):
            psg = None
            if memory_ret_psg is not None and r < len(memory_ret_psg):
                round_psg = memory_ret_psg[r]
                if round_psg is not None and i < len(round_psg):
                    psg = round_psg[i]
            if psg:  
                all_passages.extend(psg)
            if r > 0:
                cot = memory_q_ls[r][i]
                if cot:
                    all_cots.append(cot)
        passage_text = "\n".join(all_passages)
        cur_answer = " ".join(all_cots).strip()
        q = memory_q_ls[0][i]
        ret.append(
            template.render(documents=passage_text, question=q, cur_answer=cur_answer)
        )
    return ret

This function automatically concatenates the historical retrieval results and historical CoT of each round, and passes them into the current template rendering to construct the model input.

Step 2.2: Implement router.ircot_check_end

Used to detect whether the model-generated content contains “So the answer is:”

servers/router/src/router.py

@app.tool(output="ans_ls->ans_ls")
def ircot_check_end(ans_ls: List[str]) -> Dict[str, List[Dict[str, str]]]:
    ans_ls = [
        {
            "data": ans,
            "state": "complete" if "so the answer is" in ans.lower() else "incomplete",
        }
        for ans in ans_ls
    ]
    return {"ans_ls": ans_ls}

Returns whether each sample is completed (state: complete / incomplete).

Step 2.3: Implement custom.ircot_get_first_sent

Since IRCoT uses the first sentence of the current model-generated content as data for each retrieval, an additional logic code to extract the first sentence separately needs to be implemented in the Custom Server:

servers/custom/src/custom.py

@app.tool(output="ans_ls->q_ls")
def ircot_get_first_sent(
    ans_ls: List[str],
) -> Dict[str, List[str]]:
    ret = []
    for ans in ans_ls:
        match = re.search(r"(.+?[。！？.!?])", ans)
        if match:
            ret.append(match.group(1))
        else:
            ret.append(ans.strip())
    return {"q_ls": ret}

Step 2.4: Implement custom.ircot_extract_ans

Used to extract the specific answer from the model-generated content of the last round:

@app.tool(output="ans_ls->pred_ls")
def ircot_extract_ans(ans_ls: List[str]) -> Dict[str, List[str]]:
    ret = []
    pattern = re.compile(r"so the answer is[\s:]*([^\n]*)", re.IGNORECASE)
    for ans in ans_ls:
        match = pattern.search(ans)
        if match:
            ret.append(match.group(1).strip())
        else:
            ret.append(ans.strip())
    return {"pred_ls": ret}

Step 3: Write Pipeline Configuration File

After completing the above Tools, the following YAML file can be used to build the IRCoT reasoning workflow:

examples/IRCoT.yaml

# MCP Server
servers:
  benchmark: servers/benchmark
  generation: servers/generation
  retriever: servers/retriever
  prompt: servers/prompt
  evaluation: servers/evaluation
  router: servers/router
  custom: servers/custom

# MCP Client Pipeline
pipeline:
- benchmark.get_data
# Retrieval for first n-1 times, need to extract the first sentence of CoT
- loop:
    times: 2
    steps:
    # Retrieve Q->D
    - retriever.retriever_deploy_search
    # T_i = Reason(Q + D + T_i-1)
    - prompt.ircot_next_prompt
    - generation.generate
    - branch:
        router:
        # Check if contains so the answer is
        - router.ircot_check_end
        branches:
          incomplete:
          # Extract the first sentence as CoT
          - custom.ircot_get_first_sent
          complete: []
# nth retrieval, no extraction of first sentence
# T_3 = Reason(Q + D + T_2)
- retriever.retriever_deploy_search
- prompt.ircot_next_prompt
- generation.generate
- custom.ircot_extract_ans
- evaluation.evaluate

Step 4: Configure Pipeline Parameters

Run the command to build parameter templates:

ultrarag build examples/IRCoT.yaml

Then edit the generated examples/parameter/IRCoT_parameter.yaml with the following content:

examples/parameter/IRCoT_parameter.yaml

benchmark:
  benchmark:
    key_map:
      gt_ls: golden_answers
      q_ls: question
    limit: 2
    name: asqa
    path: data/sample_asqa_5.jsonl
custom: {}
evaluation:
  metrics:
  - acc
  - f1
  - em
  - coverem
  - stringem
  - rouge-1
  - rouge-2
  - rouge-l
  save_path: output/asqa.json
generation:
  base_url: http://localhost:8000/v1
  model_name: openbmb/MiniCPM4-8B
  sampling_params:
    extra_body:
      chat_template_kwargs:
        enable_thinking: false
      include_stop_str_in_output: true
      top_k: 20
    max_tokens: 2048
    temperature: 0.7
    top_p: 0.8
prompt:
  template: prompt/IRCoT.jinja
retriever:
  query_instruction: 'Query: '
  retriever_url: http://localhost:8080
  top_k: 5

Step 5: Run Your Reasoning Workflow!

Once everything is ready, execute the following command to start the reasoning workflow:

ultrarag run examples/IRCoT.yaml

Getting Started

Developer Guide

RAG Workflow with Multi-round Intermediate Results

Step 1: Clarify the Workflow Structure

Step 2: Implement Necessary Tools

Step 2.1: Implement ircot_next_prompt

Step 2.2: Implement router.ircot_check_end

Step 2.3: Implement custom.ircot_get_first_sent

Step 2.4: Implement custom.ircot_extract_ans

Step 3: Write Pipeline Configuration File

Step 4: Configure Pipeline Parameters

Step 5: Run Your Reasoning Workflow!

Getting Started

Developer Guide

​Step 1: Clarify the Workflow Structure

​Step 2: Implement Necessary Tools

​Step 2.1: Implement ircot_next_prompt

​Step 2.2: Implement router.ircot_check_end

​Step 2.3: Implement custom.ircot_get_first_sent

​Step 2.4: Implement custom.ircot_extract_ans

​Step 3: Write Pipeline Configuration File

​Step 4: Configure Pipeline Parameters

​Step 5: Run Your Reasoning Workflow!

Step 1: Clarify the Workflow Structure

Step 2: Implement Necessary Tools

Step 2.1: Implement ircot_next_prompt

Step 2.2: Implement router.ircot_check_end

Step 2.3: Implement custom.ircot_get_first_sent

Step 2.4: Implement custom.ircot_extract_ans

Step 3: Write Pipeline Configuration File

Step 4: Configure Pipeline Parameters

Step 5: Run Your Reasoning Workflow!