This section will guide you to implement a typical iterative reasoning RAG workflow: IterRetGen. This process supports repeatedly updating queries based on model outputs, gradually converging to better answers.
Paper: https://arxiv.org/pdf/2305.15294

Step 1: Clarify the Workflow Structure

Let’s first review the original workflow diagram of IterRetGen proposed in the paper: In UltraRAG, you can quickly implement this workflow based on existing modules. Its overall architecture can be abstracted as the following diagram: As shown in the figure:
  • The answer generated by the model in each round is concatenated with the original question as the query for the next round of retrieval.
  • This process is executed N times according to the set maximum number of loops.
  • Except for the prompt and custom modules that need to be implemented by yourself, other functions can be directly reused from UltraRAG built-in tools.

Step 2: Implement the Necessary Tool

IterRetGen concatenates the query and the answer generated in the current round as the query for this round. To do this, add the following code in servers/custom/src/custom.py:
servers/custom/src/custom.py
@app.tool(output="q_ls,ret_psg->nextq_ls")
def iterretgen_nextquery(
    q_ls: List[str],
    ans_ls: List[str | Any],
) -> Dict[str, List[str]]:
    ret = []
    for q, ans in zip(q_ls, ans_ls):
        next_query = f"{q} {ans}"
        ret.append(next_query)
    return {"nextq_ls": ret}

Step 3: Write the Pipeline Configuration File

Create a new YAML file under the examples/ directory, such as IterRetGen.yaml:
/images/yaml.svgexamples/IterRetGen.yaml
# IterRetGen demo

# MCP Server
servers:
  benchmark: servers/benchmark
  retriever: servers/retriever
  prompt: servers/prompt
  generation: servers/generation
  evaluation: servers/evaluation
  custom: servers/custom

# MCP Client Pipeline
pipeline:
- benchmark.get_data
- retriever.retriever_deploy_search
- prompt.qa_rag_boxed
- generation.generate
- custom.output_extract_from_boxed
- loop:
    times: 3
    steps:
    - custom.iterretgen_nextquery:
        input:
          ans_ls: pred_ls
    - retriever.retriever_deploy_search:
        input:
          query_list: nextq_ls
    - prompt.qa_rag_boxed
    - generation.generate
    - custom.output_extract_from_boxed
- evaluation.evaluate

Step 4: Configure Pipeline Parameters

Run the following command:
ultrarag build examples/IterRetGen.yaml
Open the generated examples/parameter/IterRetGen_parameter.yaml and modify the configuration as follows:
/images/yaml.svgexamples/parameter/IterRetGen_parameter.yaml
benchmark:
  benchmark:
    key_map:
      gt_ls: golden_answers
      q_ls: question
    limit: 2
    name: asqa
    path: data/sample_asqa_5.jsonl
custom: {}
evaluation:
  metrics:
  - acc
  - f1
  - em
  - coverem
  - stringem
  - rouge-1
  - rouge-2
  - rouge-l
  save_path: output/asqa.json
generation:
  base_url: http://localhost:8000/v1
  model_name: openbmb/MiniCPM4-8B
  sampling_params:
    extra_body:
      chat_template_kwargs:
        enable_thinking: false
      include_stop_str_in_output: true
      top_k: 20
    max_tokens: 2048
    temperature: 0.7
    top_p: 0.8
prompt:
  template: prompt/qa_boxed.jinja
retriever:
  query_instruction: 'Query: '
  retriever_url: http://localhost:8080
  top_k: 5

Step 5: Run Your Inference Pipeline!

Once everything is ready, execute the following command to start the inference pipeline:
ultrarag run examples/IterRetGen.yaml