DeepResearch-Like RAG Workflow

In real-world applications, user questions often contain ambiguous, generalized, or multi-turn knowledge supplementation points. Traditional one-shot retrieval + generation (RAG) workflows struggle to provide accurate answers. To address this issue, DeepResearch proposes a RAG workflow with more “research-style thinking” characteristics: before answering, the model first plans, actively identifies missing information, generates sub-questions in stages, and constructs a complete information framework through multiple rounds of “search - reasoning - page updating,” ultimately producing high-quality answers. In this section, we will implement a DeepResearch-like system based on the UltraRAG framework. This system guides the model to generate an overall structural plan (Plan) and an initial page (Page) based on the original question, then iteratively generates and retrieves sub-queries to progressively fill and refine the page, building knowledge pages closely related to the question.

Step 1: Define Workflow Structure

The algorithm includes the following key stages:

Step 1.1: Initialize Plan and Page Structure

The large model generates a complete Plan (structured page plan) based on the original question and constructs an initial page containing [to be filled] placeholders. Example input:

Question: Who did the singer of "Rock of the Night" play in "Melody of Desire"?

Example Plan:

{
  "mainTitle": "Identify the singer of 'Rock of the Night' and their role in 'Melody of Desire'",
  "sections": [
    {
      "title": "Understanding the song and its singer",
      "focus": "This section will identify the singer of 'Rock of the Night' and provide background information about the artist.",
      "subtopics": [
        "'Rock of the Night' song overview",
        "Confirming the identity of the singer related to the song"
      ]
    }...

Initial page:

# Identify the singer of 'Rock of the Night' and their role in 'Melody of Desire'
## Understanding the song and its singer
[to be filled]
## Overview of 'Melody of Desire'
[to be filled]
## The singer's role in 'Melody of Desire'
[to be filled]

Step 1.2: Enter Iterative Process

In each iteration, the model:

Decomposes the current [to be filled] content into sub-questions (Sub-questions)
Retrieves documents for the sub-questions
Updates the corresponding paragraphs on the page based on the current Plan, page, and document content

This process continues until:

There is no [to be filled] content left on the page
Or the maximum number of iterations is reached (default 10 rounds)

Step 2: Implement Necessary Tools

Step 2.1: Implement Functions Used by the Prompt Server

Prompt templates are located in the prompt/ directory, and function implementations are in servers/prompt/src/prompt.py. Each function is bound to a template file and must be individually registered in parameter.yaml to avoid conflicts:

servers/prompt/src/prompt.py

@app.prompt(output="q_ls,plan_ls,webnote_init_page_template->prompt_ls")
def webnote_init_page(
    q_ls: List[str],
    plan_ls: List[str],
    template: str | Path,
) -> List[PromptMessage]:
    template: Template = load_prompt_template(template)
    all_prompts = []
    for q, plan in zip(q_ls, plan_ls):
        p = template.render(question=q, plan=plan)
        all_prompts.append(p)
    return all_prompts

@app.prompt(output="q_ls,webnote_gen_plan_template->prompt_ls")
def webnote_gen_plan(
    q_ls: List[str],
    template: str | Path,
) -> List[PromptMessage]:
    template: Template = load_prompt_template(template)
    all_prompts = []
    for q in q_ls:
        p = template.render(question=q)
        all_prompts.append(p)
    return all_prompts

@app.prompt(output="q_ls,plan_ls,page_ls,webnote_gen_subq_template->prompt_ls")
def webnote_gen_subq(
    q_ls: List[str],
    plan_ls: List[str],
    page_ls: List[str],
    template: str | Path,
) -> List[PromptMessage]:
    template: Template = load_prompt_template(template)
    all_prompts = []
    for q, plan, page in zip(q_ls, plan_ls, page_ls):
        p = template.render(question=q, plan=plan, page=page)
        all_prompts.append(p)
    return all_prompts

@app.prompt(output="q_ls,plan_ls,page_ls,subq_ls,psg_ls,webnote_fill_page_template->prompt_ls")
def webnote_fill_page(
    q_ls: List[str],
    plan_ls: List[str],
    page_ls: List[str],
    subq_ls: List[str],
    psg_ls: List[Any],
    template: str | Path,
) -> List[PromptMessage]:
    template: Template = load_prompt_template(template)
    all_prompts = []
    for q, plan, page, subq, psg in zip(q_ls, plan_ls, page_ls, subq_ls, psg_ls):
        p = template.render(question=q, plan=plan, page=page, subq=subq, psg=psg)
        all_prompts.append(p)
    return all_prompts

@app.prompt(output="q_ls,plan_ls,page_ls,webnote_gen_answer_template->prompt_ls")
def webnote_gen_answer(
    q_ls: List[str],
    plan_ls: List[str],
    page_ls: List[str],
    template: str | Path,
) -> List[PromptMessage]:
    template: Template = load_prompt_template(template)
    all_prompts = []
    for q, plan, page in zip(q_ls, plan_ls, page_ls):
        p = template.render(question=q, plan=plan, page=page)
        all_prompts.append(p)
    return all_prompts

The complete list of functions is as follows:

webnote_gen_plan: Generate a structured Plan based on the question
webnote_init_page: Construct the initial page based on the Plan
webnote_gen_subq: Generate sub-queries
webnote_fill_page: Fill page content using sub-query results
webnote_gen_answer: Integrate page information to generate the final answer

Each function corresponds to a .jinja template. Additionally, ensure the following parameter configurations are added in servers/prompt/parameter.yaml to explicitly specify the paths of each template:

servers/prompt/parameter.yaml

template: prompt/qa_boxed.jinja
webnote_gen_plan_template: prompt/webnote_gen_plan.jinja
webnote_init_page_template: prompt/webnote_init_page.jinja
webnote_gen_subq_template: prompt/webnote_gen_subq.jinja
webnote_fill_page_template: prompt/webnote_fill_page.jinja
webnote_gen_answer_template: prompt/webnote_gen_answer.jinja

Step 2.2: Implement Router Server

Used to determine whether the current page has been fully filled. If placeholders like [to be filled] still exist, mark the page as incomplete and continue the loop; otherwise, terminate the process.

servers/router/src/router.py

@app.tool(output="page_ls->page_ls")
def webnote_check_page(page_ls: List[str]) -> Dict[str, List[Dict[str, str]]]:
    """Check if the page is complete or incomplete.
    Args:
        page_ls (list): List of pages to check.
    Returns:
        dict: Dictionary containing the list of pages with their states.
    """
    page_ls = [
        {
            "data": page,
            "state": "incomplete" if "to be filled" in page.lower() else "complete",
        }
        for page in page_ls
    ]
    return {"page_ls": page_ls}

Step 3: Write Pipeline Configuration File

Define the module structure and execution flow in examples/webnote.yaml as follows:

examples/webnote.yaml

# WebNote demo

# MCP Server
servers:
  benchmark: servers/benchmark
  generation: servers/generation
  retriever: servers/retriever
  prompt: servers/prompt
  evaluation: servers/evaluation
  custom: servers/custom
  router: servers/router

# MCP Client Pipeline
pipeline:
- benchmark.get_data
# Initialize retrieval service
- retriever.retriever_deploy_search
# Load dataset

# Generate plan
- prompt.webnote_gen_plan
- generation.generate:
    output:
      ans_ls: plan_ls
# Initialize page
- prompt.webnote_init_page
- generation.generate:
    output:
      ans_ls: page_ls
# Loop: generate sub-questions, retrieve, progressively fill page
- loop:
    times: 10
    steps:
    # Trigger check to determine if page is complete
    - branch:
        router:
        - router.webnote_check_page
        branches:
          # If page is incomplete, continue
          incomplete:
          # Generate sub-questions
          - prompt.webnote_gen_subq
          - generation.generate:
              output:
                ans_ls: subq_ls
          # Retrieve answers
          - retriever.retriever_deploy_search:
              input:
                query_list: subq_ls
              output:
                ret_psg: psg_ls
          # Fill page
          - prompt.webnote_fill_page
          - generation.generate:
              output:
                ans_ls: page_ls
          # If page is complete, end
          complete: []
# Generate final answer
- prompt.webnote_gen_answer
- generation.generate
# Evaluate results
- custom.output_extract_from_boxed
- evaluation.evaluate

Step 4: Configure Pipeline Parameters

Run the following command to build the parameter template:

ultrarag build examples/webnote.yaml

The format of webnote_parameter.yaml is as follows:

examples/webnote_parameter.yaml

benchmark:
  benchmark:
    key_map:
      gt_ls: golden_answers
      q_ls: question
    limit: 2
    name: asqa
    path: data/sample_asqa_5.jsonl
custom: {}
evaluation:
  metrics:
  - acc
  - f1
  - em
  - coverem
  - stringem
  - rouge-1
  - rouge-2
  - rouge-l
  save_path: output/nq.json
generation:
  base_url: http://localhost:8000/v1
  model_name: openbmb/MiniCPM4-8B
  sampling_params:
    extra_body:
      chat_template_kwargs:
        enable_thinking: false
      include_stop_str_in_output: true
      top_k: 20
    max_tokens: 2048
    temperature: 0.7
    top_p: 0.8
prompt:
  webnote_fill_page_template: prompt/webnote_fill_page.jinja
  webnote_gen_answer_template: prompt/webnote_gen_answer.jinja
  webnote_gen_plan_template: prompt/webnote_gen_plan.jinja
  webnote_gen_subq_template: prompt/webnote_gen_subq.jinja
  webnote_init_page_template: prompt/webnote_init_page.jinja
retriever:
  query_instruction: 'Query: '
  retriever_url: http://localhost:8080
  top_k: 5

Step 5: Run Your Inference Pipeline!

Once everything is ready, execute the following command to start the inference process:

ultrarag run examples/webnote.yaml

Getting Started

Developer Guide

DeepResearch-Like RAG Workflow

Step 1: Define Workflow Structure

Step 1.1: Initialize Plan and Page Structure

Step 1.2: Enter Iterative Process

Step 2: Implement Necessary Tools

Step 2.1: Implement Functions Used by the Prompt Server

Step 2.2: Implement Router Server

Step 3: Write Pipeline Configuration File

Step 4: Configure Pipeline Parameters

Step 5: Run Your Inference Pipeline!

Getting Started

Developer Guide

​Step 1: Define Workflow Structure

​Step 1.1: Initialize Plan and Page Structure

​Step 1.2: Enter Iterative Process

​Step 2: Implement Necessary Tools

​Step 2.1: Implement Functions Used by the Prompt Server

​Step 2.2: Implement Router Server

​Step 3: Write Pipeline Configuration File

​Step 4: Configure Pipeline Parameters

​Step 5: Run Your Inference Pipeline!

Step 1: Define Workflow Structure

Step 1.1: Initialize Plan and Page Structure

Step 1.2: Enter Iterative Process

Step 2: Implement Necessary Tools

Step 2.1: Implement Functions Used by the Prompt Server

Step 2.2: Implement Router Server

Step 3: Write Pipeline Configuration File

Step 4: Configure Pipeline Parameters

Step 5: Run Your Inference Pipeline!