Skip to main content
We recorded an explanatory video for this Demo: 📺 bilibili.

What is DeepResearch?

Deep Research (also known as Agentic Deep Research) refers to an intelligent research agent where a Large Language Model (LLM) collaborates with tools (such as search, browser, code execution, memory storage, etc.) to complete complex research tasks in a closed loop of “multi-turn reasoning → retrieval → verification → fusion”. Different from single-retrieval RAG (Retrieval-Augmented Generation), Deep Research is more like a human expert’s approach — first making a plan, then constantly exploring, adjusting direction, verifying information, and finally outputting a well-structured and sourced report.

Prerequisites

In this development, we will complete the example based on the UltraRAG framework. Considering that most users may not have computing servers, we will implement the entire process on a MacBook Air (M2) to ensure the environment is lightweight and easy to reproduce.

API Preparation

  • Retrieval API: We use Tavily Web Search. You can get 1000 free calls upon initial registration.
  • LLM API: You can choose any large model service according to your habits. In this tutorial, we use gpt-5-nano as an example.

API Settings

We provide two ways to pass the API Key: environment variables and explicit parameters. Environment variables are recommended as they are safer and avoid API Key leakage in logs. In the UltraRAG root directory, rename the template file .env.dev to .env, and fill in your key information, for example:
LLM_API_KEY="your llm key"
TAVILY_API_KEY="your retriever key"
UltraRAG will automatically read this file and load relevant configurations at startup.

Pipeline Introduction

In this example, we will implement a lightweight Deep Research Pipeline. It has the following basic functions:
  • Plan Formulation: The model first formulates a solution plan based on the user’s question;
  • Sub-question Generation and Retrieval: Decompose big questions into retrievable sub-questions and call Web search tools to obtain relevant information;
  • Report Organization and Filling: Gradually improve the content of the research report;
  • Reasoning and Final Generation: After the report is completed, the model gives the final answer.
The flow chart is shown below: The pipeline is mainly divided into two stages:
  1. Initialization Phase: The model generates a plan based on the user’s question and constructs an initial report page accordingly.
  2. Iterative Filling Phase:
  • The system checks if the current report page is fully filled.
  • The criterion is: whether the string “to be filled” still exists in the page.
  • If the report is not yet complete, the model generates a new sub-question combining the user’s question, plan, and current page, and triggers Web retrieval.
  • The retrieved documents are used to update the page, then entering the next round of checking.
  • This process iterates until the page is filled.
Finally, the model generates a complete answer based on the user’s question and the final report page. The code implementation of this example is very concise, mainly relying on custom extensions of router and prompt tool. Interested users can view the source code directly. The following is the complete pipeline definition:
https://mintcdn.com/ultrarag/T7GffHzZitf6TThi/images/yaml.svg?fit=max&auto=format&n=T7GffHzZitf6TThi&q=85&s=69b41e79144bc908039c2ee3abbb1c3bexamples/webnote_websearch.yaml
# MCP Server
servers:
  benchmark: servers/benchmark
  generation: servers/generation
  retriever: servers/retriever
  prompt: servers/prompt
  router: servers/router

# MCP Client Pipeline
pipeline:
- benchmark.get_data
- generation.generation_init
- prompt.webnote_gen_plan
- generation.generate:
    output:
      ans_ls: plan_ls
- prompt.webnote_init_page
- generation.generate:
    output:
      ans_ls: page_ls
- loop:
    times: 10
    steps:
    - branch:
        router:
        - router.webnote_check_page
        branches:
          incomplete:
          - prompt.webnote_gen_subq
          - generation.generate:
              output:
                ans_ls: subq_ls
          - retriever.retriever_tavily_search:
              input:
                query_list: subq_ls
              output:
                ret_psg: psg_ls
          - prompt.webnote_fill_page
          - generation.generate:
              output:
                ans_ls: page_ls
          complete: []
- prompt.webnote_gen_answer
- generation.generate

Run

Construct Question Data

First, create a new file named sample_light_ds.jsonl under the data folder and write the questions you want to research. For example:
https://mintcdn.com/ultrarag/T7GffHzZitf6TThi/images/json.svg?fit=max&auto=format&n=T7GffHzZitf6TThi&q=85&s=81a8c440100333f3454ca984a5b0fe5adata/sample_light_ds.jsonl
{"id": 0, "question": "Introduce Teyvat Continent", "golden_answers": [], "meta_data": {}}

Construct Parameter Configuration File

Execute the following command to generate the parameter file corresponding to the pipeline:
ultrarag build examples/webnote_websearch.yaml
Modify parameters according to actual conditions, for example:
https://mintcdn.com/ultrarag/T7GffHzZitf6TThi/images/yaml.svg?fit=max&auto=format&n=T7GffHzZitf6TThi&q=85&s=69b41e79144bc908039c2ee3abbb1c3bexamples/parameter/webnote_websearch_parameter.yaml
benchmark:
  benchmark:
    key_map:
      gt_ls: golden_answers
      q_ls: question
    limit: -1
    name: nq
    path: data/sample_nq_10.jsonl
    name: ds
    path: data/sample_light_ds.jsonl
    seed: 42
    shuffle: false
generation:
  backend: vllm
  backend: openai
  backend_configs:
    hf:
      batch_size: 8
      gpu_ids: 2,3
      model_name_or_path: openbmb/MiniCPM4-8B
      trust_remote_code: true
    openai:
      api_key: abc
      base_delay: 1.0
      base_url: http://localhost:8000/v1
      base_url: https://api.openai.com/v1
      concurrency: 8
      model_name: MiniCPM4-8B
      model_name: gpt-5-nano
      retries: 3
    vllm:
      dtype: auto
      gpu_ids: 2,3
      gpu_memory_utilization: 0.9
      model_name_or_path: openbmb/MiniCPM4-8B
      trust_remote_code: true
  extra_params:
    chat_template_kwargs:
      enable_thinking: false
  sampling_params:
    max_tokens: 2048
    temperature: 0.7
    top_p: 0.8
  system_prompt: ''
prompt:
  webnote_fill_page_template: prompt/webnote_fill_page.jinja
  webnote_gen_answer_template: prompt/webnote_gen_answer.jinja
  webnote_gen_plan_template: prompt/webnote_gen_plan.jinja
  webnote_gen_subq_template: prompt/webnote_gen_subq.jinja
  webnote_init_page_template: prompt/webnote_init_page.jinja
retriever:
  retrieve_thread_num: 1
  top_k: 5

Start

Before running, don’t forget to set your API Key:
ultrarag examples/webnote_websearch.yaml
After running, you can visually view the generated content through the Case Study Viewer:
python ./script/case_study.py \
  --data output/memory_ds_light_deepresearch_20250909_152727.json   \
  --host 127.0.0.1 \
  --port 8070 \
  --title "Case Study Viewer"
This will open the result page in the browser, allowing you to intuitively analyze the execution process and generated content of the pipeline.