Skip to main content
We have recorded a tutorial video for this demo: 📺 bilibili.

What is DeepResearch

Deep Research (also known as Agentic Deep Research) refers to the process in which large language models (LLMs) collaborate with tools such as search engines, browsers, code executors, and memory storage systems to perform complex tasks through a closed-loop of multi-round reasoning → retrieval → verification → synthesis. Unlike single-round retrieval in traditional RAG (Retrieval-Augmented Generation), Deep Research operates more like an expert’s workflow — first formulating a plan, then continuously exploring, adjusting direction, verifying information, and finally producing a complete, well-cited report.

Preparation

In this tutorial, we will implement a lightweight Deep Research pipeline based on the UltraRAG framework.
Considering that most users may not have access to high-end servers, we demonstrate the entire process on a MacBook Air (M2), ensuring a lightweight and reproducible setup.

API Setup

  • Retrieval API: We use Tavily Web Search, which provides 1,000 free API calls upon registration.
  • LLM API: You can use any large model API of your choice. In this tutorial, we use gpt-5-nano as an example.

API Configuration

You can provide API keys in two ways: environment variables or explicit parameters.
We recommend using environment variables for better security and to avoid leaking keys in logs.
In the root directory of UR-2.0, rename the template file .env.dev to .env,
and fill in your API keys, for example:
LLM_API_KEY="your llm key"
TAVILY_API_KEY="your retriever key"
UR-2.0 will automatically load these configurations at startup.

Pipeline Overview

In this example, we will implement a lightweight Deep Research pipeline with the following core components:
  • Planning: The model first creates a step-by-step plan based on the user’s question.
  • Sub-question generation and retrieval: The main question is broken into smaller sub-questions, which are used for web retrieval via external tools.
  • Report organization and filling: The model iteratively refines and completes a structured research report.
  • Reasoning and final generation: Once the report is completed, the model generates the final answer.
Workflow diagram: The pipeline consists of two main stages:
  1. Initialization stage:
    The model generates a research plan based on the user question and initializes the report page.
  2. Iterative filling stage:
    • The system checks whether the report page is complete.
    • The completion criterion: whether the string "to be filled" still exists in the page.
    • If the report is incomplete, the model generates a sub-question based on the user query, plan, and current page, then triggers web retrieval.
    • Retrieved documents are used to update the page, after which the process repeats.
    • The iteration continues until the report is fully filled.
Finally, the model uses the user query and the completed report page to produce the final answer. The implementation is concise, mainly relying on customized router and prompt tools.
Here is the full pipeline definition:
https://mintcdn.com/ultrarag/T7GffHzZitf6TThi/images/yaml.svg?fit=max&auto=format&n=T7GffHzZitf6TThi&q=85&s=69b41e79144bc908039c2ee3abbb1c3bexamples/light_deepresearch.yaml
# MCP Server
servers:
  benchmark: servers/benchmark
  generation: servers/generation
  retriever: servers/retriever
  prompt: servers/prompt
  router: servers/router

# MCP Client Pipeline
pipeline:
- benchmark.get_data
- generation.generation_init
- prompt.webnote_gen_plan
- generation.generate:
    output:
      ans_ls: plan_ls
- prompt.webnote_init_page
- generation.generate:
    output:
      ans_ls: page_ls
- loop:
    times: 10
    steps:
    - branch:
        router:
        - router.webnote_check_page
        branches:
          incomplete:
          - prompt.webnote_gen_subq
          - generation.generate:
              output:
                ans_ls: subq_ls
          - retriever.retriever_tavily_search:
              input:
                query_list: subq_ls
              output:
                ret_psg: psg_ls
          - prompt.webnote_fill_page
          - generation.generate:
              output:
                ans_ls: page_ls
          complete: []
- prompt.webnote_gen_answer
- generation.generate

Running the Pipeline

Create Sample Question Data

Create a new file sample_light_ds.jsonl in the data folder and add your question, for example:
https://mintcdn.com/ultrarag/T7GffHzZitf6TThi/images/json.svg?fit=max&auto=format&n=T7GffHzZitf6TThi&q=85&s=81a8c440100333f3454ca984a5b0fe5adata/sample_light_ds.jsonl
{"id": 0, "question": "Introduce the continent of Teyvat", "golden_answers": [], "meta_data": {}}

Generate Parameter Configuration File

Run the following command to generate a parameter file corresponding to the pipeline:
ultrarag build examples/light_deepresearch.yaml
Modify the parameters as needed, for example:
benchmark:
  benchmark:
    key_map:
      gt_ls: golden_answers
      q_ls: question
    limit: -1
    name: ds
    path: data/sample_light_ds.jsonl
    seed: 42
    shuffle: false
generation:
  backend: openai
  backend_configs:
    openai:
      api_key: ''
      base_delay: 1.0
      base_url: https://api.openai.com/v1
      concurrency: 8
      model_name: gpt-5-nano
      retries: 3
  sampling_params:
    chat_template_kwargs:
      enable_thinking: false
    max_tokens: 2048
    temperature: 0.7
    top_p: 0.8
  system_prompt: ''
prompt:
  webnote_fill_page_template: prompt/webnote_fill_page.jinja
  webnote_gen_answer_template: prompt/webnote_gen_answer.jinja
  webnote_gen_plan_template: prompt/webnote_gen_plan.jinja
  webnote_gen_subq_template: prompt/webnote_gen_subq.jinja
  webnote_init_page_template: prompt/webnote_init_page.jinja
retriever:
  retrieve_thread_num: 1
  top_k: 5

Run the Pipeline

Before running, make sure your API keys are properly set:
ultrarag run examples/light_deepresearch.yaml
After the run is complete, you can visualize the generated content using the Case Study Viewer:
python ./script/case_study.py \
  --data output/memory_ds_light_deepresearch_20250909_152727.json   \
  --host 127.0.0.1 \
  --port 8070 \
  --title "Case Study Viewer"
You can now open the results in your browser to visually analyze the pipeline execution process and generated content.