We have recorded a tutorial video for this demo: 📺 bilibili.
What is DeepResearch
Deep Research (also known as Agentic Deep Research) refers to the process in which large language models (LLMs) collaborate with tools such as search engines, browsers, code executors, and memory storage systems to perform complex tasks through a closed-loop of multi-round reasoning → retrieval → verification → synthesis. Unlike single-round retrieval in traditional RAG (Retrieval-Augmented Generation), Deep Research operates more like an expert’s workflow — first formulating a plan, then continuously exploring, adjusting direction, verifying information, and finally producing a complete, well-cited report.Preparation
In this tutorial, we will implement a lightweight Deep Research pipeline based on the UltraRAG framework.Considering that most users may not have access to high-end servers, we demonstrate the entire process on a MacBook Air (M2), ensuring a lightweight and reproducible setup.
API Setup
- Retrieval API: We use Tavily Web Search, which provides 1,000 free API calls upon registration.
- LLM API: You can use any large model API of your choice. In this tutorial, we use gpt-5-nano as an example.
API Configuration
You can provide API keys in two ways: environment variables or explicit parameters.We recommend using environment variables for better security and to avoid leaking keys in logs. In the root directory of UR-2.0, rename the template file
.env.dev to .env,and fill in your API keys, for example:
Pipeline Overview
In this example, we will implement a lightweight Deep Research pipeline with the following core components:- Planning: The model first creates a step-by-step plan based on the user’s question.
- Sub-question generation and retrieval: The main question is broken into smaller sub-questions, which are used for web retrieval via external tools.
- Report organization and filling: The model iteratively refines and completes a structured research report.
- Reasoning and final generation: Once the report is completed, the model generates the final answer.

-
Initialization stage:
The model generates a research plan based on the user question and initializes the report page. -
Iterative filling stage:
- The system checks whether the report page is complete.
- The completion criterion: whether the string
"to be filled"still exists in the page. - If the report is incomplete, the model generates a sub-question based on the user query, plan, and current page, then triggers web retrieval.
- Retrieved documents are used to update the page, after which the process repeats.
- The iteration continues until the report is fully filled.
Here is the full pipeline definition:
Running the Pipeline
Create Sample Question Data
Create a new filesample_light_ds.jsonl in the data folder and add your question, for example:
Generate Parameter Configuration File
Run the following command to generate a parameter file corresponding to the pipeline:Run the Pipeline
Before running, make sure your API keys are properly set: