Skip to main content

Introduction

Search-o1 proposes a framework that combines large-scale reasoning models with Agentic Retrieval-Augmented Generation (Agentic RAG) and Reason-in-Documents. When the model encounters a knowledge gap during reasoning, it actively retrieves external information, refines it, and injects the result back into the reasoning chain, improving accuracy and robustness in complex tasks such as science, mathematics, and programming.
Paper link: Arxiv.

Workflow

A picture is worth a thousand words. The Search-o1 workflow works as follows: the model begins by reasoning over the original question. When it detects missing or uncertain information, it generates sub-questions to trigger retrieval. Retrieved documents are then refined to reduce length, and the distilled knowledge is injected back into the reasoning chain. The model continues this process until it determines that it has sufficient information to answer the question.

Reproduction

Writing the Pipeline

Based on the logic above, the Pipeline can be defined as follows:
https://mintcdn.com/ultrarag/T7GffHzZitf6TThi/images/yaml.svg?fit=max&auto=format&n=T7GffHzZitf6TThi&q=85&s=69b41e79144bc908039c2ee3abbb1c3bexamples/search_o1.yaml
# MCP Servers
servers:
  benchmark: servers/benchmark
  generation: servers/generation
  retriever: servers/retriever
  prompt: servers/prompt
  evaluation: servers/evaluation
  router: servers/router
  custom: servers/custom

# MCP Client Pipeline
pipeline:
- benchmark.get_data
- retriever.retriever_init
- generation.generation_init
- prompt.search_o1_init
- generation.generate
- loop:
    times: 10
    steps:
    - branch:
        router:
        - router.search_o1_check
        branches:
          retrieve:
          - custom.search_o1_query_extract
          - retriever.retriever_search:
              input:
                query_list: extract_query_list
          - prompt.searcho1_reasoning_indocument
          - generation.generate
          - prompt.search_o1_insert
          - generation.generate
          stop: []   
- custom.output_extract_from_boxed
- evaluation.evaluate

Build the Pipeline file

ultrarag build examples/search_o1.yaml

Modify the parameter file

https://mintcdn.com/ultrarag/T7GffHzZitf6TThi/images/yaml.svg?fit=max&auto=format&n=T7GffHzZitf6TThi&q=85&s=69b41e79144bc908039c2ee3abbb1c3bexamples/parameters/search_o1_parameter.yaml
benchmark:
  benchmark:
    key_map:
      gt_ls: golden_answers
      q_ls: question
    limit: -1
    name: nq
    path: data/sample_nq_10.jsonl
    seed: 42
    shuffle: false
custom: {}
evaluation:
  metrics:
  - acc
  - f1
  - em
  - coverem
  - stringem
  - rouge-1
  - rouge-2
  - rouge-l
  save_path: output/evaluate_results.json
generation:
  backend: vllm
  backend_configs:
    hf:
      batch_size: 8
      gpu_ids: 2,3
      model_name_or_path: openbmb/MiniCPM4-8B
      trust_remote_code: true
    openai:
      api_key: ''
      base_delay: 1.0
      base_url: http://localhost:8000/v1
      concurrency: 8
      model_name: MiniCPM4-8B
      retries: 3
    vllm:
      dtype: auto
      gpu_ids: 4,5,6,7
      gpu_memory_utilization: 0.9
      model_name_or_path: openbmb/MiniCPM4-8B
      model_name_or_path: Qwen/QwQ-32B
      trust_remote_code: true
  sampling_params:
  chat_template_kwargs: 
      enable_thinking: false
    max_tokens: 2048
    max_tokens: 32768
    temperature: 0.7
    top_p: 0.8
    top_k: 20
    repetition_penalty: 1.05
    include_stop_str_in_output: true
    stop: [ "<|im_end|>", "<|end_search_query|>" ] 
  system_prompt: ''
prompt:
  searcho1_reasoning_template: prompt/search_o1_reasoning.jinja
  searcho1_refine_template: prompt/search_o1_refinement.jinja
retriever:
  backend: sentence_transformers
  backend_configs:
    bm25:
      lang: en
      save_path: index/bm25
    infinity:
      bettertransformer: false
      device: cuda
      model_warmup: false
      pooling_method: auto
      trust_remote_code: true
    openai:
      api_key: ''
      base_url: https://api.openai.com/v1
      model_name: text-embedding-3-small
    sentence_transformers:
      device: cuda
      sentence_transformers_encode:
        encode_chunk_size: 10000
        normalize_embeddings: false
        psg_prompt_name: document
        psg_task: null
        q_prompt_name: query
        q_task: null
      trust_remote_code: true
  batch_size: 16
  corpus_path: data/corpus_example.jsonl
  gpu_ids: 0,1
  index_backend: faiss
  index_backend_configs:
    faiss:
      index_chunk_size: 50000
      index_path: index/index.index
      index_use_gpu: true
    milvus:
      collection_name: ultrarag_embeddings
      id_field_name: id
      index_chunk_size: 50000
      index_params:
        index_type: AUTOINDEX
        metric_type: IP
      metric_type: IP
      search_params:
        metric_type: IP
        params: {}
      token: null
      uri: index/milvus_demo.db
      vector_field_name: vector
  is_multimodal: false
  model_name_or_path: openbmb/MiniCPM-Embedding-Light
  model_name_or_path: Qwen/Qwen3-Embedding-0.6B
  query_instruction: ''
  top_k: 5

Run the Pipeline

ultrarag run examples/search_o1.yaml