Search-o1

Introduction

Search-o1 proposes a framework that combines large-scale reasoning models with Agentic Retrieval-Augmented Generation (Agentic RAG) and Reason-in-Documents. When the model encounters a knowledge gap during the reasoning process, it actively retrieves external information, refines it, and injects the result into the reasoning chain, thereby improving the reasoning accuracy and robustness in complex tasks such as science, mathematics, and programming.

Paper link: Arxiv.

Process

In short, Search-o1 starts reasoning with the original question; once an information gap is identified, it generates a sub-question and triggers retrieval; then it refines the retrieved response, extracting key information to inject back into the reasoning process until a credible final answer is formed.

Reproduction

Write Pipeline

Based on the above logic, the following Pipeline can be written:

https://mintcdn.com/ultrarag/T7GffHzZitf6TThi/images/yaml.svg?fit=max&auto=format&n=T7GffHzZitf6TThi&q=85&s=69b41e79144bc908039c2ee3abbb1c3b

examples/search_o1.yaml

# Search-o1 Demo

# MCP Servers
servers:
  benchmark: servers/benchmark
  generation: servers/generation
  retriever: servers/retriever
  prompt: servers/prompt
  evaluation: servers/evaluation
  router: servers/router
  custom: servers/custom

# MCP Client Pipeline
pipeline:
- benchmark.get_data
- retriever.retriever_init
- generation.generation_init
- custom.search_o1_init_list
- prompt.search_o1_init
- generation.generate
- loop:
    times: 10
    steps:
    - branch:
        router:
        - router.search_o1_check
        branches:
          retrieve:
          - custom.search_o1_query_extract
          - retriever.retriever_search:
              input:
                query_list: extract_query_list
          - custom.search_o1_reasoning_extract
          - custom.search_o1_combine_list
          - prompt.search_o1_reasoning_indocument
          - generation.generate
          - custom.search_o1_extract_final_information
          - custom.search_o1_combine_final_information
          - prompt.search_o1_insert
          - generation.generate
          stop: []   
- custom.output_extract_from_boxed
- evaluation.evaluate

Compile Pipeline File

ultrarag build examples/search_o1.yaml

Modify Parameter File

examples/parameters/search_o1_parameter.yaml

benchmark:
  benchmark:
    key_map:
      gt_ls: golden_answers
      q_ls: question
    limit: -1
    name: nq
    path: data/sample_nq_10.jsonl
    seed: 42
    shuffle: false
custom: {}
evaluation:
  metrics:
  - acc
  - f1
  - em
  - coverem
  - stringem
  - rouge-1
  - rouge-2
  - rouge-l
  save_path: output/evaluate_results.json
generation:
  backend: vllm
  backend_configs:
    hf:
      batch_size: 8
      gpu_ids: 2,3
      model_name_or_path: openbmb/MiniCPM4-8B
      trust_remote_code: true
    openai:
      api_key: abc
      base_delay: 1.0
      base_url: http://localhost:8000/v1
      concurrency: 8
      model_name: MiniCPM4-8B
      retries: 3
    vllm:
      dtype: auto
      gpu_ids: 2,3
      gpu_memory_utilization: 0.9
      model_name_or_path: openbmb/MiniCPM4-8B
      model_name_or_path: Qwen/QwQ-32B
      trust_remote_code: true
  extra_params:
    chat_template_kwargs: 
      enable_thinking: false
    top_k: 20
    repetition_penalty: 1.05
    include_stop_str_in_output: true
    stop: 
    - <|im_end|>
    - <|end_search_query|>
  sampling_params:
    max_tokens: 2048
    max_tokens: 32768
    temperature: 0.7
    top_p: 0.8
  system_prompt: ''
prompt:
  searcho1_reasoning_template: prompt/search_o1_reasoning.jinja
  searcho1_refine_template: prompt/search_o1_refinement.jinja
retriever:
  backend: sentence_transformers
  backend_configs:
    bm25:
      lang: en
      save_path: index/bm25
    infinity:
      bettertransformer: false
      model_warmup: false
      pooling_method: auto
      trust_remote_code: true
    openai:
      api_key: abc
      base_url: https://api.openai.com/v1
      model_name: text-embedding-3-small
    sentence_transformers:
      sentence_transformers_encode:
        encode_chunk_size: 256
        normalize_embeddings: false
        psg_prompt_name: document
        psg_task: null
        q_prompt_name: query
        q_task: null
      trust_remote_code: true
  batch_size: 16
  collection_name: wiki
  corpus_path: data/corpus_example.jsonl
  gpu_ids: '1'
  index_backend: faiss
  index_backend_configs:
    faiss:
      index_chunk_size: 10000
      index_path: index/index.index
      index_use_gpu: true
    milvus:
      id_field_name: id
      id_max_length: 64
      index_chunk_size: 1000
      index_params:
        index_type: AUTOINDEX
        metric_type: IP
      metric_type: IP
      search_params:
        metric_type: IP
        params: {}
      text_field_name: contents
      text_max_length: 60000
      token: null
      uri: index/milvus_demo.db
      vector_field_name: vector
  is_demo: false
  is_multimodal: false
  model_name_or_path: openbmb/MiniCPM-Embedding-Light
  model_name_or_path: Qwen/Qwen3-Embedding-0.6B
  query_instruction: ''
  top_k: 5

Run Pipeline File

ultrarag run examples/search_o1.yaml

Get Started

RAG Servers

RAG Client

Develop Guide

Typical Implementation

Introduction

Process

Reproduction

Write Pipeline

Compile Pipeline File

Modify Parameter File

Run Pipeline File

Get Started

RAG Servers

RAG Client

Develop Guide

Typical Implementation

​Introduction

​Process

​Reproduction

​Write Pipeline

​Compile Pipeline File

​Modify Parameter File

​Run Pipeline File

Introduction

Process

Reproduction

Write Pipeline

Compile Pipeline File

Modify Parameter File

Run Pipeline File