Skip to main content

Function

The Retriever Server is the core retrieval module in UltraRAG, integrating model loading, text encoding, index construction, and retrieval query functions. It natively supports multiple backend interfaces such as Sentence-Transformers, Infinity, and OpenAI, enabling flexible adaptation to corpora of different scales and types to meet the needs of large-scale vectorization and efficient document recall.

Usage Examples

Corpus Encoding and Indexing

The following example shows how to use the Retriever Server to perform encoding and index construction on a corpus.
https://mintcdn.com/ultrarag/T7GffHzZitf6TThi/images/yaml.svg?fit=max&auto=format&n=T7GffHzZitf6TThi&q=85&s=69b41e79144bc908039c2ee3abbb1c3bexamples/corpus_index.yaml
# MCP Server
servers:
  retriever: servers/retriever

# MCP Client Pipeline
pipeline:
- retriever.retriever_init
- retriever.retriever_embed
- retriever.retriever_index
Run the following command to compile the Pipeline:
ultrarag build examples/corpus_index.yaml
Modify the parameter file according to the actual situation. Two typical scenarios are shown below: Text Corpus Encoding and Image Corpus Encoding.
  1. Text Corpus Encoding
Example: Using Qwen3-Embedding-0.6B to vectorize text corpus.
https://mintcdn.com/ultrarag/T7GffHzZitf6TThi/images/yaml.svg?fit=max&auto=format&n=T7GffHzZitf6TThi&q=85&s=69b41e79144bc908039c2ee3abbb1c3bexamples/parameters/corpus_index_parameter.yaml
retriever:
  backend: sentence_transformers # We take st as an example here
  backend_configs:
    bm25:
      lang: en
      save_path: index/bm25
    infinity:
      bettertransformer: false
      model_warmup: false
      pooling_method: auto
      trust_remote_code: true
    openai:
      api_key: ''
      base_url: https://api.openai.com/v1
      model_name: text-embedding-3-small
    sentence_transformers:
      sentence_transformers_encode:
        encode_chunk_size: 256
        normalize_embeddings: false
        psg_prompt_name: document
        psg_task: null
        q_prompt_name: query
        q_task: null
      trust_remote_code: true
  batch_size: 16
  collection_name: wiki
  corpus_path: data/corpus_example.jsonl
  embedding_path: embedding/embedding.npy
  gpu_ids: '1'
  index_backend: faiss
  index_backend_configs:
    faiss:
      index_chunk_size: 10000
      index_path: index/index.index
      index_use_gpu: true
    milvus:
      id_field_name: id
      id_max_length: 64
      index_chunk_size: 1000
      index_params:
        index_type: AUTOINDEX
        metric_type: IP
      metric_type: IP
      search_params:
        metric_type: IP
        params: {}
      text_field_name: contents
      text_max_length: 60000
      token: null
      uri: index/milvus_demo.db
      vector_field_name: vector
  is_demo: false
  is_multimodal: false
  model_name_or_path: openbmb/MiniCPM-Embedding-Light
  model_name_or_path: Qwen/Qwen3-Embedding-0.6B
  overwrite: false
  1. Image Corpus Encoding
Example: Using jinaai/jina-embeddings-v4 to vectorize image corpus.
https://mintcdn.com/ultrarag/T7GffHzZitf6TThi/images/yaml.svg?fit=max&auto=format&n=T7GffHzZitf6TThi&q=85&s=69b41e79144bc908039c2ee3abbb1c3bexamples/parameters/corpus_index_parameter.yaml
retriever:
  backend: sentence_transformers
  backend_configs:
    bm25:
      lang: en
      save_path: index/bm25
    infinity:
      bettertransformer: false
      model_warmup: false
      pooling_method: auto
      trust_remote_code: true
    openai:
      api_key: ''
      base_url: https://api.openai.com/v1
      model_name: text-embedding-3-small
    sentence_transformers:
      sentence_transformers_encode:
        encode_chunk_size: 256
        normalize_embeddings: false
        psg_prompt_name: document
        psg_task: null
        q_prompt_name: query
        q_task: null
        psg_prompt_name: null
        psg_task: retrieval
        q_prompt_name: query
        q_task: retrieval
      trust_remote_code: true
  batch_size: 16
  collection_name: wiki
  corpus_path: data/corpus_example.jsonl
  corpus_path: corpora/image.jsonl
  embedding_path: embedding/embedding.npy
  gpu_ids: 0,1
  gpu_ids: 1
  index_backend: faiss
  index_backend_configs:
    faiss:
      index_chunk_size: 10000
      index_path: index/index.index
      index_use_gpu: true
    milvus:
      id_field_name: id
      id_max_length: 64
      index_chunk_size: 1000
      index_params:
        index_type: AUTOINDEX
        metric_type: IP
      metric_type: IP
      search_params:
        metric_type: IP
        params: {}
      text_field_name: contents
      text_max_length: 60000
      token: null
      uri: index/milvus_demo.db
      vector_field_name: vector
  is_demo: false
  is_multimodal: false
  is_multimodal: true
  model_name_or_path: openbmb/MiniCPM-Embedding-Light
  model_name_or_path: jinaai/jina-embeddings-v4
  overwrite: false
Run the following command to execute this Pipeline:
ultrarag run examples/corpus_index.yaml
The encoding and indexing phase usually involves large-scale corpus processing and takes a long time. It is recommended to use screen or nohup to mount the task to run in the background, for example:
nohup ultrarag run examples/corpus_index.yaml > log.txt 2>&1 &

Vector Retrieval

The following example shows how to use the Retriever Server to perform vector retrieval tasks on the constructed index.
https://mintcdn.com/ultrarag/T7GffHzZitf6TThi/images/yaml.svg?fit=max&auto=format&n=T7GffHzZitf6TThi&q=85&s=69b41e79144bc908039c2ee3abbb1c3bexamples/corpus_search.yaml
# MCP Server
servers:
  benchmark: servers/benchmark
  retriever: servers/retriever

# MCP Client Pipeline
pipeline:
- benchmark.get_data
- retriever.retriever_init
- retriever.retriever_search

Run the following command to compile the Pipeline:
ultrarag build examples/corpus_search.yaml
Modify parameters:
https://mintcdn.com/ultrarag/T7GffHzZitf6TThi/images/yaml.svg?fit=max&auto=format&n=T7GffHzZitf6TThi&q=85&s=69b41e79144bc908039c2ee3abbb1c3bexamples/parameters/corpus_search_parameter.yaml
benchmark:
  benchmark:
    key_map:
      gt_ls: golden_answers
      q_ls: question
    limit: -1
    name: nq
    path: data/sample_nq_10.jsonl
    seed: 42
    shuffle: false
retriever:
  backend: sentence_transformers
  backend_configs:
    bm25:
      lang: en
      save_path: index/bm25
    infinity:
      bettertransformer: false
      model_warmup: false
      pooling_method: auto
      trust_remote_code: true
    openai:
      api_key: ''
      base_url: https://api.openai.com/v1
      model_name: text-embedding-3-small
    sentence_transformers:
      sentence_transformers_encode:
        encode_chunk_size: 256
        normalize_embeddings: false
        psg_prompt_name: document
        psg_task: null
        q_prompt_name: query
        q_task: null
      trust_remote_code: true
  batch_size: 16
  collection_name: wiki
  corpus_path: data/corpus_example.jsonl
  gpu_ids: '1'
  index_backend: faiss
  index_backend_configs:
    faiss:
      index_chunk_size: 10000
      index_path: index/index.index
      index_use_gpu: true
    milvus:
      id_field_name: id
      id_max_length: 64
      index_chunk_size: 1000
      index_params:
        index_type: AUTOINDEX
        metric_type: IP
      metric_type: IP
      search_params:
        metric_type: IP
        params: {}
      text_field_name: contents
      text_max_length: 60000
      token: null
      uri: index/milvus_demo.db
      vector_field_name: vector
  is_demo: false
  is_multimodal: false
  model_name_or_path: openbmb/MiniCPM-Embedding-Light
  model_name_or_path: Qwen/Qwen3-Embedding-0.6B
  query_instruction: ''
  top_k: 5
Run Pipeline:
ultrarag run examples/corpus_search.yaml

BM25 Retrieval

In addition to vector retrieval, UltraRAG also has a built-in classic BM25 text retrieval algorithm. BM25 is a sparse retrieval method improved based on Term Frequency-Inverse Document Frequency (TF-IDF), often used for fast, lightweight text semantic matching tasks. In practical applications, BM25 can complement dense retrieval to improve retrieval coverage and recall diversity. Step 1: Build BM25 Index Before using BM25 for retrieval, you need to tokenize the document and build a sparse index.
https://mintcdn.com/ultrarag/T7GffHzZitf6TThi/images/yaml.svg?fit=max&auto=format&n=T7GffHzZitf6TThi&q=85&s=69b41e79144bc908039c2ee3abbb1c3bexamples/bm25_index.yaml
# MCP Server
servers:
  retriever: servers/retriever

# MCP Client Pipeline
pipeline:
- retriever.retriever_init
- retriever.bm25_index
Run the following command to compile the Pipeline:
ultrarag build examples/bm25_index.yaml
Modify parameters:
https://mintcdn.com/ultrarag/T7GffHzZitf6TThi/images/yaml.svg?fit=max&auto=format&n=T7GffHzZitf6TThi&q=85&s=69b41e79144bc908039c2ee3abbb1c3bexamples/parameters/bm25_index_parameter.yaml
retriever:
  backend: sentence_transformers
  backend: bm25
  backend_configs:
    bm25:
      lang: en
      save_path: index/bm25
    infinity:
      bettertransformer: false
      model_warmup: false
      pooling_method: auto
      trust_remote_code: true
    openai:
      api_key: ''
      base_url: https://api.openai.com/v1
      model_name: text-embedding-3-small
    sentence_transformers:
      sentence_transformers_encode:
        encode_chunk_size: 256
        normalize_embeddings: false
        psg_prompt_name: document
        psg_task: null
        q_prompt_name: query
        q_task: null
      trust_remote_code: true
  batch_size: 16
  collection_name: wiki
  corpus_path: data/corpus_example.jsonl
  gpu_ids: '1'
  index_backend: faiss
  index_backend_configs:
    faiss:
      index_chunk_size: 10000
      index_path: index/index.index
      index_use_gpu: true
    milvus:
      id_field_name: id
      id_max_length: 64
      index_chunk_size: 1000
      index_params:
        index_type: AUTOINDEX
        metric_type: IP
      metric_type: IP
      search_params:
        metric_type: IP
        params: {}
      text_field_name: contents
      text_max_length: 60000
      token: null
      uri: index/milvus_demo.db
      vector_field_name: vector
  is_demo: false
  is_multimodal: false
  model_name_or_path: openbmb/MiniCPM-Embedding-Light
  overwrite: false
Run:
ultrarag run examples/bm25_index.yaml
Step 2: Execute BM25 Retrieval After the index construction is completed, document retrieval based on BM25 can be performed.
https://mintcdn.com/ultrarag/T7GffHzZitf6TThi/images/yaml.svg?fit=max&auto=format&n=T7GffHzZitf6TThi&q=85&s=69b41e79144bc908039c2ee3abbb1c3bexamples/bm25_search.yaml
# MCP Server
servers:
  benchmark: servers/benchmark
  retriever: servers/retriever

# MCP Client Pipeline
pipeline:
- benchmark.get_data
- retriever.retriever_init
- retriever.bm25_search
Compile Pipeline:
ultrarag build examples/bm25_search.yaml
Modify parameters:
https://mintcdn.com/ultrarag/T7GffHzZitf6TThi/images/yaml.svg?fit=max&auto=format&n=T7GffHzZitf6TThi&q=85&s=69b41e79144bc908039c2ee3abbb1c3bexamples/parameters/bm25_search_parameter.yaml
benchmark:
  benchmark:
    key_map:
      gt_ls: golden_answers
      q_ls: question
    limit: -1
    name: nq
    path: data/sample_nq_10.jsonl
    seed: 42
    shuffle: false
retriever:
  backend: sentence_transformers
  backend: bm25
  backend_configs:
    bm25:
      lang: en
      save_path: index/bm25
    infinity:
      bettertransformer: false
      model_warmup: false
      pooling_method: auto
      trust_remote_code: true
    openai:
      api_key: ''
      base_url: https://api.openai.com/v1
      model_name: text-embedding-3-small
    sentence_transformers:
      sentence_transformers_encode:
        encode_chunk_size: 256
        normalize_embeddings: false
        psg_prompt_name: document
        psg_task: null
        q_prompt_name: query
        q_task: null
      trust_remote_code: true
  batch_size: 16
  collection_name: wiki
  corpus_path: data/corpus_example.jsonl
  gpu_ids: '1'
  index_backend: faiss
  index_backend_configs:
    faiss:
      index_chunk_size: 10000
      index_path: index/index.index
      index_use_gpu: true
    milvus:
      id_field_name: id
      id_max_length: 64
      index_chunk_size: 1000
      index_params:
        index_type: AUTOINDEX
        metric_type: IP
      metric_type: IP
      search_params:
        metric_type: IP
        params: {}
      text_field_name: contents
      text_max_length: 60000
      token: null
      uri: index/milvus_demo.db
      vector_field_name: vector
  is_demo: false
  is_multimodal: false
  model_name_or_path: openbmb/MiniCPM-Embedding-Light
  top_k: 5
Run retrieval process:
ultrarag run examples/bm25_search.yaml

Hybrid Retrieval

In practical applications, a single retrieval method is often difficult to balance recall and precision. For example, BM25 excels at keyword matching, while vector retrieval has advantages in semantic understanding. Therefore, UltraRAG supports fusing sparse retrieval (BM25) with dense retrieval (Dense Retrieval), comprehensively utilizing the advantages of both through hybrid strategies (Hybrid Retrieval) to further improve retrieval diversity and robustness. The following example demonstrates how to run BM25 and vector retrieval simultaneously in the same Pipeline, and merge results through a custom module.
You can refer to this example to flexibly extend retrieval methods into any combination, such as combining local knowledge bases with online Web retrieval, or fusing multi-modal retrieval results such as text and images, to build a more powerful hybrid retrieval Pipeline.
https://mintcdn.com/ultrarag/T7GffHzZitf6TThi/images/yaml.svg?fit=max&auto=format&n=T7GffHzZitf6TThi&q=85&s=69b41e79144bc908039c2ee3abbb1c3bexamples/hybrid_search.yaml
# MCP Server
servers:
  benchmark: servers/benchmark
  dense: servers/retriever
  bm25: servers/retriever
  custom: servers/custom

# MCP Client Pipeline
pipeline:
- benchmark.get_data
- dense.retriever_init
- bm25.retriever_init
- dense.retriever_search:
    output:
      ret_psg: dense_psg
- bm25.bm25_search:
    output:
      ret_psg: sparse_psg
- custom.merge_passages:
    input:
      ret_psg: dense_psg
      temp_psg: sparse_psg
This Pipeline involves Parameter Renaming and Module Reuse mechanisms. You can click the links to view detailed instructions.
Run the following command to compile the Pipeline:
ultrarag build examples/hybrid_search.yaml
Modify parameters:
https://mintcdn.com/ultrarag/T7GffHzZitf6TThi/images/yaml.svg?fit=max&auto=format&n=T7GffHzZitf6TThi&q=85&s=69b41e79144bc908039c2ee3abbb1c3bexamples/parameters/hybrid_search_parameter.yaml
benchmark:
  benchmark:
    key_map:
      gt_ls: golden_answers
      q_ls: question
    limit: -1
    name: nq
    path: data/sample_nq_10.jsonl
    seed: 42
    shuffle: false
bm25:
  backend: sentence_transformers
  backend: bm25
  backend_configs:
    bm25:
      lang: en
      save_path: index/bm25
    infinity:
      bettertransformer: false
      model_warmup: false
      pooling_method: auto
      trust_remote_code: true
    openai:
      api_key: ''
      base_url: https://api.openai.com/v1
      model_name: text-embedding-3-small
    sentence_transformers:
      sentence_transformers_encode:
        encode_chunk_size: 256
        normalize_embeddings: false
        psg_prompt_name: document
        psg_task: null
        q_prompt_name: query
        q_task: null
      trust_remote_code: true
  batch_size: 16
  collection_name: wiki
  corpus_path: data/corpus_example.jsonl
  gpu_ids: '1'
  index_backend: faiss
  index_backend_configs:
    faiss:
      index_chunk_size: 10000
      index_path: index/index.index
      index_use_gpu: true
    milvus:
      id_field_name: id
      id_max_length: 64
      index_chunk_size: 1000
      index_params:
        index_type: AUTOINDEX
        metric_type: IP
      metric_type: IP
      search_params:
        metric_type: IP
        params: {}
      text_field_name: contents
      text_max_length: 60000
      token: null
      uri: index/milvus_demo.db
      vector_field_name: vector
  is_demo: false
  is_multimodal: false
  model_name_or_path: openbmb/MiniCPM-Embedding-Light
  top_k: 5
custom: {}
dense:
  backend: sentence_transformers
  backend_configs:
    bm25:
      lang: en
      save_path: index/bm25
    infinity:
      bettertransformer: false
      model_warmup: false
      pooling_method: auto
      trust_remote_code: true
    openai:
      api_key: ''
      base_url: https://api.openai.com/v1
      model_name: text-embedding-3-small
    sentence_transformers:
      sentence_transformers_encode:
        encode_chunk_size: 256
        normalize_embeddings: false
        psg_prompt_name: document
        psg_task: null
        q_prompt_name: query
        q_task: null
      trust_remote_code: true
  batch_size: 16
  collection_name: wiki
  corpus_path: data/corpus_example.jsonl
  gpu_ids: '1'
  index_backend: faiss
  index_backend_configs:
    faiss:
      index_chunk_size: 10000
      index_path: index/index.index
      index_use_gpu: true
    milvus:
      id_field_name: id
      id_max_length: 64
      index_chunk_size: 1000
      index_params:
        index_type: AUTOINDEX
        metric_type: IP
      metric_type: IP
      search_params:
        metric_type: IP
        params: {}
      text_field_name: contents
      text_max_length: 60000
      token: null
      uri: index/milvus_demo.db
      vector_field_name: vector
  is_demo: false
  is_multimodal: false
  model_name_or_path: openbmb/MiniCPM-Embedding-Light
  model_name_or_path: Qwen/Qwen3-Embedding-0.6B
  query_instruction: ''
  top_k: 5
Run Hybrid Search Pipeline:
ultrarag run examples/hybrid_search.yaml

Deploy Retrieval Model

UltraRAG is fully compatible with the OpenAI API interface specification, so any Embedding model that conforms to this interface standard can be directly accessed without additional adaptation or code modification. The following example shows how to deploy a local retrieval model using vLLM. Step 1: Background Model Deployment It is recommended to use the Screen method to run in the background to view logs and status in real time. Enter a new Screen session:
screen -S retriever
Execute the following command to deploy the model (taking Qwen3-Embedding-0.6B as an example):
script/vllm_serve_emb.sh
CUDA_VISIBLE_DEVICES=2 python -m vllm.entrypoints.openai.api_server \
    --served-model-name qwen-embedding \
    --model Qwen/Qwen3-Embedding-0.6B \
    --trust-remote-code \
    --host 0.0.0.0 \
    --port 65504 \
    --task embed \
    --gpu-memory-utilization 0.2
Seeing output similar to the following indicates that the model service has started successfully:
(APIServer pid=2270761) INFO:     Started server process [2270761]
(APIServer pid=2270761) INFO:     Waiting for application startup.
(APIServer pid=2270761) INFO:     Application startup complete.
Press Ctrl + A + D to exit and keep the service running in the background. If you need to re-enter the session, execute:
screen -r retriever
Step 2: Modify Pipeline Parameters Taking corpus_search Pipeline as an example, just switch the retrieval backend to openai and point base_url to the local vLLM service:
https://mintcdn.com/ultrarag/T7GffHzZitf6TThi/images/yaml.svg?fit=max&auto=format&n=T7GffHzZitf6TThi&q=85&s=69b41e79144bc908039c2ee3abbb1c3bexamples/parameters/corpus_search_parameter.yaml
benchmark:
  benchmark:
    key_map:
      gt_ls: golden_answers
      q_ls: question
    limit: -1
    name: nq
    path: data/sample_nq_10.jsonl
    seed: 42
    shuffle: false
retriever:
  backend: sentence_transformers
  backend: openai
  backend_configs:
    bm25:
      lang: en
      save_path: index/bm25
    infinity:
      bettertransformer: false
      model_warmup: false
      pooling_method: auto
      trust_remote_code: true
    openai:
      api_key: ''
      api_key: 'abc'
      base_url: https://api.openai.com/v1
      base_url: http://127.0.0.1:65504/v1
      model_name: text-embedding-3-small
      model_name: qwen-embedding
    sentence_transformers:
      sentence_transformers_encode:
        encode_chunk_size: 256
        normalize_embeddings: false
        psg_prompt_name: document
        psg_task: null
        q_prompt_name: query
        q_task: null
      trust_remote_code: true
  batch_size: 16
  collection_name: wiki
  corpus_path: data/corpus_example.jsonl
  gpu_ids: '1'
  index_backend: faiss
  index_backend_configs:
    faiss:
      index_chunk_size: 10000
      index_path: index/index.index
      index_use_gpu: true
    milvus:
      id_field_name: id
      id_max_length: 64
      index_chunk_size: 1000
      index_params:
        index_type: AUTOINDEX
        metric_type: IP
      metric_type: IP
      search_params:
        metric_type: IP
        params: {}
      text_field_name: contents
      text_max_length: 60000
      token: null
      uri: index/milvus_demo.db
      vector_field_name: vector
  is_demo: false
  is_multimodal: false
  model_name_or_path: openbmb/MiniCPM-Embedding-Light
  query_instruction: ''
  top_k: 5

After completing the configuration, you can run it just like using ordinary vector retrieval.

Web Search API

UltraRAG natively integrates three mainstream Web retrieval APIs: Tavily, Exa, and GLM. These APIs can be directly used as the retrieval backend of the Retriever Server to achieve online information retrieval and real-time knowledge enhancement. Step 1: Configure API Key You need to set the API Key of the corresponding service before use. You can manually export environment variables before running the Pipeline:
export TAVILY_API_KEY="your retriever key"
It is recommended to use the .env configuration file for unified management: In the UltraRAG root directory, rename the template file .env.dev to .env, and fill in your key information, for example:
LLM_API_KEY=
RETRIEVER_API_KEY=
TAVILY_API_KEY=tvly-dev-yourapikeyhere
EXA_API_KEY=
ZHIPUAI_API_KEY=
UltraRAG will automatically read this file and load relevant configurations at startup. Step 2: Web Search The following example demonstrates how to use Tavily API for Web retrieval:
https://mintcdn.com/ultrarag/T7GffHzZitf6TThi/images/yaml.svg?fit=max&auto=format&n=T7GffHzZitf6TThi&q=85&s=69b41e79144bc908039c2ee3abbb1c3bexamples/web_search.yaml
# MCP Server
servers:
  benchmark: servers/benchmark
  retriever: servers/retriever

# MCP Client Pipeline
pipeline:
- benchmark.get_data
- retriever.retriever_tavily_search
Compile Pipeline:
ultrarag build examples/web_search.yaml
Fill in the data path and retrieval parameters in the automatically generated parameter file:
https://mintcdn.com/ultrarag/T7GffHzZitf6TThi/images/yaml.svg?fit=max&auto=format&n=T7GffHzZitf6TThi&q=85&s=69b41e79144bc908039c2ee3abbb1c3bexamples/parameters/web_search_parameter.yaml
benchmark:
  benchmark:
    key_map:
      gt_ls: golden_answers
      q_ls: question
    limit: -1
    name: nq
    path: data/sample_nq_10.jsonl
    seed: 42
    shuffle: false
retriever:
  retrieve_thread_num: 1
  top_k: 5
Execute the following command to start the Web retrieval process:
ultrarag run examples/web_search.yaml
You can replace retriever_tavily_search with retriever_exa_search or retriever_zhipuai_search as the Web retrieval source.

Deploy Retriever Server

When testing multiple benchmarks or model performances under the same corpus, if the retriever server is re-initialized every time, the large corpus and index will be repeatedly loaded, which is time-consuming and inefficient. Therefore, UltraRAG provides a Resident Retriever Server Deployment Script, which allows the retriever to run on the CPU or GPU for a long time, avoiding repeated loading and accelerating the experimental process. Step 1: Parameter Settings Similar to ordinary retriever server, you need to prepare the configuration file first:
https://mintcdn.com/ultrarag/T7GffHzZitf6TThi/images/json.svg?fit=max&auto=format&n=T7GffHzZitf6TThi&q=85&s=81a8c440100333f3454ca984a5b0fe5ascript/deploy_retriever_config.json
{
  "model_name_or_path": "openbmb/MiniCPM-Embedding-Light",
  "corpus_path": "data/corpus_example.jsonl",
  "collection_name": "ultrarag_embeddings",

  "backend": "sentence_transformers",
  "backend_configs": {
    "infinity": {
      "bettertransformer": false,
      "pooling_method": "auto",
      "model_warmup": false,
      "trust_remote_code": true
    },
    "sentence_transformers": {
      "trust_remote_code": true,
      "sentence_transformers_encode": {
        "normalize_embeddings": false,
        "encode_chunk_size": 10000,
        "q_prompt_name": "query",
        "psg_prompt_name": "document",
        "psg_task": null,
        "q_task": null
      }
    },
    "openai": {
      "model_name": "text-embedding-3-small",
      "base_url": "https://api.openai.com/v1",
      "api_key": ""
    },
    "bm25": {
      "lang": "en",
      "save_path": "index/bm25"
    }
  },

  "index_backend": "faiss",
  "index_backend_configs": {
    "faiss": {
      "index_use_gpu": true,
      "index_chunk_size": 50000,
      "index_path": "index/index.index"
    },
    "milvus": {
      "uri": "index/milvus_demo.db",
      "token": null,
      "id_field_name": "id",
      "vector_field_name": "vector",
      "text_field_name": "contents",
      "id_max_length": 64,
      "text_max_length": 60000,
      "metric_type": "IP",
      "index_params": {
        "index_type": "AUTOINDEX",
        "metric_type": "IP"
      },
      "search_params": {
        "metric_type": "IP",
        "params": {}
      },
      "index_chunk_size": 50000
    }
  },

  "batch_size": 16,
  "gpu_ids": "0,1",
  "is_multimodal": false,
  "is_demo": false
}
Step 2: Background Deployment It is recommended to use Screen so that the retriever can run in the background for a long time and logs can be viewed at any time. Create Screen session:
screen -S retriever
Start retriever server:
script/deploy_retriever_server.py
python ./script/deploy_retriever_server.py \
    --config_path script/deploy_retriever_config.json \
    --host 0.0.0.0 \
    --port 64501
After the Server starts, it will reside in memory without repeated loading of corpus and index. Step 3: Online Retrieval During online retrieval, there is no need to re-initialize the retriever, just specify the deployed address in the pipeline:
https://mintcdn.com/ultrarag/T7GffHzZitf6TThi/images/yaml.svg?fit=max&auto=format&n=T7GffHzZitf6TThi&q=85&s=69b41e79144bc908039c2ee3abbb1c3bexamples/deploy_corpus_search.yaml
#  Deploy Corpus Search Demo

# MCP Server
servers:
  benchmark: servers/benchmark
  retriever: servers/retriever

# MCP Client Pipeline
pipeline:
- benchmark.get_data
- retriever.retriever_deploy_search
Run the following command to compile the Pipeline:
ultrarag build examples/deploy_corpus_search.yaml
Modify parameters:
https://mintcdn.com/ultrarag/T7GffHzZitf6TThi/images/yaml.svg?fit=max&auto=format&n=T7GffHzZitf6TThi&q=85&s=69b41e79144bc908039c2ee3abbb1c3bexamples/parameters/deploy_corpus_search_parameter.yaml
benchmark:
  benchmark:
    key_map:
      gt_ls: golden_answers
      q_ls: question
    limit: -1
    name: nq
    path: data/sample_nq_10.jsonl
    seed: 42
    shuffle: false
retriever:
  query_instruction: ''
  retriever_url: http://127.0.0.1:64501
  top_k: 5
Run Pipeline:
ultrarag run examples/deploy_corpus_search.yaml