Overview
The Retriever Server is the core retrieval module in UR-2.0. It integrates model loading, text/image encoding, index construction, and search into a single component. It natively supports multiple backends—Sentence-Transformers, Infinity, and OpenAI—so it can flexibly adapt to corpora of various sizes and types, meeting the needs of large-scale embedding and efficient document recall.
Usage Example
Corpus Encoding & Indexing
The following example shows how to use the Retriever Server to encode a corpus and build an index.

examples/corpus_index.yaml
# MCP Server
servers:
retriever: servers/retriever
# MCP Client Pipeline
pipeline:
- retriever.retriever_init
- retriever.retriever_embed
- retriever.retriever_index
Build the Pipeline:
ultrarag build examples/corpus_index.yaml
Modify the parameter file as needed. Below are two common scenarios: text corpus encoding and image corpus encoding.
- Text corpus encoding
Example: use Qwen3-Embedding-0.6B to vectorize a text corpus.

examples/parameters/corpus_index_parameter.yaml
retriever:
backend: sentence_transformers # Using ST as the example backend
backend_configs:
bm25:
lang: en
infinity:
bettertransformer: false
device: cuda
model_warmup: false
pooling_method: auto
trust_remote_code: true
openai:
api_key: ''
base_url: https://api.openai.com/v1
model_name: text-embedding-3-small
sentence_transformers:
device: cuda
sentence_transformers_encode:
encode_chunk_size: 10000
normalize_embeddings: false
psg_prompt_name: document
psg_task: null
q_prompt_name: query
q_task: null
trust_remote_code: true
batch_size: 16
corpus_path: data/corpus_example.jsonl
embedding_path: embedding/embedding.npy
faiss_use_gpu: true
gpu_ids: 0,1
index_chunk_size: 50000
index_path: index/index.index
is_multimodal: false
model_name_or_path: openbmb/MiniCPM-Embedding-Light
model_name_or_path: Qwen/Qwen3-Embedding-0.6B
overwrite: false
- Image corpus encoding
Example: use jinaai/jina-embeddings-v4 to vectorize an image corpus.

examples/parameters/corpus_index_parameter.yaml
retriever:
backend: sentence_transformers
backend_configs:
bm25:
lang: en
infinity:
bettertransformer: false
device: cuda
model_warmup: false
pooling_method: auto
trust_remote_code: true
openai:
api_key: ''
base_url: https://api.openai.com/v1
model_name: text-embedding-3-small
sentence_transformers:
device: cuda
sentence_transformers_encode:
encode_chunk_size: 10000
normalize_embeddings: false
psg_prompt_name: document
psg_task: null
q_prompt_name: query
q_task: null
psg_prompt_name: null
psg_task: retrieval
q_prompt_name: query
q_task: retrieval
trust_remote_code: true
batch_size: 16
corpus_path: data/corpus_example.jsonl
corpus_path: corpora/image.jsonl
embedding_path: embedding/embedding.npy
faiss_use_gpu: true
gpu_ids: 0,1
gpu_ids: 1
index_chunk_size: 50000
index_path: index/index.index
is_multimodal: false
is_multimodal: true
model_name_or_path: openbmb/MiniCPM-Embedding-Light
model_name_or_path: jinaai/jina-embeddings-v4
overwrite: false
Run the Pipeline:
ultrarag run examples/corpus_index.yaml
Encoding and indexing often involve large corpora and can be time-consuming. It’s recommended to run the task in the background using screen or nohup, for example:
nohup ultrarag run examples/corpus_index.yaml > log.txt 2>&1 &
Vector Search
The following example shows how to use the Retriever Server to perform vector search on an existing index.

examples/corpus_search.yaml
# MCP Server
servers:
benchmark: servers/benchmark
retriever: servers/retriever
# MCP Client Pipeline
pipeline:
- benchmark.get_data
- retriever.retriever_init
- retriever.retriever_search
Build the Pipeline:
ultrarag build examples/corpus_search.yaml
Modify parameters:

examples/parameters/corpus_search_parameter.yaml
benchmark:
benchmark:
key_map:
gt_ls: golden_answers
q_ls: question
limit: -1
name: nq
path: data/sample_nq_10.jsonl
seed: 42
shuffle: false
retriever:
backend: sentence_transformers
backend_configs:
bm25:
lang: en
infinity:
bettertransformer: false
device: cuda
model_warmup: false
pooling_method: auto
trust_remote_code: true
openai:
api_key: ''
base_url: https://api.openai.com/v1
model_name: text-embedding-3-small
sentence_transformers:
device: cuda
sentence_transformers_encode:
encode_chunk_size: 10000
normalize_embeddings: false
psg_prompt_name: document
psg_task: null
q_prompt_name: query
q_task: null
trust_remote_code: true
batch_size: 16
corpus_path: data/corpus_example.jsonl
faiss_use_gpu: true
gpu_ids: 0,1
index_path: index/index.index
is_multimodal: false
model_name_or_path: openbmb/MiniCPM-Embedding-Light
model_name_or_path: Qwen/Qwen3-Embedding-0.6B
query_instruction: ''
top_k: 5
Run the Pipeline:
ultrarag run examples/corpus_search.yaml
BM25 Retrieval
In addition to dense retrieval, UR-2.0 includes the classic BM25 sparse text retrieval algorithm. BM25 is an improvement over TF–IDF and is commonly used for fast, lightweight keyword-based matching. In practice, BM25 and dense retrieval are complementary: combining them often improves both coverage and recall diversity.
Step 1: Build the BM25 index
Before searching with BM25, you need to tokenize documents and build a sparse index.

examples/bm25_index.yaml
# MCP Server
servers:
retriever: servers/retriever
# MCP Client Pipeline
pipeline:
- retriever.retriever_init
- retriever.bm25_index
Build the Pipeline:
ultrarag build examples/bm25_index.yaml
Modify parameters:

examples/parameters/bm25_index_parameter.yaml
retriever:
backend: sentence_transformers
backend: bm25
backend_configs:
bm25:
lang: en
infinity:
bettertransformer: false
device: cuda
model_warmup: false
pooling_method: auto
trust_remote_code: true
openai:
api_key: ''
base_url: https://api.openai.com/v1
model_name: text-embedding-3-small
sentence_transformers:
device: cuda
sentence_transformers_encode:
encode_chunk_size: 10000
normalize_embeddings: false
psg_prompt_name: document
psg_task: null
q_prompt_name: query
q_task: null
trust_remote_code: true
batch_size: 16
corpus_path: data/corpus_example.jsonl
faiss_use_gpu: true
gpu_ids: 0,1
index_path: index/index.index
is_multimodal: false
model_name_or_path: openbmb/MiniCPM-Embedding-Light
overwrite: false
Run:
ultrarag run examples/bm25_index.yaml
Step 2: Execute BM25 search
Once the index is built, you can perform BM25-based document retrieval.

examples/bm25_search.yaml
# MCP Server
servers:
benchmark: servers/benchmark
retriever: servers/retriever
# MCP Client Pipeline
pipeline:
- benchmark.get_data
- retriever.retriever_init
- retriever.bm25_search
Build the Pipeline:
ultrarag build examples/bm25_search.yaml
Modify parameters:

examples/parameters/bm25_search_parameter.yaml
benchmark:
benchmark:
key_map:
gt_ls: golden_answers
q_ls: question
limit: -1
name: nq
path: data/sample_nq_10.jsonl
seed: 42
shuffle: false
retriever:
backend: sentence_transformers
backend: bm25
backend_configs:
bm25:
lang: en
infinity:
bettertransformer: false
device: cuda
model_warmup: false
pooling_method: auto
trust_remote_code: true
openai:
api_key: ''
base_url: https://api.openai.com/v1
model_name: text-embedding-3-small
sentence_transformers:
device: cuda
sentence_transformers_encode:
encode_chunk_size: 10000
normalize_embeddings: false
psg_prompt_name: document
psg_task: null
q_prompt_name: query
q_task: null
trust_remote_code: true
batch_size: 16
corpus_path: data/corpus_example.jsonl
faiss_use_gpu: true
gpu_ids: 0,1
index_path: index/index.index
is_multimodal: false
model_name_or_path: openbmb/MiniCPM-Embedding-Light
top_k: 5
Run the search flow:
ultrarag run examples/bm25_search.yaml
Hybrid Retrieval
In real applications, a single retrieval method often struggles to balance recall and precision. For instance, BM25 excels at keyword matching, while dense retrieval is stronger in semantic understanding. UR-2.0 supports Hybrid Retrieval that combines sparse (BM25) and dense methods to leverage both, improving diversity and robustness.
The example below shows how to run BM25 and dense retrieval in the same Pipeline and merge results via a custom module.
You can extend this example to arbitrarily combine retrieval methods—for instance, mixing local knowledge bases with online web search, or fusing text and image multimodal retrieval—to build a more powerful hybrid retrieval Pipeline.

examples/hybrid_search.yaml
# MCP Server
servers:
benchmark: servers/benchmark
dense: servers/retriever
bm25: servers/retriever
custom: servers/custom
# MCP Client Pipeline
pipeline:
- benchmark.get_data
- dense.retriever_init
- bm25.retriever_init
- dense.retriever_search:
output:
ret_psg: dense_psg
- bm25.bm25_search:
output:
ret_psg: sparse_psg
- custom.merge_passages:
input:
ret_psg: dense_psg
temp_psg: sparse_psg
Build the Pipeline:
ultrarag build examples/hybrid_search.yaml
Modify parameters:

examples/parameters/hybrid_search_parameter.yaml
benchmark:
benchmark:
key_map:
gt_ls: golden_answers
q_ls: question
limit: -1
name: nq
path: data/sample_nq_10.jsonl
seed: 42
shuffle: false
bm25:
backend: sentence_transformers
backend: bm25
backend_configs:
bm25:
lang: en
infinity:
bettertransformer: false
device: cuda
model_warmup: false
pooling_method: auto
trust_remote_code: true
openai:
api_key: ''
base_url: https://api.openai.com/v1
model_name: text-embedding-3-small
sentence_transformers:
device: cuda
sentence_transformers_encode:
encode_chunk_size: 10000
normalize_embeddings: false
psg_prompt_name: document
psg_task: null
q_prompt_name: query
q_task: null
trust_remote_code: true
batch_size: 16
corpus_path: data/corpus_example.jsonl
faiss_use_gpu: true
gpu_ids: 0,1
index_path: index/index.index
is_multimodal: false
model_name_or_path: openbmb/MiniCPM-Embedding-Light
top_k: 5
custom: {}
dense:
backend: sentence_transformers
backend_configs:
bm25:
lang: en
infinity:
bettertransformer: false
device: cuda
model_warmup: false
pooling_method: auto
trust_remote_code: true
openai:
api_key: ''
base_url: https://api.openai.com/v1
model_name: text-embedding-3-small
sentence_transformers:
device: cuda
sentence_transformers_encode:
encode_chunk_size: 10000
normalize_embeddings: false
psg_prompt_name: document
psg_task: null
q_prompt_name: query
q_task: null
trust_remote_code: true
batch_size: 16
corpus_path: data/corpus_example.jsonl
faiss_use_gpu: true
gpu_ids: 0,1
index_path: index/index.index
index_path: index1/index.index
is_multimodal: false
model_name_or_path: openbmb/MiniCPM-Embedding-Light
model_name_or_path: Qwen/Qwen3-Embedding-0.6B
query_instruction: ''
top_k: 5
Run the hybrid retrieval Pipeline:
ultrarag run examples/hybrid_search.yaml
Deploy Retrieval Models
UR-2.0 is fully compatible with the OpenAI API interface specification, so any embedding model that conforms to this API can be integrated without extra adapters or code changes. The example below shows how to deploy a local embedding model with vLLM.
Step 1: Serve the model in the background
We recommend using screen to run in the background, so you can easily view logs and status.
Start a new screen session:
Launch the model (using Qwen3-Embedding-0.6B as an example):
CUDA_VISIBLE_DEVICES=1 python -m vllm.entrypoints.openai.api_server \
--served-model-name qwen-embedding \
--model Qwen/Qwen3-Embedding-0.6B \
--trust-remote-code \
--host 127.0.0.1 \
--port 65502 \
--task embed \
--gpu-memory-utilization 0.9
If you see output similar to the following, the service started successfully:
(APIServer pid=2270761) INFO: Started server process [2270761]
(APIServer pid=2270761) INFO: Waiting for application startup.
(APIServer pid=2270761) INFO: Application startup complete.
Press Ctrl + A + D to exit and keep the service running in the background.
To re-enter the session, run:
Step 2: Modify Pipeline Parameters
Taking the corpus_search Pipeline as an example, simply switch the retriever backend to openai and set base_url to your local vLLM service:

examples/parameters/corpus_search_parameter.yaml
benchmark:
benchmark:
key_map:
gt_ls: golden_answers
q_ls: question
limit: -1
name: nq
path: data/sample_nq_10.jsonl
seed: 42
shuffle: false
retriever:
backend: sentence_transformers
backend: openai
backend_configs:
bm25:
lang: en
infinity:
bettertransformer: false
device: cuda
model_warmup: false
pooling_method: auto
trust_remote_code: true
openai:
api_key: ''
base_url: https://api.openai.com/v1
base_url: http://127.0.0.1:65502/v1
model_name: text-embedding-3-small
model_name: qwen-embedding
sentence_transformers:
device: cuda
sentence_transformers_encode:
encode_chunk_size: 10000
normalize_embeddings: false
psg_prompt_name: document
psg_task: null
q_prompt_name: query
q_task: null
trust_remote_code: true
batch_size: 16
corpus_path: data/corpus_example.jsonl
faiss_use_gpu: true
gpu_ids: 0,1
index_path: index/index.index
is_multimodal: false
model_name_or_path: openbmb/MiniCPM-Embedding-Light
query_instruction: ''
top_k: 5
Once configured, you can run it just like regular vector retrieval.
Web Search API
UR-2.0 natively integrates three mainstream Web search APIs: Tavily, Exa, and GLM. These APIs can be directly used as Retriever Server backends to enable online information retrieval and real-time knowledge augmentation.
Step 1: Configure API Keys
Before use, set the corresponding service’s API key. You can manually export environment variables before running the Pipeline:
export TAVILY_API_KEY="your retriever key"
It is recommended to manage keys centrally using a .env configuration file. In the UR-2.0 root directory, rename the template file .env.dev to .env and fill in your key information, for example:
LLM_API_KEY=
RETRIEVER_API_KEY=
TAVILY_API_KEY=tvly-dev-yourapikeyhere
EXA_API_KEY=
ZHIPUAI_API_KEY=
UR-2.0 will automatically read this file and load the configuration at startup.
Step 2: Web Search
The following example demonstrates how to use the Tavily API for Web search:

examples/web_search.yaml
# MCP Server
servers:
benchmark: servers/benchmark
retriever: servers/retriever
# MCP Client Pipeline
pipeline:
- benchmark.get_data
- retriever.retriever_tavily_search
Build the Pipeline:
ultrarag build examples/web_search.yaml
In the generated parameter file, fill in the data path and retrieval parameters:

examples/parameters/web_search_parameter.yaml
benchmark:
benchmark:
key_map:
gt_ls: golden_answers
q_ls: question
limit: -1
name: nq
path: data/sample_nq_10.jsonl
seed: 42
shuffle: false
retriever:
retrieve_thread_num: 1
top_k: 5
Run the following command to start the Web search flow:
ultrarag run examples/web_search.yaml
You can replace retriever_tavily_search with retriever_exa_search or retriever_zhipuai_search as the Web search source.