To meet scenarios that require dynamic retrieval such as multi-turn reasoning, UltraRAG supports deploying the Retriever as an online service, allowing other workflows to call it asynchronously via HTTP.

Feature Overview

We implemented the Retriever online service deployment based on Flask, supporting asynchronous requests and remote access, which can work collaboratively with pipelines in local or distributed environments.

Deploy Online Retriever Service

The deployment process is similar to a regular pipeline, just execute a YAML configuration dedicated to service deployment:
/images/yaml.svgexamples/deploy_retriever.yaml
# MCP Server
servers:
  retriever: servers/retriever

# MCP Client Pipeline
pipeline:
- retriever.retriever_init
- retriever.retriever_deploy_service
Run the following command to generate the corresponding configuration file:
ultrarag build examples/deploy_retriever.yaml
Then edit the generated parameter configuration:
/images/yaml.svgexamples/parameter/deploy_retriever_parameter.yaml
retriever:
  corpus_path: data/sample_hotpotqa_corpus_5.jsonl
  cuda_devices: 0,1
  faiss_use_gpu: true
  index_path: index/index.index
  infinity_kwargs:
    batch_size: 1024
    bettertransformer: false
    device: cuda
    pooling_method: auto
  retriever_path: openbmb/MiniCPM-Embedding-Light
  retriever_url: http://localhost:8080
📌 Configuration Explanation:
  • retriever_url: Service listening address, format is http://<host>:<port>.
    • Recommended for local deployment: 127.0.0.1:5112
    • Recommended for remote deployment: 0.0.0.0:<port> (to support cross-host access)
It is recommended to run the service in the background mode:
# Use screen to start and detach the service
screen -S retriever
ultrarag run examples/deploy_retriever.yaml
# Press Ctrl + A + D to detach
# To resume later: screen -r retriever
Or run with nohup and write logs:
nohup ultrarag run examples/deploy_retriever.yaml > retriever.log 2>&1 &
After the service starts, it will listen on the specified port, waiting to receive retrieval requests from other workflows.

Call the Online Retriever Service

To call the online service, just use the retriever.retriever_deploy_search tool in the pipeline (no need to initialize the retriever anymore):
/images/yaml.svgexamples/retriever_deploy_search.yaml
servers:
  retriever: servers/retriever
  benchmark: servers/benchmark

# MCP Client Pipeline
pipeline:
- benchmark.get_data
- retriever.retriever_deploy_search
This allows local queries to be directly sent to the remote retrieval service and get the corresponding document results.