Function

Retriever Server is the vector retrieval module in UR-2.0, integrating multiple functions from text embedding, index construction, retrieval to online model launching and model deployment. It supports an end-to-end knowledge retrieval process, suitable for fast vectorization and efficient recall of large-scale corpora.

Parameter Description

/images/yaml.svgservers/retriever/parameter.yaml
retriever_path: openbmb/MiniCPM-Embedding-Light
corpus_path: data/sample_hotpotqa_corpus_5.jsonl
embedding_path: embedding/embedding.npy
index_path: index/index.index

# infinity_emb config
infinity_kwargs:
  bettertransformer: false
  pooling_method: auto
  device: cuda
  batch_size: 1024

cuda_devices: "0,1"
query_instruction: "Query: "
faiss_use_gpu: True
top_k: 5
overwrite: false
retriever_url: http://localhost:8080
index_chunk_size: 50000

# OpenAI API configuration
use_openai: false
openai_model: "text-embedding-3-small"
api_base: ""
api_key: ""

# LanceDB configuration
lancedb_path: "lancedb/"
table_name: "vector_index"
filter_expr: null
  • retriever_path: Name/local path of the retrieval model.
  • corpus_path: Path to the corpus file (.jsonl), where each line should contain a contents field representing a document or paragraph.
  • embedding_path: Path to store embedding vectors (.npy), used for index construction or loading. If not available, embeddings can be generated by calling the retriever_embed tool and saved to this path.
  • index_path: Path to save the index (.index), used to load existing indexes or save new ones.
  • infinity_kwargs: Parameters related to the infinity library, requiring settings such as pooling_method (supports cls, mean, auto) and batch_size.
  • cuda_devices: Specifies the GPU devices to use.
  • query_instruction: Prompt prefix concatenated before the query.
  • faiss_use_gpu: Whether to enable GPU-accelerated FAISS index. If set to False, FAISS runs on CPU.
  • top_k: Number of documents returned per query.
  • overwrite: Whether to allow overwriting existing embedding/index files. Set to False to avoid overwriting already generated files.
  • retriever_url: Address and port for deploying the retriever.
  • index_chunk_size: Index chunk size to prevent CPU from loading all data at once causing OOM.

Tool Functions

  • retriever_init: Initializes and loads the retriever model, loads corpus data, and optionally loads existing indexes.
  • retriever_embed: Vectorizes the previously loaded corpus content and saves the embedding results as a .npy file for subsequent FAISS index construction.
  • retriever_index: Constructs a FAISS index based on pre-generated embedding files (.npy format) and saves it as a .index file for vector retrieval use.
  • retriever_search: Receives a set of queries, encodes them into vectors, retrieves them through the FAISS index, and returns the top-k similar text content for each query.
  • retriever_deploy_service: Starts a Flask-based vector retrieval server, deploying a /search endpoint that supports semantic retrieval via HTTP POST requests.
  • retriever_deploy_search: Acts as a client to access a remote Flask retrieval service, sending a list of queries to the specified service address, calling the /search endpoint via HTTP POST, and returning retrieval results.