> ## Documentation Index
> Fetch the complete documentation index at: https://ultrarag.openbmb.cn/llms.txt
> Use this file to discover all available pages before exploring further.

# 部署指南

本指南将指导您完成 UltraRAG UI 的全栈部署，包括生成模型（LLM）、检索模型（Embedding）以及 Milvus 向量数据库。

## 模型推理服务部署

UltraRAG UI 统一采用 OpenAI API 协议进行调用。您可以选择直接在宿主机使用 `Screen` 运行，或使用 `Docker` 容器化部署。

### 生成模型部署

以 Qwen3-32B 为例，建议使用多卡并行以保证推理速度。

**Screen (宿主机直接运行)**

1. 新建会话会话：

```shell theme={null}
screen -S llm
```

2. 启动命令：

```shell script/vllm_serve.sh theme={null}
CUDA_VISIBLE_DEVICES=0,1 python -m vllm.entrypoints.openai.api_server \
    --served-model-name qwen3-32b \
    --model Qwen/Qwen3-32B \
    --trust-remote-code \
    --host 0.0.0.0 \
    --port 65503 \
    --max-model-len 32768 \
    --gpu-memory-utilization 0.9 \
    --tensor-parallel-size 2 \
    --enforce-eager
```

出现类似以下输出，表示模型服务启动成功：

```
(APIServer pid=2811812) INFO:     Started server process [2811812]
(APIServer pid=2811812) INFO:     Waiting for application startup.
(APIServer pid=2811812) INFO:     Application startup complete.
```

3. 退出会话：按下 `Ctrl + A + D` 可退出并保持服务在后台运行。
   如需重新进入该会话，可执行：

```shell theme={null}
screen -r llm
```

**Docker (容器化部署)**

```shell theme={null}
docker run -d --gpus all \
  -e CUDA_VISIBLE_DEVICES=0,1 \
  -v /parent_dir_of_models:/workspace \
  -p 29001:65503 \
  --ipc=host \
  --name vllm_qwen \
  vllm/vllm-openai:latest \
  --served-model-name qwen3-32b \
  --model Qwen/Qwen3-32B \
  --trust-remote-code \
  --host 0.0.0.0 \
  --port 65503 \
  --max-model-len 32768 \
  --gpu-memory-utilization 0.9 \
  --tensor-parallel-size 2 \
  --enforce-eager
```

### 检索模型部署

以 Qwen3-Embedding-0.6B 为例，通常占用显存较小。

**Screen (宿主机直接运行)**

1. 新建会话：

```shell theme={null}
screen -S retriever
```

2. 启动命令：

```shell script/vllm_serve_emb.sh theme={null}
CUDA_VISIBLE_DEVICES=2 python -m vllm.entrypoints.openai.api_server \
    --served-model-name qwen-embedding \
    --model Qwen/Qwen3-Embedding-0.6B \
    --trust-remote-code \
    --host 0.0.0.0 \
    --port 65504 \
    --task embed \
    --gpu-memory-utilization 0.2
```

**Docker (容器化部署)**

```shell theme={null}
docker run -d --gpus all \
  -e CUDA_VISIBLE_DEVICES=2 \
  -v /parent_dir_of_models:/workspace \
  -p 29002:65504 \
  --ipc=host \
  --name vllm_qwen_emb \
  vllm/vllm-openai:latest \
  --served-model-name qwen-embedding \
  --model Qwen/Qwen3-Embedding-0.6B \
  --trust-remote-code \
  --host 0.0.0.0 \
  --port 65504 \
  --task embed \
  --gpu-memory-utilization 0.2
```

## 向量数据库部署 (Milvus)

Milvus 用于高效存储和检索向量数据。

**官方部署**

```shell theme={null}
# milvus单机版（docker）：https://milvus.io/docs/zh/install-overview.md#Milvus-Standalone
curl -sfL https://raw.githubusercontent.com/milvus-io/milvus/master/scripts/standalone_embed.sh -o standalone_embed.sh
bash standalone_embed.sh start
```

**自定义部署**

若需自定义端口（如防止端口冲突）或数据路径，可使用以下脚本：

```shell start_milvus.sh highlight="7,8,10" theme={null}
#!/usr/bin/env bash
set -e

CONTAINER_NAME=milvus-ultrarag
MILVUS_IMAGE=milvusdb/milvus:latest

GRPC_PORT=29901
HTTP_PORT=29902

DATA_DIR=/root/ultrarag-demo/milvus/

echo "==> Starting Milvus (standalone)"
echo "==> gRPC: ${GRPC_PORT}, HTTP: ${HTTP_PORT}"
echo "==> Data dir: ${DATA_DIR}"

mkdir -p ${DATA_DIR}
chown -R 1000:1000 ${DATA_DIR} 2>/dev/null || true

docker run -d \
  --name ${CONTAINER_NAME} \
  --restart unless-stopped \
  --security-opt seccomp:unconfined \
  -e DEPLOY_MODE=STANDALONE \
  -e ETCD_USE_EMBED=true \
  -e COMMON_STORAGETYPE=local \
  -v ${DATA_DIR}:/var/lib/milvus \
  -p ${GRPC_PORT}:19530 \
  -p ${HTTP_PORT}:9091 \
  --health-cmd="curl -f http://localhost:9091/healthz" \
  --health-interval=30s \
  --health-start-period=60s \
  --health-timeout=10s \
  --health-retries=3 \
  ${MILVUS_IMAGE} \
  milvus run standalone

echo "==> Waiting for Milvus to become healthy..."
sleep 5
docker ps | grep ${CONTAINER_NAME} || true
```

修改GRPC\_PORT、HTTP\_PORT以及DATA\_DIR，并运行以下命令进行部署：

```shell theme={null}
bash start_milvus.sh
```

部署成功后，您可以通过以下命令检查Milvus的状态：

```shell theme={null}
docker ps | grep milvus-ultrarag
```

如果一切正常，您应该能够看到Milvus容器正在运行。

<Tip>UI 配置提示：启动成功后，在 UltraRAG UI 的 `Knowledge Base` -> `Configure DB` 中填写 `GRPC_PORT` 地址（如 `tcp://127.0.0.1:29901`）。点击 Connect 显示 Connected 即代表成功。</Tip>