Overview
The Generation Server is the core module in UR-2.0 responsible for invoking and deploying Large Language Models (LLMs). It receives input prompts constructed by the Prompt Server and generates the corresponding outputs. This module supports both text generation and image-text multimodal generation, making it adaptable to diverse task scenarios such as question answering, reasoning, summarization, and visual question answering. The Generation Server natively supports the following popular backends:vLLM, HuggingFace, and OpenAI.
Usage Example
Text Generation
The following example demonstrates how to use the Generation Server to perform a basic text generation task. The workflow first constructs the prompt using the Prompt Server, then calls the LLM to generate responses, followed by result extraction and evaluation.Multimodal Reasoning
In multimodal scenarios, the Generation Server can process not only textual input but also visual information such as images, enabling more complex reasoning tasks. The following example illustrates how to achieve this. First, prepare a sample dataset (including image paths):multimodal_path field in the get_data function of the Benchmark Server to specify the image input path.
For instructions on adding new fields, refer to Adding Additional Dataset Fields.
Model Deployment
UR-2.0 is fully compatible with the OpenAI API specification, allowing any model conforming to this standard to be integrated directly—no extra adaptation or code modification required. The following example shows how to deploy a local model using vLLM. Step 1: Run the Model in the Background We recommend using screen to run it in the background for real-time log and status monitoring. Start a new screen session:script/vllm_serve_emb.sh