Skip to main content

generation_init

Signature
def generation_init(
    backend_configs: Dict[str, Any],
    sampling_params: Dict[str, Any],
    backend: str = "vllm",
) -> None
Function
  • Initializes the inference backend and sampling parameters.

generate

Signature
async def generate(
    prompt_ls: List[Union[str, Dict[str, Any]]],
    system_prompt: str = "",
) -> Dict[str, List[str]]
  • Pure text-based dialogue generation.
Output Format (JSON)
{"ans_ls": ["answer for prompt_0", "answer for prompt_1", "..."]}

multimodal_generate

Signature
async def multimodal_generate(
    multimodal_path: List[List[str]],
    prompt_ls: List[Union[str, Dict[str, Any]]],
    system_prompt: str = "",
) -> Dict[str, List[str]]
Function
  • Performs multimodal (text-image) dialogue generation.
Output Format (JSON)
{"ans_ls": ["answer with images for prompt_0", "..."]}

Parameter Configuration

https://mintcdn.com/ultrarag/T7GffHzZitf6TThi/images/yaml.svg?fit=max&auto=format&n=T7GffHzZitf6TThi&q=85&s=69b41e79144bc908039c2ee3abbb1c3bservers/generation/parameter.yaml
backend: vllm # options: vllm, openai
backend_configs:
  vllm:
    model_name_or_path: openbmb/MiniCPM4-8B
    gpu_ids: "2,3"
    gpu_memory_utilization: 0.9
    dtype: auto
    trust_remote_code: true
  openai:
    model_name: MiniCPM4-8B
    base_url: http://localhost:8000/v1
    api_key: ""
    concurrency: 8
    retries: 3
    base_delay: 1.0
  hf:
    model_name_or_path: openbmb/MiniCPM4-8B
    gpu_ids: '2,3'
    trust_remote_code: true
    batch_size: 8

sampling_params:
  temperature: 0.7
  top_p: 0.8
  max_tokens: 2048
  chat_template_kwargs:
    enable_thinking: false

system_prompt: ""
Parameter Description:
ParameterTypeDescription
backendstrSpecifies the generation backend; options: vllm, openai, or hf (Transformers).
backend_configsdictConfiguration for model and runtime environments of each backend.
sampling_paramsdictSampling parameters controlling generation diversity and length.
system_promptstrGlobal system prompt added as a system message in context.
Detailed Description of backend_configs:
BackendParameterDescription
vllmmodel_name_or_pathModel name or local path.
gpu_idsGPU IDs in use (e.g., "0,1").
gpu_memory_utilizationGPU memory usage ratio (0–1).
dtypeData type (e.g., auto, bfloat16).
trust_remote_codeWhether to trust remote custom code.
openaimodel_nameModel name for OpenAI or self-hosted API-compatible model.
base_urlAPI endpoint URL.
api_keyAPI key for authentication.
concurrencyMaximum number of concurrent requests.
retriesMaximum retry count for API requests.
base_delayBase delay time (in seconds) between retries.
hfmodel_name_or_pathModel path for Transformers backend.
gpu_idsGPU IDs (same as above).
trust_remote_codeWhether to trust remote custom code.
batch_sizeBatch size per inference.
Detailed Description of sampling_params:
ParameterTypeDescription
temperaturefloatControls randomness — higher values increase diversity.
top_pfloatNucleus sampling threshold.
max_tokensintMaximum number of generated tokens.
chat_template_kwargsdictAdditional arguments for chat templates.
enable_thinkingboolEnables chain-of-thought style reasoning output (if supported by model).