generation_init
Signature
- Initializes inference backend and sampling parameters.
- Supports
vllm,openai,hfbackends. extra_paramscan be used to passchat_template_kwargsor other backend-specific parameters.
generate
Signature
- Plain text conversation generation.
- Automatically handles Prompt in list, supports string or OpenAI format dictionary.
multimodal_generate
Signature
- Text-image multimodal conversation generation.
multimodal_path: List of image paths corresponding to each Prompt (supports local path or URL).image_tag: If specified (e.g.,<img>), inserts image at that tag’s position in Prompt; otherwise defaults to appending to end of Prompt.
multiturn_generate
Signature
- Multi-turn conversation generation.
- Supports only single-call generation, does not handle batch Prompts.
vllm_shutdown
Signature
- Explicitly shuts down vLLM engine and releases VRAM resources.
- Valid only when using
vllmbackend.
Configuration
| Parameter | Type | Description |
|---|---|---|
backend | str | Specify generation backend, options vllm, openai, or hf (Transformers) |
backend_configs | dict | Model and runtime environment configuration for each backend |
sampling_params | dict | Sampling parameters to control generation diversity and length |
extra_params | dict | Extra parameters, e.g., chat_template_kwargs |
system_prompt | str | Global system prompt, added to context as system message |
image_tag | str | Image placeholder tag (if needed) |
backend_configs Detailed Description:
| Backend | Parameter | Description |
|---|---|---|
| vllm | model_name_or_path | Model name or path |
gpu_ids | GPU IDs used (e.g., "0,1") | |
gpu_memory_utilization | GPU memory utilization ratio (0–1) | |
dtype | Data type (e.g., auto, bfloat16) | |
trust_remote_code | Whether to trust remote code | |
| openai | model_name | OpenAI model name or self-hosted compatible model |
base_url | API base URL | |
api_key | API Key | |
concurrency | Max concurrent requests | |
retries | API retry count | |
base_delay | Base wait time for each retry (seconds) | |
| hf | model_name_or_path | Transformers model path |
gpu_ids | GPU IDs (same as above) | |
trust_remote_code | Whether to trust remote code | |
batch_size | Batch size per inference |
sampling_params Detailed Description:
| Parameter | Type | Description |
|---|---|---|
temperature | float | Controls randomness, higher means more diverse generation |
top_p | float | Nucleus sampling threshold |
max_tokens | int | Max generated tokens |