str
): Name of the model loaded by vLLM, used for --served-model-name
str
): Path to the model for vLLM to loadstr
): Address of the LLM service deployed via vLLMint
): Port on which the vLLM service listensstr | int
): Visible GPUs, e.g., "0,1"
str
): If provided, enables vLLM API authentication via --api-key
Dict[str, Any]
): Sampling parameters for LLMs; see vLLM SamplingParams documentationinitialize_local_vllm
base_url
(e.g., http://localhost:{port}/v1
).
str
): Path to the model for vLLM to loadstr
): Name of the model loaded by vLLM, used for --served-model-name
int
): Port on which the vLLM service listensstr | int
): Visible GPUs, e.g., "0,1"
str
): If provided, enables vLLM API authentication via --api-key
Dict[str, str]
): The address of the vLLM service, e.g., "http://localhost:<port>/v1"
generate
List[Union[str, Dict[str, Any]]]
): Inputs for the large language modelstr
): The model
field for chat.completions.create
; must match vLLM --served-model-name
str
): Base URL of the OpenAI-compatible service (e.g., http://localhost:8000/v1
)Dict[str, Any]
): Sampling parameters passed to chat.completions.create
, such as temperature
, max_tokens
, top_p
, n
, stop
, etc.Dict[str, List[str]]
): Model-generated results