Nov 20, 2025
Docker Model Runner Integrates vLLM for High-Throughput Inference
New: vLLM in Docker Model Runner. High-throughput inference for safetensors models with auto engine routing for NVIDIA GPUs using Docker.
Dorin Geman,
Eric Curtin,
and
Ignasi Lopez Luna
Read now