Release 0.2.0 · runpod-workers/worker-vllm

You no longer need a linux-based machine or NVIDIA GPUs to build the worker.
Over 3x lighter Docker image size.
OpenAI Chat Completion output format (optional to use).
Fast image build time.
Docker Secrets-protected Hugging Face token support for building the image with a model baked in without exposing your token.
Support for n and best_of sampling parameters, which allow you to generate multiple responses from a single prompt.
New environment variables for various configuration.
vLLM Version: 0.2.7

Provide feedback