Skip to content

0.2.0

Compare
Choose a tag to compare
@alpayariyak alpayariyak released this 26 Jan 04:26
· 183 commits to main since this release

Worker vLLM 0.2.0 - What's New

  • You no longer need a linux-based machine or NVIDIA GPUs to build the worker.
  • Over 3x lighter Docker image size.
  • OpenAI Chat Completion output format (optional to use).
  • Fast image build time.
  • Docker Secrets-protected Hugging Face token support for building the image with a model baked in without exposing your token.
  • Support for n and best_of sampling parameters, which allow you to generate multiple responses from a single prompt.
  • New environment variables for various configuration.
  • vLLM Version: 0.2.7