Merge pull request #106 from runpod-workers/up-0.5.5

update vllm version 0.5.5
runpod-workers · Aug 29, 2024 · ab40d9c · ab40d9c
2 parents 286d6ba + 3293245
commit ab40d9c
Show file tree

Hide file tree

Showing 2 changed files with 3 additions and 3 deletions.
diff --git a/Dockerfile b/Dockerfile
@@ -12,7 +12,7 @@ RUN --mount=type=cache,target=/root/.cache/pip \
     python3 -m pip install --upgrade -r /requirements.txt
 
 # Install vLLM (switching back to pip installs since issues that required building fork are fixed and space optimization is not as important since caching) and FlashInfer 
-RUN python3 -m pip install vllm==0.5.4 && \
+RUN python3 -m pip install vllm==0.5.5 && \
     python3 -m pip install flashinfer -i https://flashinfer.ai/whl/cu121/torch2.3
 
 # Setup for Option 2: Building the Image with the Model included

diff --git a/README.md b/README.md
@@ -18,9 +18,9 @@ Deploy OpenAI-Compatible Blazing-Fast LLM Endpoints powered by the [vLLM](https:
 ### 1. UI for Deploying vLLM Worker on RunPod console:
 ![Demo of Deploying vLLM Worker on RunPod console with new UI](media/ui_demo.gif)
 
-### 2. Worker vLLM `v1.2.0` with vLLM `0.5.4` now available under `stable` tags 
+### 2. Worker vLLM `v1.3.0` with vLLM `0.5.4` now available under `stable` tags 
 
-Update v1.2.0 is now available, use the image tag `runpod/worker-v1-vllm:v1.2.0stable-cuda12.1.0`.
+Update v1.3.0 is now available, use the image tag `runpod/worker-v1-vllm:v1.3.0stable-cuda12.1.0`.
 
 ### 3. OpenAI-Compatible [Embedding Worker](https://github.com/runpod-workers/worker-infinity-embedding) Released
 Deploy your own OpenAI-compatible Serverless Endpoint on RunPod with multiple embedding models and fast inference for RAG and more!