Skip to content

Commit f801e68

Browse files
authored
Merge pull request #609 from basetenlabs/bump-version-0.6.3
Release 0.6.3
2 parents 7f5ba4a + 99ba36e commit f801e68

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

45 files changed

+1664
-70
lines changed

.github/workflows/commit_new_release_to_main.yml

+2-2
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,7 @@ jobs:
1111

1212
steps:
1313
- name: Check out code
14-
uses: actions/checkout@v2
14+
uses: actions/checkout@v3
1515

1616
- name: Configure Git user as basetenbot
1717
run: |
@@ -25,7 +25,7 @@ jobs:
2525
- name: Merge release into main with priority on main changes
2626
run: |
2727
git checkout main
28-
git merge --strategy-option=ours release -m "Merge release into main prioritizing main changes"
28+
git merge --strategy-option=ours origin/release -m "Merge release into main prioritizing main changes"
2929
git push origin main
3030
env:
3131
GH_TOKEN: ${{ secrets.BASETENBOT_GITHUB_TOKEN }}
+71
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,71 @@
1+
name: integration-tests
2+
3+
on:
4+
workflow_dispatch: # Allows running from actions tab
5+
6+
concurrency:
7+
group: main-${{ github.ref_name }}
8+
cancel-in-progress: false
9+
10+
jobs:
11+
detect-version-changed:
12+
runs-on: ubuntu-20.04
13+
outputs:
14+
version_changed: ${{ steps.versions.outputs.version_changed }}
15+
new_version: ${{ steps.versions.outputs.new_version }}
16+
new_base_image_version: ${{ steps.versions.outputs.new_base_image_version }}
17+
build_base_images: ${{ steps.versions.outputs.build_base_images }}
18+
release_version: ${{ steps.versions.outputs.release_version }}
19+
is_prerelease_version: ${{ steps.versions.outputs.is_prerelease_version }}
20+
steps:
21+
- uses: actions/checkout@v3
22+
with:
23+
# We need to use a different github token because GITHUB_TOKEN cannot trigger a workflow from another
24+
token: ${{secrets.BASETENBOT_GITHUB_TOKEN}}
25+
fetch-depth: 2
26+
- uses: ./.github/actions/detect-versions/
27+
id: versions
28+
build-and-push-truss-base-images-if-needed:
29+
needs: [detect-version-changed]
30+
if: needs.detect-version-changed.outputs.build_base_images == 'true'
31+
runs-on: ubuntu-20.04
32+
strategy:
33+
matrix:
34+
python_version: ["3.8", "3.9", "3.10", "3.11"]
35+
use_gpu: ["y", "n"]
36+
job_type: ["server", "training"]
37+
steps:
38+
- name: Set up Docker Buildx
39+
uses: docker/setup-buildx-action@v1
40+
41+
- name: Login to Docker Hub
42+
uses: docker/login-action@v1
43+
with:
44+
username: ${{ secrets.DOCKERHUB_USERNAME }}
45+
password: ${{ secrets.DOCKERHUB_TOKEN }}
46+
47+
- uses: actions/checkout@v3
48+
- uses: ./.github/actions/setup-python/
49+
- run: poetry install
50+
- shell: bash
51+
run: |
52+
poetry run bin/generate_base_images.py \
53+
--use-gpu ${{ matrix.use_gpu }} \
54+
--python-version ${{ matrix.python_version }} \
55+
--job-type ${{ matrix.job_type }} \
56+
--version-tag ${{ needs.detect-version-changed.outputs.new_base_image_version }} \
57+
--skip-login --push
58+
59+
integration-tests:
60+
needs: [detect-version-changed, build-and-push-truss-base-images-if-needed]
61+
if: ${{ !failure() && !cancelled() && (needs.build-and-push-truss-base-images-if-needed.result == 'success' || needs.build-and-push-truss-base-images-if-needed.result == 'skipped') }}
62+
runs-on: ubuntu-20.04
63+
strategy:
64+
fail-fast: false
65+
matrix:
66+
split_group: ["1", "2", "3", "4", "5"]
67+
steps:
68+
- uses: actions/checkout@v3
69+
- uses: ./.github/actions/setup-python/
70+
- run: poetry install
71+
- run: poetry run pytest truss/tests -m 'integration' --splits 5 --group ${{ matrix.split_group }}
+52
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,52 @@
1+
name: Release CI
2+
3+
on:
4+
workflow_dispatch:
5+
inputs:
6+
version:
7+
description: 'Version to bump to'
8+
required: true
9+
10+
11+
12+
concurrency:
13+
group: release-utils-${{ github.head_ref || github.run_id }}
14+
cancel-in-progress: false
15+
16+
jobs:
17+
publish-to-pypi:
18+
runs-on: ubuntu-20.04
19+
steps:
20+
- name: "Git tag release"
21+
uses: actions/checkout@v3
22+
with:
23+
token: ${{secrets.BASETENBOT_GITHUB_TOKEN}}
24+
25+
- uses: ./.github/actions/setup-python/
26+
27+
- name: Tag release
28+
env:
29+
INPUT_VERSION: ${{ github.event.inputs.version }}
30+
run: |
31+
cd truss-utils
32+
poetry version $INPUT_VERSION
33+
NEW_VERSION=v$INPUT_VERSION
34+
TAG=truss-utils-$NEW_VERSION
35+
git config --global user.name "Github action"
36+
git config --global user.email "github.action@baseten.co"
37+
38+
git tag -a $TAG -m "Release truss-utils $NEW_VERSION"
39+
git push origin $TAG
40+
41+
- name: Install poetry packages
42+
working-directory: truss-utils
43+
run: poetry install --no-dev
44+
45+
- name: Build
46+
working-directory: truss-utils
47+
run: poetry build
48+
49+
- name: Publish to PyPI
50+
if: ${{ github.event_name != 'pull_request' }}
51+
working-directory: truss-utils
52+
run: poetry publish -u "${{ secrets.PYPI_USERNAME }}" -p "${{ secrets.PYPI_PASSWORD }}"

docs/examples/performance/tgi-server.mdx

+1-1
Original file line numberDiff line numberDiff line change
@@ -65,7 +65,7 @@ truss push
6565
You can invoke the model with:
6666

6767
```sh
68-
truss predict -d '{"inputs": "What is a large language model?", "parameters": {"max_new_tokens": 128}}'
68+
truss predict -d '{"inputs": "What is a large language model?", "parameters": {"max_new_tokens": 128, "sample": true}} --published'
6969
```
7070

7171
<RequestExample>

docs/examples/pre-process.mdx

+123
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,123 @@
1+
---
2+
title: Pre/post-process methods
3+
description: "Deploy a model that makes use of pre-process"
4+
---
5+
6+
Out of the box, Truss limits the amount of concurrent predicts that happen on
7+
single container. This ensures that the CPU, and for many models the GPU, do not get
8+
overloaded, and that the model can continue respond to requests in periods of high load
9+
10+
However, many models, in addition to having compute components, also have
11+
IO requirements. For example, a model that classifies images may need to download
12+
the image from a URL before it can classify it.
13+
14+
Truss provides a way to separate the IO component from the compute component, to
15+
ensure that any IO does not prevent utilization of the compute on your pod.
16+
17+
To do this, you can use the pre/post process methods on a Truss. These methods
18+
can be defined like this:
19+
20+
21+
```python
22+
class Model:
23+
def __init__: ...
24+
def load(self, **kwargs) -> None: ...
25+
def preprocess(self, request):
26+
# Include any IO logic that happens _before_ predict here
27+
...
28+
29+
def predict(self, request):
30+
# Include the actual predict here
31+
...
32+
33+
def postprocess(self, response):
34+
# Include any IO logic that happens _after_ predict here
35+
...
36+
```
37+
38+
What happens when the model is invoked is that any logic defined in the pre or post-process
39+
methods happen on a separate thread, and are not subject to the same concurrency limits as
40+
predict. So -- let's say you have a model that can handle 5 concurrent requests:
41+
42+
```config.yaml
43+
...
44+
runtime:
45+
predict_concurrency: 10
46+
...
47+
```
48+
49+
If you hit it with 10 requests, they will _all_ begin pre-processing, but then when the
50+
the 6th request is ready to begin the predict method, it will have to wait for one of the
51+
first 5 requests to finish. This ensures that the GPU is not overloaded, while also ensuring
52+
that the compute logic does not get blocked by IO, thereby ensuring that you can achieve
53+
maximum throughput.
54+
55+
<RequestExample>
56+
57+
```python model/model.py
58+
import requests
59+
from typing import Dict
60+
from PIL import Image
61+
from transformers import CLIPProcessor, CLIPModel
62+
63+
CHECKPOINT = "openai/clip-vit-base-patch32"
64+
65+
66+
class Model:
67+
"""
68+
This is simple example of using CLIP to classify images.
69+
It outputs the probability of the image being a cat or a dog.
70+
"""
71+
def __init__(self, **kwargs) -> None:
72+
self._processor = None
73+
self._model = None
74+
75+
def load(self):
76+
"""
77+
Loads the CLIP model and processor checkpoints.
78+
"""
79+
self._model = CLIPModel.from_pretrained(CHECKPOINT)
80+
self._processor = CLIPProcessor.from_pretrained(CHECKPOINT)
81+
82+
def preprocess(self, request: Dict) -> Dict:
83+
""""
84+
This method downloads the image from the url and preprocesses it.
85+
The preprocess method is used for any logic that involves IO, in this
86+
case downloading the image. It is called before the predict method
87+
in a separate thread and is not subject to the same concurrency
88+
limits as the predict method, so can be called many times in parallel.
89+
"""
90+
image = Image.open(requests.get(request.pop("url"), stream=True).raw)
91+
request["inputs"] = self._processor(
92+
text=["a photo of a cat", "a photo of a dog"],
93+
images=image,
94+
return_tensors="pt",
95+
padding=True
96+
)
97+
return request
98+
99+
def predict(self, request: Dict) -> Dict:
100+
"""
101+
This performs the actual classification. The predict method is subject to
102+
the predict concurrency constraints.
103+
"""
104+
outputs = self._model(**request["inputs"])
105+
logits_per_image = outputs.logits_per_image
106+
return logits_per_image.softmax(dim=1).tolist()
107+
108+
```
109+
110+
```yaml config.yaml
111+
model_name: clip-example
112+
requirements:
113+
- transformers==4.32.0
114+
- pillow==10.0.0
115+
- torch==2.0.1
116+
resources:
117+
cpu: "3"
118+
memory: 14Gi
119+
use_gpu: true
120+
accelerator: A10G
121+
```
122+
123+
</RequestExample>

docs/images/notebook-to-model.png

12.5 KB
Loading

docs/learn/model-serving/model-load.mdx

+7-1
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,13 @@ description: "Load an ML model into your Truss"
55

66
The other essential file in a Truss is `model/model.py`. In this file, you write a `Model` class: an interface between the ML model that you're packaging and the model server that you're running it on.
77

8-
Open `model/model.py` in your text editor.
8+
The code to load and invoke a model in a Jupyter notebook or Python script maps directly to the code used in `model/model.py`.
9+
10+
<Frame>
11+
<img src="/images/notebook-to-model.png" />
12+
</Frame>
13+
14+
We'll go line-by-line through the code. Open `model/model.py` in your text editor.
915

1016
### Import transformers
1117

docs/mint.json

+1
Original file line numberDiff line numberDiff line change
@@ -58,6 +58,7 @@
5858
"examples/private-model",
5959
"examples/system-packages",
6060
"examples/streaming",
61+
"examples/pre-process",
6162
"examples/performance/cached-weights",
6263
"examples/performance/tgi-server",
6364
"examples/performance/vllm-server"

docs/quickstart.mdx

+6
Original file line numberDiff line numberDiff line change
@@ -37,6 +37,12 @@ cd text-classification
3737

3838
One of the two essential files in a Truss is `model/model.py`. In this file, you write a `Model` class: an interface between the ML model that you're packaging and the model server that you're running it on.
3939

40+
The code to load and invoke a model in a Jupyter notebook or Python script maps directly to the code used in `model/model.py`.
41+
42+
<Frame>
43+
<img src="/images/notebook-to-model.png" />
44+
</Frame>
45+
4046
There are two member functions that you must implement in the `Model` class:
4147

4248
* `load()` loads the model onto the model server. It runs exactly once when the model server is spun up or patched.

examples/vllm-gcs/config.yaml

+19
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,19 @@
1+
build:
2+
arguments:
3+
endpoint: Completions
4+
model: gs://llama-2-7b
5+
tokenizer: hf-internal-testing/llama-tokenizer
6+
model_server: VLLM
7+
environment_variables: {}
8+
external_package_dirs: []
9+
model_metadata: {}
10+
model_name: vllm llama gcs
11+
python_version: py39
12+
requirements: []
13+
resources:
14+
accelerator: A10G
15+
cpu: 500m
16+
memory: 30Gi
17+
use_gpu: true
18+
secrets: {}
19+
system_packages: []
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
YOUR SERVICE ACCOUNT KEY

0 commit comments

Comments
 (0)