You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardexpand all lines: docs/examples/performance/cached-weights.mdx
+4-4
Original file line number
Diff line number
Diff line change
@@ -3,7 +3,7 @@ title: Deploy Llama 2 with Caching
3
3
description: "Enable fast cold starts for a model with private Hugging Face weights"
4
4
---
5
5
6
-
In this example, we will cover how you can use the `hf_cache` key in your Truss's `config.yml` to automatically bundle model weights from a private Hugging Face repo.
6
+
In this example, we will cover how you can use the `model_cache` key in your Truss's `config.yml` to automatically bundle model weights from a private Hugging Face repo.
7
7
8
8
<Tip>
9
9
Bundling model weights can significantly reduce cold start times because your instance won't waste time downloading the model weights from Hugging Face's servers.
@@ -116,10 +116,10 @@ Always pin exact versions for your Python dependencies. The ML/AI space moves fa
116
116
117
117
### Step 3: Configure Hugging Face caching
118
118
119
-
Finally, we can configure Hugging Face caching in `config.yaml` by adding the `hf_cache` key. When building the image for your Llama 2 deployment, the Llama 2 model weights will be downloaded and cached for future use.
119
+
Finally, we can configure Hugging Face caching in `config.yaml` by adding the `model_cache` key. When building the image for your Llama 2 deployment, the Llama 2 model weights will be downloaded and cached for future use.
Copy file name to clipboardexpand all lines: docs/guides/model-cache.mdx
+84-8
Original file line number
Diff line number
Diff line change
@@ -18,17 +18,20 @@ In practice, this reduces the cold start for large models to just a few seconds.
18
18
19
19
## Enabling Caching for a Model
20
20
21
-
To enable caching, simply add `hf_cache` to your `config.yml` with a valid `repo_id`. The `hf_cache` has a few key configurations:
21
+
To enable caching, simply add `model_cache` to your `config.yml` with a valid `repo_id`. The `model_cache` has a few key configurations:
22
22
-`repo_id` (required): The endpoint for your cloud bucket. Currently, we support Hugging Face and Google Cloud Storage.
23
23
-`revision`: Points to your revision. This is only relevant if you are pulling By default, it refers to `main`.
24
24
-`allow_patterns`: Only cache files that match specified patterns. Utilize Unix shell-style wildcards to denote these patterns.
25
25
-`ignore_patterns`: Conversely, you can also denote file patterns to ignore, hence streamlining the caching process.
26
26
27
-
Here is an example of a well written `hf_cache` for Stable Diffusion XL. Note how it only pulls the model weights that it needs using `allow_patterns`.
27
+
<Info>We recently renamed `hf_cache` to `model_cache`, but don't worry! If you're using `hf_cache` in any of your projects, it will automatically be aliased to `model_cache`.</Info>
28
+
29
+
30
+
Here is an example of a well written `model_cache` for Stable Diffusion XL. Note how it only pulls the model weights that it needs using `allow_patterns`.
28
31
29
32
```yaml config.yml
30
33
...
31
-
hf_cache:
34
+
model_cache:
32
35
- repo_id: madebyollin/sdxl-vae-fp16-fix
33
36
allow_patterns:
34
37
- config.json
@@ -51,7 +54,7 @@ Many Hugging Face repos have model weights in different formats (`.bin`, `.safet
51
54
There are also some additional steps depending on the cloud bucket you want to query.
52
55
53
56
### Hugging Face 🤗
54
-
For any public Hugging Face repo, you don't need to do anything else. Adding the `hf_cache` key with an appropriate `repo_id` should be enough.
57
+
For any public Hugging Face repo, you don't need to do anything else. Adding the `model_cache` key with an appropriate `repo_id` should be enough.
55
58
56
59
However, if you want to deploy a model from a gated repo like [Llama 2](https://huggingface.co/meta-llama/Llama-2-70b-chat-hf) to Baseten, there's a few steps you need to take:
57
60
<Steps>
@@ -86,15 +89,17 @@ Weights will be cached in the default Hugging Face cache directory, `~/.cache/hu
86
89
### Google Cloud Storage
87
90
Google Cloud Storage is a great alternative to Hugging Face when you have a custom model or fine-tune you want to gate, especially if you are already using GCP and care about security and compliance.
88
91
89
-
Your `hf_cache` should look something like this:
92
+
Your `model_cache` should look something like this:
90
93
91
94
```yaml config.yml
92
95
...
93
-
hf_cache:
96
+
model_cache:
94
97
repo_id: gcs://path-to-my-bucket
95
98
...
96
99
```
97
100
101
+
If you are accessing a public GCS bucket, you can ignore the following steps, but make sure you set appropriate permissions on your bucket. Users should be able to list and view all files. Otherwise, the model build will fail.
102
+
98
103
For a private GCS bucket, first export your service account key. Rename it to be `service_account.json` and add it to the `data` directory of your Truss.
99
104
100
105
Your file structure should look something like this:
@@ -111,9 +116,80 @@ your-truss
111
116
If you are using version control, like git, for your Truss, make sure to add `service_account.json` to your `.gitignore` file. You don't want to accidentally expose your service account key.
112
117
</Warning>
113
118
114
-
Weights will be cached at `/app/hf_cache/{your_bucket_name}`.
119
+
Weights will be cached at `/app/model_cache/{your_bucket_name}`.
120
+
121
+
122
+
### Amazon Web Services S3
123
+
124
+
Another popular cloud storage option for hosting model weights is AWS S3, especially if you're already using AWS services.
125
+
126
+
Your `model_cache` should look something like this:
127
+
128
+
```yaml config.yml
129
+
...
130
+
model_cache:
131
+
repo_id: s3://path-to-my-bucket
132
+
...
133
+
```
134
+
135
+
If you are accessing a public GCS bucket, you can ignore the subsequent steps, but make sure you set an appropriate appropriate policy on your bucket. Users should be able to list and view all files. Otherwise, the model build will fail.
136
+
137
+
However, for a private S3 bucket, you need to first find your `aws_access_key_id`, `aws_secret_access_key`, and `aws_region` in your AWS dashboard. Create a file named `s3_credentials.json`. Inside this file, add the credentials that you identified earlier as shown below. Place this file into the `data` directory of your Truss.
138
+
The key `aws_session_token` can be included, but is optional.
139
+
140
+
Here is an example of how your `s3_credentials.json` file should look:
If you are using version control, like git, for your Truss, make sure to add `s3_credentials.json` to your `.gitignore` file. You don't want to accidentally expose your service account key.
188
+
</Warning>
189
+
190
+
Weights will be cached at `/app/model_cache/{your_bucket_name}`.
115
191
116
192
117
193
### Other Buckets
118
194
119
-
We're currently workign on adding support for more bucket types, including AWS S3. If you have any suggestions, please [leave an issue](https://github.com/basetenlabs/truss/issues) on our GitHub repo.
195
+
We can work with you to support additional bucket types if needed. If you have any suggestions, please [leave an issue](https://github.com/basetenlabs/truss/issues) on our GitHub repo.
0 commit comments