You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardexpand all lines: pytorch/serving/README.md
-60
Original file line number
Diff line number
Diff line change
@@ -103,66 +103,6 @@ As demonstrated in the above example, models must be registered before they can
103
103
}
104
104
```
105
105
106
-
### Simple MaaS on K8s
107
-
108
-
Using the provided [helm chart](../charts/inference) your model can scale to multiple nodes in Kubernetes (K8s). Once you have set your `KUBECONFIG` environment variable and can access your cluster, use the below instructions to deploy your model as a service.
2. (Optional) Push TorchServe Image to a Private Registry
119
-
120
-
If you added layers to an existing torchserve container image in a [previous step](#test-model), use `docker push` to add that image to a private registry that your cluster can access.
121
-
122
-
3. Set up Model Storage
123
-
124
-
Your model archive file will no longer be accessible from your local environment, so it needs to be added to a [PVC](https://kubernetes.io/docs/tasks/configure-pod-container/configure-persistent-volume-storage/) using a network storage solution like [NFS](https://kubernetes.io/docs/concepts/storage/volumes/#nfs).
125
-
126
-
4. Install TorchServe Chart
127
-
128
-
Using the provided [Chart README](../charts/inference/README.md) set the variables found in the table to match the expected model storage, cluster type, and model configuration for your service. The example below assumes that a PVC has been created with the squeezenet model found in the root directory of the volume.
By default the service is a `NodePort` service, and is accessible from the ip address of any node in your cluster. Find a node ip with `kubectl get node -o wide` and attempt to communicate with service using the command below:
144
-
145
-
```bash
146
-
curl -X GET http://<your-node-ip>:30000/ping
147
-
curl -X GET http://<your-node-ip>:30001/models
148
-
```
149
-
150
-
> [!NOTE]
151
-
> If you are under a network proxy, you may need to unset your `http_proxy` and `no_proxy` to communicate with the nodes in your cluster with `curl`.
152
-
153
-
#### Next Steps
154
-
155
-
There are some additional steps that can be taken to prepare your service for your users:
156
-
157
-
- Enable [Autoscaling](https://github.com/pytorch/serve/blob/master/kubernetes/autoscale.md#autoscaler) via Prometheus
- Export an [INT8 Model for IPEX](https://github.com/pytorch/serve/blob/f7ae6f8281ac6e26404a6ae4d210535c9dc96d9a/examples/intel_extension_for_pytorch/README.md#creating-and-exporting-int8-model-for-intel-extension-for-pytorch)
162
-
- Integrate an [Ingress](https://kubernetes.io/docs/concepts/services-networking/ingress/) to your service to serve to a hostname rather than an ip address.
- Export an [INT8 Model for IPEX](https://github.com/pytorch/serve/blob/f7ae6f8281ac6e26404a6ae4d210535c9dc96d9a/examples/intel_extension_for_pytorch/README.md#creating-and-exporting-int8-model-for-intel-extension-for-pytorch)
33
+
- Integrate an [Ingress](https://kubernetes.io/docs/concepts/services-networking/ingress/) to your service to serve to a hostname rather than an ip address.
- Export an [INT8 Model for IPEX](https://github.com/pytorch/serve/blob/f7ae6f8281ac6e26404a6ae4d210535c9dc96d9a/examples/intel_extension_for_pytorch/README.md#creating-and-exporting-int8-model-for-intel-extension-for-pytorch)
20
+
- Integrate an [Ingress](https://kubernetes.io/docs/concepts/services-networking/ingress/) to your service to serve to a hostname rather than an ip address.
0 commit comments