Re-submit Intel TorchServe Chart (#103)

Tyler Titsworth · web-flow · commit 5d4ed87b68fe · 2024-06-07T09:34:05.000-07:00
Signed-off-by: Tyler Titsworth &lt;tyler.titsworth@intel.com&gt;
diff --git a/pytorch/serving/README.md b/pytorch/serving/README.md
@@ -103,66 +103,6 @@ As demonstrated in the above example, models must be registered before they can
     }
     ```
 
-### Simple MaaS on K8s
-
-Using the provided [helm chart](../charts/inference) your model can scale to multiple nodes in Kubernetes (K8s). Once you have set your `KUBECONFIG` environment variable and can access your cluster, use the below instructions to deploy your model as a service.
-
-1. Install [Helm](https://helm.sh/docs/intro/install/)
-
-    ```bash
-    curl -fsSL -o get_helm.sh https://raw.githubusercontent.com/helm/helm/main/scripts/get-helm-3 && \
-    chmod 700 get_helm.sh && \
-    ./get_helm.sh
-    ```
-
-2. (Optional) Push TorchServe Image to a Private Registry
-
-    If you added layers to an existing torchserve container image in a [previous step](#test-model), use `docker push` to add that image to a private registry that your cluster can access.
-
-3. Set up Model Storage
-
-    Your model archive file will no longer be accessible from your local environment, so it needs to be added to a [PVC](https://kubernetes.io/docs/tasks/configure-pod-container/configure-persistent-volume-storage/) using a network storage solution like [NFS](https://kubernetes.io/docs/concepts/storage/volumes/#nfs).
-
-4. Install TorchServe Chart
-
-    Using the provided [Chart README](../charts/inference/README.md) set the variables found in the table to match the expected model storage, cluster type, and model configuration for your service. The example below assumes that a PVC has been created with the squeezenet model found in the root directory of the volume.
-
-    ```bash
-    helm install \
-        --namespace=<namespace> \
-        --set deploy.image=intel/intel-optimized-pytorch:2.2.0-serving-cpu \
-        --set deploy.models='squeezenet=squeezenet1_1.mar' \
-        --set deploy.storage.pvc.enable=true \
-        --set deploy.storage.pvc.claimName=squeezenet \
-        ipex-serving \
-        ../charts/inference
-    ```
-
-5. Test Service
-
-    By default the service is a `NodePort` service, and is accessible from the ip address of any node in your cluster. Find a node ip with `kubectl get node -o wide` and attempt to communicate with service using the command below:
-
-    ```bash
-    curl -X GET http://<your-node-ip>:30000/ping
-    curl -X GET http://<your-node-ip>:30001/models
-    ```
-
-> [!NOTE]
-> If you are under a network proxy, you may need to unset your `http_proxy` and `no_proxy` to communicate with the nodes in your cluster with `curl`.
-
-#### Next Steps
-
-There are some additional steps that can be taken to prepare your service for your users:
-
-- Enable [Autoscaling](https://github.com/pytorch/serve/blob/master/kubernetes/autoscale.md#autoscaler) via Prometheus
-- Enable [Intel GPU](https://github.com/intel/intel-device-plugins-for-kubernetes/blob/main/cmd/gpu_plugin/README.md#install-to-nodes-with-intel-gpus-with-fractional-resources)
-- Enable [Metrics](https://pytorch.org/serve/metrics.html) and [Metrics API](https://pytorch.org/serve/metrics_api.html).
-- Enable [Profiling](https://github.com/pytorch/serve/blob/master/docs/performance_guide.md#profiling).
-- Export an [INT8 Model for IPEX](https://github.com/pytorch/serve/blob/f7ae6f8281ac6e26404a6ae4d210535c9dc96d9a/examples/intel_extension_for_pytorch/README.md#creating-and-exporting-int8-model-for-intel-extension-for-pytorch)
-- Integrate an [Ingress](https://kubernetes.io/docs/concepts/services-networking/ingress/) to your service to serve to a hostname rather than an ip address.
-- Integrate [MLFlow](https://github.com/mlflow/mlflow-torchserve).
-- Integrate an [SSL Certificate](https://pytorch.org/serve/configuration.html#enable-ssl) in your model config file to serve models securely.
-
 ### KServe
 
 Apply Intel Optimizations to KServe by patching the serving runtimes to use Intel Optimized Serving Containers with `kubectl apply -f patch.yaml`
diff --git a/workflows/charts/torchserve/.helmignore b/workflows/charts/torchserve/.helmignore
@@ -0,0 +1,23 @@
+# Patterns to ignore when building packages.
+# This supports shell glob matching, relative path matching, and
+# negation (prefixed with !). Only one pattern per line.
+.DS_Store
+# Common VCS dirs
+.git/
+.gitignore
+.bzr/
+.bzrignore
+.hg/
+.hgignore
+.svn/
+# Common backup files
+*.swp
+*.bak
+*.tmp
+*.orig
+*~
+# Various IDEs
+.project
+.idea/
+*.tmproj
+.vscode/
diff --git a/workflows/charts/torchserve/Chart.yaml b/workflows/charts/torchserve/Chart.yaml
@@ -0,0 +1,42 @@
+# Copyright (c) 2024 Intel Corporation
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+apiVersion: v2
+name: intel-torchserve
+description: Intel TorchServe is a performant, flexible and easy to use tool for serving PyTorch models in production.
+
+# A chart can be either an 'application' or a 'library' chart.
+#
+# Application charts are a collection of templates that can be packaged into versioned archives
+# to be deployed.
+#
+# Library charts provide useful utilities or functions for the chart developer. They're included as
+# a dependency of application charts to inject those utilities and functions into the rendering
+# pipeline. Library charts do not define any templates and therefore cannot be deployed.
+maintainers:
+  - name: tylertitsworth
+    email: tyler.titsworth@intel.com
+    url: https://github.com/tylertitsworth
+type: application
+
+# This is the chart version. This version number should be incremented each time you make changes
+# to the chart and its templates, including the app version.
+# Versions are expected to follow Semantic Versioning (https://semver.org/)
+version: 0.1.0
+
+# This is the version number of the application being deployed. This version number should be
+# incremented each time you make changes to the application. Versions are not expected to
+# follow Semantic Versioning. They should reflect the version the application is using.
+# It is recommended to use it with quotes.
+appVersion: "1.16.0"
diff --git a/workflows/charts/torchserve/README.md b/workflows/charts/torchserve/README.md
@@ -0,0 +1,38 @@
+# Intel TorchServe
+
+Intel TorchServe is a performant, flexible and easy to use tool for serving PyTorch models in production.
+
+![Version: 0.1.0](https://img.shields.io/badge/Version-0.1.0-informational?style=flat-square) ![Type: application](https://img.shields.io/badge/Type-application-informational?style=flat-square) ![AppVersion: 1.16.0](https://img.shields.io/badge/AppVersion-1.16.0-informational?style=flat-square)
+
+## Values
+
+| Key | Type | Default | Description |
+|-----|------|---------|-------------|
+| deploy.env | object | `{"configMapName":"intel-proxy-config","enabled":true}` | Add Environment mapping |
+| deploy.image | string | `"intel/intel-optimized-pytorch:2.3.0-serving-cpu"` | Intel Optimized torchserve image |
+| deploy.modelConfig | string | `"/home/model-server/config.properties"` | Model Server Configuration file location |
+| deploy.models | string | `"all"` | Models to be loaded |
+| deploy.replicas | int | `1` | Number of pods |
+| deploy.resources.limits | object | `{"cpu":"4000m","memory":"1Gi"}` | Maximum resources per pod |
+| deploy.resources.requests | object | `{"cpu":"1000m","memory":"512Mi"}` | Minimum resources per pod |
+| deploy.storage.nfs | object | `{"enabled":false,"path":"nil","readOnly":true,"server":"nil","subPath":"nil"}` | Network File System (NFS) storage for models |
+| fullnameOverride | string | `""` | Full qualified Domain Name |
+| nameOverride | string | `""` | Name of the serving service |
+| pvc.size | string | `"1Gi"` | Size of the storage |
+| service.type | string | `"NodePort"` | Type of service |
+
+## Next Steps
+
+There are some additional steps that can be taken to prepare your service for your users:
+
+- Enable [Autoscaling](https://github.com/pytorch/serve/blob/master/kubernetes/autoscale.md#autoscaler) via Prometheus
+- Enable [Intel GPU](https://github.com/intel/intel-device-plugins-for-kubernetes/blob/main/cmd/gpu_plugin/README.md#install-to-nodes-with-intel-gpus-with-fractional-resources)
+- Enable [Metrics](https://pytorch.org/serve/metrics.html) and [Metrics API](https://pytorch.org/serve/metrics_api.html).
+- Enable [Profiling](https://github.com/pytorch/serve/blob/master/docs/performance_guide.md#profiling).
+- Export an [INT8 Model for IPEX](https://github.com/pytorch/serve/blob/f7ae6f8281ac6e26404a6ae4d210535c9dc96d9a/examples/intel_extension_for_pytorch/README.md#creating-and-exporting-int8-model-for-intel-extension-for-pytorch)
+- Integrate an [Ingress](https://kubernetes.io/docs/concepts/services-networking/ingress/) to your service to serve to a hostname rather than an ip address.
+- Integrate [MLFlow](https://github.com/mlflow/mlflow-torchserve).
+- Integrate an [SSL Certificate](https://pytorch.org/serve/configuration.html#enable-ssl) in your model config file to serve models securely.
+
+----------------------------------------------
+Autogenerated from chart metadata using [helm-docs v1.13.1](https://github.com/norwoodj/helm-docs/releases/v1.13.1)
diff --git a/workflows/charts/torchserve/README.md.gotmpl b/workflows/charts/torchserve/README.md.gotmpl
@@ -0,0 +1,24 @@
+# Intel TorchServe
+
+{{ template "chart.description" . }}
+
+{{ template "chart.versionBadge" . }}{{ template "chart.typeBadge" . }}{{ template "chart.appVersionBadge" . }}
+
+{{ template "chart.requirementsSection" . }}
+
+{{ template "chart.valuesSection" . }}
+
+## Next Steps
+
+There are some additional steps that can be taken to prepare your service for your users:
+
+- Enable [Autoscaling](https://github.com/pytorch/serve/blob/master/kubernetes/autoscale.md#autoscaler) via Prometheus
+- Enable [Intel GPU](https://github.com/intel/intel-device-plugins-for-kubernetes/blob/main/cmd/gpu_plugin/README.md#install-to-nodes-with-intel-gpus-with-fractional-resources)
+- Enable [Metrics](https://pytorch.org/serve/metrics.html) and [Metrics API](https://pytorch.org/serve/metrics_api.html).
+- Enable [Profiling](https://github.com/pytorch/serve/blob/master/docs/performance_guide.md#profiling).
+- Export an [INT8 Model for IPEX](https://github.com/pytorch/serve/blob/f7ae6f8281ac6e26404a6ae4d210535c9dc96d9a/examples/intel_extension_for_pytorch/README.md#creating-and-exporting-int8-model-for-intel-extension-for-pytorch)
+- Integrate an [Ingress](https://kubernetes.io/docs/concepts/services-networking/ingress/) to your service to serve to a hostname rather than an ip address.
+- Integrate [MLFlow](https://github.com/mlflow/mlflow-torchserve).
+- Integrate an [SSL Certificate](https://pytorch.org/serve/configuration.html#enable-ssl) in your model config file to serve models securely.
+
+{{ template "helm-docs.versionFooter" . }}
diff --git a/workflows/charts/torchserve/templates/NOTES.txt b/workflows/charts/torchserve/templates/NOTES.txt
@@ -0,0 +1,16 @@
+1. Get the application URL by running these commands:
+{{- if contains "NodePort" .Values.service.type }}
+  export NODE_PORT=$(kubectl get --namespace {{ .Release.Namespace }} -o jsonpath="{.spec.ports[0].nodePort}" services {{ include "torchserve.fullname" . }})
+  export NODE_IP=$(kubectl get nodes --namespace {{ .Release.Namespace }} -o jsonpath="{.items[0].status.addresses[0].address}")
+  echo http://$NODE_IP:$NODE_PORT
+{{- else if contains "LoadBalancer" .Values.service.type }}
+     NOTE: It may take a few minutes for the LoadBalancer IP to be available.
+           You can watch its status by running 'kubectl get --namespace {{ .Release.Namespace }} svc -w {{ include "torchserve.fullname" . }}'
+  export SERVICE_IP=$(kubectl get svc --namespace {{ .Release.Namespace }} {{ include "torchserve.fullname" . }} --template "{{"{{ range (index .status.loadBalancer.ingress 0) }}{{.}}{{ end }}"}}")
+  echo http://$SERVICE_IP:30000/ping
+{{- else if contains "ClusterIP" .Values.service.type }}
+  export POD_NAME=$(kubectl get pods --namespace {{ .Release.Namespace }} -l "app.kubernetes.io/name={{ include "torchserve.name" . }},app.kubernetes.io/instance={{ .Release.Name }}" -o jsonpath="{.items[0].metadata.name}")
+  export CONTAINER_PORT=$(kubectl get pod --namespace {{ .Release.Namespace }} $POD_NAME -o jsonpath="{.spec.containers[0].ports[0].containerPort}")
+  echo "Visit http://127.0.0.1:8080 to use your application"
+  kubectl --namespace {{ .Release.Namespace }} port-forward $POD_NAME 8080:$CONTAINER_PORT
+{{- end }}
diff --git a/workflows/charts/torchserve/templates/_helpers.tpl b/workflows/charts/torchserve/templates/_helpers.tpl
@@ -0,0 +1,62 @@
+{{/*
+Expand the name of the chart.
+*/}}
+{{- define "torchserve.name" -}}
+{{- default .Chart.Name .Values.nameOverride | trunc 63 | trimSuffix "-" }}
+{{- end }}
+
+{{/*
+Create a default fully qualified app name.
+We truncate at 63 chars because some Kubernetes name fields are limited to this (by the DNS naming spec).
+If release name contains chart name it will be used as a full name.
+*/}}
+{{- define "torchserve.fullname" -}}
+{{- if .Values.fullnameOverride }}
+{{- .Values.fullnameOverride | trunc 63 | trimSuffix "-" }}
+{{- else }}
+{{- $name := default .Chart.Name .Values.nameOverride }}
+{{- if contains $name .Release.Name }}
+{{- .Release.Name | trunc 63 | trimSuffix "-" }}
+{{- else }}
+{{- printf "%s-%s" .Release.Name $name | trunc 63 | trimSuffix "-" }}
+{{- end }}
+{{- end }}
+{{- end }}
+
+{{/*
+Create chart name and version as used by the chart label.
+*/}}
+{{- define "torchserve.chart" -}}
+{{- printf "%s-%s" .Chart.Name .Chart.Version | replace "+" "_" | trunc 63 | trimSuffix "-" }}
+{{- end }}
+
+{{/*
+Common labels
+*/}}
+{{- define "torchserve.labels" -}}
+helm.sh/chart: {{ include "torchserve.chart" . }}
+{{ include "torchserve.selectorLabels" . }}
+{{- if .Chart.AppVersion }}
+app.kubernetes.io/version: {{ .Chart.AppVersion | quote }}
+{{- end }}
+app.kubernetes.io/managed-by: {{ .Release.Service }}
+{{- end }}
+
+{{/*
+Selector labels
+*/}}
+{{- define "torchserve.selectorLabels" -}}
+app.kubernetes.io/name: {{ include "torchserve.name" . }}
+app.kubernetes.io/instance: {{ .Release.Name }}
+{{- end }}
+
+{{/*
+Create the name of the service account to use
+*/}}
+{{- define "torchserve.serviceAccountName" -}}
+{{- if .Values.serviceAccount.create }}
+{{- default (include "torchserve.fullname" .) .Values.serviceAccount.name }}
+{{- else }}
+{{- default "default" .Values.serviceAccount.name }}
+{{- end }}
+{{- end }}
diff --git a/workflows/charts/torchserve/templates/deploy.yaml b/workflows/charts/torchserve/templates/deploy.yaml
@@ -0,0 +1,92 @@
+# Copyright (c) 2024 Intel Corporation
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+apiVersion: apps/v1
+kind: Deployment
+metadata:
+  name: {{ include "torchserve.fullname" . }}
+  labels:
+    {{- include "torchserve.labels" . | nindent 4 }}
+spec:
+  replicas: {{ .Values.deploy.replicas }}
+  selector:
+    matchLabels:
+      {{- include "torchserve.selectorLabels" . | nindent 6 }}
+  template:
+    metadata:
+      labels:
+        {{- include "torchserve.selectorLabels" . | nindent 8 }}
+    spec:
+      containers:
+        - name: torchserve
+          image: {{ .Values.deploy.image }}
+          args:
+            - 'torchserve'
+            - '--start'
+            - '--ts-config'
+            - {{ .Values.deploy.modelConfig }}
+            - '--model-store'
+            - 'model-store'
+            - '--workflow-store'
+            - 'model-store'
+            - '--models'
+            - {{ .Values.deploy.models }}
+          {{- if eq .Values.deploy.env.enabled true }}
+          envFrom:
+          - configMapRef:
+              name: {{ .Values.deploy.env.configMapName }}
+          {{- end }}
+          ports:
+            - name: rest-1
+              containerPort: 8080
+            - name: rest-2
+              containerPort: 8081
+            - name: rest-3
+              containerPort: 8082
+            - name: grpc-1
+              containerPort: 7070
+            - name: grpc-2
+              containerPort: 7071
+          volumeMounts:
+          {{- if .Values.deploy.storage.nfs.enabled }}
+            - name: model
+              mountPath: /home/model-server/model-store
+              subPath: {{ .Values.deploy.storage.nfs.subPath }}
+          {{- else }}
+            - name: model
+              mountPath: /home/model-server/model-store
+          {{- end }}
+          resources:
+            requests:
+              cpu: {{ .Values.deploy.resources.requests.cpu }}
+              memory: {{ .Values.deploy.resources.requests.memory }}
+            limits:
+              cpu: {{ .Values.deploy.resources.limits.cpu }}
+              memory: {{ .Values.deploy.resources.limits.memory }}
+      securityContext:
+        fsGroup: 1000
+        runAsUser: 1000
+      volumes:
+      {{- if .Values.deploy.storage.nfs.enabled }}
+        - name: model
+          nfs:
+            server: {{ .Values.deploy.storage.nfs.server }}
+            path: {{ .Values.deploy.storage.nfs.path }}
+            readOnly: {{ .Values.deploy.storage.nfs.readOnly }}
+          emptyDir: {}
+      {{- else }}
+        - name: model
+          persistentVolumeClaim:
+            claimName: {{ include "torchserve.fullname" . }}-model-dir
+      {{- end }}
diff --git a/workflows/charts/torchserve/templates/pvc.yaml b/workflows/charts/torchserve/templates/pvc.yaml
diff --git a/workflows/charts/torchserve/templates/service.yaml b/workflows/charts/torchserve/templates/service.yaml
diff --git a/workflows/charts/torchserve/templates/tests/test-connection.yaml b/workflows/charts/torchserve/templates/tests/test-connection.yaml
diff --git a/workflows/charts/torchserve/values.yaml b/workflows/charts/torchserve/values.yaml