You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardexpand all lines: docs/mxnet-neuron/tutorial-compile-infer.md
+38-33
Original file line number
Diff line number
Diff line change
@@ -5,34 +5,27 @@ Neuron supports both Python module and Symbol APIs and the C predict API. The fo
5
5
## Steps Overview:
6
6
7
7
1. Launch an EC2 instance for compilation and/or inference
8
-
2. Install Neuron for Compiler and Runtime execution
9
-
3. Run Example
10
-
1. Compile
11
-
2. Execute inference on Inf1
8
+
2. Install Neuron for compilation and runtime execution
9
+
3. Compile on compilation server
10
+
4. Execute inference on Inf1
12
11
13
12
## Step 1: Launch EC2 Instances
14
13
15
-
A typical workflow with the Neuron SDK will be for a trained ML model to be compiled on a compilation server and then the artifacts distributed to the (fleet of) Inf1 instances for execution. Neuron enables MXNet to be used for all of these steps.
14
+
A typical workflow with the Neuron SDK will be to compile trained ML models on a compilation server and then distribute the artifacts to a fleet of Inf1 instances for execution. Neuron enables MXNet to be used for all of these steps.
16
15
16
+
1. Select an AMI of your choice, which may be Ubuntu 16.x, Ubuntu 18.x, Amazon Linux 2 based. To use a pre-built Deep Learning AMI, which includes all of the needed packages, see [Launching and Configuring a DLAMI](https://docs.aws.amazon.com/dlami/latest/devguide/launch-config.html)
17
+
2. Select and launch an EC2 instance of your choice to compile. Launch an instance by following [EC2 instructions](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/EC2_GetStarted.html#ec2-launch-instance).
18
+
* It is recommended to use c5.4xlarge or larger. For this example we will use a c5.4xlarge.
19
+
* If you would like to compile and infer on the same machine, please select inf1.6xlarge.
20
+
3. Select and launch an Inf1 instance of your choice if not compiling and inferencing on the same instance. Launch an instance by following [EC2 instructions](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/EC2_GetStarted.html#ec2-launch-instance).
17
21
18
-
1. Select an AMI of your choice, which may be Ubuntu 16.x, Ubuntu 18.x, Amazon Linux 2 based. To use a pre-built Deep Learning AMI, which includes all of the needed packages, see these instructions: https://docs.aws.amazon.com/dlami/latest/devguide/launch-config.html
19
-
2. Select and start an EC2 instance of your choice to compile
20
-
1. It is recommended to use C5.4xlarge or larger. For this example we will use a C5.4xlarge
21
-
2. If you would like to compile and infer on the same machine, please select Inf1.6xlarge
22
-
3. Select and start an Inf1 instance of your choice to run the compiled model you sdtaretd in step 2.2 to run the compiled model.
22
+
## Step 2: Install Neuron Compiler and MXNet-Neuron On Compilation Instance
23
23
24
-
## Step 2: Install Neuron
24
+
If using DLAMI, activate aws_neuron_mxnet_p36 environment and skip this step.
25
25
26
-
If using DLAMI and aws_neuron_mxnet_p36 environment, you can skip to Step 3.
26
+
On the instance you are going to use for compilation, install both Neuron Compiler and MXNet-Neuron.
27
27
28
-
### Compiler Instance: Install Neuron Compiler and MXnet-Neuron
29
-
30
-
On the instance you are going to use for compilation, you must have both the Neuron Compiler and the MXNet-Neuron installed. (The inference instance must have the MXNet-Neuron and the Neuron Runtime installed.)
31
-
Steps Overview:
32
-
33
-
#### Using Virtualenv:
34
-
35
-
1. Install virtualenv if needed:
28
+
2.1. Install virtualenv if needed:
36
29
```bash
37
30
# Ubuntu
38
31
sudo apt-get update
@@ -44,32 +37,32 @@ sudo yum update
44
37
sudo yum install -y python3
45
38
pip3 install --user virtualenv
46
39
```
47
-
2. Setup a new Python 3.6 environment:
40
+
2.2. Setup a new Python 3.6 environment:
48
41
```bash
49
42
virtualenv --python=python3.6 test_env_p36
50
43
source test_env_p36/bin/activate
51
44
```
52
-
3. Modify Pip repository configurations to point to the Neuron repository.
45
+
2.3. Modify Pip repository configurations to point to the Neuron repository.
### Inference Instance: Install MXNet-Neuron and Neuron-Runtime
66
-
67
-
1. Same as above to install MXNet-Neuron
68
-
2. To install Runtime, see [Getting started: Installing and Configuring Neuron-RTD](./../neuron-runtime/nrt_start.md).
61
+
## Step 3: Compile on Compilation Server
69
62
70
-
## Step 3: Run Example
63
+
Model must be compiled to Inferentia target before it can run on Inferentia.
71
64
72
-
1. Create a file `compile_resnet50.py` with the content below and run it using `python compile_resnet50.py`. Compilation will take a few minutes on c5.4xlarge. At the end of compilation, the files `resnet-50_compiled-0000.params` and `resnet-50_compiled-symbol.json` will be created in local directory.
65
+
3.1. Create a file `compile_resnet50.py` with the content below and run it using `python compile_resnet50.py`. Compilation will take a few minutes on c5.4xlarge. At the end of compilation, the files `resnet-50_compiled-0000.params` and `resnet-50_compiled-symbol.json` will be created in local directory.
Copy file name to clipboardexpand all lines: docs/mxnet-neuron/tutorial-model-serving.md
+6-4
Original file line number
Diff line number
Diff line change
@@ -2,9 +2,9 @@
2
2
3
3
This Neuron MXNet Model Serving (MMS) example is adapted from the MXNet vision service example which uses pretrained squeezenet to perform image classification: https://github.com/awslabs/mxnet-model-server/tree/master/examples/mxnet_vision.
4
4
5
-
Before starting this example, please ensure that Neuron-optimized MXNet version mxnet-neuron is installed (see [MXNet Tutorial](./tutorial-compile-infer.md)) and Neuron RTD is running with default settings (see [Neuron Runtime getting started](./../neuron-runtime/nrt_start.md) ).
5
+
Before starting this example, please ensure that Neuron-optimized MXNet version mxnet-neuron is installed along with Neuron Compiler (see [MXNet Tutorial](./tutorial-compile-infer.md)) and Neuron RTD is running with default settings (see [Neuron Runtime getting started](./../neuron-runtime/nrt_start.md) ).
6
6
7
-
If using DLAMI and aws_neuron_mxnet_p36 environment, you can skip the installation part in the first step below.
7
+
If using DLAMI, you can activate the environment aws_neuron_mxnet_p36 and skip the installation part in the first step below.
8
8
9
9
1. First, install Java runtime and mxnet-model-server:
10
10
@@ -91,14 +91,14 @@ Also, comment out unnecessary data copy for model_input in `mxnet_model_service.
91
91
#model_input = [item.as_in_context(self.mxnet_ctx) for item in model_input]
7. Start MXNet Model Server (MMS) and load model using RESTful API. The number of workers should be less than or equal number of NeuronCores divided by the number of NeuronCores required by model (<linktoAPI>). Please ensure that Neuron RTD is running with default settings (see Getting Started guide):
101
+
7. Start MXNet Model Server (MMS) and load model using RESTful API. Please ensure that Neuron RTD is running with default settings (see [Neuron Runtime getting started](./../neuron-runtime/nrt_start.md)):
102
102
103
103
```bash
104
104
cd~/mxnet-model-server/
@@ -108,6 +108,8 @@ curl -v -X POST "http://localhost:8081/models?initial_workers=1&max_workers=1&sy
108
108
sleep 10 # allow sufficient time to load model
109
109
```
110
110
111
+
Each worker requires NeuronCore Group that can accommodate the compiled model. Additional workers can be added by increasing max_workers configuration as long as there are enough NeuronCores available. Use `neuron-cli list-ncg` to see NeuronCore Groups being created.
Copy file name to clipboardexpand all lines: docs/mxnet-neuron/tutorial-neuroncore-groups.md
+14-2
Original file line number
Diff line number
Diff line change
@@ -1,6 +1,8 @@
1
1
# Tutorial: MXNet Configurations for NeuronCore Groups
2
2
3
-
To further subdivide the pool of NeuronCores controled by a Neuron-RTD, specify the NeuronCore Groups within that pool using the environment variable `NEURONCORE_GROUP_SIZES` set to list of group sizes. The consecutive NeuronCore groups will be created by Neuron-RTD and be available for use to map the models.
3
+
A NeuronCore Group is a set of NeuronCores that are used to load and run compiled models. At any time, one model will be running in a NeuronCore Group. By changing to a different sized NeuronCore Group and then creating several of these NeuronCore Groups, a user may create independent and parallel models running in the Inferentia. Additionally, within a NeuronCore Group, loaded models can be dynamically started and stopped, allowing for dynamic context switching from one model to another.
4
+
5
+
To explicitly specify the NeuronCore Groups, set environment variable `NEURONCORE_GROUP_SIZES` to a list of group sizes. The consecutive NeuronCore groups will be created by Neuron-RTD and be available for user to map the models.
4
6
5
7
Note that to map a model to a group, the model must be compiled to fit within the group size. To limit the number of NeuronCores during compilation, use compiler_args dictionary with field “--num-neuroncores“ set to the group size:
Before starting this example, please ensure that Neuron-optimized MXNet version mxnet-neuron is installed along with Neuron Compiler (see [MXNet Tutorial](./tutorial-compile-infer.md)) and Neuron RTD is running with default settings (see [Neuron Runtime getting started](./../neuron-runtime/nrt_start.md) ).
15
+
16
+
## Compile Model
17
+
18
+
Model must be compiled to Inferentia target before it can run on Inferentia.
19
+
12
20
Create compile_resnet50.py with `--num-neuroncores` set to 2 and run it. The files `resnet-50_compiled-0000.params` and `resnet-50_compiled-symbol.json` will be created in local directory:
During inference, to subdivide the pool of one Inferentia into groups of 1, 2, and 1 NeuronCores, specify `NEURONCORE_GROUP_SIZES` as follows:
34
44
35
45
```bash
36
46
NEURONCORE_GROUP_SIZES='[1,2,1]'<launch process>`
37
47
```
38
48
39
-
Within the framework, the model can be mapped to group using `ctx=mx.neuron(N)` context where N is the group index within the `NEURONCORE_GROUP_SIZES` list. Create infer_resnet50.py with the following content:
49
+
Within the framework, the model can be mapped to group using `ctx=mx.neuron(N)` context where N is the group index within the `NEURONCORE_GROUP_SIZES` list.
50
+
51
+
Create infer_resnet50.py with the following content:
Copy file name to clipboardexpand all lines: docs/tensorflow-neuron/tutorial-NeuronCore-Group.md
+4-9
Original file line number
Diff line number
Diff line change
@@ -1,10 +1,10 @@
1
1
# Tutorial: Configuring NeuronCore Groups
2
2
3
-
A NeuronCore Group is a set of NeuronCores that are used to load and run compiled models. At any time, one model will be running in a NeuronCoreGroup. By changing to a different sized NeuonCoreGroup and then creating several of these NeuronCoreGroups, a user may create independent and parallel models running in the Inferentia. Additonally: within a NeuronCoreGroup, loaded models can be dynamically started and stopped, allowing for dynamic context switching from one model to another. By default, a single NeuronCoreGroup is created by Neuron Runtime that contains all 4 NeuronCores in an Inferentia. In this default case, when models are loaded to that default NeuronCoreGroup, only 1 will be running at any time. By configuring multiple NeuronCoreGroups as shown in this tutorial, multiple models may be made to run simultaenously.
3
+
A NeuronCore Group is a set of NeuronCores that are used to load and run compiled models. At any time, one model will be running in a NeuronCore Group. By changing to a different sized NeuronCore Group and then creating several of these NeuronCore Groups, a user may create independent and parallel models running in the Inferentia. Additionally, within a NeuronCore Group, loaded models can be dynamically started and stopped, allowing for dynamic context switching from one model to another. By default, a single NeuronCoreGroup is created by Neuron Runtime that contains all four NeuronCores in an Inferentia. In this default case, when models are loaded to that default NeuronCore Group, only one will be running at any time. By configuring multiple NeuronCore Groups as shown in this tutorial, multiple models may be made to run simultaneously.
4
4
5
-
The NEURONCORE_GROUP_SIZES environment variable provides user control over this in Neuron-integrated TensorFlow. By default, TensorFlow-Neuron will choose the optimal utilization mode based on model metadata, but in some cases manually setting NEURONCORE_GROUP_SIZES can provide additional performance benefits.
5
+
The NEURONCORE_GROUP_SIZES environment variable provides user control over the grouping of NeuronCores in Neuron-integrated TensorFlow. By default, TensorFlow-Neuron will choose the optimal utilization mode based on model metadata, but in some cases manually setting NEURONCORE_GROUP_SIZES can provide additional performance benefits.
6
6
7
-
In this tutorial you will learn how to enable a NeuronCore group running TensorFlow Resnet-50 model
7
+
In this tutorial you will learn how to enable a NeuronCore Group running TensorFlow Resnet-50 model.
8
8
9
9
## Steps Overview:
10
10
@@ -61,7 +61,7 @@ python infer_resnet50.py
61
61
62
62
Scenario 1: allow tensorflow-neuron to utilize more than one Inferentia on inf1.6xlarge and inf1.24xlarge instance sizes.
63
63
64
-
By default, one Python process with tensorflow-neuron or one tensorflow_model_server_neuron process tries to allocate all NeuronCores in an Inferentia from the Neuron Runtime Daemon. To utilize multiple Inferentias, the recommended parallelization mode is process-level parallelization, as it bypasses the overhead of Python and tensorflow_model_server_neuron resource handling as well as Python’s global interpreter lock (GIL). Note that TensorFlow’s session.run function actually does not hold the GIL.
64
+
By default, one Python process with tensorflow-neuron or one tensorflow_model_server_neuron process tries to allocate all NeuronCores in an Inferentia from the Neuron Runtime Daemon. To utilize multiple Inferentias, the recommended parallelization mode is process-level parallelization, as it bypasses the overhead of Python and tensorflow_model_server_neuron resource handling as well as Python’s global interpreter lock (GIL). Note that TensorFlow’s session.run function actually does not hold the GIL.
65
65
66
66
When there is a need to allocate more Inferentia compute into a single process, the following example shows the usage:
67
67
@@ -133,8 +133,3 @@ result_list = [predictor(feed) for feed in model_feed_dict_list]
0 commit comments