Skip to content

Commit bec33c1

Browse files
committed
Restructure the compile-infer tutorials based on feedbacks; expanded NCG intro
1 parent 473655a commit bec33c1

5 files changed

+97
-76
lines changed

docs/mxnet-neuron/tutorial-compile-infer.md

+38-33
Original file line numberDiff line numberDiff line change
@@ -5,34 +5,27 @@ Neuron supports both Python module and Symbol APIs and the C predict API. The fo
55
## Steps Overview:
66

77
1. Launch an EC2 instance for compilation and/or inference
8-
2. Install Neuron for Compiler and Runtime execution
9-
3. Run Example
10-
1. Compile
11-
2. Execute inference on Inf1
8+
2. Install Neuron for compilation and runtime execution
9+
3. Compile on compilation server
10+
4. Execute inference on Inf1
1211

1312
## Step 1: Launch EC2 Instances
1413

15-
A typical workflow with the Neuron SDK will be for a trained ML model to be compiled on a compilation server and then the artifacts distributed to the (fleet of) Inf1 instances for execution. Neuron enables MXNet to be used for all of these steps.
14+
A typical workflow with the Neuron SDK will be to compile trained ML models on a compilation server and then distribute the artifacts to a fleet of Inf1 instances for execution. Neuron enables MXNet to be used for all of these steps.
1615

16+
1. Select an AMI of your choice, which may be Ubuntu 16.x, Ubuntu 18.x, Amazon Linux 2 based. To use a pre-built Deep Learning AMI, which includes all of the needed packages, see [Launching and Configuring a DLAMI](https://docs.aws.amazon.com/dlami/latest/devguide/launch-config.html)
17+
2. Select and launch an EC2 instance of your choice to compile. Launch an instance by following [EC2 instructions](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/EC2_GetStarted.html#ec2-launch-instance).
18+
* It is recommended to use c5.4xlarge or larger. For this example we will use a c5.4xlarge.
19+
* If you would like to compile and infer on the same machine, please select inf1.6xlarge.
20+
3. Select and launch an Inf1 instance of your choice if not compiling and inferencing on the same instance. Launch an instance by following [EC2 instructions](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/EC2_GetStarted.html#ec2-launch-instance).
1721

18-
1. Select an AMI of your choice, which may be Ubuntu 16.x, Ubuntu 18.x, Amazon Linux 2 based. To use a pre-built Deep Learning AMI, which includes all of the needed packages, see these instructions: https://docs.aws.amazon.com/dlami/latest/devguide/launch-config.html
19-
2. Select and start an EC2 instance of your choice to compile
20-
1. It is recommended to use C5.4xlarge or larger. For this example we will use a C5.4xlarge
21-
2. If you would like to compile and infer on the same machine, please select Inf1.6xlarge
22-
3. Select and start an Inf1 instance of your choice to run the compiled model you sdtaretd in step 2.2 to run the compiled model.
22+
## Step 2: Install Neuron Compiler and MXNet-Neuron On Compilation Instance
2323

24-
## Step 2: Install Neuron
24+
If using DLAMI, activate aws_neuron_mxnet_p36 environment and skip this step.
2525

26-
If using DLAMI and aws_neuron_mxnet_p36 environment, you can skip to Step 3.
26+
On the instance you are going to use for compilation, install both Neuron Compiler and MXNet-Neuron.
2727

28-
### Compiler Instance: Install Neuron Compiler and MXnet-Neuron
29-
30-
On the instance you are going to use for compilation, you must have both the Neuron Compiler and the MXNet-Neuron installed. (The inference instance must have the MXNet-Neuron and the Neuron Runtime installed.)
31-
Steps Overview:
32-
33-
#### Using Virtualenv:
34-
35-
1. Install virtualenv if needed:
28+
2.1. Install virtualenv if needed:
3629
```bash
3730
# Ubuntu
3831
sudo apt-get update
@@ -44,32 +37,32 @@ sudo yum update
4437
sudo yum install -y python3
4538
pip3 install --user virtualenv
4639
```
47-
2. Setup a new Python 3.6 environment:
40+
2.2. Setup a new Python 3.6 environment:
4841
```bash
4942
virtualenv --python=python3.6 test_env_p36
5043
source test_env_p36/bin/activate
5144
```
52-
3. Modify Pip repository configurations to point to the Neuron repository.
45+
2.3. Modify Pip repository configurations to point to the Neuron repository.
5346
```bash
5447
tee $VIRTUAL_ENV/pip.conf > /dev/null <<EOF
5548
[global]
5649
extra-index-url = https://pip.repos.neuron.amazonaws.com
5750
EOF
5851
```
59-
4. Install MxNet-Neuron and Neuron Compiler
52+
2.4. Install MXNet-Neuron and Neuron Compiler
6053
```bash
61-
pip install neuron-cc[mxnet]
6254
pip install mxnet-neuron
6355
```
56+
```bash
57+
# can be skipped on inference-only instance
58+
pip install neuron-cc[mxnet]
59+
```
6460

65-
### Inference Instance: Install MXNet-Neuron and Neuron-Runtime
66-
67-
1. Same as above to install MXNet-Neuron
68-
2. To install Runtime, see [Getting started: Installing and Configuring Neuron-RTD](./../neuron-runtime/nrt_start.md).
61+
## Step 3: Compile on Compilation Server
6962

70-
## Step 3: Run Example
63+
Model must be compiled to Inferentia target before it can run on Inferentia.
7164

72-
1. Create a file `compile_resnet50.py` with the content below and run it using `python compile_resnet50.py`. Compilation will take a few minutes on c5.4xlarge. At the end of compilation, the files `resnet-50_compiled-0000.params` and `resnet-50_compiled-symbol.json` will be created in local directory.
65+
3.1. Create a file `compile_resnet50.py` with the content below and run it using `python compile_resnet50.py`. Compilation will take a few minutes on c5.4xlarge. At the end of compilation, the files `resnet-50_compiled-0000.params` and `resnet-50_compiled-symbol.json` will be created in local directory.
7366

7467
```python
7568
import mxnet as mx
@@ -88,13 +81,25 @@ sym, args, aux = mx.contrib.neuron.compile(sym, args, aux, inputs)
8881
mx.model.save_checkpoint("resnet-50_compiled", 0, sym, args, aux)
8982
```
9083

91-
2. If not compiling and inferring on the same instance, copy the artifact to the inference server (use ec2-user as user for AML2):
84+
3.2. If not compiling and inferring on the same instance, copy the artifact to the inference server (use ec2-user as user for AML2):
9285
```bash
9386
scp -i <PEM key file> resnet-50_compiled-0000.params ubuntu@<instance DNS>:~/ # Ubuntu
9487
scp -i <PEM key file> resnet-50_compiled-symbol.json ubuntu@<instance DNS>:~/ # Ubuntu
9588
```
9689

97-
3. On the Inf1, create a inference Python script named `infer_resnet50.py` with the following content:
90+
## Step 4: Install MXNet-Neuron and Neuron-Runtime on Inference Instance
91+
92+
If using DLAMI, activate aws_neuron_mxnet_p36 environment and skip this step.
93+
94+
4.1. Follow Step 2 above to install MXNet-Neuron.
95+
* Install neuron-cc if compilation on inference instance is desired (see notes above on recommended Inf1 sizes for compilation)
96+
* Skip neuron-cc if compilation is not done on inference instance
97+
98+
4.2. To install Runtime, see [Getting started: Installing and Configuring Neuron-RTD](./../neuron-runtime/nrt_start.md).
99+
100+
## Step 5: Execute inference on Inf1
101+
102+
5.1. On the Inf1, create a inference Python script named `infer_resnet50.py` with the following content:
98103
```python
99104
import mxnet as mx
100105
import numpy as np
@@ -130,7 +135,7 @@ for i in a[0:5]:
130135
print('probability=%f, class=%s' %(prob[i], labels[i]))
131136
```
132137

133-
4. Run the script to see inference results:
138+
5.2. Run the script to see inference results:
134139
```bash
135140
python infer_resnet50.py
136141
```

docs/mxnet-neuron/tutorial-model-serving.md

+6-4
Original file line numberDiff line numberDiff line change
@@ -2,9 +2,9 @@
22

33
This Neuron MXNet Model Serving (MMS) example is adapted from the MXNet vision service example which uses pretrained squeezenet to perform image classification: https://github.com/awslabs/mxnet-model-server/tree/master/examples/mxnet_vision.
44

5-
Before starting this example, please ensure that Neuron-optimized MXNet version mxnet-neuron is installed (see [MXNet Tutorial](./tutorial-compile-infer.md)) and Neuron RTD is running with default settings (see [Neuron Runtime getting started](./../neuron-runtime/nrt_start.md) ).
5+
Before starting this example, please ensure that Neuron-optimized MXNet version mxnet-neuron is installed along with Neuron Compiler (see [MXNet Tutorial](./tutorial-compile-infer.md)) and Neuron RTD is running with default settings (see [Neuron Runtime getting started](./../neuron-runtime/nrt_start.md) ).
66

7-
If using DLAMI and aws_neuron_mxnet_p36 environment, you can skip the installation part in the first step below.
7+
If using DLAMI, you can activate the environment aws_neuron_mxnet_p36 and skip the installation part in the first step below.
88

99
1. First, install Java runtime and mxnet-model-server:
1010

@@ -91,14 +91,14 @@ Also, comment out unnecessary data copy for model_input in `mxnet_model_service.
9191
#model_input = [item.as_in_context(self.mxnet_ctx) for item in model_input]
9292
```
9393

94-
6. Package the model with model-archiver
94+
6. Package the model with model-archiver:
9595

9696
```bash
9797
cd ~/mxnet-model-server/examples
9898
model-archiver --force --model-name resnet-50_compiled --model-path mxnet_vision --handler mxnet_vision_service:handle
9999
```
100100

101-
7. Start MXNet Model Server (MMS) and load model using RESTful API. The number of workers should be less than or equal number of NeuronCores divided by the number of NeuronCores required by model (<link to API>). Please ensure that Neuron RTD is running with default settings (see Getting Started guide):
101+
7. Start MXNet Model Server (MMS) and load model using RESTful API. Please ensure that Neuron RTD is running with default settings (see [Neuron Runtime getting started](./../neuron-runtime/nrt_start.md)):
102102

103103
```bash
104104
cd ~/mxnet-model-server/
@@ -108,6 +108,8 @@ curl -v -X POST "http://localhost:8081/models?initial_workers=1&max_workers=1&sy
108108
sleep 10 # allow sufficient time to load model
109109
```
110110

111+
Each worker requires NeuronCore Group that can accommodate the compiled model. Additional workers can be added by increasing max_workers configuration as long as there are enough NeuronCores available. Use `neuron-cli list-ncg` to see NeuronCore Groups being created.
112+
111113
8. Test inference using an example image:
112114

113115
```bash

docs/mxnet-neuron/tutorial-neuroncore-groups.md

+14-2
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,8 @@
11
# Tutorial: MXNet Configurations for NeuronCore Groups
22

3-
To further subdivide the pool of NeuronCores controled by a Neuron-RTD, specify the NeuronCore Groups within that pool using the environment variable `NEURONCORE_GROUP_SIZES` set to list of group sizes. The consecutive NeuronCore groups will be created by Neuron-RTD and be available for use to map the models.
3+
A NeuronCore Group is a set of NeuronCores that are used to load and run compiled models. At any time, one model will be running in a NeuronCore Group. By changing to a different sized NeuronCore Group and then creating several of these NeuronCore Groups, a user may create independent and parallel models running in the Inferentia. Additionally, within a NeuronCore Group, loaded models can be dynamically started and stopped, allowing for dynamic context switching from one model to another.
4+
5+
To explicitly specify the NeuronCore Groups, set environment variable `NEURONCORE_GROUP_SIZES` to a list of group sizes. The consecutive NeuronCore groups will be created by Neuron-RTD and be available for user to map the models.
46

57
Note that to map a model to a group, the model must be compiled to fit within the group size. To limit the number of NeuronCores during compilation, use compiler_args dictionary with field “--num-neuroncores“ set to the group size:
68

@@ -9,6 +11,12 @@ compile_args = {'--num-neuroncores' : 2}
911
sym, args, auxs = neuron.compile(sym, args, auxs, inputs, **compile_args)
1012
```
1113

14+
Before starting this example, please ensure that Neuron-optimized MXNet version mxnet-neuron is installed along with Neuron Compiler (see [MXNet Tutorial](./tutorial-compile-infer.md)) and Neuron RTD is running with default settings (see [Neuron Runtime getting started](./../neuron-runtime/nrt_start.md) ).
15+
16+
## Compile Model
17+
18+
Model must be compiled to Inferentia target before it can run on Inferentia.
19+
1220
Create compile_resnet50.py with `--num-neuroncores` set to 2 and run it. The files `resnet-50_compiled-0000.params` and `resnet-50_compiled-symbol.json` will be created in local directory:
1321

1422
```python
@@ -30,13 +38,17 @@ mx.model.save_checkpoint("resnet-50_compiled", 0, sym, args, aux)
3038

3139
```
3240

41+
## Run Inference
42+
3343
During inference, to subdivide the pool of one Inferentia into groups of 1, 2, and 1 NeuronCores, specify `NEURONCORE_GROUP_SIZES` as follows:
3444

3545
```bash
3646
NEURONCORE_GROUP_SIZES='[1,2,1]' <launch process>`
3747
```
3848

39-
Within the framework, the model can be mapped to group using `ctx=mx.neuron(N)` context where N is the group index within the `NEURONCORE_GROUP_SIZES` list. Create infer_resnet50.py with the following content:
49+
Within the framework, the model can be mapped to group using `ctx=mx.neuron(N)` context where N is the group index within the `NEURONCORE_GROUP_SIZES` list.
50+
51+
Create infer_resnet50.py with the following content:
4052

4153
```python
4254
import mxnet as mx

docs/tensorflow-neuron/tutorial-NeuronCore-Group.md

+4-9
Original file line numberDiff line numberDiff line change
@@ -1,10 +1,10 @@
11
# Tutorial: Configuring NeuronCore Groups
22

3-
A NeuronCore Group is a set of NeuronCores that are used to load and run compiled models. At any time, one model will be running in a NeuronCoreGroup. By changing to a different sized NeuonCoreGroup and then creating several of these NeuronCoreGroups, a user may create independent and parallel models running in the Inferentia. Additonally: within a NeuronCoreGroup, loaded models can be dynamically started and stopped, allowing for dynamic context switching from one model to another. By default, a single NeuronCoreGroup is created by Neuron Runtime that contains all 4 NeuronCores in an Inferentia. In this default case, when models are loaded to that default NeuronCoreGroup, only 1 will be running at any time. By configuring multiple NeuronCoreGroups as shown in this tutorial, multiple models may be made to run simultaenously.
3+
A NeuronCore Group is a set of NeuronCores that are used to load and run compiled models. At any time, one model will be running in a NeuronCore Group. By changing to a different sized NeuronCore Group and then creating several of these NeuronCore Groups, a user may create independent and parallel models running in the Inferentia. Additionally, within a NeuronCore Group, loaded models can be dynamically started and stopped, allowing for dynamic context switching from one model to another. By default, a single NeuronCoreGroup is created by Neuron Runtime that contains all four NeuronCores in an Inferentia. In this default case, when models are loaded to that default NeuronCore Group, only one will be running at any time. By configuring multiple NeuronCore Groups as shown in this tutorial, multiple models may be made to run simultaneously.
44

5-
The NEURONCORE_GROUP_SIZES environment variable provides user control over this in Neuron-integrated TensorFlow. By default, TensorFlow-Neuron will choose the optimal utilization mode based on model metadata, but in some cases manually setting NEURONCORE_GROUP_SIZES can provide additional performance benefits.
5+
The NEURONCORE_GROUP_SIZES environment variable provides user control over the grouping of NeuronCores in Neuron-integrated TensorFlow. By default, TensorFlow-Neuron will choose the optimal utilization mode based on model metadata, but in some cases manually setting NEURONCORE_GROUP_SIZES can provide additional performance benefits.
66

7-
In this tutorial you will learn how to enable a NeuronCore group running TensorFlow Resnet-50 model
7+
In this tutorial you will learn how to enable a NeuronCore Group running TensorFlow Resnet-50 model.
88

99
## Steps Overview:
1010

@@ -61,7 +61,7 @@ python infer_resnet50.py
6161

6262
Scenario 1: allow tensorflow-neuron to utilize more than one Inferentia on inf1.6xlarge and inf1.24xlarge instance sizes.
6363

64-
By default, one Python process with tensorflow-neuron or one tensorflow_model_server_neuron process tries to allocate all NeuronCores in an Inferentia from the Neuron Runtime Daemon. To utilize multiple Inferentias, the recommended parallelization mode is process-level parallelization, as it bypasses the overhead of Python and tensorflow_model_server_neuron resource handling as well as Python’s global interpreter lock (GIL). Note that TensorFlow’s session.run function actually does not hold the GIL.
64+
By default, one Python process with tensorflow-neuron or one tensorflow_model_server_neuron process tries to allocate all NeuronCores in an Inferentia from the Neuron Runtime Daemon. To utilize multiple Inferentias, the recommended parallelization mode is process-level parallelization, as it bypasses the overhead of Python and tensorflow_model_server_neuron resource handling as well as Python’s global interpreter lock (GIL). Note that TensorFlow’s session.run function actually does not hold the GIL.
6565

6666
When there is a need to allocate more Inferentia compute into a single process, the following example shows the usage:
6767

@@ -133,8 +133,3 @@ result_list = [predictor(feed) for feed in model_feed_dict_list]
133133

134134
# inference results can be found in result_list
135135
```
136-
137-
138-
139-
140-

0 commit comments

Comments
 (0)