Getting started: Installing and Configuring Neuron-RTD

In this getting started guide you will learn how to install Neuron runtime, and configure it for inference. If you'd like to install the Neuron runtime into your own AMI of choice please start with Step 1. If you plan to use a pre-built Deep Learning AMI (recommended) follow these instructions: https://docs.aws.amazon.com/dlami/latest/devguide/launch-config.html. Using DLAMI is recommended as it comes pre-installed with all of the needed Neuron packages. When using DLAMI, you can start with step 3 below.

Step 1: Launch an Inf1 Instance

Steps Overview:

Select an AMI of your choice, which may be Ubuntu 16.x, Ubuntu 18.x, Amazon Linux 2 based.
Select an Inf1 instance size of your choice (see https://aws.amazon.com/ec2/instance-types/)

Step 2: Install Neuron-RTD

Steps Overview:

Modify yum/apt repository configurations to point to the Neuron repository.
Install Neuron-RTD

Modify yum/apt repository configurations to point to the Neuron repository.

To know your ubuntu version, type this grep cmd. It should be an 18.* or 16.*

grep -iw version /etc/os-release

UBUNTU 16

sudo tee /etc/apt/sources.list.d/neuron.list > /dev/null <<EOF
deb https://apt.repos.neuron.amazonaws.com xenial main
EOF

wget -qO - https://apt.repos.neuron.amazonaws.com/GPG-PUB-KEY-AMAZON-AWS-NEURON.PUB | sudo apt-key add -
 
sudo apt-get update
sudo apt-get install aws-neuron-runtime
sudo apt-get install aws-neuron-tools

UBUNTU 18

sudo tee /etc/apt/sources.list.d/neuron.list > /dev/null <<EOF
deb https://apt.repos.neuron.amazonaws.com bionic main
EOF

wget -qO - https://apt.repos.neuron.amazonaws.com/GPG-PUB-KEY-AMAZON-AWS-NEURON.PUB | sudo apt-key add -
 
sudo apt-get update
sudo apt-get install aws-neuron-runtime
sudo apt-get install aws-neuron-tools

RPM (AmazonLinux, Centos)

sudo tee /etc/yum.repos.d/neuron.repo > /dev/null <<EOF
[neuron]
name=Neuron YUM Repository
baseurl=https://yum.repos.neuron.amazonaws.com
enabled=1
EOF

sudo rpm --import https://yum.repos.neuron.amazonaws.com/GPG-PUB-KEY-AMAZON-AWS-NEURON.PUB
sudo yum install aws-neuron-runtime
sudo yum install aws-neuron-tools

Step 3: Configure nr_hugepages

Neuron Runtime uses 2MB hugepages for the input feature map buffers and the output feature map buffers for all loaded models. By default Neuron Runtime uses 128 2MB hugepages per Inferentia. Hugepages is a system wide resource. The allocation of 2MB hugepages should be done at boot time or as soon as possible after boot. To allocate at boot time, pass hugepages option to the kernel, for example, to allocate 128 2MB hugepages use as a linux boot param:

hugepages=128

Alternatively, 2MB hugepages could be allocated after boot by invoking the following command:

sudo sysctl -w vm.nr_hugepages=128

To make the changes persist across reboots add the following to /etc/sysctl.conf

vm.nr_hugepages=128

Run the following command to see the number of 2MB hugepages setting for your instance:

grep HugePages_Total /proc/meminfo | awk {'print $2'}

To adjust the number of hugepages used by the Neuron Runtime, update /opt/aws/neuron/config/neuron-rtd.config parameter: num_hugepages_per_device; The default number is 128 2MB pages per Inferentia. Increase to the desired number, then restart neuron-rtd service. Make sure the OS has at least that many hugepages available before restarting Neuron Runtime.

Step 4: Configure Neuron-RTD

You can choose your Neuron-RTD mode, either select to run a single instance of the Neuron runtime, or mutiple instances which may be desired to provide your application capabilities like isolation or load balancing.

Single Neuron-RTD

The default configuration sets up a single Neuron-RTD daemon for all present Neuron devices in the instance. With the default configuration:

Runtime API server listens on a single UDS endpoint unix:/run/neuron.sock
A single runtime daemon(multi threaded) handles all the inference requests.

Multiple Neuron-RTD

Multiple runtime daemon might be preferred in some cases for isolation or for load balancing. The following steps explains configuring 4 Neuron-RTD on a Inf1.6xl instance and let each daemon to manage 1 Neuron device. When configuring multiple Neuron-RTD, a configuration file needs to be created to specify the API server endpoint (UDP or TCP port) and logical device id it should manage.

Useful commands

Identify logical IDs of Inferentia devices

Use neuron-ls to enumerate the set of Inferentia chips in the system.

/opt/aws/neuron/bin/neuron-ls
+--------------+---------+--------+-----------+-----------+------+------+
|   PCI BDF    | LOGICAL | NEURON |  MEMORY   |  MEMORY   | EAST | WEST |
|              |   ID    | CORES  | CHANNEL 0 | CHANNEL 1 |      |      |
+--------------+---------+--------+-----------+-----------+------+------+
| 0000:00:1f.0 |       0 |      4 | 4096 MB   | 4096 MB   |    0 |    1 |
+--------------+---------+--------+-----------+-----------+------+------+ 
| 0000:00:1e.0 |       1 |      4 | 4096 MB   | 4096 MB   |    1 |    1 |
+--------------+---------+--------+-----------+-----------+------+------+ 
| 0000:00:1d.0 |       2 |      4 | 4096 MB   | 4096 MB   |    1 |    1 |
+--------------+---------+--------+-----------+-----------+------+------+ 
| 0000:00:1c.0 |       3 |      4 | 4096 MB   | 4096 MB   |    1 |    0 |
+--------------+---------+--------+-----------+-----------+------+------+

neuron-rtd can manage one or more devices. Select contigous Inferentia devices to be managed by a single neuron-rtd.

Create a configuration file for each instance

Create a configuration file for each Neuron-rtd you wish to launch, with one or more Inferentia chips desired to be mapped to that Neuron-rtd instance, and the listening port for it.

sudo tee /opt/aws/neuron/bin/nrtd0.json > /dev/null << EOF
{
"name": "nrtd0",
"server_port": "unix:/run/neuron.sock0",
"devices": [0]
}
EOF

sudo tee /opt/aws/neuron/bin/nrtd1.json > /dev/null << EOF
{
"name": "nrtd1",
"server_port": "unix:/run/neuron.sock1",
"devices": [1]
}
EOF

sudo tee /opt/aws/neuron/bin/nrtd2.json > /dev/null << EOF
{
"name": "nrtd2",
"server_port": "unix:/run/neuron.sock2",
"devices": [2]
}
EOF

>sudo tee /opt/aws/neuron/bin/nrtd3.json > /dev/null << EOF
{
"name": "nrtd3",
"server_port": "unix:/run/neuron.sock3",
"devices": [3]
}
EOF

sudo chmod 755 /opt/aws/neuron/bin/nrtd0.json
sudo chmod 755 /opt/aws/neuron/bin/nrtd1.json
sudo chmod 755 /opt/aws/neuron/bin/nrtd2.json
sudo chmod 755 /opt/aws/neuron/bin/nrtd3.json

Start the services

Stop the default service

sudo systemctl stop neuron-rtd

Start the new services

sudo systemctl start neuron-rtd@nrtd0
sudo systemctl start neuron-rtd@nrtd1
sudo systemctl start neuron-rtd@nrtd2
sudo systemctl start neuron-rtd@nrtd3

Verify the services are up and running. This example shows one of the Neuron-RTD daemons (Neuron-RTD0):

sudo systemctl status neuron-rtd@nrtd0
● neuron-rtd@nrtd0.service - Neuron Runtime Daemon nrtd0
   Loaded: loaded (/lib/systemd/system/neuron-rtd@.service; disabled; vendor preset: enabled)
   Active: active (running) since Wed 2019-11-13 00:24:25 UTC; 8s ago
 Main PID: 32446 (neuron-rtd)
    Tasks: 14 (limit: 4915)
   CGroup: /system.slice/system-neuron\x2drtd.slice/neuron-rtd@nrtd0.service
           └─32446 /opt/aws/neuron/bin/neuron-rtd -i nrtd0 -c /opt/aws/neuron/config/neuron-rtd.config

Nov 13 00:23:39 ip-10-1-255-226 neuron-rtd[32446]: nrtd[32446]: [TDRV:reset_mla] Resetting 0000:00:1f.0
Nov 13 00:23:39 ip-10-1-255-226 nrtd[32446]: [TDRV:reset_mla] Resetting 0000:00:1f.0
Nov 13 00:24:00 ip-10-1-255-226 neuron-rtd[32446]: nrtd[32446]: [hal] request seq: 3, cmd: 1 timed out
Nov 13 00:24:00 ip-10-1-255-226 nrtd[32446]: [hal] request seq: 3, cmd: 1 timed out
Nov 13 00:24:25 ip-10-1-255-226 neuron-rtd[32446]: nrtd[32446]: [TDRV:tdrv_init_one_mla_phase2] Initialized Inferentia: 0000:00:1f.0
Nov 13 00:24:25 ip-10-1-255-226 nrtd[32446]: [TDRV:tdrv_init_one_mla_phase2] Initialized Inferentia: 0000:00:1f.0
Nov 13 00:24:25 ip-10-1-255-226 neuron-rtd[32446]: E1113 00:24:25.605502817   32446 socket_utils_common_posix.cc:197] check for SO_REUSEPORT: {"created":"@1573604665.605493059","description":"SO_REUSEPORT unavailab
Nov 13 00:24:25 ip-10-1-255-226 systemd[1]: Started Neuron Runtime Daemon nrtd0.
Nov 13 00:24:25 ip-10-1-255-226 neuron-rtd[32446]: nrtd[32446]: [NRTD:RunServer] Server listening on unix:/run/neuron.sock0
Nov 13 00:24:25 ip-10-1-255-226 nrtd[32446]: [NRTD:RunServer] Server listening on unix:/run/neuron.sock0
lines 1-18/18 (END)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

nrt_start.md

nrt_start.md

Getting started: Installing and Configuring Neuron-RTD

Step 1: Launch an Inf1 Instance

Step 2: Install Neuron-RTD

Modify yum/apt repository configurations to point to the Neuron repository.

UBUNTU 16

RPM (AmazonLinux, Centos)

Step 3: Configure nr_hugepages

Step 4: Configure Neuron-RTD

Single Neuron-RTD

Multiple Neuron-RTD

Useful commands

Identify logical IDs of Inferentia devices

Create a configuration file for each instance

Start the services

Stop the default service

Start the new services

Files

nrt_start.md

Latest commit

History

nrt_start.md

File metadata and controls

Getting started: Installing and Configuring Neuron-RTD

Step 1: Launch an Inf1 Instance

Step 2: Install Neuron-RTD

Modify yum/apt repository configurations to point to the Neuron repository.

UBUNTU 16

RPM (AmazonLinux, Centos)

Step 3: Configure nr_hugepages

Step 4: Configure Neuron-RTD

Single Neuron-RTD

Multiple Neuron-RTD

Useful commands

Identify logical IDs of Inferentia devices

Create a configuration file for each instance

Start the services

Stop the default service

Start the new services