This document lists the release notes for AWS Neuron compiler. The neuron compiler is an ahead-of-time compiler that ensures Neuron will optimally utilize the Inferentia devices.
Operator-support for each input format is provided directly from the compiler:
neuron-cc --list-operators --framwork {TENSORFLOW | MXNET | ONNX}
The supported operators are also listed here:
Date 12/1/2019
- Added warning for unsupported operators and convolution sizes
- Added warning for unsupported layout / upsampling
- Added support for Relu6, AddV2, BatchMatmulV2 operators
- Added support for default MXNet outputs in –io-config
- Improved performance of batched inference for convolutional networks
- Fixed MatMult column size 1
- Fixed bf16 constant loading
- Fixed Conv2D tile accumulation
See Previous releases. Resolved issues are shown in Resolved Issues.
- dmlc_nnvm-1.0.1328.0
- dmlc_topi-1.0.1328.0
- dmlc_tvm-1.0.1328.0
- inferentia_hwm-1.0.674.0
- islpy-2018.2
Date: 11/25/2019
N/A, this is the first release.
N/A, this is the first release.
- Control flow Inferentia has a limited support for control flow. In general, Neuron can only support control flow operators which are static at compile time, i.e. static length RNN, top-k, sort, ...
- Size of neural network The size of neural network is influenced by a) type of neural network (CNN, LSTM, MLP) , b) number of layers, c) sizes of input (dimension of the tensors, batch size, ...). The current Neuron compiler release has a limitation in terms of the size of neural network it could effectively optimize. As a result, we limit CNN models (e.g. ResNet) to have an input size of up to 480x480 fp/bf16, batch size=4; LSTM models (e.g. GNMT) are limited to a time step limit of up to 900; MLP models (like BERT) are limited up to sequence-length=128, batch=8.
- Data layout The Neuron compiler supports multiple data layout format (NCHW, NHWC, ...). Non-CNHW input/output data-layouts will require Neuron to insert additional transpose operations, causing a degradation in performance.
- Object detection models Computer-vision object detection and segmentation models are not supported by the current release.
- Reduce data type INT8 data type is not currently supported by the Neuron compiler.
- Tensor residency When a sub-graph that is executed on the host is communicating with a sub-graph that is executing on Neuron cores, tensors are copied via the communication queues between the host and Inferentia memory for each inference, which may result in end-to-end performacne degradation.
- Primary inputs in Neuron Pipeline mode When a neural network is executed in Neuron Pipeline mode, only the first operator in a neural network can receive primary inputs from the host.
- nnvm: dmlc_nnvm-1.0.1219.0
- topi: dmlc_topi-1.0.1219.0
- tvm: dmlc_tvm-1.0.1219.0
- hwm: inferentia_hwm-1.0.602.0
- islpy: islpy-2018.2+aws2018.x.73.0