Neuron Compiler Release Notes

This document lists the release notes for AWS Neuron compiler. The neuron compiler is an ahead-of-time compiler that ensures Neuron will optimally utilize the Inferentia devices.

Operator-support for each input format is provided directly from the compiler:

neuron-cc --list-operators --framwork {TENSORFLOW | MXNET | ONNX}

The supported operators are also listed here:

Neuron-cc Tensorflow Operators
Neuron-cc MXNet Operators
Neuron-cc ONNX Operators

[1.0.5301.0]

Date 12/1/2019

Summary

Major New Features

Resolved Issues

Added warning for unsupported operators and convolution sizes
Added warning for unsupported layout / upsampling
Added support for Relu6, AddV2, BatchMatmulV2 operators
Added support for default MXNet outputs in –io-config
Improved performance of batched inference for convolutional networks
Fixed MatMult column size 1
Fixed bf16 constant loading
Fixed Conv2D tile accumulation

Known Issues and Limitations

See Previous releases. Resolved issues are shown in Resolved Issues.

Other Notes

Dependencies

dmlc_nnvm-1.0.1328.0
dmlc_topi-1.0.1328.0
dmlc_tvm-1.0.1328.0
inferentia_hwm-1.0.674.0
islpy-2018.2

[1.0.4680.0]

Date: 11/25/2019

Major new features

N/A, this is the first release.

Resolved issues

N/A, this is the first release.

Known issues and limitations

Control flow Inferentia has a limited support for control flow. In general, Neuron can only support control flow operators which are static at compile time, i.e. static length RNN, top-k, sort, ...
Size of neural network The size of neural network is influenced by a) type of neural network (CNN, LSTM, MLP) , b) number of layers, c) sizes of input (dimension of the tensors, batch size, ...). The current Neuron compiler release has a limitation in terms of the size of neural network it could effectively optimize. As a result, we limit CNN models (e.g. ResNet) to have an input size of up to 480x480 fp/bf16, batch size=4; LSTM models (e.g. GNMT) are limited to a time step limit of up to 900; MLP models (like BERT) are limited up to sequence-length=128, batch=8.
Data layout The Neuron compiler supports multiple data layout format (NCHW, NHWC, ...). Non-CNHW input/output data-layouts will require Neuron to insert additional transpose operations, causing a degradation in performance.
Object detection models Computer-vision object detection and segmentation models are not supported by the current release.
Reduce data type INT8 data type is not currently supported by the Neuron compiler.
Tensor residency When a sub-graph that is executed on the host is communicating with a sub-graph that is executing on Neuron cores, tensors are copied via the communication queues between the host and Inferentia memory for each inference, which may result in end-to-end performacne degradation.
Primary inputs in Neuron Pipeline mode When a neural network is executed in Neuron Pipeline mode, only the first operator in a neural network can receive primary inputs from the host.

Other Notes

Dependencies

nnvm: dmlc_nnvm-1.0.1219.0
topi: dmlc_topi-1.0.1219.0
tvm: dmlc_tvm-1.0.1219.0
hwm: inferentia_hwm-1.0.602.0
islpy: islpy-2018.2+aws2018.x.73.0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

neuron-cc.md

neuron-cc.md

Neuron Compiler Release Notes

[1.0.5301.0]

Summary

Major New Features

Resolved Issues

Known Issues and Limitations

Other Notes

Dependencies

[1.0.4680.0]

Major new features

Resolved issues

Known issues and limitations

Other Notes

Dependencies

Files

neuron-cc.md

Latest commit

History

neuron-cc.md

File metadata and controls

Neuron Compiler Release Notes

[1.0.5301.0]

Summary

Major New Features

Resolved Issues

Known Issues and Limitations

Other Notes

Dependencies

[1.0.4680.0]

Major new features

Resolved issues

Known issues and limitations

Other Notes

Dependencies