Skip to content

Commit 00a5fc7

Browse files
authored
refine mixed precision doc (intel#1119)
1 parent 8c565a4 commit 00a5fc7

File tree

2 files changed

+44
-61
lines changed

2 files changed

+44
-61
lines changed

docs/mixed_precision.md

+6-61
Original file line numberDiff line numberDiff line change
@@ -7,9 +7,7 @@ The recent growth of Deep Learning has driven the development of more complex mo
77

88
The recently launched 3rd Gen Intel® Xeon® Scalable processor (codenamed Cooper Lake), featuring Intel® Deep Learning Boost, is the first general-purpose x86 CPU to support the bfloat16 format. Specifically, three new bfloat16 instructions are added as a part of the AVX512_BF16 extension within Intel Deep Learning Boost: VCVTNE2PS2BF16, VCVTNEPS2BF16, and VDPBF16PS. The first two instructions allow converting to and from bfloat16 data type, while the last one performs a dot product of bfloat16 pairs. Further details can be found in the [hardware numerics document](https://software.intel.com/content/www/us/en/develop/download/bfloat16-hardware-numerics-definition.html) published by Intel.
99

10-
Intel® Neural Compressor (INC) supports two use cases for mixed precision:
11-
1. Direct mixed precision conversion: Mix `BF16 + FP32` and be executed by MixedPrecision API
12-
2. Mixed precision during quantization: Mix `BF16 + FP32 + INT8` and occur during quantization
10+
Intel® Neural Compressor (INC) supports `BF16 + FP32` mixed precision conversion by MixedPrecision API.
1311

1412
Its support status:
1513

@@ -29,21 +27,7 @@ It needs the CPU supports `avx512_bf16` instruction set.
2927
Intel has worked with the PyTorch & TensorFlow development teams to enhance PyTorch & TensorFlow to include bfloat16 data support for CPUs.
3028
- For PyTorch, the version higher than [1.11.0](https://download.pytorch.org/whl/torch_stable.html) is necessary.
3129

32-
33-
- For Tensorflow, BF16 support has been enabled in intel-tensorflow [2.3.0](https://pypi.org/project/intel-tensorflow/2.3.0/)/[2.4.0](https://pypi.org/project/intel-tensorflow/2.4.0/)/[1.15.0up1](https://github.com/Intel-tensorflow/tensorflow/tree/v1.15.0up1)/[1.15.0up2](https://github.com/Intel-tensorflow/tensorflow/tree/v1.15.0up2) and intel-tensorflow-avx512[2.3.0](https://pypi.org/project/intel-tensorflow-avx512/2.3.0/)/[2.4.0](https://pypi.org/project/intel-tensorflow-avx512/2.4.0/).
34-
35-
> For more information about BF16 in TensorFlow, please read [Accelerating AI performance on 3rd Gen Intel® Xeon® Scalable processors with TensorFlow and Bfloat16](https://blog.tensorflow.org/2020/06/accelerating-ai-performance-on-3rd-gen-processors-with-tensorflow-bfloat16.html).
36-
37-
> To get better performance with BF16 datatype, the intel-tensorflow-avx512 is recommended, or build intel tensorflow (take [tag v1.15.0up2](https://github.com/Intel-tensorflow/tensorflow/tree/v1.15.0up2) as example) from source code by using below command:
38-
39-
```shell
40-
bazel build --cxxopt=-D_GLIBCXX_USE_CXX11_ABI=0 --copt=-O3 --copt=-Wformat --copt=-Wformat-security \
41-
--copt=-fstack-protector --copt=-fPIC --copt=-fpic --linkopt=-znoexecstack --linkopt=-zrelro \
42-
--linkopt=-znow --linkopt=-fstack-protector --config=mkl --define build_with_mkl_dnn_v1_only=true \
43-
--copt=-DENABLE_INTEL_MKL_BFLOAT16 --copt=-march=native //tensorflow/tools/pip_package:build_pip_package
44-
45-
./bazel-bin/tensorflow/tools/pip_package/build_pip_package /tmp/
46-
```
30+
- For Tensorflow, the version higher than [2.3.0](https://pypi.org/project/intel-tensorflow/2.3.0/) is necessary.
4731

4832
## Methods to enable & disable BF16 support
4933
By default, BF16 has been added into activation and weight supported datatype if **the TensorFlow/PyTorch version and CPU meet the requirements at the same time**. We can disable it in the yaml config file by specifying the datatype for activation and weight.
@@ -54,11 +38,11 @@ FORCE_BF16=1 /path/to/executable_nc_wrapper
5438
```
5539
> ⚠️Without hardware or software support, the poor performance or other problems may expect for force enabling.
5640
57-
## Direct mixed precision conversion
41+
> During quantization, BF16 conversion will be executed automatically as well if pre-requirements are met or force enable it. Please refer to this [document](./quantization_mixed_precision.md) for its workflow.
5842
59-
INC queries framework capability and user-defined precision to generate an op-wise config based on the pre-optimized fp32 model. Direct mixed precision conversion will be implemented under the direction of config. Further, if users add necessary evaluation components, INC will tune accuracy during conversion.
43+
## How to use
6044

61-
### How to use it
45+
INC queries framework capability and user-defined precision to generate an op-wise config based on the pre-optimized fp32 model. Direct mixed precision conversion will be implemented under the direction of config. Further, if users add necessary evaluation components, INC will tune accuracy during conversion.
6246

6347
- Convert as many nodes as possible to target dtype
6448

@@ -130,43 +114,4 @@ INC queries framework capability and user-defined precision to generate an op-wi
130114
converter.eval_dataloader = common.DataLoader(dataset)
131115
converter.model = './model.pb'
132116
output_model = converter()
133-
```
134-
135-
## Mixed precision during quantization
136-
137-
This use case is only executed during quantization. Currently, only the Basic strategy with BF16 support has been validated.
138-
139-
### Tensorflow
140-
141-
- BF16 conversion during quantization in TensorFlow
142-
143-
![Mixed Precision](imgs/bf16_convert_tf.png "Mixed Precision Graph")
144-
145-
- Three steps
146-
147-
1. Convert to a `FP32 + INT8` mixed precision Graph
148-
149-
In this steps, TF adaptor will regard all fallback datatype as `FP32`. According to the per op datatype in tuning config passed by strategy, TF adaptor will generate a `FP32 + INT8` mixed precision graph.
150-
151-
2. Convert to a `BF16 + FP32 + INT8` mixed precision Graph
152-
153-
In this phase, adaptor will convert some `FP32` ops to `BF16` according to `bf16_ops` list in tuning config.
154-
155-
3. Optimize the `BF16 + FP32 + INT8` mixed precision Graph
156-
157-
After the mixed precision graph generated, there are still some optimization need to be applied to improved the performance, for example `Cast + Cast` and so on. The `BF16Convert` transformer also apply a depth-first method to make it possible to take the ops use `BF16` which can support `BF16` datatype to reduce the insertion of `Cast` op.
158-
159-
### PyTorch
160-
161-
- BF16 conversion during quantization in PyTorch
162-
163-
![Mixed Precision](imgs/bf16_convert_pt.png "Mixed Precision Graph")
164-
165-
- Two steps
166-
1. Convert to a `FP32 + INT8` mixed precision Graph or Module
167-
168-
In this steps, PT adaptor will combine the `INT8` ops and all fallback ops to `FP32 + INT8` mixed precision Graph or Module no matter in Eager mode or Fx Graph mode.
169-
170-
2. Convert to a `BF16 + FP32 + INT8` mixed precision Graph or Module
171-
172-
In this phase, adaptor will according to `BF16` op list from strategy tune config to wrapper the `FP32` module with `BF16Wrapper` to realize the `BF16 + FP32 + INT8` mixed precision Graph or Module. adaptor will do retrace the `GraphModule` again if using Fx Graph mode.
117+
```

docs/quantization_mixed_precision.md

+38
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,38 @@
1+
### Tensorflow
2+
3+
Intel has worked with the TensorFlow development team to enhance TensorFlow to include bfloat16 data support for CPUs. For more information about BF16 in TensorFlow, please read [Accelerating AI performance on 3rd Gen Intel® Xeon® Scalable processors with TensorFlow and Bfloat16](https://blog.tensorflow.org/2020/06/accelerating-ai-performance-on-3rd-gen-processors-with-tensorflow-bfloat16.html).
4+
5+
- BF16 conversion during quantization in TensorFlow
6+
7+
![Mixed Precision](imgs/bf16_convert_tf.png "Mixed Precision Graph")
8+
9+
- Three steps
10+
11+
1. Convert to a `FP32 + INT8` mixed precision Graph
12+
13+
In this steps, TF adaptor will regard all fallback datatype as `FP32`. According to the per op datatype in tuning config passed by strategy, TF adaptor will generate a `FP32 + INT8` mixed precision graph.
14+
15+
2. Convert to a `BF16 + FP32 + INT8` mixed precision Graph
16+
17+
In this phase, adaptor will convert some `FP32` ops to `BF16` according to `bf16_ops` list in tuning config.
18+
19+
3. Optimize the `BF16 + FP32 + INT8` mixed precision Graph
20+
21+
After the mixed precision graph generated, there are still some optimization need to be applied to improved the performance, for example `Cast + Cast` and so on. The `BF16Convert` transformer also apply a depth-first method to make it possible to take the ops use `BF16` which can support `BF16` datatype to reduce the insertion of `Cast` op.
22+
23+
### PyTorch
24+
25+
Intel has also worked with the PyTorch development team to enhance PyTorch to include bfloat16 data support for CPUs.
26+
27+
- BF16 conversion during quantization in PyTorch
28+
29+
![Mixed Precision](imgs/bf16_convert_pt.png "Mixed Precision Graph")
30+
31+
- Two steps
32+
1. Convert to a `FP32 + INT8` mixed precision Graph or Module
33+
34+
In this steps, PT adaptor will combine the `INT8` ops and all fallback ops to `FP32 + INT8` mixed precision Graph or Module no matter in Eager mode or Fx Graph mode.
35+
36+
2. Convert to a `BF16 + FP32 + INT8` mixed precision Graph or Module
37+
38+
In this phase, adaptor will according to `BF16` op list from strategy tune config to wrapper the `FP32` module with `BF16Wrapper` to realize the `BF16 + FP32 + INT8` mixed precision Graph or Module. adaptor will do retrace the `GraphModule` again if using Fx Graph mode.

0 commit comments

Comments
 (0)