You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardexpand all lines: docs/mixed_precision.md
+6-61
Original file line number
Diff line number
Diff line change
@@ -7,9 +7,7 @@ The recent growth of Deep Learning has driven the development of more complex mo
7
7
8
8
The recently launched 3rd Gen Intel® Xeon® Scalable processor (codenamed Cooper Lake), featuring Intel® Deep Learning Boost, is the first general-purpose x86 CPU to support the bfloat16 format. Specifically, three new bfloat16 instructions are added as a part of the AVX512_BF16 extension within Intel Deep Learning Boost: VCVTNE2PS2BF16, VCVTNEPS2BF16, and VDPBF16PS. The first two instructions allow converting to and from bfloat16 data type, while the last one performs a dot product of bfloat16 pairs. Further details can be found in the [hardware numerics document](https://software.intel.com/content/www/us/en/develop/download/bfloat16-hardware-numerics-definition.html) published by Intel.
9
9
10
-
Intel® Neural Compressor (INC) supports two use cases for mixed precision:
11
-
1. Direct mixed precision conversion: Mix `BF16 + FP32` and be executed by MixedPrecision API
12
-
2. Mixed precision during quantization: Mix `BF16 + FP32 + INT8` and occur during quantization
@@ -29,21 +27,7 @@ It needs the CPU supports `avx512_bf16` instruction set.
29
27
Intel has worked with the PyTorch & TensorFlow development teams to enhance PyTorch & TensorFlow to include bfloat16 data support for CPUs.
30
28
- For PyTorch, the version higher than [1.11.0](https://download.pytorch.org/whl/torch_stable.html) is necessary.
31
29
32
-
33
-
- For Tensorflow, BF16 support has been enabled in intel-tensorflow [2.3.0](https://pypi.org/project/intel-tensorflow/2.3.0/)/[2.4.0](https://pypi.org/project/intel-tensorflow/2.4.0/)/[1.15.0up1](https://github.com/Intel-tensorflow/tensorflow/tree/v1.15.0up1)/[1.15.0up2](https://github.com/Intel-tensorflow/tensorflow/tree/v1.15.0up2) and intel-tensorflow-avx512[2.3.0](https://pypi.org/project/intel-tensorflow-avx512/2.3.0/)/[2.4.0](https://pypi.org/project/intel-tensorflow-avx512/2.4.0/).
34
-
35
-
> For more information about BF16 in TensorFlow, please read [Accelerating AI performance on 3rd Gen Intel® Xeon® Scalable processors with TensorFlow and Bfloat16](https://blog.tensorflow.org/2020/06/accelerating-ai-performance-on-3rd-gen-processors-with-tensorflow-bfloat16.html).
36
-
37
-
> To get better performance with BF16 datatype, the intel-tensorflow-avx512 is recommended, or build intel tensorflow (take [tag v1.15.0up2](https://github.com/Intel-tensorflow/tensorflow/tree/v1.15.0up2) as example) from source code by using below command:
- For Tensorflow, the version higher than [2.3.0](https://pypi.org/project/intel-tensorflow/2.3.0/) is necessary.
47
31
48
32
## Methods to enable & disable BF16 support
49
33
By default, BF16 has been added into activation and weight supported datatype if **the TensorFlow/PyTorch version and CPU meet the requirements at the same time**. We can disable it in the yaml config file by specifying the datatype for activation and weight.
> ⚠️Without hardware or software support, the poor performance or other problems may expect for force enabling.
56
40
57
-
## Direct mixed precision conversion
41
+
> During quantization, BF16 conversion will be executed automatically as well if pre-requirements are met or force enable it. Please refer to this [document](./quantization_mixed_precision.md) for its workflow.
58
42
59
-
INC queries framework capability and user-defined precision to generate an op-wise config based on the pre-optimized fp32 model. Direct mixed precision conversion will be implemented under the direction of config. Further, if users add necessary evaluation components, INC will tune accuracy during conversion.
43
+
## How to use
60
44
61
-
### How to use it
45
+
INC queries framework capability and user-defined precision to generate an op-wise config based on the pre-optimized fp32 model. Direct mixed precision conversion will be implemented under the direction of config. Further, if users add necessary evaluation components, INC will tune accuracy during conversion.
62
46
63
47
- Convert as many nodes as possible to target dtype
64
48
@@ -130,43 +114,4 @@ INC queries framework capability and user-defined precision to generate an op-wi
1. Convert to a `FP32 + INT8` mixed precision Graph
148
-
149
-
In this steps, TF adaptor will regard all fallback datatype as `FP32`. According to the per op datatype in tuning config passed by strategy, TF adaptor will generate a `FP32 + INT8` mixed precision graph.
150
-
151
-
2. Convert to a `BF16 + FP32 + INT8` mixed precision Graph
152
-
153
-
In this phase, adaptor will convert some `FP32` ops to `BF16` according to `bf16_ops` list in tuning config.
After the mixed precision graph generated, there are still some optimization need to be applied to improved the performance, for example `Cast + Cast` and so on. The `BF16Convert` transformer also apply a depth-first method to make it possible to take the ops use `BF16` which can support `BF16` datatype to reduce the insertion of `Cast` op.
1. Convert to a `FP32 + INT8` mixed precision Graph or Module
167
-
168
-
In this steps, PT adaptor will combine the `INT8` ops and all fallback ops to `FP32 + INT8` mixed precision Graph or Module no matter in Eager mode or Fx Graph mode.
169
-
170
-
2. Convert to a `BF16 + FP32 + INT8` mixed precision Graph or Module
171
-
172
-
In this phase, adaptor will according to `BF16` op list from strategy tune config to wrapper the `FP32` module with `BF16Wrapper` to realize the `BF16 + FP32 + INT8` mixed precision Graph or Module. adaptor will do retrace the `GraphModule` again if using Fx Graph mode.
Intel has worked with the TensorFlow development team to enhance TensorFlow to include bfloat16 data support for CPUs. For more information about BF16 in TensorFlow, please read [Accelerating AI performance on 3rd Gen Intel® Xeon® Scalable processors with TensorFlow and Bfloat16](https://blog.tensorflow.org/2020/06/accelerating-ai-performance-on-3rd-gen-processors-with-tensorflow-bfloat16.html).
4
+
5
+
- BF16 conversion during quantization in TensorFlow
1. Convert to a `FP32 + INT8` mixed precision Graph
12
+
13
+
In this steps, TF adaptor will regard all fallback datatype as `FP32`. According to the per op datatype in tuning config passed by strategy, TF adaptor will generate a `FP32 + INT8` mixed precision graph.
14
+
15
+
2. Convert to a `BF16 + FP32 + INT8` mixed precision Graph
16
+
17
+
In this phase, adaptor will convert some `FP32` ops to `BF16` according to `bf16_ops` list in tuning config.
After the mixed precision graph generated, there are still some optimization need to be applied to improved the performance, for example `Cast + Cast` and so on. The `BF16Convert` transformer also apply a depth-first method to make it possible to take the ops use `BF16` which can support `BF16` datatype to reduce the insertion of `Cast` op.
22
+
23
+
### PyTorch
24
+
25
+
Intel has also worked with the PyTorch development team to enhance PyTorch to include bfloat16 data support for CPUs.
1. Convert to a `FP32 + INT8` mixed precision Graph or Module
33
+
34
+
In this steps, PT adaptor will combine the `INT8` ops and all fallback ops to `FP32 + INT8` mixed precision Graph or Module no matter in Eager mode or Fx Graph mode.
35
+
36
+
2. Convert to a `BF16 + FP32 + INT8` mixed precision Graph or Module
37
+
38
+
In this phase, adaptor will according to `BF16` op list from strategy tune config to wrapper the `FP32` module with `BF16Wrapper` to realize the `BF16 + FP32 + INT8` mixed precision Graph or Module. adaptor will do retrace the `GraphModule` again if using Fx Graph mode.
0 commit comments