Skip to content

Commit e106dea

Browse files
authored
Update Example for Pytorch 3x Mixed Precision (#1882)
Signed-off-by: zehao-intel <zehao.huang@intel.com>
1 parent 1ebf698 commit e106dea

File tree

19 files changed

+1129
-62
lines changed

19 files changed

+1129
-62
lines changed

docs/3x/PT_MixedPrecision.md

+19-11
Original file line numberDiff line numberDiff line change
@@ -8,13 +8,17 @@ PyTorch Mixed Precision
88

99
## Introduction
1010

11-
The recent growth of Deep Learning has driven the development of more complex models that require significantly more compute and memory capabilities. Several low precision numeric formats have been proposed to address the problem. Google's [bfloat16](https://cloud.google.com/tpu/docs/bfloat16) and the [FP16: IEEE](https://en.wikipedia.org/wiki/Half-precision_floating-point_format) half-precision format are two of the most widely used sixteen bit formats. [Mixed precision](https://arxiv.org/abs/1710.03740) training and inference using low precision formats have been developed to reduce compute and bandwidth requirements.
11+
The recent growth of Deep Learning has driven the development of more complex models that require significantly more compute and memory capabilities. Several low precision numeric formats have been proposed to address the problem.
12+
Google's [bfloat16](https://cloud.google.com/tpu/docs/bfloat16) and the [FP16: IEEE](https://en.wikipedia.org/wiki/Half-precision_floating-point_format) half-precision format are two of the most widely used sixteen bit formats. [Mixed precision](https://arxiv.org/abs/1710.03740) training and inference using low precision formats have been developed to reduce compute and bandwidth requirements.
1213

13-
The 3rd Gen Intel® Xeon® Scalable processor (codenamed Cooper Lake), featuring Intel® Deep Learning Boost, is the first general-purpose x86 CPU to support the bfloat16 format. Specifically, three new bfloat16 instructions are added as a part of the AVX512_BF16 extension within Intel Deep Learning Boost: VCVTNE2PS2BF16, VCVTNEPS2BF16, and VDPBF16PS. The first two instructions allow converting to and from bfloat16 data type, while the last one performs a dot product of bfloat16 pairs. Further details can be found in the [hardware numerics document](https://www.intel.com/content/www/us/en/developer/articles/technical/intel-deep-learning-boost-new-instruction-bfloat16.html) published by Intel.
14+
The 3rd Gen Intel® Xeon® Scalable processor (codenamed Cooper Lake), featuring Intel® Deep Learning Boost, is the first general-purpose x86 CPU to support the bfloat16 format. Specifically, three new bfloat16 instructions are added as a part of the AVX512_BF16 extension within Intel Deep Learning Boost: VCVTNE2PS2BF16, VCVTNEPS2BF16, and VDPBF16PS. The first two instructions allow converting to and from bfloat16 data type, while the last one performs a dot product of bfloat16 pairs.
15+
Further details can be found in the [Hardware Numerics Document](https://www.intel.com/content/www/us/en/developer/articles/technical/intel-deep-learning-boost-new-instruction-bfloat16.html) published by Intel.
1416

15-
The 4th Gen Intel® Xeon® Scalable processor supports FP16 instruction set architecture (ISA) for Intel®
16-
Advanced Vector Extensions 512 (Intel® AVX-512). The new ISA supports a wide range of general-purpose numeric
17-
operations for 16-bit half-precision IEEE-754 floating-point and complements the existing 32-bit and 64-bit floating-point instructions already available in the Intel Xeon processor based products. Further details can be found in the [hardware numerics document](https://www.intel.com/content/www/us/en/content-details/669773/intel-avx-512-fp16-instruction-set-for-intel-xeon-processor-based-products-technology-guide.html) published by Intel.
17+
The 4th Gen Intel® Xeon® Scalable processor supports FP16 instruction set architecture (ISA) for Intel® Advanced Vector Extensions 512 (Intel® AVX-512). The new ISA supports a wide range of general-purpose numeric operations for 16-bit half-precision IEEE-754 floating-point and complements the existing 32-bit and 64-bit floating-point instructions already available in the Intel Xeon processor based products.
18+
Further details can be found in the [Intel AVX512 FP16 Guide](https://www.intel.com/content/www/us/en/content-details/669773/intel-avx-512-fp16-instruction-set-for-intel-xeon-processor-based-products-technology-guide.html) published by Intel.
19+
20+
The latest Intel Xeon processors deliver flexibility of Intel Advanced Matrix Extensions (Intel AMX) ,an accelerator that improves the performance of deep learning(DL) training and inference, making it ideal for workloads like NLP, recommender systems, and image recognition. Developers can code AI functionality to take advantage of the Intel AMX instruction set, and they can code non-AI functionality to use the processor instruction set architecture (ISA). Intel has integrated the Intel® oneAPI Deep Neural Network Library (oneDNN), its oneAPI DL engine, into Pytorch.
21+
Further details can be found in the [Intel AMX Document](https://www.intel.com/content/www/us/en/content-details/785250/accelerate-artificial-intelligence-ai-workloads-with-intel-advanced-matrix-extensions-intel-amx.html) published by Intel.
1822

1923
<p align="center" width="100%">
2024
<img src="./imgs/data_format.png" alt="Architecture" height=230>
@@ -58,6 +62,9 @@ operations for 16-bit half-precision IEEE-754 floating-point and complements the
5862
- PyTorch
5963
1. Hardware: CPU supports `avx512_fp16` instruction set.
6064
2. Software: torch >= [1.11.0](https://download.pytorch.org/whl/torch_stable.html).
65+
> Note: To run FP16 on Intel-AMX, please set the environment variable `ONEDNN_MAX_CPU_ISA`:
66+
> ```export ONEDNN_MAX_CPU_ISA=AVX512_CORE_AMX_FP16```
67+
6168

6269

6370
### Accuracy-driven mixed precision
@@ -68,36 +75,37 @@ To be noticed, IPEX backend doesn't support accuracy-driven mixed precision.
6875

6976
## Get Started with autotune API
7077

71-
To get a bf16/fp16 model, users can use the `autotune` interface with `MixPrecisionConfig` as follows.
78+
To get a bf16/fp16 model, users can use the `autotune` interface with `MixedPrecisionConfig` as follows.
7279

7380
- BF16:
7481

7582
```python
76-
from neural_compressor.torch.quantization import MixPrecisionConfig, TuningConfig, autotune
83+
from neural_compressor.torch.quantization import MixedPrecisionConfig, TuningConfig, autotune
7784

7885
def eval_acc_fn(model):
7986
......
8087
return acc
8188

8289
# modules might be fallback to fp32 to get better accuracy
83-
custom_tune_config = TuningConfig(config_set=[MixPrecisionConfig(dtype=["bf16", "fp32"])], max_trials=3)
90+
custom_tune_config = TuningConfig(config_set=[MixedPrecisionConfig(dtype=["bf16", "fp32"])], max_trials=3)
8491
best_model = autotune(model=build_torch_model(), tune_config=custom_tune_config, eval_fn=eval_acc_fn)
8592
```
8693

8794
- FP16:
8895

8996
```python
90-
from neural_compressor.torch.quantization import MixPrecisionConfig, TuningConfig, autotune
97+
from neural_compressor.torch.quantization import MixedPrecisionConfig, TuningConfig, autotune
9198

9299
def eval_acc_fn(model):
93100
......
94101
return acc
95102

96103
# modules might be fallback to fp32 to get better accuracy
97-
custom_tune_config = TuningConfig(config_set=[MixPrecisionConfig(dtype=["fp16", "fp32"])], max_trials=3)
104+
custom_tune_config = TuningConfig(config_set=[MixedPrecisionConfig(dtype=["fp16", "fp32"])], max_trials=3)
98105
best_model = autotune(model=build_torch_model(), tune_config=custom_tune_config, eval_fn=eval_acc_fn)
99106
```
100107

101108
## Examples
102109

103-
Example will be added later.
110+
Users can also refer to [examples](https://github.com/intel/neural-compressor/blob/master/examples/3.x_api/pytorch\cv\mixed_precision
111+
) on how to quantize a model with Mixed Precision.

examples/.config/model_params_pytorch_3x.json

+7
Original file line numberDiff line numberDiff line change
@@ -146,6 +146,13 @@
146146
"input_model": "",
147147
"main_script": "run_clm_no_trainer.py",
148148
"batch_size": 1
149+
},
150+
"resnet18_mixed_precision": {
151+
"model_src_dir": "cv/mixed_precision",
152+
"dataset_location": "/tf_dataset/pytorch/ImageNet/raw",
153+
"input_model": "resnet18",
154+
"main_script": "main.py",
155+
"batch_size": 100
149156
}
150157
}
151158
}
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,47 @@
1+
Step-by-Step
2+
============
3+
4+
This document describes the step-by-step instructions for reproducing PyTorch ResNet18 MixedPrecision results with Intel® Neural Compressor.
5+
6+
# Prerequisite
7+
8+
### 1. Environment
9+
10+
PyTorch 1.8 or higher version is needed with pytorch_fx backend.
11+
12+
```Shell
13+
cd examples/3.x_api/pytorch/image_recognition/torchvision_models/mixed_precision/resnet18
14+
pip install -r requirements.txt
15+
```
16+
> Note: Validated PyTorch [Version](/docs/source/installation_guide.md#validated-software-environment).
17+
18+
### 2. Prepare Dataset
19+
20+
Download [ImageNet](http://www.image-net.org/) Raw image to dir: /path/to/imagenet. The dir includes below folder:
21+
22+
```bash
23+
ls /path/to/imagenet
24+
train val
25+
```
26+
27+
# Run
28+
29+
> Note: All torchvision model names can be passed as long as they are included in `torchvision.models`, below are some examples.
30+
31+
## MixedPrecision
32+
```Shell
33+
bash run_autotune.sh --input_model=resnet18 --dataset_location=/path/to/imagenet
34+
```
35+
36+
## Benchmark
37+
```Shell
38+
# run optimized performance
39+
bash run_benchmark.sh --input_model=resnet18 --dataset_location=/path/to/imagenet --mode=performance --batch_size=100 --optimized=true --iters=500
40+
# run optimized accuracy
41+
bash run_benchmark.sh --input_model=resnet18 --dataset_location=/path/to/imagenet --mode=accuracy --batch_size=1 --optimized=true
42+
```
43+
44+
45+
46+
47+

0 commit comments

Comments
 (0)