Skip to content

Commit c8e4f40

Browse files
committed
2 parents 2408f7a + 437c8e7 commit c8e4f40

File tree

105 files changed

+1574
-7711
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

105 files changed

+1574
-7711
lines changed

.azure-pipelines/scripts/codeScan/pylint/pylint.sh

+1
Original file line numberDiff line numberDiff line change
@@ -20,6 +20,7 @@ apt-get install -y --no-install-recommends --fix-missing \
2020
build-essential
2121

2222
pip install -r /neural-compressor/requirements.txt
23+
pip install -r /neural-compressor/requirements_pt.txt
2324
pip install cmake
2425

2526
pip install torch \

.azure-pipelines/scripts/install_nc.sh

+1-5
Original file line numberDiff line numberDiff line change
@@ -10,13 +10,9 @@ elif [[ $1 = *"3x_tf"* ]]; then
1010
python -m pip install --no-cache-dir -r requirements_tf.txt
1111
python setup.py tf bdist_wheel
1212
pip install dist/neural_compressor*.whl --force-reinstall
13-
elif [[ $1 = *"3x_ort" ]]; then
14-
python -m pip install --no-cache-dir -r requirements_ort.txt
15-
python setup.py ort bdist_wheel
16-
pip install dist/neural_compressor*.whl --force-reinstall
1713
else
1814
python -m pip install --no-cache-dir -r requirements.txt
19-
python setup.py 2x bdist_wheel
15+
python setup.py bdist_wheel
2016
pip install dist/neural_compressor*.whl --force-reinstall
2117
fi
2218

.azure-pipelines/scripts/ut/3x/coverage.3x_ort

-15
This file was deleted.

.azure-pipelines/scripts/ut/3x/run_3x_ort.sh

-35
This file was deleted.

.azure-pipelines/scripts/ut/env_setup.sh

+1-1
Original file line numberDiff line numberDiff line change
@@ -92,7 +92,7 @@ elif [[ $(echo "${test_case}" | grep -c "tf pruning") != 0 ]]; then
9292
fi
9393

9494
if [[ $(echo "${test_case}" | grep -c "api") != 0 ]] || [[ $(echo "${test_case}" | grep -c "adaptor") != 0 ]]; then
95-
pip install auto-round
95+
pip install git+https://github.com/intel/auto-round.git@24b2e74070f2b4e6f26ff069ec75af74cf5b177c
9696
fi
9797

9898
# test deps

.azure-pipelines/scripts/ut/run_itrex.sh

+3
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,9 @@ source /neural-compressor/.azure-pipelines/scripts/change_color.sh
44
python -c "import neural_compressor as nc;print(nc.version.__version__)"
55
echo "run itrex ut..."
66

7+
# install inc 3x deps
8+
pip install -r /neural-compressor/requirements_pt.txt
9+
710
# prepare itrex
811
git clone https://github.com/intel/intel-extension-for-transformers.git /intel-extension-for-transformers
912
cd /intel-extension-for-transformers && git rev-parse --short HEAD

.azure-pipelines/ut-3x-ort.yml

-109
This file was deleted.

.github/checkgroup.yml

-13
Original file line numberDiff line numberDiff line change
@@ -140,16 +140,3 @@ subprojects:
140140
- "UT-3x-Torch (Coverage Compare CollectDatafiles)"
141141
- "UT-3x-Torch (Unit Test 3x Torch Unit Test 3x Torch)"
142142
- "UT-3x-Torch (Unit Test 3x Torch baseline Unit Test 3x Torch baseline)"
143-
144-
- id: "Unit Tests 3x-ONNXRT workflow"
145-
paths:
146-
- "neural_compressor/common/**"
147-
- "neural_compressor/onnxrt/**"
148-
- "test/3x/onnxrt/**"
149-
- "setup.py"
150-
- "requirements_ort.txt"
151-
checks:
152-
- "UT-3x-ONNXRT"
153-
- "UT-3x-ONNXRT (Coverage Compare CollectDatafiles)"
154-
- "UT-3x-ONNXRT (Unit Test 3x ONNXRT Unit Test 3x ONNXRT)"
155-
- "UT-3x-ONNXRT (Unit Test 3x ONNXRT baseline Unit Test 3x ONNXRT baseline)"

README.md

+36-27
Original file line numberDiff line numberDiff line change
@@ -19,20 +19,25 @@ Intel® Neural Compressor aims to provide popular model compression techniques s
1919
as well as Intel extensions such as [Intel Extension for TensorFlow](https://github.com/intel/intel-extension-for-tensorflow) and [Intel Extension for PyTorch](https://github.com/intel/intel-extension-for-pytorch).
2020
In particular, the tool provides the key features, typical examples, and open collaborations as below:
2121

22-
* Support a wide range of Intel hardware such as [Intel Xeon Scalable Processors](https://www.intel.com/content/www/us/en/products/details/processors/xeon/scalable.html), [Intel Xeon CPU Max Series](https://www.intel.com/content/www/us/en/products/details/processors/xeon/max-series.html), [Intel Data Center GPU Flex Series](https://www.intel.com/content/www/us/en/products/details/discrete-gpus/data-center-gpu/flex-series.html), and [Intel Data Center GPU Max Series](https://www.intel.com/content/www/us/en/products/details/discrete-gpus/data-center-gpu/max-series.html) with extensive testing; support AMD CPU, ARM CPU, and NVidia GPU through ONNX Runtime with limited testing
22+
* Support a wide range of Intel hardware such as [Intel Gaudi Al Accelerators](https://www.intel.com/content/www/us/en/products/details/processors/ai-accelerators/gaudi-overview.html), [Intel Core Ultra Processors](https://www.intel.com/content/www/us/en/products/details/processors/core-ultra.html), [Intel Xeon Scalable Processors](https://www.intel.com/content/www/us/en/products/details/processors/xeon/scalable.html), [Intel Xeon CPU Max Series](https://www.intel.com/content/www/us/en/products/details/processors/xeon/max-series.html), [Intel Data Center GPU Flex Series](https://www.intel.com/content/www/us/en/products/details/discrete-gpus/data-center-gpu/flex-series.html), and [Intel Data Center GPU Max Series](https://www.intel.com/content/www/us/en/products/details/discrete-gpus/data-center-gpu/max-series.html) with extensive testing;
23+
support AMD CPU, ARM CPU, and NVidia GPU through ONNX Runtime with limited testing; support NVidia GPU for some WOQ algorithms like AutoRound and HQQ.
2324

2425
* Validate popular LLMs such as [LLama2](/examples/pytorch/nlp/huggingface_models/language-modeling/quantization/llm), [Falcon](/examples/pytorch/nlp/huggingface_models/language-modeling/quantization/llm), [GPT-J](/examples/pytorch/nlp/huggingface_models/language-modeling/quantization/llm), [Bloom](/examples/pytorch/nlp/huggingface_models/language-modeling/quantization/llm), [OPT](/examples/pytorch/nlp/huggingface_models/language-modeling/quantization/llm), and more than 10,000 broad models such as [Stable Diffusion](/examples/pytorch/nlp/huggingface_models/text-to-image/quantization), [BERT-Large](/examples/pytorch/nlp/huggingface_models/text-classification/quantization/ptq_static/fx), and [ResNet50](/examples/pytorch/image_recognition/torchvision_models/quantization/ptq/cpu/fx) from popular model hubs such as [Hugging Face](https://huggingface.co/), [Torch Vision](https://pytorch.org/vision/stable/index.html), and [ONNX Model Zoo](https://github.com/onnx/models#models), with automatic [accuracy-driven](/docs/source/design.md#workflow) quantization strategies
2526

2627
* Collaborate with cloud marketplaces such as [Google Cloud Platform](https://console.cloud.google.com/marketplace/product/bitnami-launchpad/inc-tensorflow-intel?project=verdant-sensor-286207), [Amazon Web Services](https://aws.amazon.com/marketplace/pp/prodview-yjyh2xmggbmga#pdp-support), and [Azure](https://azuremarketplace.microsoft.com/en-us/marketplace/apps/bitnami.inc-tensorflow-intel), software platforms such as [Alibaba Cloud](https://www.intel.com/content/www/us/en/developer/articles/technical/quantize-ai-by-oneapi-analytics-on-alibaba-cloud.html), [Tencent TACO](https://new.qq.com/rain/a/20221202A00B9S00) and [Microsoft Olive](https://github.com/microsoft/Olive), and open AI ecosystem such as [Hugging Face](https://huggingface.co/blog/intel), [PyTorch](https://pytorch.org/tutorials/recipes/intel_neural_compressor_for_pytorch.html), [ONNX](https://github.com/onnx/models#models), [ONNX Runtime](https://github.com/microsoft/onnxruntime), and [Lightning AI](https://github.com/Lightning-AI/lightning/blob/master/docs/source-pytorch/advanced/post_training_quantization.rst)
2728

2829
## What's New
29-
* [2024/03] A new SOTA approach [AutoRound](https://github.com/intel/auto-round) Weight-Only Quantization on [Intel Gaudi2 AI accelerator](https://habana.ai/products/gaudi2/) is available for LLMs.
30+
* [2024/07] From 3.0 release, framework extension API is recommended to be used for quantization.
31+
* [2024/07] Performance optimizations and usability improvements on [client-side](https://github.com/intel/neural-compressor/blob/master/docs/3x/client_quant.md).
3032

3133
## Installation
3234

3335
### Install from pypi
3436
```Shell
35-
pip install neural-compressor
37+
# Install 2.X API + Framework extension API + PyTorch dependency
38+
pip install neural-compressor[pt]
39+
# Install 2.X API + Framework extension API + TensorFlow dependency
40+
pip install neural-compressor[tf]
3641
```
3742
> **Note**:
3843
> Further installation methods can be found under [Installation Guide](https://github.com/intel/neural-compressor/blob/master/docs/source/installation_guide.md). check out our [FAQ](https://github.com/intel/neural-compressor/blob/master/docs/source/faq.md) for more details.
@@ -116,54 +121,58 @@ quantized_model = fit(model=float_model, conf=static_quant_conf, calib_dataloade
116121
</thead>
117122
<tbody>
118123
<tr>
119-
<td colspan="2" align="center"><a href="./docs/source/design.md#architecture">Architecture</a></td>
120-
<td colspan="2" align="center"><a href="./docs/source/design.md#workflow">Workflow</a></td>
121-
<td colspan="1" align="center"><a href="https://intel.github.io/neural-compressor/latest/docs/source/api-doc/apis.html">APIs</a></td>
122-
<td colspan="1" align="center"><a href="./docs/source/llm_recipes.md">LLMs Recipes</a></td>
123-
<td colspan="2" align="center"><a href="examples/README.md">Examples</a></td>
124+
<td colspan="2" align="center"><a href="./docs/3x/design.md#architecture">Architecture</a></td>
125+
<td colspan="2" align="center"><a href="./docs/3x/design.md#workflow">Workflow</a></td>
126+
<td colspan="2" align="center"><a href="https://intel.github.io/neural-compressor/latest/docs/source/api-doc/apis.html">APIs</a></td>
127+
<td colspan="1" align="center"><a href="./docs/3x/llm_recipes.md">LLMs Recipes</a></td>
128+
<td colspan="1" align="center">Examples</td>
124129
</tr>
125130
</tbody>
126131
<thead>
127132
<tr>
128-
<th colspan="8">Python-based APIs</th>
133+
<th colspan="8">PyTorch Extension APIs</th>
129134
</tr>
130135
</thead>
131136
<tbody>
132137
<tr>
133-
<td colspan="2" align="center"><a href="./docs/source/quantization.md">Quantization</a></td>
134-
<td colspan="2" align="center"><a href="./docs/source/mixed_precision.md">Advanced Mixed Precision</a></td>
135-
<td colspan="2" align="center"><a href="./docs/source/pruning.md">Pruning (Sparsity)</a></td>
136-
<td colspan="2" align="center"><a href="./docs/source/distillation.md">Distillation</a></td>
138+
<td colspan="2" align="center"><a href="./docs/3x/PyTorch.md">Overview</a></td>
139+
<td colspan="2" align="center"><a href="./docs/3x/PT_StaticQuant.md">Static Quantization</a></td>
140+
<td colspan="2" align="center"><a href="./docs/3x/PT_DynamicQuant.md">Dynamic Quantization</a></td>
141+
<td colspan="2" align="center"><a href="./docs/3x/PT_SmoothQuant.md">Smooth Quantization</a></td>
137142
</tr>
138143
<tr>
139-
<td colspan="2" align="center"><a href="./docs/source/orchestration.md">Orchestration</a></td>
140-
<td colspan="2" align="center"><a href="./docs/source/benchmark.md">Benchmarking</a></td>
141-
<td colspan="2" align="center"><a href="./docs/source/distributed.md">Distributed Compression</a></td>
142-
<td colspan="2" align="center"><a href="./docs/source/export.md">Model Export</a></td>
144+
<td colspan="4" align="center"><a href="./docs/3x/PT_WeightOnlyQuant.md">Weight-Only Quantization</a></td>
145+
<td colspan="2" align="center"><a href="./docs/3x/PT_MXQuant.md">MX Quantization</a></td>
146+
<td colspan="2" align="center"><a href="./docs/3x/PT_MixedPrecision.md">Mixed Precision</a></td>
143147
</tr>
144148
</tbody>
145149
<thead>
146150
<tr>
147-
<th colspan="8">Advanced Topics</th>
151+
<th colspan="8">Tensorflow Extension APIs</th>
148152
</tr>
149153
</thead>
150154
<tbody>
151155
<tr>
152-
<td colspan="2" align="center"><a href="./docs/source/adaptor.md">Adaptor</a></td>
153-
<td colspan="2" align="center"><a href="./docs/source/tuning_strategies.md">Strategy</a></td>
154-
<td colspan="2" align="center"><a href="./docs/source/distillation_quantization.md">Distillation for Quantization</a></td>
155-
<td colspan="2" align="center"><a href="./docs/source/smooth_quant.md">SmoothQuant</td>
156+
<td colspan="3" align="center"><a href="./docs/3x/TensorFlow.md">Overview</a></td>
157+
<td colspan="3" align="center"><a href="./docs/3x/TF_Quant.md">Static Quantization</a></td>
158+
<td colspan="2" align="center"><a href="./docs/3x/TF_SQ.md">Smooth Quantization</a></td>
156159
</tr>
160+
</tbody>
161+
<thead>
157162
<tr>
158-
<td colspan="4" align="center"><a href="./docs/source/quantization_weight_only.md">Weight-Only Quantization (INT8/INT4/FP4/NF4) </td>
159-
<td colspan="2" align="center"><a href="https://github.com/intel/neural-compressor/blob/fp8_adaptor/docs/source/fp8.md">FP8 Quantization </td>
160-
<td colspan="2" align="center"><a href="./docs/source/quantization_layer_wise.md">Layer-Wise Quantization </td>
163+
<th colspan="8">Other Modules</th>
164+
</tr>
165+
</thead>
166+
<tbody>
167+
<tr>
168+
<td colspan="4" align="center"><a href="./docs/3x/autotune.md">Auto Tune</a></td>
169+
<td colspan="4" align="center"><a href="./docs/3x/benchmark.md">Benchmark</a></td>
161170
</tr>
162171
</tbody>
163172
</table>
164173

165-
> **Note**:
166-
> Further documentations can be found at [User Guide](https://github.com/intel/neural-compressor/blob/master/docs/source/user_guide.md).
174+
> **Note**:
175+
> From 3.0 release, we recommend to use 3.X API. Compression techniques during training such as QAT, Pruning, Distillation only available in [2.X API](https://github.com/intel/neural-compressor/blob/master/docs/source/2x_user_guide.md) currently.
167176
168177
## Selected Publications/Events
169178
* Blog by Intel: [Neural Compressor: Boosting AI Model Efficiency](https://community.intel.com/t5/Blogs/Tech-Innovation/Artificial-Intelligence-AI/Neural-Compressor-Boosting-AI-Model-Efficiency/post/1604740) (June 2024)
File renamed without changes.

docs/3x/PT_WeightOnlyQuant.md

+6
Original file line numberDiff line numberDiff line change
@@ -15,6 +15,7 @@ PyTorch Weight Only Quantization
1515
- [HQQ](#hqq)
1616
- [Specify Quantization Rules](#specify-quantization-rules)
1717
- [Saving and Loading](#saving-and-loading)
18+
- [Efficient Usage on Client-Side](#efficient-usage-on-client-side)
1819
- [Examples](#examples)
1920

2021
## Introduction
@@ -276,6 +277,11 @@ loaded_model = load(
276277
) # Please note that the original_model parameter passes the original model.
277278
```
278279

280+
## Efficient Usage on Client-Side
281+
282+
For client machines with limited RAM and cores, we offer optimizations to reduce computational overhead and minimize memory usage. For detailed information, please refer to [Quantization on Client](https://github.com/intel/neural-compressor/blob/master/docs/3x/client_quant.md).
283+
284+
279285
## Examples
280286

281287
Users can also refer to [examples](https://github.com/intel/neural-compressor/blob/master/examples/3.x_api/pytorch/nlp/huggingface_models/language-modeling/quantization/weight_only) on how to quantize a model with WeightOnlyQuant.

docs/3x/PyTorch.md

+15
Original file line numberDiff line numberDiff line change
@@ -194,6 +194,21 @@ def load(output_dir="./saved_results", model=None):
194194
<td class="tg-9wq8">&#10004</td>
195195
<td class="tg-9wq8"><a href="PT_DynamicQuant.md">link</a></td>
196196
</tr>
197+
<tr>
198+
<td class="tg-9wq8">MX Quantization</td>
199+
<td class="tg-9wq8"><a href=https://arxiv.org/pdf/2310.10537>Microscaling Data Formats for
200+
Deep Learning</a></td>
201+
<td class="tg-9wq8">PyTorch eager mode</td>
202+
<td class="tg-9wq8">&#10004</td>
203+
<td class="tg-9wq8"><a href="PT_MXQuant.md">link</a></td>
204+
</tr>
205+
<tr>
206+
<td class="tg-9wq8">Mixed Precision</td>
207+
<td class="tg-9wq8"><a href=https://arxiv.org/abs/1710.03740>Mixed precision</a></td>
208+
<td class="tg-9wq8">PyTorch eager mode</td>
209+
<td class="tg-9wq8">&#10004</td>
210+
<td class="tg-9wq8"><a href="PT_MixPrecision.md">link</a></td>
211+
</tr>
197212
<tr>
198213
<td class="tg-9wq8">Quantization Aware Training</td>
199214
<td class="tg-9wq8"><a href=https://pytorch.org/docs/master/quantization.html#quantization-aware-training-for-static-quantization>Quantization Aware Training</a></td>

0 commit comments

Comments
 (0)