Enhance 3.x torch WOQ load #1877

yuwenzho · 2024-06-18T08:08:23Z

Type of Change

feature
API changed or not: no

Description

Use different WeightOnlyLinear module according to device.

Abstract WeightOnlyLinear class. Inherited class INCWeightOnlyLinear and HPUWeighOnlyLinear
Load woq linear weight module by module
save hpu format tensor to reuse it once load it again: huggingface format save to local 'hpu_model.safetensor' file; default format save to 'quantized_hpu_weight.pt' file

load huggingface WOQ model example:

from neural_compressor.torch.quantization import load

model_id = "TheBloke/TinyLlama-1.1B-python-v0.1-GPTQ"
# first load: torch.nn.Linear -> INCWeightOnlyLinear -> HPUWeightOnlyLinear, 
# and then save hpu_model.safetensors to local cache dir
qmodel = load(model_name_or_path=model_id, format="huggingface", device="hpu")

# second load: torch.nn.Linear -> HPUWeightOnlyLinear using hpu_model.safetensors saved in local cache dir
qmodel = load(model_name_or_path=model_id, format="huggingface", device="hpu")

load INC WOQ model example:

from neural_compressor.torch.quantization import load

# first load: torch.nn.Linear -> INCWeightOnlyLinear -> HPUWeightOnlyLinear, 
# and then save quantized_hpu_weight.pt to 'saved_results' dir
qmodel = load("saved_results", original_model=fp32_model, device="hpu")

# second load: torch.nn.Linear -> HPUWeightOnlyLinear using quantized_hpu_weight.pt saved in 'saved_results' dir
qmodel = load("saved_results", original_model=fp32_model, device="hpu")

How has this PR been tested?

CI

Dependency Change?

No

Signed-off-by: yuwenzho <yuwen.zhou@intel.com>

github-actions · 2024-06-18T08:08:47Z

⛈️ Required checks status: Has failure 🔴

Warning
If you do not have the access to re-run the Probot, please contact XuehaoSun for help. If you push a new commit, all of the workflow will be re-triggered.

Groups summary

🟢 Code Scan Tests workflow

Check ID	Status
Code-Scan	success	✅
Code-Scan (Bandit Code Scan Bandit)	success	✅
Code-Scan (DocStyle Code Scan DocStyle)	success	✅
Code-Scan (Pylint Code Scan Pylint)	success	✅

These checks are required after the changes to neural_compressor/torch/algorithms/weight_only/gptq.py, neural_compressor/torch/algorithms/weight_only/modules.py, neural_compressor/torch/algorithms/weight_only/rtn.py, neural_compressor/torch/algorithms/weight_only/save_load.py, neural_compressor/torch/quantization/load_entry.py, neural_compressor/torch/utils/environ.py, neural_compressor/torch/utils/utility.py.

🟢 Model Tests 3x workflow

Check ID	Status
Model-Test-3x	success	✅
Model-Test-3x (Generate Report GenerateReport)	success	✅
Model-Test-3x (Run PyTorch Model opt_125m_woq_gptq_int4)	success	✅
Model-Test-3x (Run PyTorch Model opt_125m_woq_gptq_int4_dq_bnb)	success	✅
Model-Test-3x (Run PyTorch Model opt_125m_woq_gptq_int4_dq_ggml)	success	✅

These checks are required after the changes to neural_compressor/torch/algorithms/weight_only/gptq.py, neural_compressor/torch/algorithms/weight_only/modules.py, neural_compressor/torch/algorithms/weight_only/rtn.py, neural_compressor/torch/algorithms/weight_only/save_load.py, neural_compressor/torch/quantization/load_entry.py, neural_compressor/torch/utils/environ.py, neural_compressor/torch/utils/utility.py.

🔴 Unit Tests 3x-PyTorch workflow

Check ID	Status	Error details
UT-3x-Torch	failure		❌
UT-3x-Torch (Coverage Compare CollectDatafiles)	failure	download	❌
UT-3x-Torch (Unit Test 3x Torch Unit Test 3x Torch)	success		✅
UT-3x-Torch (Unit Test 3x Torch baseline Unit Test 3x Torch baseline)	success		✅

These checks are required after the changes to neural_compressor/torch/algorithms/weight_only/gptq.py, neural_compressor/torch/algorithms/weight_only/modules.py, neural_compressor/torch/algorithms/weight_only/rtn.py, neural_compressor/torch/algorithms/weight_only/save_load.py, neural_compressor/torch/quantization/load_entry.py, neural_compressor/torch/utils/environ.py, neural_compressor/torch/utils/utility.py, test/3x/torch/quantization/weight_only/test_autoround.py, test/3x/torch/quantization/weight_only/test_awq.py, test/3x/torch/quantization/weight_only/test_gptq.py, test/3x/torch/quantization/weight_only/test_load.py, test/3x/torch/quantization/weight_only/test_load_woq_hf_model.py, test/3x/torch/quantization/weight_only/test_rtn.py.

Thank you for your contribution! 💜

Note
This comment is automatically generated and will be updates every 180 seconds within the next 6 hours. If you have any other questions, contact chensuyue or XuehaoSun for help.

Signed-off-by: yuwenzho <yuwen.zhou@intel.com>

for more information, see https://pre-commit.ci

xin3he

Adjustments after discussion:

use skipif for hpu logics and avoid exposing WOQModelLoader.
avoid using AutoRoundWeightOnlyLinear so that we can unpack and pack to HPUWeightOnlyLinear

Kaihui-intel · 2024-06-18T08:39:44Z

Abstract WeightOnlyLinear class. Inherited class INCWeightOnlyLinear and HPUWeighOnlyLinear
For cpu, how does the woq algorithm use abstract class WeightOnlyLinear ? Do we use INCweightonlinear instead of WeightOnlyLinear?

neural_compressor/torch/algorithms/weight_only/save_load.py

neural_compressor/torch/algorithms/weight_only/modules.py

Signed-off-by: yuwenzho <yuwen.zhou@intel.com>

for more information, see https://pre-commit.ci

Signed-off-by: yuwenzho <yuwen.zhou@intel.com>

Signed-off-by: yuwenzho <zyuwen@habana.ai>

bgoldberg-habana · 2024-06-19T12:07:59Z

neural_compressor/torch/algorithms/weight_only/save_load.py

+            os.path.abspath(os.path.expanduser(self.model_name_or_path)), WEIGHT_NAME
+        )
+        # if hpu format tensor can be used directly, then update qmodel_weight_file_path to the hpu format tensor file
+        if self._with_hpu_format_tensor():


suggest here to change the format to hpu and later on load the correct layer according to hpu format.

Good idea, we should use the format to indicate that the loaded model file is already in hpu format.
However, we think it is better to use the habana format because hpu is a device name.
Looking forward to your feedback.

Sorry, I made a mistake. Thanks for correcting me, @yuwenzho.
The arg: format indicates the load API arguments format, not packing format. We are using arg: device to decide which packing format should be used. If the packing format is already hpu format, we should have some flag in config.json or else before uploading it to huggingface hub.

# load API format 1 load(model_name_or_path="saved_results", original_model=fp32_model, format="default") # load API format 2 load(model_name_or_path="TheBloke/TinyLlama-1.1B-python-v0.1-GPTQ", format="huggingface")

bgoldberg-habana · 2024-06-19T12:11:47Z

neural_compressor/torch/algorithms/weight_only/save_load.py

+        device_dict = {"cpu": INCWeightOnlyLinear, "hpu": HPUWeightOnlyLinear}
+
+        # if hpu format tensor can be used directly, then update mapping module to HPUWeightOnlyLinear
+        if self._with_hpu_format_tensor():


suggest to add new format for HPU instead, it will be clearer. and then you can move format_dict and device_dictto global place ..

format_dict and device_dict are moved to global place in 8001a0a

xin3he

LGTM

neural_compressor/torch/utils/auto_accelerator.py

Signed-off-by: yuwenzho <yuwen.zhou@intel.com>

for more information, see https://pre-commit.ci

yuwenzho · 2024-06-20T08:58:38Z

Abstract WeightOnlyLinear class. Inherited class INCWeightOnlyLinear and HPUWeighOnlyLinear For cpu, how does the woq algorithm use abstract class WeightOnlyLinear ? Do we use INCweightonlinear instead of WeightOnlyLinear?

Yes, algorithm should use INCweightonlinear. Fixed in 56c864f

Signed-off-by: yuwenzho <yuwen.zhou@intel.com>

for more information, see https://pre-commit.ci

Signed-off-by: yuwenzho <yuwen.zhou@intel.com>

for more information, see https://pre-commit.ci

Signed-off-by: yuwenzho <yuwen.zhou@intel.com>

yuwenzho added 4 commits June 18, 2024 06:52

enhance woq load

9c631ac

Signed-off-by: yuwenzho <yuwen.zhou@intel.com>

enhance code

bb467a0

Signed-off-by: yuwenzho <yuwen.zhou@intel.com>

enhance code

923d344

Signed-off-by: yuwenzho <yuwen.zhou@intel.com>

enhance code

d6b8cb9

Signed-off-by: yuwenzho <yuwen.zhou@intel.com>

yuwenzho added the PyTorch Related to PyTorch F/W label Jun 18, 2024

yuwenzho requested review from xin3he, Kaihui-intel and yiliu30 June 18, 2024 08:08

yuwenzho and others added 2 commits June 18, 2024 08:10

fix code

43f0c0a

Signed-off-by: yuwenzho <yuwen.zhou@intel.com>

[pre-commit.ci] auto fixes from pre-commit.com hooks

2b64b48

for more information, see https://pre-commit.ci

xin3he approved these changes Jun 18, 2024

View reviewed changes

yiliu30 reviewed Jun 18, 2024

View reviewed changes

neural_compressor/torch/algorithms/weight_only/save_load.py Outdated Show resolved Hide resolved

neural_compressor/torch/algorithms/weight_only/modules.py Show resolved Hide resolved

neural_compressor/torch/algorithms/weight_only/modules.py Outdated Show resolved Hide resolved

yuwenzho and others added 6 commits June 19, 2024 08:22

enhance code & fix bugs

56c864f

Signed-off-by: yuwenzho <yuwen.zhou@intel.com>

[pre-commit.ci] auto fixes from pre-commit.com hooks

1f3f619

for more information, see https://pre-commit.ci

enhance code

ab45bf2

Signed-off-by: yuwenzho <yuwen.zhou@intel.com>

Merge branch 'master' into yuwenzho/enhance_woq_load

6366ac2

enhance code

6501ad1

Signed-off-by: yuwenzho <zyuwen@habana.ai>

enhance code

30cad87

Signed-off-by: yuwenzho <zyuwen@habana.ai>

bgoldberg-habana reviewed Jun 19, 2024

View reviewed changes

xin3he approved these changes Jun 20, 2024

View reviewed changes

neural_compressor/torch/utils/auto_accelerator.py Outdated Show resolved Hide resolved

yuwenzho and others added 2 commits June 20, 2024 08:53

enhance code & enhance ut & fix bug

8001a0a

Signed-off-by: yuwenzho <yuwen.zhou@intel.com>

[pre-commit.ci] auto fixes from pre-commit.com hooks

4b1e366

for more information, see https://pre-commit.ci

yuwenzho and others added 2 commits June 20, 2024 09:10

fix pylint failure

f5d1b43

Signed-off-by: yuwenzho <yuwen.zhou@intel.com>

[pre-commit.ci] auto fixes from pre-commit.com hooks

b1b4c67

for more information, see https://pre-commit.ci

yiliu30 approved these changes Jun 20, 2024

View reviewed changes

yuwenzho added the WIP label Jun 28, 2024

enhance code

9c41f57

Signed-off-by: yuwenzho <yuwen.zhou@intel.com>

yuwenzho and others added 4 commits July 2, 2024 06:41

fix conflict

5ebdcfe

Signed-off-by: yuwenzho <yuwen.zhou@intel.com>

fix conflict

eda15ee

Signed-off-by: yuwenzho <yuwen.zhou@intel.com>

[pre-commit.ci] auto fixes from pre-commit.com hooks

3fa8461

for more information, see https://pre-commit.ci

fix ut

602b572

Signed-off-by: yuwenzho <yuwen.zhou@intel.com>

chensuyue closed this Jul 17, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enhance 3.x torch WOQ load #1877

Enhance 3.x torch WOQ load #1877

yuwenzho commented Jun 18, 2024

github-actions bot commented Jun 18, 2024 •

edited

Loading

xin3he left a comment •

edited

Loading

Kaihui-intel commented Jun 18, 2024

bgoldberg-habana Jun 19, 2024

xin3he Jun 20, 2024

xin3he Jun 20, 2024

bgoldberg-habana Jun 19, 2024

yuwenzho Jun 20, 2024

xin3he left a comment

yuwenzho commented Jun 20, 2024

Enhance 3.x torch WOQ load #1877

Enhance 3.x torch WOQ load #1877

Conversation

yuwenzho commented Jun 18, 2024

Type of Change

Description

How has this PR been tested?

Dependency Change?

github-actions bot commented Jun 18, 2024 • edited Loading

⛈️ Required checks status: Has failure 🔴

Groups summary

xin3he left a comment • edited Loading

Choose a reason for hiding this comment

Kaihui-intel commented Jun 18, 2024

bgoldberg-habana Jun 19, 2024

Choose a reason for hiding this comment

xin3he Jun 20, 2024

Choose a reason for hiding this comment

xin3he Jun 20, 2024

Choose a reason for hiding this comment

bgoldberg-habana Jun 19, 2024

Choose a reason for hiding this comment

yuwenzho Jun 20, 2024

Choose a reason for hiding this comment

xin3he left a comment

Choose a reason for hiding this comment

yuwenzho commented Jun 20, 2024

github-actions bot commented Jun 18, 2024 •

edited

Loading

xin3he left a comment •

edited

Loading