-
Notifications
You must be signed in to change notification settings - Fork 263
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enhance 3.x torch WOQ load #1877
Conversation
Signed-off-by: yuwenzho <yuwen.zhou@intel.com>
Signed-off-by: yuwenzho <yuwen.zhou@intel.com>
Signed-off-by: yuwenzho <yuwen.zhou@intel.com>
Signed-off-by: yuwenzho <yuwen.zhou@intel.com>
⛈️ Required checks status: Has failure 🔴
Groups summary🟢 Code Scan Tests workflow
These checks are required after the changes to 🟢 Model Tests 3x workflow
These checks are required after the changes to 🔴 Unit Tests 3x-PyTorch workflow
These checks are required after the changes to Thank you for your contribution! 💜
|
for more information, see https://pre-commit.ci
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adjustments after discussion:
- use
skipif
for hpu logics and avoid exposing WOQModelLoader. - avoid using AutoRoundWeightOnlyLinear so that we can unpack and pack to HPUWeightOnlyLinear
Abstract WeightOnlyLinear class. Inherited class INCWeightOnlyLinear and HPUWeighOnlyLinear |
Signed-off-by: yuwenzho <yuwen.zhou@intel.com>
for more information, see https://pre-commit.ci
Signed-off-by: yuwenzho <yuwen.zhou@intel.com>
Signed-off-by: yuwenzho <zyuwen@habana.ai>
Signed-off-by: yuwenzho <zyuwen@habana.ai>
os.path.abspath(os.path.expanduser(self.model_name_or_path)), WEIGHT_NAME | ||
) | ||
# if hpu format tensor can be used directly, then update qmodel_weight_file_path to the hpu format tensor file | ||
if self._with_hpu_format_tensor(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggest here to change the format to hpu and later on load the correct layer according to hpu format.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea, we should use the format to indicate that the loaded model file is already in hpu
format.
However, we think it is better to use the habana
format because hpu
is a device name.
Looking forward to your feedback.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I made a mistake. Thanks for correcting me, @yuwenzho.
The arg: format
indicates the load API arguments format, not packing format. We are using arg: device
to decide which packing format should be used. If the packing format is already hpu format, we should have some flag in config.json or else before uploading it to huggingface hub.
# load API format 1
load(model_name_or_path="saved_results", original_model=fp32_model, format="default")
# load API format 2
load(model_name_or_path="TheBloke/TinyLlama-1.1B-python-v0.1-GPTQ", format="huggingface")
device_dict = {"cpu": INCWeightOnlyLinear, "hpu": HPUWeightOnlyLinear} | ||
|
||
# if hpu format tensor can be used directly, then update mapping module to HPUWeightOnlyLinear | ||
if self._with_hpu_format_tensor(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggest to add new format for HPU instead, it will be clearer. and then you can move format_dict and device_dictto global place ..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
format_dict
and device_dict
are moved to global place in 8001a0a
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Signed-off-by: yuwenzho <yuwen.zhou@intel.com>
for more information, see https://pre-commit.ci
Yes, algorithm should use |
Signed-off-by: yuwenzho <yuwen.zhou@intel.com>
for more information, see https://pre-commit.ci
Signed-off-by: yuwenzho <yuwen.zhou@intel.com>
Signed-off-by: yuwenzho <yuwen.zhou@intel.com>
Signed-off-by: yuwenzho <yuwen.zhou@intel.com>
for more information, see https://pre-commit.ci
Type of Change
feature
API changed or not: no
Description
Use different WeightOnlyLinear module according to device.
load huggingface WOQ model example:
load INC WOQ model example:
How has this PR been tested?
CI
Dependency Change?
No