You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
* align gptq check to transformers for supporting cpu
* fix comment
* gptqmodel
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
* compatible with auto-gptq
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
* fix compatible with auto-gptq
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
* fix compatible with auto-gptq linear
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
* revert unrelated changes
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
* gptqmodel need use checkpoint_format (#1)
* need checkpoint_format
* default value of checkpoint_format is gptq
* fix quantize
* fix quantize
* fix quantize
* Update quantizer.py
* need convert to v1 before gptqmodel save
* back checkpoint_format to gptq after convert
* cleanup code
* sym=False is not supported with auto-gptq
* add comments
* cleanup code
* Update quantizer.py
* always convert v2 to v1 if checkpoint_format = "gptq"
* Update quantizer.py
---------
Co-authored-by: ZX-ModelCloud <zx@modelcloud.ai>
Co-authored-by: Qubitium-ModelCloud <qubitium@modelcloud.ai>
* Mod backend code (#2)
* keep gptq_v2 if sym is false
* use hf_convert_gptq_v1_to_v2_format, hf_convert_gptq_v2_to_v1_format, and hf_gptqmodel_post_init
* no need check backend
* use device_map
* cleanup
* Update quantizer.py
* move import
---------
Co-authored-by: Qubitium-ModelCloud <qubitium@modelcloud.ai>
* fix format and log
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
* fix version check
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
* enable gptqmodel tests
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
* update check quant type
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
* Fix optimum compat (#3)
* add meta info
* cleanup
* cleanup
* The value of quantizer should be an array
* Update quantizer.py
* If is_auto_gptq_available() also writes "auto_gptq:version" to "quantizer"
* If is_auto_gptq_available() also writes "auto_gptq:version" to "quantizer"
* Update quantizer.py
* cleanup
* comment on meta
* hf_select_quant_linear pass checkpoint_format
* add todo fix
* move convert code to quantizer.save()
* Update quantizer.py
* Optimize hf_convert_gptq_v2_to_v1_format()
* Optimize hf_convert_gptq_v1_to_v2_format()
* fix GPTQTestCUDA
* hf_select_quant_linear() always set pack=True
* gptqmodel.hf_select_quant_linear() now does not select ExllamaV2
* gptqmodel.hf_select_quant_linear() now does not select ExllamaV2
* GPTQQuantizer add backend
* lower checkpoint_format and backend
* cleanup
* move backend to bottom
* no need to check gptqmodel version for ipex support
* Update import_utils.py
* Update quantizer.py
* fix UnboundLocalError: cannot access local variable 'version' where it is not associated with a value
* make version var short
* Update import_utils.py
* fix unittest
* use assertLessEqual
---------
Co-authored-by: Qubitium-ModelCloud <qubitium@modelcloud.ai>
Co-authored-by: LRL <lrl@lbx.dev>
* fix format and convert v2 to v1
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
* [Fix] all tensors not same device (#5)
* fix device error
* update gptqmodel version
* fix test
* fix format
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
* add gptqmodel tests which contains cpu
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
* fix all auto-gptq tests
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
* revert tests
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
* rm gptqmodel yaml
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
* fix comment
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
* enable real cpu tests by fp32
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
* fix test model name
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
* keep the original device setting when using auto-gptq
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
* Update optimum/gptq/quantizer.py
Co-authored-by: Ilyas Moutawwakil <57442720+IlyasMoutawwakil@users.noreply.github.com>
* Update optimum/gptq/quantizer.py
Co-authored-by: Ilyas Moutawwakil <57442720+IlyasMoutawwakil@users.noreply.github.com>
---------
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
Co-authored-by: LRL-ModelCloud <165116337+LRL-ModelCloud@users.noreply.github.com>
Co-authored-by: ZX-ModelCloud <zx@modelcloud.ai>
Co-authored-by: Qubitium-ModelCloud <qubitium@modelcloud.ai>
Co-authored-by: ZX-ModelCloud <165115237+ZX-ModelCloud@users.noreply.github.com>
Co-authored-by: LRL <lrl@lbx.dev>
Co-authored-by: Ilyas Moutawwakil <57442720+IlyasMoutawwakil@users.noreply.github.com>
List list of module names to quantize in the block specified. This argument is useful to exclude certain linear modules from being quantized.
130
156
The block to quantize can be specified by setting `block_name_to_quantize`. We will quantize each list sequentially.
131
157
If not set, we will quantize all linear layers. Example: `inside_layer_modules=[["self_attention.query_key_value"], ["mlp.dense_h_to_4h"]]`
158
+
checkpoint_format (`str`, *optional*, defaults to `gptq`):
159
+
GPTQ weight format. `gptq`(v1) is supported by both gptqmodel and auto-gptq. `gptq_v2` is gptqmodel only.
160
+
meta (`Dict[str, any]`, *optional*):
161
+
Properties, such as tooling:version, that do not directly contributes to quantization or quant inference are stored in meta.
162
+
i.e. `meta.quantizer`: ["optimum:_version_", "gptqmodel:_version_"]
163
+
backend (`str`, *optional*):
164
+
Controls which gptq kernel to be used. Valid values for gptqmodel are `auto`, `auto_trainable` and more. For auto-gptq, only valid value is None and `auto_trainable`. Ref gptqmodel backends: https://github.com/ModelCloud/GPTQModel/blob/main/gptqmodel/utils/backend.py
"gptqmodel or auto-gptq is required in order to perform gptq quantzation: `pip install gptqmodel` or `pip install auto-gptq`. Please notice that auto-gptq will be deprecated in the future."
"gptqmodel (`pip install gptqmodel`) or auto-gptq (`pip install auto-gptq`) is required in order to load quantized weights. Please notice that auto-gptq will be deprecated in the future."
879
+
)
743
880
ifnotis_accelerate_available():
744
881
raiseRuntimeError(
745
882
"You need to install accelerate in order to load and dispatch weights to"
0 commit comments