Skip to content

Commit be9adc2

Browse files
nirda7violetch24xin3hezehao-intelKaihui-intel
authored
Merge public Intel Neural Compressor to fork (#93)
* modify 3.x ipex example structure (#1858) * modify 3.x ipex example structure Signed-off-by: Cheng, Zixuan <zixuan.cheng@intel.com> * add json path Signed-off-by: Cheng, Zixuan <zixuan.cheng@intel.com> * fix for sq Signed-off-by: Cheng, Zixuan <zixuan.cheng@intel.com> * minor fix Signed-off-by: Cheng, Zixuan <zixuan.cheng@intel.com> * Update run_clm_no_trainer.py * Update run_clm_no_trainer.py * Update run_clm_no_trainer.py * minor fix Signed-off-by: Cheng, Zixuan <zixuan.cheng@intel.com> * remove old files Signed-off-by: Cheng, Zixuan <zixuan.cheng@intel.com> * fix act_algo Signed-off-by: Cheng, Zixuan <zixuan.cheng@intel.com> --------- Signed-off-by: Cheng, Zixuan <zixuan.cheng@intel.com> Co-authored-by: xinhe <xin3.he@intel.com> * Improve UT Branch Coverage for TF 3x (#1867) Signed-off-by: zehao-intel <zehao.huang@intel.com> * [3x] add recommendation examples (#1844) Signed-off-by: xin3he <xin3.he@intel.com> * Add PT2E cv&llm example (#1853) Signed-off-by: Kaihui-intel <kaihui.tang@intel.com> * Update SQ/WOQ status (#1869) Signed-off-by: Sun, Xuehao <xuehao.sun@intel.com> Co-authored-by: chen, suyue <suyue.chen@intel.com> * Modify WOQ examples structure (#1866) Signed-off-by: Kaihui-intel <kaihui.tang@intel.com> Signed-off-by: chensuyue <suyue.chen@intel.com> * update v2.6 release readme (#1871) Signed-off-by: chensuyue <suyue.chen@intel.com> * Limit numpy versions (#1874) Signed-off-by: Sun, Xuehao <xuehao.sun@intel.com> * fix layer match (#1873) Signed-off-by: Kaihui-intel <kaihui.tang@intel.com> Co-authored-by: Sun, Xuehao <xuehao.sun@intel.com> * Enhance autotune to return the best `q_model` directly (#1875) Signed-off-by: yiliu30 <yi4.liu@intel.com> * Add op statistics dump for woq (#1876) Signed-off-by: Kaihui-intel <kaihui.tang@intel.com> * rm cov (#1878) Signed-off-by: yiliu30 <yi4.liu@intel.com> * Add `set_local` support for static quant with pt2e (#1870) Signed-off-by: yiliu30 <yi4.liu@intel.com> * Update the Gaudi container example in the README (#1885) * support quant_lm_head arg in all WOQ configs (#1881) Signed-off-by: xin3he <xin3.he@intel.com> * Fix sql injection for Neural Solution gRPC (#1879) Signed-off-by: Kaihui-intel <kaihui.tang@intel.com> * Remove Gelu Fusion for TF Newapi (#1886) Signed-off-by: zehao-intel <zehao.huang@intel.com> * Refine HQQ UTs (#1888) Signed-off-by: yiliu30 <yi4.liu@intel.com> * tmp fix nas deps issue (#1896) Signed-off-by: chensuyue <suyue.chen@intel.com> * support auto_host2device on RTN and GPTQ(#1894) Signed-off-by: He, Xin3 <xin3.he@intel.com> * remove import pdb (#1897) Signed-off-by: changwangss <chang1.wang@intel.com> * Port auto-detect absorb layers for TEQ (#1895) Signed-off-by: yiliu30 <yi4.liu@intel.com> * Remove 1x API (#1865) Signed-off-by: yiliu30 <yi4.liu@intel.com> Co-authored-by: chen, suyue <suyue.chen@intel.com> * remove neural insight CI (#1903) Signed-off-by: Sun, Xuehao <xuehao.sun@intel.com> * fix bf16 symbolic_trace bug (#1892) Description: fix bf16 symbolic_trace bug, - cause abnormal recursive calling. - missing necessary attributes - By moving BF16 fallback ahead of quantization and removing bf16_symbolic_trace, we fix it. --------- Signed-off-by: xin3he <xin3.he@intel.com> Co-authored-by: Sun, Xuehao <xuehao.sun@intel.com> * update fp4_e2m1 mapping list (#1906) * update fp4_e2m1 mapping list * Update utility.py * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Add docstring for `common` module (#1905) Signed-off-by: yiliu30 <yi4.liu@intel.com> * support habana fp8 UT test in CI (#1909) Signed-off-by: chensuyue <suyue.chen@intel.com> * bump version into 3.0 (#1908) Signed-off-by: chensuyue <suyue.chen@intel.com> * implement `incbench` command for ease-of-use benchmark (#1884) # Description implement incbench command as entrypoint for ease-of-use benchmark automatically check numa/socket info and dump it with table for ease-of-understand supports both Linux and Windows platform add benchmark documents dump benchmark summary add benchmark UTs # General Use Cases incbench main.py: run 1 instance on NUMA:0. incbench --num_i 2 main.py: run 2 instances on NUMA:0. incbench --num_c 2 main.py: run multi-instances with 2 cores per instance on NUMA:0. incbench -C 24-47 main.py: run 1 instance on COREs:24-47. incbench -C 24-47 --num_c 4 main.py: run multi-instances with 4 COREs per instance on COREs:24-47. --------- Signed-off-by: xin3he <xin3.he@intel.com> Co-authored-by: chen, suyue <suyue.chen@intel.com> * Get default config based on the auto-detect CPU type (#1904) Signed-off-by: yiliu30 <yi4.liu@intel.com> * Add export support for TEQ (#1910) Signed-off-by: yiliu30 <yi4.liu@intel.com> * update Gaudi CI baseline artifacts name (#1912) Signed-off-by: chensuyue <suyue.chen@intel.com> * Remove deprecated modules (#1872) Signed-off-by: chensuyue <suyue.chen@intel.com> * fix CI docker container clean up issue (#1917) Signed-off-by: chensuyue <suyue.chen@intel.com> * remove 1x docs (#1900) Signed-off-by: yiliu30 <yi4.liu@intel.com> * Add `save`/`load` support for HQQ (#1913) Signed-off-by: yiliu30 <yi4.liu@intel.com> Co-authored-by: chen, suyue <suyue.chen@intel.com> * Support PT2E save and load (#1918) Signed-off-by: Kaihui-intel <kaihui.tang@intel.com> * implement TorchBaseConfig (#1911) Signed-off-by: xin3he <xin3.he@intel.com> * update documentation for 3x API (#1923) Signed-off-by: chensuyue <suyue.chen@intel.com> Signed-off-by: xin3he <xin3.he@intel.com> Signed-off-by: yiliu30 <yi4.liu@intel.com> * fix typo in architecture diagram (#1924) Signed-off-by: Huang, Tai <tai.huang@intel.com> * Support woq Autotune (#1921) Signed-off-by: Kaihui-intel <kaihui.tang@intel.com> * Support absorb dict for awq (#1920) Signed-off-by: Kaihui-intel <kaihui.tang@intel.com> * Support LayerWise for RTN/GPTQ (#1883) Signed-off-by: Kaihui-intel <kaihui.tang@intel.com> Co-authored-by: chensuyue <suyue.chen@intel.com> * update itrex ut test (#1929) Signed-off-by: chensuyue <suyue.chen@intel.com> * add docstring for torch.quantization and torch.utils (#1928) Signed-off-by: xin3he <xin3.he@intel.com> * Integrate AutoRound v0.3 (#1925) Signed-off-by: Kaihui-intel <kaihui.tang@intel.com> * Integrate AutoRound v0.3 to 2x (#1926) Signed-off-by: Kaihui-intel <kaihui.tang@intel.com> * Enhance load_empty_model import (#1930) Signed-off-by: Kaihui-intel <kaihui.tang@intel.com> * Add doc for client usage (#1914) Signed-off-by: yiliu30 <yi4.liu@intel.com> * remove peft version limit (#1933) Signed-off-by: chensuyue <suyue.chen@intel.com> * Support xpu for ipex static quant (#1916) Signed-off-by: violetch24 <zixuan@aia-sdp-spr-117706.jf.intel.com> * Support calib_func on TF 3x API (#1934) Signed-off-by: zehao-intel <zehao.huang@intel.com> * 3.X API installation update (#1935) Signed-off-by: chensuyue <suyue.chen@intel.com> * Fix unused pkgs import (#1931) Signed-off-by: Kaihui-intel <kaihui.tang@intel.com> * Add docstring for PT2E and HQQ (#1937) Signed-off-by: yiliu30 <yi4.liu@intel.com> * add docstring for static quant and smooth quant (#1936) * add docstring for static quant and smooth quant Signed-off-by: violetch24 <zixuan@aia-sdp-spr-117706.jf.intel.com> * format fix Signed-off-by: violetch24 <zixuan@aia-sdp-spr-117706.jf.intel.com> * update scan path Signed-off-by: violetch24 <zixuan@aia-sdp-spr-117706.jf.intel.com> * Update utility.py --------- Signed-off-by: violetch24 <zixuan@aia-sdp-spr-117706.jf.intel.com> Co-authored-by: violetch24 <zixuan@aia-sdp-spr-117706.jf.intel.com> * Update Example for Pytorch 3x Mixed Precision (#1882) Signed-off-by: zehao-intel <zehao.huang@intel.com> * add read permission token (#1942) Signed-off-by: Huang, Tai <tai.huang@intel.com> * Add docstring for WOQ&LayerWise (#1938) Signed-off-by: Kaihui-intel <kaihui.tang@intel.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: xinhe <xin3.he@intel.com> * add docstring for mx quant (#1932) Signed-off-by: Mengni Wang <mengni.wang@intel.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: xinhe <xin3.he@intel.com> * Update for API 3.0 online doc (#1940) Co-authored-by: ZhangJianyu <zhang.jianyu@outlook.com> * Refine Pytorch 3x Mixed Precision Example (#1946) Signed-off-by: zehao-intel <zehao.huang@intel.com> * Update AutoRound commit version (#1941) Signed-off-by: Kaihui-intel <kaihui.tang@intel.com> * Update publish.yml (#1949) * Update publish.yml * Update publish.yml * Update publish.yml (#1950) * Update doc for client-usage and LWQ (#1947) Signed-off-by: yiliu30 <yi4.liu@intel.com> * Add Docstring for TF 3x API and Torch 3x Mixed Precision (#1944) Signed-off-by: zehao-intel <zehao.huang@intel.com> * Update Examples for TF 3x API (#1901) Signed-off-by: zehao-intel <zehao.huang@intel.com> * Complement UT of calibration function for TF 3x API (#1945) Signed-off-by: zehao-intel <zehao.huang@intel.com> * Enable yolov5 Example for TF 3x API (#1943) Signed-off-by: zehao-intel <zehao.huang@intel.com> * add ipex xpu example to 3x API (#1948) Signed-off-by: violetch24 <zixuan@aia-sdp-spr-117706.jf.intel.com> * update 3x torch installation (#1957) Signed-off-by: chensuyue <suyue.chen@intel.com> * Add save/load for pt2e example (#1927) Signed-off-by: Kaihui-intel <kaihui.tang@intel.com> * Fix itrex qbits nf4/int8 training core dumped issue (#1954) Signed-off-by: Kaihui-intel <kaihui.tang@intel.com> Signed-off-by: chensuyue <suyue.chen@intel.com> * new previous results could not find all raise issues in CI model test (#1958) Signed-off-by: chensuyue <suyue.chen@intel.com> * Set low_gpu_mem_usage=False for AutoRound Signed-off-by: Kaihui-intel <kaihui.tang@intel.com> * Bump tensorflow version (#1961) Signed-off-by: dependabot[bot] <support@github.com> * fix docs link (#1959) Signed-off-by: chensuyue <suyue.chen@intel.com> * fix welcome.html link issue (#1962) Co-authored-by: ZhangJianyu <zhang.jianyu@outlook.com> * replenish docstring (#1955) * replenish docstring Signed-off-by: xin3he <xin3.he@intel.com> * update Quantizer API docstring Signed-off-by: xin3he <xin3.he@intel.com> * Add docstring for auto accelerator (#1956) Signed-off-by: yiliu30 <yi4.liu@intel.com> * temporary remove torch/quantization and add it back after fp8 code is updated. * Update config.py --------- Signed-off-by: xin3he <xin3.he@intel.com> Signed-off-by: yiliu30 <yi4.liu@intel.com> Co-authored-by: Yi Liu <106061964+yiliu30@users.noreply.github.com> * add SDXL model example to INC 3.x (#1887) * add SDXL model example to INC 3.x Signed-off-by: Cheng, Zixuan <zixuan.cheng@intel.com> * add evaluation script Signed-off-by: violetch24 <zixuan@aia-sdp-spr-117706.jf.intel.com> * add test script Signed-off-by: violetch24 <zixuan@aia-sdp-spr-117706.jf.intel.com> * minor fix Signed-off-by: violetch24 <zixuan@aia-sdp-spr-117706.jf.intel.com> * Update run_quant.sh * add iter limit Signed-off-by: violetch24 <zixuan@aia-sdp-spr-117706.jf.intel.com> * modify test script Signed-off-by: violetch24 <zixuan@aia-sdp-spr-117706.jf.intel.com> * update json Signed-off-by: chensuyue <suyue.chen@intel.com> * add requirements Signed-off-by: violetch24 <zixuan@aia-sdp-spr-117706.jf.intel.com> * Update run_benchmark.sh * Update sdxl_smooth_quant.py * minor fix Signed-off-by: violetch24 <zixuan@aia-sdp-spr-117706.jf.intel.com> --------- Signed-off-by: Cheng, Zixuan <zixuan.cheng@intel.com> Signed-off-by: violetch24 <zixuan@aia-sdp-spr-117706.jf.intel.com> Signed-off-by: chensuyue <suyue.chen@intel.com> Co-authored-by: violetch24 <zixuan@aia-sdp-spr-117706.jf.intel.com> Co-authored-by: chensuyue <suyue.chen@intel.com> * example update for 3.x ipex sq (#1902) Signed-off-by: violetch24 <zixuan@aia-sdp-spr-117706.jf.intel.com> * Fix `opt_125m_woq_gptq_int4_dq_ggml` issue (#1965) Signed-off-by: Kaihui-intel <kaihui.tang@intel.com> * remove unnecessary CI (#1966) Signed-off-by: Sun, Xuehao <xuehao.sun@intel.com> * Add version mapping between INC and Gaudi SW Stack (#1967) Signed-off-by: Huang, Tai <tai.huang@intel.com> * Add 3.x readme (#1971) Signed-off-by: Sun, Xuehao <xuehao.sun@intel.com> * Fix broken link in docs (#1969) Signed-off-by: Huang, Tai <tai.huang@intel.com> * Cherry pick v1.17.0 (#1964) * [SW-184941] INC CI, CD and Promotion Change-Id: I60c420f9776e1bdab7bb9e02e5bcbdb6891bfe52 * [SW-183320]updated setup.py Change-Id: I592af89486cb1d9e0b5197521c428920197a9103 * [SW-177474] add HQT FP8 porting code Change-Id: I4676f13a5ed43c444f2ec68675cc41335e7234dd Signed-off-by: Zhou Yuwen <zyuwen@habana.ai> * [SW-189361] Fix white list extend Change-Id: Ic2021c248798fce37710d28014a6d59259c868a3 * [SW-191317] Raise exception according to hqt config object Change-Id: I06ba8fa912c811c88912987c11e5c12ef328348a * [SW-184714] Port HQT code into INC HQT lib content was copied as is under fp8_quant Tests were copied to 3.x torch location Change-Id: Iec6e1fa7ac4bf1df1c95b429524c40e32bc13ac9 * [SW-184714] Add internal folder to fp8 quant This is a folder used for experiments, not to be used by users Change-Id: I9e221ae582794e304e95392c0f37638f7bce69bc * [SW-177468] Removed unused code + cleanup Change-Id: I4d27c067e87c1a30eb1da9df16a16c46d092c638 * Fix errors in regression_detection Change-Id: Iee5318bd5593ba349812516eb5641958ece3c438 * [SW-187731] Save orig module as member of patched module This allows direct usage of the original module methods, which solves torch compile issue Change-Id: I464d8bd1bacdfc3cd1f128a67114e1e43f092632 * [SW-190899] Install packages according to configuration Change-Id: I570b490658f5d2c5399ba1db93f8f52f56449525 * [SW-184689] use finalize_calibration intrenaly for one step flow Change-Id: Ie0b8b426c951cf57ed7e6e678c86813fb2d05c89 * [SW-191945] align requirement_pt.txt in gerrit INC with Github INC Change-Id: If5c0dbf21bf989af37a8e29246e4f8760cd215ef Signed-off-by: xinhe3 <xinhe3@hababa.ai> * [SW-192358] Remove HQT reference in INC Change-Id: Ic25f9323486596fa2dc6d909cd568a37ab84dd5e * [SW-191415] update fp8 maxAbs observer using torch.copy_ Change-Id: I3923c832f9a8a2b14e392f3f4719d233a457702f * [SW-184943] Enhance INC WOQ model loading - Support loading huggingface WOQ model - Abstract WeightOnlyLinear base class. Add INCWeightOnlyLinear and HPUWeighOnlyLinear subclasses - Load woq linear weight module by module - Save hpu format tensor to reuse it once load it again Change-Id: I679a42759b49e1f45f52bbb0bdae8580a23d0bcf * [SW-190303] Implement HPUWeightOnlyLinear class in INC Change-Id: Ie05c8787e708e2c3559dce24ef0758d6c498ac41 * [SW-192809] fix json_file bug when instantiating FP8Config class Change-Id: I4a715d0a706efe20ccdb49033755cabbc729ccdc Signed-off-by: Zhou Yuwen <zyuwen@habana.ai> * [SW-192931] align setup.py with github INC and remove fp8_convert Change-Id: Ibbc157646cfcfad64b323ecfd96b9bbda5ba9e2f Signed-off-by: xinhe3 <xinhe3@hababa.ai> * [SW-192917] Update all HQT logic files with pre-commit check Change-Id: I119dc8578cb10932fd1a8a674a8bdbf61f978e42 Signed-off-by: xinhe3 <xinhe3@hababa.ai> * update docstring Signed-off-by: yuwenzho <yuwen.zhou@intel.com> * add fp8 example and document (#1639) Signed-off-by: xinhe3 <xinhe3@hababa.ai> * Update settings to be compatible with gerrit * enhance ut Signed-off-by: yuwenzho <yuwen.zhou@intel.com> * move fp8 sample to helloworld folder Signed-off-by: yuwenzho <yuwen.zhou@intel.com> * update torch version of habana docker Signed-off-by: xinhe3 <xinhe3@hababa.ai> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update readme demo Signed-off-by: xinhe3 <xinhe3@hababa.ai> * update WeightOnlyLinear to INCWeightOnlyLinear Signed-off-by: xinhe3 <xinhe3@hababa.ai> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add docstring for FP8Config Signed-off-by: xinhe3 <xinhe3@hababa.ai> * fix pylint Signed-off-by: xinhe3 <xinhe3@hababa.ai> * update fp8 test scripts Signed-off-by: chensuyue <suyue.chen@intel.com> * delete deps Signed-off-by: chensuyue <suyue.chen@intel.com> * update container into v1.17.0 Signed-off-by: chensuyue <suyue.chen@intel.com> * update docker version Signed-off-by: xinhe3 <xinhe3@hababa.ai> * update pt ut Signed-off-by: chensuyue <suyue.chen@intel.com> * add lib path Signed-off-by: chensuyue <suyue.chen@intel.com> * fix dir issue Signed-off-by: xinhe3 <xinhe3@hababa.ai> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update fp8 test scope Signed-off-by: chensuyue <suyue.chen@intel.com> * fix typo Signed-off-by: xinhe3 <xinhe3@hababa.ai> * update fp8 test scope Signed-off-by: chensuyue <suyue.chen@intel.com> * update pre-commit-ci Signed-off-by: chensuyue <suyue.chen@intel.com> * work around for hpu Signed-off-by: xinhe3 <xinhe3@hababa.ai> * fix UT Signed-off-by: xinhe3 <xinhe3@hababa.ai> * fix parameter Signed-off-by: chensuyue <suyue.chen@intel.com> * omit some test Signed-off-by: chensuyue <suyue.chen@intel.com> * update main page example to llm loading Signed-off-by: xinhe3 <xinhe3@hababa.ai> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix autotune Signed-off-by: xinhe3 <xinhe3@hababa.ai> --------- Signed-off-by: Zhou Yuwen <zyuwen@habana.ai> Signed-off-by: xinhe3 <xinhe3@hababa.ai> Signed-off-by: yuwenzho <yuwen.zhou@intel.com> Signed-off-by: chensuyue <suyue.chen@intel.com> Co-authored-by: yan tomsinsky <ytomsinsky@habana.ai> Co-authored-by: Ron Ben Moshe <rbenmoshe@habana.ai> Co-authored-by: Uri Livne <ulivne@habana.ai> Co-authored-by: Danny Semiat <dsemiat@habana.ai> Co-authored-by: smarkovichgolan <smarkovich@habana.ai> Co-authored-by: Dudi Lester <dlester@habana.ai> * update main page (#1973) Signed-off-by: chensuyue <suyue.chen@intel.com> * fix online doc search issue (#1975) Co-authored-by: ZhangJianyu <zhang.jianyu@outlook.com> * bump main version into v3.1 (#1974) Signed-off-by: chensuyue <suyue.chen@intel.com> * update readme for fp8 (#1979) Signed-off-by: xinhe3 <xinhe3@habana.ai> * Skip some tests for torch 2.4 (#1981) Signed-off-by: yiliu30 <yi4.liu@intel.com> * Fix UT env and upgrade torch to 2.4.0 (#1978) Signed-off-by: Sun, Xuehao <xuehao.sun@intel.com> * support gptq `true_sequential` and `quant_lm_head` (#1977) Signed-off-by: Kaihui-intel <kaihui.tang@intel.com> * update installation and ci test for 3x api (#1991) Signed-off-by: chensuyue <suyue.chen@intel.com> * add hasattr check for torch fp8 dtype (#1985) Signed-off-by: xin3he <xin3.he@intel.com> * add quantize, save, load function for transformers-like api (#1986) Signed-off-by: changwangss <chang1.wang@intel.com> * Update installation_guide.md (#1989) Correct typo in installation doc * update 3x pt binary build (#1992) Signed-off-by: chensuyue <suyue.chen@intel.com> * add per_channel_minmax (#1990) Signed-off-by: yiliu30 <yi4.liu@intel.com> * Remove the save of gptq config (#1993) Signed-off-by: Kaihui-intel <kaihui.tang@intel.com> * Add recent publications (#1995) * add recent publications Signed-off-by: Huang, Tai <tai.huang@intel.com> * update total count Signed-off-by: Huang, Tai <tai.huang@intel.com> --------- Signed-off-by: Huang, Tai <tai.huang@intel.com> * update docker image prune rules (#2003) Signed-off-by: chensuyue <suyue.chen@intel.com> * Support transformers-like api for woq quantization (#1987) Signed-off-by: Kaihui-intel <kaihui.tang@intel.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Wang, Chang <chang1.wang@intel.com> * add INC_FORCE_DEVICE introduction (#1988) * add INC_FORCE_DEVICE introduction Signed-off-by: xin3he <xin3.he@intel.com> * Update PyTorch.md * Update PyTorch.md * Update docs/source/3x/PyTorch.md Co-authored-by: Yi Liu <yi4.liu@intel.com> * rename to INC_TARGET_DEVICE Signed-off-by: xin3he <xin3.he@intel.com> --------- Signed-off-by: xin3he <xin3.he@intel.com> Co-authored-by: Yi Liu <yi4.liu@intel.com> * Replace FORCE_DEVICE with INC_TARGET_DEVICE [transformers] (#2005) Signed-off-by: Kaihui-intel <kaihui.tang@intel.com> * enable auto_round format export (#2002) Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com> * remove accelerate version in unit test (#2007) Signed-off-by: Sun, Xuehao <xuehao.sun@intel.com> * add repack_awq_to_optimum_format function (#1998) Signed-off-by: changwangss <chang1.wang@intel.com> * Update auto_round requirements for transformers example (#2013) Signed-off-by: Kaihui-intel <kaihui.tang@intel.com> * add pad_to_buckets in evaluation for hpu performance (#2011) * add pad_to_buckets in evaluation for hpu performance --------- Signed-off-by: xin3he <xin3.he@intel.com> * Update model accuracy (#2006) Signed-off-by: Sun, Xuehao <xuehao.sun@intel.com> * fix xpu device set weight and bias (#2010) Signed-off-by: changwangss <chang1.wang@intel.com> Co-authored-by: Sun, Xuehao <xuehao.sun@intel.com> * Add transformers-like api doc (#2018) Signed-off-by: Kaihui-intel <kaihui.tang@intel.com> * Adapt transformers 4.45.1 (#2019) Signed-off-by: Kaihui-intel <kaihui.tang@intel.com> Co-authored-by: changwangss <chang1.wang@intel.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * add autoround EMNLP24 to pub list (#2014) Signed-off-by: Huang, Tai <tai.huang@intel.com> * Fix transformers rtn layer-wise quant (#2008) Signed-off-by: Kaihui-intel <kaihui.tang@intel.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Remove itrex dependency for 3x example (#2016) Signed-off-by: Kaihui-intel <kaihui.tang@intel.com> Co-authored-by: Sun, Xuehao <xuehao.sun@intel.com> * add transformers-like api link in readme (#2022) Signed-off-by: Huang, Tai <tai.huang@intel.com> * Add woq examples (#1982) Signed-off-by: Kaihui-intel <kaihui.tang@intel.com> Signed-off-by: Sun, Xuehao <xuehao.sun@intel.com> Co-authored-by: Sun, Xuehao <xuehao.sun@intel.com> * remove ITREX unit test CI (#2021) Signed-off-by: Sun, Xuehao <xuehao.sun@intel.com> * Support quant procedure on XPU (#2026) Signed-off-by: Kaihui-intel <kaihui.tang@intel.com> * Support generation search for transformers examples (#2029) Signed-off-by: Kaihui-intel <kaihui.tang@intel.com> * Remove itrex dependency for 2x example (#2024) Signed-off-by: Kaihui-intel <kaihui.tang@intel.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Update the PT2E CV example (#2032) Signed-off-by: yiliu30 <yi4.liu@intel.com> * Cherry pick Habana software 1.18.0 update (#2025) Signed-off-by: xinhe3 <xinhe3@habana.ai> Signed-off-by: Yi Liu <yiliu4@habana.ai> Signed-off-by: Sun, Xuehao <xuehao.sun@intel.com> Signed-off-by: chensuyue <suyue.chen@intel.com> Co-authored-by: yan tomsinsky <ytomsinsky@habana.ai> Co-authored-by: Uri Livne <ulivne@habana.ai> Co-authored-by: Dudi Lester <dlester@habana.ai> Co-authored-by: Danny <dsemiat@habana.ai> Co-authored-by: Tomer Gafni <tgafni@habana.ai> Co-authored-by: Eran Geva <egeva@habana.ai> Co-authored-by: Daniel Ohayon <danielohayon444@gmail.com> Co-authored-by: Roi Tiefenbrunn <rtiefenbrunn@habana.ai> Co-authored-by: Kamil Felskowski <kfelskowskix@habana.ai> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * update gaudi version mapping table for v3.1 (#2030) Signed-off-by: Huang, Tai <tai.huang@intel.com> Co-authored-by: chen, suyue <suyue.chen@intel.com> * fix broken link to FP8 example (#2034) Signed-off-by: Huang, Tai <tai.huang@intel.com> * add back missing image (#2035) Signed-off-by: xin3he <xin3.he@intel.com> * Add vlm examples, bugfix (#2012) * add VLM examples Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com> * bugfix, add utils Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix docstring issues Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * bugfix Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * refine examples Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com> * fix scan issue Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * refine shell Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com> * refine scripts & requirements Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com> * typofix Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com> * refine docs Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com> * set attn_implementation for Phi3-vision Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com> * refine phi3 example Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix code coverage Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update config Signed-off-by: Sun, Xuehao <xuehao.sun@intel.com> * refine shells, docs and example. enable qwen2-vl quantization Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix ci Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com> * fix EOF error Signed-off-by: Sun, Xuehao <xuehao.sun@intel.com> * update qwen dir Signed-off-by: Sun, Xuehao <xuehao.sun@intel.com> * refine shell, add llama3.2 inference to doc Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com> * bugfix Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * bugfix Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com> * bugfix Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com> * refine eval shell Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com> * fix eval device issue Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com> * refine eval dtype Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com> --------- Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com> Signed-off-by: Sun, Xuehao <xuehao.sun@intel.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Sun, Xuehao <xuehao.sun@intel.com> * remove autoround limit (#2036) Signed-off-by: Sun, Xuehao <xuehao.sun@intel.com> * Adapt autoround format (#2038) Signed-off-by: Kaihui-intel <kaihui.tang@intel.com> * remove transformers import from utility (#2045) * remove transformers import from utility Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com> * bugfix Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fixtypos Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com> --------- Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * add buckets setting for lm_eval (#2044) * add buckets setting for lm_eval Signed-off-by: xinhe3 <xinhe3@habana.ai> * clear graph cache to avoid OOM Signed-off-by: xinhe3 <xinhe3@habana.ai> --------- Signed-off-by: xinhe3 <xinhe3@habana.ai> Co-authored-by: xinhe3 <xinhe3@habana.ai> * Enhance example for HPU performance (#2043) * Enhance example for HPU performance Signed-off-by: xinhe3 <xinhe3@habana.ai> * Update run_clm_no_trainer.py * remove wikitext to avoid oom for llama2-7b bs=8 * remove wikitext Signed-off-by: xinhe3 <xinhe3@habana.ai> --------- Signed-off-by: xinhe3 <xinhe3@habana.ai> Co-authored-by: xinhe3 <xinhe3@habana.ai> * remove useless code in setup.py (#2046) * Update the default PT2E config (#2041) Signed-off-by: yiliu30 <yi4.liu@intel.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Support non-contiguous weight saving (#2049) Signed-off-by: Kaihui-intel <kaihui.tang@intel.com> * fix GPTQ oom issue on HPU (#2042) * fix GPTQ oom issue on HPU Signed-off-by: xinhe3 <xinhe3@habana.ai> --------- Signed-off-by: xinhe3 <xinhe3@habana.ai> Co-authored-by: xinhe3 <xinhe3@habana.ai> * fix bug and update readme (#2051) * fix bug and update readme --------- Signed-off-by: xinhe3 <xinhe3@habana.ai> Co-authored-by: xinhe3 <xinhe3@habana.ai> * Support safetensors loading for layerwise (#2047) Signed-off-by: Kaihui-intel <kaihui.tang@intel.com> * Enhance WOQ example Readme and help (#2053) Signed-off-by: Kaihui-intel <kaihui.tang@intel.com> Co-authored-by: xinhe <xin3.he@intel.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * improve optimum-habana available check (#2054) Signed-off-by: changwang <changwang@habana.ai> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Fixed CI IPEX version (#2061) Signed-off-by: Sun, Xuehao <xuehao.sun@intel.com> * Update torch config kwargs (#2055) Signed-off-by: Kaihui-intel <kaihui.tang@intel.com> * Support client `use_layer_wise` setting (#2048) Signed-off-by: Kaihui-intel <kaihui.tang@intel.com> * Check autoround before import it (#2062) Signed-off-by: yiliu30 <yi4.liu@intel.com> * Delete fp8_quant/scripts/regression_detection directory (#2059) A missed change when cherry-picking Habana software 1.18.0 * Make PatchedVLLMKVCache resiliant to forward API changes (#2067) Change-Id: I33fad5c3e80e017099f300782809f24669765d42 Co-authored-by: Konrad Zawora <kzawora@habana.ai> * Fix glm-4-9b oom issue on BMG Signed-off-by: Kaihui-intel <kaihui.tang@intel.com> * Update recipes & Bump version to 3.2 (#2037) Signed-off-by: Sun, Xuehao <xuehao.sun@intel.com> * Docs: Add customer defined calibration and update docker run (#2057) Signed-off-by: fengding <feng1.ding@intel.com> * Adapt torch and ipex 2.5 (#2066) Signed-off-by: Kaihui-intel <kaihui.tang@intel.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Sun, Xuehao <xuehao.sun@intel.com> * Enhance `TBB` check (#2068) Signed-off-by: yiliu30 <yi4.liu@intel.com> * Fix the PT2E UT (#2071) Signed-off-by: yiliu30 <yi4.liu@intel.com> * Support gptq layerwise on client (#2069) Signed-off-by: Kaihui-intel <kaihui.tang@intel.com> * Adapt autoround v0.4 (#2073) Signed-off-by: Kaihui-intel <kaihui.tang@intel.com> * Ensure that mul operators with shared initializer will not be absorbed in SmoothQuant (#2063) Signed-off-by: duansheng.liu <44742794+duanshengliu@users.noreply.github.com> * Integrate AutoRound v0.4 [3x] (#2072) Signed-off-by: Kaihui-intel <kaihui.tang@intel.com> Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Update CI framework versions and README badge for release 3.1.1 (#2058) Signed-off-by: Sun, Xuehao <xuehao.sun@intel.com> * Remove the examples force required torch 1.13.1 (#2074) * remove alexnet_fashion_mnist notebook Signed-off-by: chensuyue <suyue.chen@intel.com> * remove rnnt in pytorch examples Signed-off-by: chensuyue <suyue.chen@intel.com> --------- Signed-off-by: chensuyue <suyue.chen@intel.com> * Fix truthfulqa task evaluation issue Signed-off-by: Kaihui-intel <kaihui.tang@intel.com> * Add required library for ONNX example (#2078) * Add required library for ONNX example * Update requirements.txt * support autoround new API for VLM (#2075) Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * add import check (#2076) Signed-off-by: Sun, Xuehao <xuehao.sun@intel.com> * Update utility.py (#2079) * Add gptq known issue (#2080) Signed-off-by: Kaihui-intel <kaihui.tang@intel.com> * Fix sdxl `q_unet` config (#2081) Signed-off-by: Kaihui-intel <kaihui.tang@intel.com> * Fixed the PT2E LLM example (#2082) Signed-off-by: yiliu30 <yi4.liu@intel.com> * fix dlrm when using incbench (#2084) Signed-off-by: Xin He <xinhe3@habana.ai> * add mapping for v3.2 (#2085) Signed-off-by: Huang, Tai <tai.huang@intel.com> * [SW-192753] unify StaticQuantConfig and FP8Config Change-Id: I2fe09ba4c575810a5b130268d63b9eee926bdf08 Signed-off-by: xinhe3 <xinhe3@habana.ai> Signed-off-by: Xin He <xinhe3@habana.ai> * [SW-200124] Set Scalar as default scale format + Compatibility check Set ScaleFormat.SCALAR as default value of 'scale_format' Add reduction of 'scale_format' to 'CONST' value if using a PCQ scale_format or fake_quant Add test to show Scalar models aren't giving wrong outputs Fix fakequant test as it is problematic use of 'hpu_initialize' and should be fixed in SW-202697 Change-Id: I43ff4900e9e02ce7f50edcdbb19a28f4f615ef9c Signed-off-by: Xin He <xinhe3@habana.ai> * [SW-201679] support unit_scales for FuseMoE Change-Id: I02a63332bc09f1f6cdc3f133dd5f58829fcbad5a Signed-off-by: Xin He <xinhe3@habana.ai> * [SW-203698] Add log for converting prepared model Change-Id: I1464f11bbab27d9041c9ba6f448e5ae6fa43bc2d Signed-off-by: Mengni Wang <mewang@habana.ai> * [SW-199737] Measurement dump improvements Add _validate_dump_path to make sure dump dir is writable and backup measurements Change-Id: Ib64abe772b4c309bbf04de89477cde92ea47ade4 * [SW-203452] Fixing and temp skipping G3 unittests Change-Id: Iafa4a6a8577724bd8a86581bfe38d3269dab2ea2 Signed-off-by: Xin He <xinhe3@habana.ai> * [SW-195965] [GPTQ] INC load model loads model in fp32 only Change-Id: I597d19273786c0c169ad952ebe5a357274e358dc Signed-off-by: xinhe3 <xinhe3@habana.ai> * [SW-204016] Enable scale calculation with disk offload in INC -move calculating scales and quantization config info during the module patching loop as the weights there guaranteed to be on cpu. Change-Id: Ifb2de4e67c1b36c611dcc50b4cd14731b0336c50 * [SW-202614] Llama70b int4 gptq with INC load flow - getting host OOM Change-Id: Id1797371bb136502d89c4e8d17abcac1eaac4534 Signed-off-by: xinhe3 <xinhe3@habana.ai> * [SW-199823] [HQT] fix INC one-step quantization API workflow 1. fix test_fp8_static_quant.py::TestFP8StaticQuant::test_one_step_quant_cv failure by deepcoping forward function in common.py 2. fix config.py: Object of type dict_keys is not JSON serializable by converting it to list 3. fix download issue of UT by using local tiny_gptj.json Change-Id: I2ad3eac411e8fca9d88a021f6a5b9594e6c75ae9 Signed-off-by: xinhe3 <xinhe3@habana.ai> * [SW-202617] vllm mixtral MoE quant and measure using forward call Change-Id: I919f1e3597b6c95c3fc60db78ac9c0c06242b416 Signed-off-by: Xin He <xinhe3@habana.ai> * [SW-200092] Allow fsdpa and softmax to use scalar scales in INC Change-Id: Ieba4c74c18624fb0c5fce6321671d6f4eb2b8c93 Signed-off-by: Xin He <xinhe3@habana.ai> * [SW-205363] Update _load_state_dict_into_meta_model Update _load_state_dict_into_meta_model to compatible with Transformer 4.45 release Change-Id: Ib5d8ca777d38c7ae225b7174a886b333b6246ab1 Signed-off-by: Xin He <xinhe3@habana.ai> * [SW-184948] INC Q/DQ optimization, included conv2d, kv_cache, fsdpa, softmax and other operators. Change-Id: I920f8ad85b3493f1bd4bbe770533343e214fc2d1 Signed-off-by: changwang <changwang@habana.ai> Signed-off-by: Xin He <xinhe3@habana.ai> * [SW-198585] Fix typo causing PatchedVLLMKVCache error Change-Id: Iafdcc935f702bc4756e2ba89935becb3bc47a728 * [SW-199208] QDQ Refactor for Registering Patched Modules, Scaling Methods, and Observers 1. Extension APIs - `PatchedModuleBase` , `register_patched_module` - `ScalingMethodBase`, `register_scaling_methods` - `ObserverBase` ``register_observer`, `register_module_config_for_observer` Related files: - fp8_quant/patched_module_base.py - fp8_quant/observer_base.py - fp8_quant/_core/measure.py - test_register_apis.py 2. Device-agnostic Patching - Replaced `hpu` with `cur_accelerator.name()` - Replaced `htcore.mark_step()` with `cur_accelerator.synchronize()` - Removed `torch.device("hpu")` under observers and scaling method - Updated `hpu_accelerator.synchronize()` to `htcore.mark_step()` + `torch.hpu.synchronize()` Change-Id: I83c6de928a991ed2c1b3b434d372f49e095c38d3 Signed-off-by: Yi Liu <yiliu4@habana.ai> Co-authored-by: Mengni Wang <mewang@habana.ai> Signed-off-by: Xin He <xinhe3@habana.ai> * [SW-203389] scalars scales doesn't provide dtype attribution Change-Id: I4e40dc9b2d9cb65bc9e49571cd57a9ab030f5d7b Signed-off-by: xinhe3 <xinhe3@habana.ai> Signed-off-by: Xin He <xinhe3@habana.ai> * [SW-199208] fix ModuleInfo conversion issue Change-Id: Ib6c35e1623dda3e470e569defccd607a18b43312 * [SW-200168] Enable working with G2 HW scales on G3 Change-Id: I17f71540eb78e828f01f1a11c8b233d60951293e Signed-off-by: Xin He <xinhe3@habana.ai> * [SW-203389] fix get_scale_dtype to support PCQ scales Change-Id: I923ace405a0f751a2e5a0a3aadb7abbb401a6c44 * [SW-199719] reduce PCQ scales memory usage removed persistent full weight scales during PCQ quantization instead we are keeping only the input and output channels scales creating temporary full scale tensor on input quant Op call since the full scale tensor is the same size as the orig bf16 weight keeping all full scales persistently and the quntized weights will result a quantized model that uses more memory than the unquantized. Change-Id: Idc91c5ac8b9cfea2e2a3ad053cb4dc5464cff776 * [SW-206112] INC Q/DQ improvement - use Q/DQ ops Change-Id: Ib03ea8744aa2cca8b606754c45944840da1c3898 Signed-off-by: changwang <changwang@habana.ai> Signed-off-by: Xin He <xinhe3@habana.ai> * [SW-206693] Convert conv2d_fp8 params to list if necessary It's needed for the new approach to dynamic shapes in PT2.5. Change-Id: I8d5e620153970b210675459e3d6aecad8ca7cbde * [SW-207411] Add catch for OSError in _validate_dump_path Change-Id: I82bae184257f3da982877b3797f2ee8b40a573c8 * [SW-207328] remove accuracy check due to random issue Change-Id: Ifbd985c31c3755b6ab353ef8fa45e911dd75d688 Signed-off-by: xinhe3 <xinhe3@habana.ai> * [SW-207559] Folder layout refactoring and cleanup (phase 1) Change-Id: Ic9bffd2b7477d4530b4e2a5e411760a731efb84b Signed-off-by: Yi Liu <yiliu4@habana.ai> Signed-off-by: Xin He <xinhe3@habana.ai> * [SW-193262] INC multi device save/load CP design in fp8 (#5) Signed-off-by: Xin <xin3.he@intel.com> Signed-off-by: Xin He <xinhe3@habana.ai> * [SW-208521] one-step quantization got double memory usage (#3) * [SW-208521] one-step quantization got double memory usage Signed-off-by: Xin <xin3.he@intel.com> * [SW-208789] Support quantizing FP16 model to FP8 (#15) Since layer-wise is using memory mapping from disk, the model could be fp16 as it saved on disk, for example, llama2-7b. We need to add logic to support this case to make sure layer-wise works well. Signed-off-by: Xin He <xinhe3@habana.ai> * [SW-205959] Update _load_state_dict_into_meta_model for model with bias (#7) Signed-off-by: Xin <xin3.he@intel.com> * [SW-208700] release bf16 model memory on HPU in one-step quantization (#14) Signed-off-by: Xin <xin3.he@intel.com> Signed-off-by: Xin He <xinhe3@habana.ai> * [SW-197077] refactoring maxabs scales and adding arbitrary scales. (#12) * [SW-197077] refactoring maxabs scales and adding arbitrary scales. Change-Id: I2c35cf925b6b21983f1770db7d35e14f3d7d3e47 * [SW-197077] refactoring scale: fix atol Change-Id: I1c99ddd9ade679286988e7d8a96338b32c0ddc07 * [SW-197077] adding arbitrary scales * Skip autoround test for HPU (#19) Change-Id: I6dc9724389c16a05252370b9e09a1db80bc8d696 Signed-off-by: Yi Liu <yiliu4@habana.ai> Co-authored-by: Yi Liu <yiliu4@habana.ai> * [SW-199728] [DeepSpeed] Buffers initialized by model are not correct … (#16) * [SW-199728] [DeepSpeed] Buffers initialized by model are not correct after tensor parallel --------- Signed-off-by: Xin <xin3.he@intel.com> Co-authored-by: Danny Semiat <dsemiat@habana.ai> Signed-off-by: Xin He <xinhe3@habana.ai> * [SW-208151] CD 1.19.0 - PT Docker - test_quantization No module named… (#33) * [SW-209256] fix GPTQ oom issue on HPU (#2042) (#20) * fix GPTQ oom issue on HPU (#2042) --------- Signed-off-by: Xin <xin3.he@intel.com> Co-authored-by: xinhe3 <xinhe3@habana.ai> * [SW-208151] CD 1.19.0 - PT Docker - test_quantization No module named 'safetensors' Signed-off-by: Xin <xin3.he@intel.com> --------- Signed-off-by: Xin <xin3.he@intel.com> Co-authored-by: xinhe3 <xinhe3@habana.ai> Co-authored-by: Danny Semiat <dsemiat@habana.ai> * [SW-207748] Support Auto-round on HPU (#25) Signed-off-by: Yi Liu <yiliu4@habana.ai> Co-authored-by: Yi Liu <yiliu4@habana.ai> * [SW-209878] Increase threshold to avoid random error in test_layer_wise.py (#36) Signed-off-by: Xin He <xinhe3@habana.ai> Co-authored-by: Xin He <xinhe3@habana.ai> * [SW-207579] support load vLLM compatible FP8 model (#18) Support load vLLM compatible FP8 model, both G2 and G3, both single card and multi-cards. --------- Signed-off-by: changwang <changwang@habana.ai> * [SW-207451] Implement block-wise calibration for LLM (#41) * [SW-207451] Implement block-wise calibration for LLM --------- Signed-off-by: Xin <xin3.he@intel.com> Co-authored-by: Xin He <xinhe3@habana.ai> Signed-off-by: Xin He <xinhe3@habana.ai> * [SW-208986] fix save&load bug (#40) * [SW-208986] fix save&load bug --------- Signed-off-by: Xin He <xinhe3@habana.ai> Co-authored-by: Xin He <xinhe3@habana.ai> * [SW-207748] Add Auto-round Example (#42) * add autoround hpu example Change-Id: Ibd537f4667c7c077160427722a5eca2c721aa5cd Signed-off-by: Yi Liu <yiliu4@habana.ai> * add requirements Change-Id: I77a95ec05e41247db9903e8622c31f05259ca365 Signed-off-by: Yi Liu <yiliu4@habana.ai> --------- Signed-off-by: Yi Liu <yiliu4@habana.ai> Co-authored-by: Yi Liu <yiliu4@habana.ai> Co-authored-by: Uri Livne <ulivne@habana.ai> Signed-off-by: Xin He <xinhe3@habana.ai> * [SW-197077] fix bug (#47) * [SW-210541] loading for fused_sdpa requires additional amax scale (#51) Signed-off-by: Xin He <xinhe3@habana.ai> Co-authored-by: Xin He <xinhe3@habana.ai> * fix PatchedLoRACompatibleLinear init (#65) Signed-off-by: changwangss <changwang@habana.ai> * align files with v1.19.0 in fp8_quant folder Signed-off-by: Xin He <xinhe3@habana.ai> * fix missing SaveLoadFormat Signed-off-by: Xin He <xinhe3@habana.ai> * align and fix config after cherry-pick Signed-off-by: Xin He <xinhe3@habana.ai> * Implicit relative imports is abandoned Signed-off-by: Xin He <xinhe3@habana.ai> * fix config issue blocking CI Signed-off-by: Xin He <xinhe3@habana.ai> * remove synchronize for `pack_unpack_tensor_with_numpy` (#2070) * remove pack&unpack synchronize --------- Signed-off-by: Kaihui-intel <kaihui.tang@intel.com> * stop auto-fix of pre-commit Signed-off-by: Xin He <xinhe3@habana.ai> * update autoround example for release test Signed-off-by: xin3he <xin3.he@intel.com> * fix AWQ&TEQ loading due to input scale Signed-off-by: xin3he <xin3.he@intel.com> * fix HQQ state_dict loading caused by [SW-195965] Signed-off-by: xin3he <xin3.he@intel.com> * use per_channel as default config (#2091) Signed-off-by: yiliu30 <yi4.liu@intel.com> * workaround transformers issue in version 4.47.0 (#2092) * workaround transformers issue in version 4.47.0 Signed-off-by: xin3he <xin3.he@intel.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Refactor FP8 pytest script (#2089) * Refactor FP8 pytest script --------- Signed-off-by: Sun, Xuehao <xuehao.sun@intel.com> * update ci scan scope Signed-off-by: chensuyue <suyue.chen@intel.com> * [SW-210500] [Optimum-Habana] [Regression] [fp8] [INC] No generated text for llava models [llava-1.5-7b-hf] [llava-1.5-13b-hf ] (#54) Signed-off-by: Xin He <xinhe3@habana.ai> Co-authored-by: Xin He <xinhe3@habana.ai> * [SW-213236] resolve CPU mem issue in CI (#76) Signed-off-by: Xin He <xinhe3@habana.ai> Co-authored-by: Xin He <xinhe3@habana.ai> * recover pre-commit Signed-off-by: Xin He <xinhe3@habana.ai> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix `is_sharded` setting for loading quant model (#2094) Signed-off-by: Kaihui-intel <kaihui.tang@intel.com> * fix error message for different python version (#2099) Signed-off-by: changwangss <changwang@habana.ai> * fix UT of RTN on HPU (#2098) Signed-off-by: xin3he <xin3.he@intel.com> Signed-off-by: Sun, Xuehao <xuehao.sun@intel.com> * fix device issue during calibration (#2100) Signed-off-by: Xin He <xinhe3@habana.ai> * fix woq example and update document for v1.19.0 (#2097) Signed-off-by: xin3he <xin3.he@intel.com> * Refactor version import paths to common module (#2095) Signed-off-by: Sun, Xuehao <xuehao.sun@intel.com> * update CI gaudi-docker to 1.19.0 (#2096) Signed-off-by: Sun, Xuehao <xuehao.sun@intel.com> * fix device mapping issue of llama gptq (#2101) Signed-off-by: Xin He <xinhe3@habana.ai> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * remove fix_measurements.py. exists with a different name - postprocessing_vllm_measurements.py * fix merge * remove unused imported functions with wrong path * change envar requested value from 1 to true --------- Signed-off-by: Cheng, Zixuan <zixuan.cheng@intel.com> Signed-off-by: zehao-intel <zehao.huang@intel.com> Signed-off-by: xin3he <xin3.he@intel.com> Signed-off-by: Kaihui-intel <kaihui.tang@intel.com> Signed-off-by: Sun, Xuehao <xuehao.sun@intel.com> Signed-off-by: chensuyue <suyue.chen@intel.com> Signed-off-by: yiliu30 <yi4.liu@intel.com> Signed-off-by: He, Xin3 <xin3.he@intel.com> Signed-off-by: changwangss <chang1.wang@intel.com> Signed-off-by: Huang, Tai <tai.huang@intel.com> Signed-off-by: violetch24 <zixuan@aia-sdp-spr-117706.jf.intel.com> Signed-off-by: Mengni Wang <mengni.wang@intel.com> Signed-off-by: dependabot[bot] <support@github.com> Signed-off-by: Zhou Yuwen <zyuwen@habana.ai> Signed-off-by: xinhe3 <xinhe3@hababa.ai> Signed-off-by: yuwenzho <yuwen.zhou@intel.com> Signed-off-by: xinhe3 <xinhe3@habana.ai> Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com> Signed-off-by: Yi Liu <yiliu4@habana.ai> Signed-off-by: changwang <changwang@habana.ai> Signed-off-by: fengding <feng1.ding@intel.com> Signed-off-by: duansheng.liu <44742794+duanshengliu@users.noreply.github.com> Signed-off-by: Xin He <xinhe3@habana.ai> Signed-off-by: Mengni Wang <mewang@habana.ai> Signed-off-by: Xin <xin3.he@intel.com> Signed-off-by: changwangss <changwang@habana.ai> Co-authored-by: Zixuan Cheng <110808245+violetch24@users.noreply.github.com> Co-authored-by: xinhe <xin3.he@intel.com> Co-authored-by: zehao-intel <zehao.huang@intel.com> Co-authored-by: Kaihui-intel <kaihui.tang@intel.com> Co-authored-by: Sun, Xuehao <xuehao.sun@intel.com> Co-authored-by: chen, suyue <suyue.chen@intel.com> Co-authored-by: Yi Liu <106061964+yiliu30@users.noreply.github.com> Co-authored-by: Dina Suehiro Jones <dina.s.jones@intel.com> Co-authored-by: Wang, Chang <chang1.wang@intel.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Huang, Tai <tai.huang@intel.com> Co-authored-by: violetch24 <zixuan@aia-sdp-spr-117706.jf.intel.com> Co-authored-by: Wang, Mengni <mengni.wang@intel.com> Co-authored-by: Neo Zhang Jianyu <jianyu.zhang@intel.com> Co-authored-by: ZhangJianyu <zhang.jianyu@outlook.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: yan tomsinsky <ytomsinsky@habana.ai> Co-authored-by: Ron Ben Moshe <rbenmoshe@habana.ai> Co-authored-by: Uri Livne <ulivne@habana.ai> Co-authored-by: Danny Semiat <dsemiat@habana.ai> Co-authored-by: smarkovichgolan <smarkovich@habana.ai> Co-authored-by: Dudi Lester <dlester@habana.ai> Co-authored-by: Yi Liu <yi4.liu@intel.com> Co-authored-by: WeiweiZhang1 <weiwei1.zhang@intel.com> Co-authored-by: Tomer Gafni <tgafni@habana.ai> Co-authored-by: Eran Geva <egeva@habana.ai> Co-authored-by: Daniel Ohayon <danielohayon444@gmail.com> Co-authored-by: Roi Tiefenbrunn <rtiefenbrunn@habana.ai> Co-authored-by: Kamil Felskowski <kfelskowskix@habana.ai> Co-authored-by: xinhe3 <xinhe3@habana.ai> Co-authored-by: Konrad Zawora <kzawora@habana.ai> Co-authored-by: feng-intel <110514170+feng-intel@users.noreply.github.com> Co-authored-by: duanshengliu <44742794+duanshengliu@users.noreply.github.com> Co-authored-by: Mengni Wang <mewang@habana.ai> Co-authored-by: Jimin Ha <jha@habana.ai> Co-authored-by: changwang <changwang@habana.ai> Co-authored-by: Yi Liu <yiliu4@habana.ai> Co-authored-by: Amadeusz Skrzypczak <askrzypczak@habana.ai> Co-authored-by: Linoy Buchnik <linoybu@gmail.com>
1 parent 370cc68 commit be9adc2

File tree

154 files changed

+1896
-5853
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

154 files changed

+1896
-5853
lines changed

.azure-pipelines/code-scan.yml

+1
Original file line numberDiff line numberDiff line change
@@ -13,6 +13,7 @@ pr:
1313
- requirements.txt
1414
- .azure-pipelines/code-scan.yml
1515
- .azure-pipelines/scripts/codeScan
16+
- .azure-pipelines/template/docker-template.yml
1617

1718
pool:
1819
vmImage: "ubuntu-latest"

.azure-pipelines/docker/DockerfileWithNC.devel

-53
This file was deleted.

.azure-pipelines/model-test-3x.yml

+2-1
Original file line numberDiff line numberDiff line change
@@ -15,6 +15,7 @@ pr:
1515
- requirements_pt.txt
1616
- .azure-pipelines/scripts/models
1717
- .azure-pipelines/model-test-3x.yml
18+
- .azure-pipelines/template/docker-template.yml
1819

1920
variables:
2021
OUT_SCRIPT_PATH: $(Build.SourcesDirectory)/.azure-pipelines/scripts/models
@@ -30,7 +31,7 @@ parameters:
3031
type: object
3132
default:
3233
- opt_125m_woq_gptq_int4
33-
- opt_125m_woq_gptq_int4_dq_bnb
34+
- opt_125m_woq_gptq_nf4_dq_bnb
3435
- opt_125m_woq_gptq_int4_dq_ggml
3536

3637
stages:

.azure-pipelines/model-test.yml

+4
Original file line numberDiff line numberDiff line change
@@ -11,9 +11,13 @@ pr:
1111
- neural_compressor
1212
- setup.py
1313
- requirements.txt
14+
- .azure-pipelines/model-test.yml
15+
- .azure-pipelines/template/docker-template.yml
1416
- .azure-pipelines/scripts/models
1517
- examples/tensorflow/oob_models/quantization/ptq
1618
- .azure-pipelines/model-test.yml
19+
- .azure-pipelines/scripts/fwk_version.sh
20+
- .azure-pipelines/scripts/install_nc.sh
1721
exclude:
1822
- test
1923
- neural_compressor/common

.azure-pipelines/scripts/fwk_version.sh

+5-5
Original file line numberDiff line numberDiff line change
@@ -2,9 +2,9 @@
22

33
echo "export FWs version..."
44
export tensorflow_version='2.15.0-official'
5-
export pytorch_version='2.4.0+cpu'
6-
export torchvision_version='0.19.0'
7-
export ipex_version='2.4.0+cpu'
8-
export onnx_version='1.16.0'
9-
export onnxruntime_version='1.18.0'
5+
export pytorch_version='2.5.1+cpu'
6+
export torchvision_version='0.20.1'
7+
export ipex_version='2.5.0+cpu'
8+
export onnx_version='1.17.0'
9+
export onnxruntime_version='1.20.0'
1010
export mxnet_version='1.9.1'

.azure-pipelines/scripts/install_nc.sh

+4-2
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
#!/bin/bash
22

3-
echo -e "\n Install Neural Compressor ... "
3+
echo -e "##[group]Install Neural Compressor ... "
44
cd /neural-compressor
55
if [[ $1 = *"3x_pt"* ]]; then
66
python -m pip install --no-cache-dir -r requirements_pt.txt
@@ -9,7 +9,8 @@ if [[ $1 = *"3x_pt"* ]]; then
99
python setup.py pt bdist_wheel
1010
else
1111
echo -e "\n Install torch CPU ... "
12-
pip install torch==2.4.0 --index-url https://download.pytorch.org/whl/cpu
12+
pip install torch==2.5.1 --index-url https://download.pytorch.org/whl/cpu
13+
python -m pip install intel-extension-for-pytorch==2.5.0 oneccl_bind_pt==2.5.0 --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/cpu/us/
1314
python -m pip install --no-cache-dir -r requirements.txt
1415
python setup.py bdist_wheel
1516
fi
@@ -26,4 +27,5 @@ else
2627
fi
2728

2829
echo -e "\n pip list after install Neural Compressor ... "
30+
echo "##[endgroup]"
2931
pip list

.azure-pipelines/scripts/models/run_pytorch_models_trigger.sh

+2-2
Original file line numberDiff line numberDiff line change
@@ -56,10 +56,10 @@ elif [ "${model}" == "opt_125m_woq_gptq_int4" ]; then
5656
model_src_dir="nlp/huggingface_models/language-modeling/quantization/weight_only"
5757
inc_new_api=3x_pt
5858
tuning_cmd="bash run_quant.sh --topology=opt_125m_woq_gptq_int4"
59-
elif [ "${model}" == "opt_125m_woq_gptq_int4_dq_bnb" ]; then
59+
elif [ "${model}" == "opt_125m_woq_gptq_nf4_dq_bnb" ]; then
6060
model_src_dir="nlp/huggingface_models/language-modeling/quantization/weight_only"
6161
inc_new_api=3x_pt
62-
tuning_cmd="bash run_quant.sh --topology=opt_125m_woq_gptq_int4_dq_bnb"
62+
tuning_cmd="bash run_quant.sh --topology=opt_125m_woq_gptq_nf4_dq_bnb"
6363
elif [ "${model}" == "opt_125m_woq_gptq_int4_dq_ggml" ]; then
6464
model_src_dir="nlp/huggingface_models/language-modeling/quantization/weight_only"
6565
inc_new_api=3x_pt

.azure-pipelines/scripts/ut/3x/run_3x_pt.sh

+8-1
Original file line numberDiff line numberDiff line change
@@ -3,12 +3,19 @@ python -c "import neural_compressor as nc"
33
test_case="run 3x Torch"
44
echo "${test_case}"
55

6+
echo "##[section]Run import check"
7+
set -e
8+
python -c "import neural_compressor.torch"
9+
python -c "import neural_compressor.common"
10+
echo "##[section]import check pass"
11+
612
# install requirements
7-
echo "set up UT env..."
13+
echo "##[group]set up UT env..."
814
export LD_LIBRARY_PATH=/usr/local/lib/:$LD_LIBRARY_PATH
915
pip install -r /neural-compressor/test/3x/torch/requirements.txt
1016
pip install pytest-cov
1117
pip install pytest-html
18+
echo "##[endgroup]"
1219
pip list
1320

1421
export COVERAGE_RCFILE=/neural-compressor/.azure-pipelines/scripts/ut/3x/coverage.3x_pt

.azure-pipelines/scripts/ut/3x/run_3x_pt_fp8.sh

+22-1
Original file line numberDiff line numberDiff line change
@@ -3,8 +3,14 @@ python -c "import neural_compressor as nc"
33
test_case="run 3x Torch Habana FP8"
44
echo "${test_case}"
55

6+
echo "##[section]Run import check"
7+
set -e
8+
python -c "import neural_compressor.torch"
9+
python -c "import neural_compressor.common"
10+
echo "##[section]import check pass"
11+
612
# install requirements
7-
echo "set up UT env..."
13+
echo "##[group]set up UT env..."
814
export LD_LIBRARY_PATH=/usr/local/lib/:$LD_LIBRARY_PATH
915
sed -i '/^intel_extension_for_pytorch/d' /neural-compressor/test/3x/torch/requirements.txt
1016
sed -i '/^auto_round/d' /neural-compressor/test/3x/torch/requirements.txt
@@ -13,6 +19,7 @@ pip install -r /neural-compressor/test/3x/torch/requirements.txt
1319
pip install pytest-cov
1420
pip install pytest-html
1521
pip install pytest-html-merger
22+
echo "##[endgroup]"
1623
pip list
1724

1825
export COVERAGE_RCFILE=/neural-compressor/.azure-pipelines/scripts/ut/3x/coverage.3x_pt_fp8
@@ -28,6 +35,18 @@ pytest --cov="${inc_path}" -vs --disable-warnings --html=report_2.html --self-co
2835
pytest --cov="${inc_path}" -vs --disable-warnings --html=report_4.html --self-contained-html torch/quantization/fp8_quant 2>&1 | tee -a ${ut_log_name}
2936
pytest --cov="${inc_path}" -vs --disable-warnings --html=report_5.html --self-contained-html torch/algorithms/fp8_quant 2>&1 | tee -a ${ut_log_name}
3037

38+
# Below folder contains some special configuration for pytest so we need to enter the path and run it separately
39+
cd /neural-compressor/test/3x/torch/algorithms/fp8_quant
40+
pytest --cov="${inc_path}" -vs --disable-warnings --html=report_4.html --self-contained-html . 2>&1 | tee -a ${ut_log_name}
41+
cp .coverage ${LOG_DIR}/.coverage.algo_fp8
42+
cd - && mv /neural-compressor/test/3x/torch/algorithms/fp8_quant/*.html .
43+
44+
# Below folder contains some special configuration for pytest so we need to enter the path and run it separately
45+
cd /neural-compressor/test/3x/torch/quantization/fp8_quant
46+
pytest --cov="${inc_path}" -vs --disable-warnings --html=report_5.html --self-contained-html . 2>&1 | tee -a ${ut_log_name}
47+
cp .coverage ${LOG_DIR}/.coverage.quant_fp8
48+
cd - && mv /neural-compressor/test/3x/torch/quantization/fp8_quant/*.html .
49+
3150
mkdir -p report && mv *.html report
3251
pytest_html_merger -i ./report -o ./report.html
3352
cp report.html ${LOG_DIR}/
@@ -40,5 +59,7 @@ fi
4059

4160
# if ut pass, collect the coverage file into artifacts
4261
cp .coverage ${LOG_DIR}/.coverage
62+
cd ${LOG_DIR}
63+
coverage combine .coverage.*
4364

4465
echo "UT finished successfully! "

.azure-pipelines/scripts/ut/3x/run_3x_tf.sh

+8-1
Original file line numberDiff line numberDiff line change
@@ -3,12 +3,19 @@ python -c "import neural_compressor as nc"
33
test_case="run 3x TensorFlow"
44
echo "${test_case}"
55

6+
echo "##[section]Run import check"
7+
set -e
8+
python -c "import neural_compressor.tensorflow"
9+
python -c "import neural_compressor.common"
10+
echo "##[section]import check pass"
11+
612
# install requirements
7-
echo "set up UT env..."
13+
echo "##[group]set up UT env..."
814
pip install -r /neural-compressor/test/3x/tensorflow/requirements.txt
915
pip install pytest-cov
1016
pip install pytest-html
1117
pip install pytest-html-merger
18+
echo "##[endgroup]"
1219
pip list
1320

1421
export COVERAGE_RCFILE=/neural-compressor/.azure-pipelines/scripts/ut/3x/coverage.3x_tf

.azure-pipelines/scripts/ut/collect_log.sh

+4-2
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@ coverage_log_base="/neural-compressor/log_dir/coverage_log_base"
77
coverage_compare="/neural-compressor/log_dir/coverage_compare.html"
88
cd /neural-compressor/log_dir
99

10-
$BOLD_YELLOW && echo "collect coverage for PR branch" && $RESET
10+
$BOLD_YELLOW && echo "##[group]collect coverage for PR branch" && $RESET
1111
mkdir -p coverage_PR
1212
cp ut_*_coverage/.coverage.* ./coverage_PR/
1313

@@ -28,8 +28,9 @@ git checkout master
2828
rm -rf build dist *egg-info
2929
echo y | pip uninstall neural-compressor
3030
cd /neural-compressor/.azure-pipelines-pr/scripts && bash install_nc.sh
31+
echo "##[endgroup]"
3132

32-
$BOLD_YELLOW && echo "collect coverage for baseline" && $RESET
33+
$BOLD_YELLOW && echo "##[group]collect coverage for baseline" && $RESET
3334
coverage erase
3435
cd /neural-compressor/log_dir
3536
mkdir -p coverage_base
@@ -43,6 +44,7 @@ coverage report -m --rcfile=${COVERAGE_RCFILE} | tee ${coverage_log_base}
4344
coverage html -d log_dir/coverage_base/htmlcov --rcfile=${COVERAGE_RCFILE}
4445
coverage xml -o log_dir/coverage_base/coverage.xml --rcfile=${COVERAGE_RCFILE}
4546
ls -l log_dir/coverage_base/htmlcov
47+
echo "##[endgroup]"
4648

4749
get_coverage_data() {
4850
# Input argument

.azure-pipelines/scripts/ut/env_setup.sh

+3-2
Original file line numberDiff line numberDiff line change
@@ -19,7 +19,7 @@ echo "onnxruntime version is $onnxruntime_version"
1919
echo "mxnet version is $mxnet_version"
2020

2121
test_case=$1
22-
echo "========= test case is ${test_case}"
22+
echo -e "##[group]test case is ${test_case}"
2323

2424
if [[ "${tensorflow_version}" == *"-official" ]]; then
2525
pip install tensorflow==${tensorflow_version%-official}
@@ -100,6 +100,8 @@ pip install coverage
100100
pip install pytest
101101
pip install pytest-html
102102

103+
echo "##[endgroup]"
104+
103105
pip list
104106
echo "[DEBUG] list pipdeptree..."
105107
pip install pipdeptree
@@ -112,4 +114,3 @@ if [[ $(echo "${test_case}" | grep -c "run basic api") != 0 ]] || [[ $(echo "${t
112114
find . -name "test*.py" | xargs sed -i 's/import tensorflow.compat.v1 as tf/import torch; import tensorflow.compat.v1 as tf/g'
113115
find . -name "test*.py" | xargs sed -i 's/from tensorflow import keras/import torch; from tensorflow import keras/g'
114116
fi
115-

.azure-pipelines/scripts/ut/run_basic_adaptor.sh

+1
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,7 @@ source /neural-compressor/.azure-pipelines/scripts/fwk_version.sh $1
88

99
echo "set up UT env..."
1010
bash /neural-compressor/.azure-pipelines/scripts/ut/env_setup.sh "${test_case}"
11+
export LD_LIBRARY_PATH=/usr/local/lib/:$LD_LIBRARY_PATH
1112
export COVERAGE_RCFILE=/neural-compressor/.azure-pipelines/scripts/ut/coverage.file
1213
lpot_path=$(python -c 'import neural_compressor; import os; print(os.path.dirname(neural_compressor.__file__))')
1314
cd /neural-compressor/test || exit 1

.azure-pipelines/template/docker-template.yml

+3-2
Original file line numberDiff line numberDiff line change
@@ -74,7 +74,7 @@ steps:
7474
7575
- ${{ if eq(parameters.imageSource, 'pull') }}:
7676
- script: |
77-
docker pull vault.habana.ai/gaudi-docker/1.18.0/ubuntu22.04/habanalabs/pytorch-installer-2.4.0:latest
77+
docker pull vault.habana.ai/gaudi-docker/1.19.0/ubuntu22.04/habanalabs/pytorch-installer-2.5.1:latest
7878
displayName: "Pull habana docker image"
7979
8080
- script: |
@@ -95,7 +95,8 @@ steps:
9595
else
9696
docker run -dit --disable-content-trust --privileged --name=${{ parameters.containerName }} --shm-size="2g" \
9797
--runtime=habana -e HABANA_VISIBLE_DEVICES=all -e OMPI_MCA_btl_vader_single_copy_mechanism=none --cap-add=sys_nice --net=host --ipc=host \
98-
-v ${BUILD_SOURCESDIRECTORY}:/neural-compressor vault.habana.ai/gaudi-docker/1.18.0/ubuntu22.04/habanalabs/pytorch-installer-2.4.0:latest
98+
-v ${BUILD_SOURCESDIRECTORY}:/neural-compressor vault.habana.ai/gaudi-docker/1.19.0/ubuntu22.04/habanalabs/pytorch-installer-2.5.1:latest
99+
docker exec ${{ parameters.containerName }} bash -c "ln -sf \$(which python3) /usr/bin/python"
99100
fi
100101
echo "Show the container list after docker run ... "
101102
docker ps -a

.azure-pipelines/ut-3x-pt-fp8.yml

+9-1
Original file line numberDiff line numberDiff line change
@@ -9,11 +9,15 @@ pr:
99
paths:
1010
include:
1111
- .azure-pipelines/scripts/ut/3x/run_3x_pt_fp8.sh
12+
- .azure-pipelines/scripts/install_nc.sh
1213
- .azure-pipelines/ut-3x-pt-fp8.yml
14+
- .azure-pipelines/template/docker-template.yml
1315
- neural_compressor/common
1416
- neural_compressor/torch
1517
- test/3x/torch/algorithms/fp8_quant
1618
- test/3x/torch/quantization/fp8_quant
19+
- test/3x/torch/quantization/weight_only/test_rtn.py
20+
- test/3x/torch/quantization/weight_only/test_load.py
1721
- setup.py
1822
- requirements_pt.txt
1923

@@ -85,7 +89,7 @@ stages:
8589

8690
- script: |
8791
echo "--- create container ---"
88-
docker run -d -it --name="collectLogs" -v ${BUILD_SOURCESDIRECTORY}:/neural-compressor ${IMAGE_NAME}:${IMAGE_TAG} /bin/bash
92+
docker run -d -it --name="collectLogs" -v ${BUILD_SOURCESDIRECTORY}:/neural-compressor ${IMAGE_NAME}:${IMAGE_TAG} /bin/bash
8993
echo "--- docker ps ---"
9094
docker ps
9195
echo "--- collect logs ---"
@@ -94,6 +98,10 @@ stages:
9498
&& bash ut/3x/collect_log_3x.sh 3x_pt_fp8"
9599
displayName: "Collect UT Coverage"
96100
101+
- task: PublishCodeCoverageResults@2
102+
inputs:
103+
summaryFileLocation: $(Build.SourcesDirectory)/log_dir/coverage_PR/coverage.xml
104+
97105
- task: PublishPipelineArtifact@1
98106
condition: succeededOrFailed()
99107
inputs:

.azure-pipelines/ut-3x-pt.yml

+8-1
Original file line numberDiff line numberDiff line change
@@ -14,6 +14,9 @@ pr:
1414
- test/3x/common
1515
- setup.py
1616
- requirements_pt.txt
17+
- .azure-pipelines/ut-3x-pt.yml
18+
- .azure-pipelines/template/docker-template.yml
19+
- .azure-pipelines/scripts/install_nc.sh
1720
- .azure-pipelines/scripts/ut/3x/run_3x_pt.sh
1821

1922
pool: ICX-16C
@@ -84,7 +87,7 @@ stages:
8487

8588
- script: |
8689
echo "--- create container ---"
87-
docker run -d -it --name="collectLogs" -v ${BUILD_SOURCESDIRECTORY}:/neural-compressor ${IMAGE_NAME}:${IMAGE_TAG} /bin/bash
90+
docker run -d -it --name="collectLogs" -v ${BUILD_SOURCESDIRECTORY}:/neural-compressor ${IMAGE_NAME}:${IMAGE_TAG} /bin/bash
8891
echo "--- docker ps ---"
8992
docker ps
9093
echo "--- collect logs ---"
@@ -93,6 +96,10 @@ stages:
9396
&& bash ut/3x/collect_log_3x.sh 3x_pt"
9497
displayName: "Collect UT Coverage"
9598
99+
- task: PublishCodeCoverageResults@2
100+
inputs:
101+
summaryFileLocation: $(Build.SourcesDirectory)/log_dir/coverage_PR/coverage.xml
102+
96103
- task: PublishPipelineArtifact@1
97104
condition: succeededOrFailed()
98105
inputs:

0 commit comments

Comments
 (0)