You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@bhardwaj-nakul, last week, we added load_in_4bit feature that enables data-aware weight quantization. Please take a look at this PR in Optimum-Intel.
You can use optimum API directly in this case. Please consider installing it from github.
I have converted the llama-7b-chat model to int4 using following commands:
python convert.py --model_id meta-llama/Llama-2-7b-chat-hf --output_dir models/llama-2-7b-chat --precision FP16 --compress_weights INT4_SYM INT4_ASYM 4BIT_DEFAULT
python convert.py --model_id meta-llama/Llama-2-7b-chat-hf --output_dir models/llama-2-7b-chat --precision FP32 --compress_weights 4BIT_DEFAULT
I'm running benchmarking with the int4 converted models. I tried with following variations and as you can see all the responses contain German words.
Tried with different prompt - It is giving partial answer in German.
Using the following prompt generates the complete response in German.
Am I missing something here ? Please provide some guidance.
The text was updated successfully, but these errors were encountered: