Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Int4(llama-7b-chat) converted model generates response with German words #207

Closed
bhardwaj-nakul opened this issue Feb 2, 2024 · 2 comments

Comments

@bhardwaj-nakul
Copy link

bhardwaj-nakul commented Feb 2, 2024

I have converted the llama-7b-chat model to int4 using following commands:

python convert.py --model_id meta-llama/Llama-2-7b-chat-hf --output_dir models/llama-2-7b-chat --precision FP16 --compress_weights INT4_SYM INT4_ASYM 4BIT_DEFAULT

python convert.py --model_id meta-llama/Llama-2-7b-chat-hf --output_dir models/llama-2-7b-chat --precision FP32 --compress_weights 4BIT_DEFAULT

I'm running benchmarking with the int4 converted models. I tried with following variations and as you can see all the responses contain German words.

  • OV_FP16-INT4_ASYM

image

  • OV_FP16-INT4_SYM

image

Tried with different prompt - It is giving partial answer in German.

image

  • OV_FP16-4BIT_DEFAULT

image

  • OV_FP32-4BIT_DEFAULT

image

Using the following prompt generates the complete response in German.

image

Am I missing something here ? Please provide some guidance.

@AlexKoff88
Copy link
Collaborator

@bhardwaj-nakul, last week, we added load_in_4bit feature that enables data-aware weight quantization. Please take a look at this PR in Optimum-Intel.
You can use optimum API directly in this case. Please consider installing it from github.

@peterchen-intel
Copy link
Collaborator

Close inactive issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants