-
Notifications
You must be signed in to change notification settings - Fork 59
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support hallucination score #218
base: master
Are you sure you want to change the base?
Support hallucination score #218
Conversation
WenjingKangIntel
commented
Mar 7, 2025
•
edited
Loading
edited
- Use deepeval to calculate hallucination score
- Use SelfCheckGPT to calculate hallucination score
- Use both deepeval and SelfCheckGPT to calculate hallucination score (time consuming)
d016944
to
9dd0884
Compare
Resolves #199 |
589959e
to
f01ab94
Compare
* Support hallucination score, deepeval part * Support hallucination score, selfcheckgpt part * Add workflow --------- Signed-off-by: Kang Wenjing <wenjing.kang@intel.com> Signed-off-by: yzheng124 <yi.zheng@intel.com> Co-authored-by: yzheng124 <yi.zheng@intel.com>
d152a45
to
9b2bf55
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You also need to solve the conflict in .github/reusable-steps/categorize-projects/action.yml
@@ -0,0 +1,20 @@ | |||
Can you suggest some popular fruit-based drinks that are healthy and refreshing? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wouldn't call this file a "personality", rather "questions". Similar other files
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have modified them.
|
||
3. Run Ollama, taking `deepseek-r1` as an example: | ||
``` | ||
ollama run deepseek-r1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using deepseek-r1 will take forever. Is it possible to use any of the distilled models? https://huggingface.co/collections/deepseek-ai/deepseek-r1-678e1e131c0169c0bc89728d
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, the command ollama run deepseek-r1
will run deepseek-r1-distill-qwen-7B
model, not the 681B version model. Within the deepseek-r1 series, the only smaller model available is the deepseek-r1-distill-qwen-1.5B
, but its performance is not so good.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
then ok
return dataset[dataset_info["col"]], ov_chat_engine | ||
|
||
|
||
def load_chat_model(model_name: str, token: str = None) -> OpenVINOLLM: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't it be better to use the function from main.py instead of copying it here? What if the function changes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have carefully considered and implemented your valuable advice.