Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support hallucination score #218

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

WenjingKangIntel
Copy link

@WenjingKangIntel WenjingKangIntel commented Mar 7, 2025

  1. Use deepeval to calculate hallucination score
  2. Use SelfCheckGPT to calculate hallucination score
  3. Use both deepeval and SelfCheckGPT to calculate hallucination score (time consuming)

@WenjingKangIntel WenjingKangIntel force-pushed the issue-199-dev-hallucination branch from d016944 to 9dd0884 Compare March 7, 2025 05:42
@jnzw
Copy link

jnzw commented Mar 7, 2025

Resolves #199

@yzheng124 yzheng124 force-pushed the issue-199-dev-hallucination branch 3 times, most recently from 589959e to f01ab94 Compare March 7, 2025 09:34
* Support hallucination score, deepeval part
* Support hallucination score, selfcheckgpt part
* Add workflow

---------

Signed-off-by:  Kang Wenjing <wenjing.kang@intel.com>
Signed-off-by:  yzheng124 <yi.zheng@intel.com>
Co-authored-by: yzheng124 <yi.zheng@intel.com>
@WenjingKangIntel WenjingKangIntel force-pushed the issue-199-dev-hallucination branch 3 times, most recently from d152a45 to 9b2bf55 Compare March 7, 2025 15:09
Copy link
Contributor

@adrianboguszewski adrianboguszewski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You also need to solve the conflict in .github/reusable-steps/categorize-projects/action.yml

@@ -0,0 +1,20 @@
Can you suggest some popular fruit-based drinks that are healthy and refreshing?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wouldn't call this file a "personality", rather "questions". Similar other files

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have modified them.


3. Run Ollama, taking `deepseek-r1` as an example:
```
ollama run deepseek-r1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using deepseek-r1 will take forever. Is it possible to use any of the distilled models? https://huggingface.co/collections/deepseek-ai/deepseek-r1-678e1e131c0169c0bc89728d

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, the command ollama run deepseek-r1 will run deepseek-r1-distill-qwen-7B model, not the 681B version model. Within the deepseek-r1 series, the only smaller model available is the deepseek-r1-distill-qwen-1.5B, but its performance is not so good.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

then ok

return dataset[dataset_info["col"]], ov_chat_engine


def load_chat_model(model_name: str, token: str = None) -> OpenVINOLLM:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't it be better to use the function from main.py instead of copying it here? What if the function changes?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have carefully considered and implemented your valuable advice.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants