Support hallucination score #218

WenjingKangIntel · 2025-03-07T05:38:09Z

Use deepeval to calculate hallucination score
Use SelfCheckGPT to calculate hallucination score
Use both deepeval and SelfCheckGPT to calculate hallucination score (time consuming)

jnzw · 2025-03-07T07:04:21Z

Resolves #199

* Support hallucination score, deepeval part * Support hallucination score, selfcheckgpt part * Add workflow --------- Signed-off-by: Kang Wenjing <wenjing.kang@intel.com> Signed-off-by: yzheng124 <yi.zheng@intel.com> Co-authored-by: yzheng124 <yi.zheng@intel.com>

adrianboguszewski

You also need to solve the conflict in .github/reusable-steps/categorize-projects/action.yml

adrianboguszewski · 2025-03-07T16:24:27Z

demos/virtual_ai_assistant_demo/test/bartender_personality.txt

@@ -0,0 +1,20 @@
+Can you suggest some popular fruit-based drinks that are healthy and refreshing?


I wouldn't call this file a "personality", rather "questions". Similar other files

We have modified them.

adrianboguszewski · 2025-03-07T16:25:31Z

demos/virtual_ai_assistant_demo/test/README.md

+
+3. Run Ollama, taking `deepseek-r1` as an example:
+    ```
+    ollama run deepseek-r1


Using deepseek-r1 will take forever. Is it possible to use any of the distilled models? https://huggingface.co/collections/deepseek-ai/deepseek-r1-678e1e131c0169c0bc89728d

Actually, the command ollama run deepseek-r1 will run deepseek-r1-distill-qwen-7B model, not the 681B version model. Within the deepseek-r1 series, the only smaller model available is the deepseek-r1-distill-qwen-1.5B, but its performance is not so good.

adrianboguszewski · 2025-03-07T16:28:19Z

demos/virtual_ai_assistant_demo/test/test.py

+    return dataset[dataset_info["col"]], ov_chat_engine
+
+
+def load_chat_model(model_name: str, token: str = None) -> OpenVINOLLM:


Wouldn't it be better to use the function from main.py instead of copying it here? What if the function changes?

We have carefully considered and implemented your valuable advice.

WenjingKangIntel force-pushed the issue-199-dev-hallucination branch from d016944 to 9dd0884 Compare March 7, 2025 05:42

yzheng124 force-pushed the issue-199-dev-hallucination branch 3 times, most recently from 589959e to f01ab94 Compare March 7, 2025 09:34

WenjingKangIntel force-pushed the issue-199-dev-hallucination branch 3 times, most recently from d152a45 to 9b2bf55 Compare March 7, 2025 15:09

adrianboguszewski requested changes Mar 7, 2025

View reviewed changes

xinpengzz added 4 commits March 8, 2025 10:05

updated the action.yml

8c5c91f

Merge branch 'master' into issue-199-dev-hallucination

6ee01c0

modified some files name

4fe39f2

import load_chat_model from main.py instead of copying it.

6897df5

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support hallucination score #218

Support hallucination score #218

WenjingKangIntel commented Mar 7, 2025 •

edited

Loading

jnzw commented Mar 7, 2025

adrianboguszewski left a comment

adrianboguszewski Mar 7, 2025

xinpengzz Mar 8, 2025

adrianboguszewski Mar 7, 2025

xinpengzz Mar 8, 2025

adrianboguszewski Mar 10, 2025

adrianboguszewski Mar 7, 2025

xinpengzz Mar 8, 2025

		@@ -0,0 +1,20 @@
		Can you suggest some popular fruit-based drinks that are healthy and refreshing?

		return dataset[dataset_info["col"]], ov_chat_engine


		def load_chat_model(model_name: str, token: str = None) -> OpenVINOLLM:

Support hallucination score #218

Are you sure you want to change the base?

Support hallucination score #218

Conversation

WenjingKangIntel commented Mar 7, 2025 • edited Loading

jnzw commented Mar 7, 2025

adrianboguszewski left a comment

Choose a reason for hiding this comment

adrianboguszewski Mar 7, 2025

Choose a reason for hiding this comment

xinpengzz Mar 8, 2025

Choose a reason for hiding this comment

adrianboguszewski Mar 7, 2025

Choose a reason for hiding this comment

xinpengzz Mar 8, 2025

Choose a reason for hiding this comment

adrianboguszewski Mar 10, 2025

Choose a reason for hiding this comment

adrianboguszewski Mar 7, 2025

Choose a reason for hiding this comment

xinpengzz Mar 8, 2025

Choose a reason for hiding this comment

WenjingKangIntel commented Mar 7, 2025 •

edited

Loading