Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[JS] - The judge for GenkitMetrics doesn't always output in the required JSON schema when using gpt4o #2181

Open
barthje opened this issue Feb 27, 2025 · 0 comments
Labels
bug Something isn't working js

Comments

@barthje
Copy link

barthje commented Feb 27, 2025

Describe the bug
When running Evaluations where gpt4oMini is the judge the response of the judge for the GenkitMetric.ANSWER_RELEVANCY response does not follow the required JSON schema. When I switched the judge to gemini20Flash everything worked fine.
Error for the Relevancy Metric:

- answered: must be string

Provided data:

{
  "noncommittal": 0,
  "answered": 1,
  "question": "What is the wifi code?"
}

Required JSON schema:

{
  "type": "object",
  "properties": {
    "question": {
      "type": "string"
    },
    "answered": {
      "type": "string",
      "enum": [
        "0",
        "1"
      ]
    },
    "noncommittal": {
      "type": "string",
      "enum": [
        "0",
        "1"
      ]
    }
  },
  "required": [
    "question",
    "answered",
    "noncommittal"
  ],

To Reproduce
Setup of Evaluations and Metrics:

ai = genkit({
      plugins: [
        openAI({ apiKey: process.env.OPENAI_API_KEY }),
        googleAI({ apiKey: process.env.GOOGLE_API_KEY }),
        genkitEval({
          judge: gpt4oMini,
          metrics: [
            GenkitMetric.MALICIOUSNESS,
            GenkitMetric.FAITHFULNESS,
            GenkitMetric.ANSWER_RELEVANCY,
          ],
          embedder: textEmbeddingAda002,
        }),
      ],
      model: gpt4oMini,
    });
  • Create a flow with an input and output
  • Make sure you use a retriever
  • Create a dataset with at least one input
  • Run the created dataset with the flow

Expected behavior
Evaluations where all three metrics are scored correctly

Runtime (please complete the following information):

  • OS: [MacOS]
  • Genkit: [1.0.5]

** Node version

  • v20.18.0

Additional context
Add any other context about the problem here.

@barthje barthje added bug Something isn't working js labels Feb 27, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working js
Projects
Status: No status
Development

No branches or pull requests

1 participant