-
Notifications
You must be signed in to change notification settings - Fork 7
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #244 from symflower/report-gemma-2
Update 0.5.0 report with results for Gemma 2 model family
- Loading branch information
Showing
19 changed files
with
290,323 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,30 @@ | ||
# Evaluation from 2024-07-05 06:55:06 | ||
|
||
 | ||
|
||
This report was generated by [DevQualityEval benchmark](https://github.com/symflower/eval-dev-quality) in `version 0.5.0`. | ||
|
||
**REMARK: `gemma-2-9b-it` and `gemma-2-27-it` were originally evaluated together with the results then being split into separate folders. Therefore some logs might contain entries from "the other" gemma model.** | ||
|
||
## Results | ||
|
||
> Keep in mind that LLMs are nondeterministic. The following results just reflect a current snapshot. | ||
The results of all models have been divided into the following categories: | ||
|
||
- category unknown: Models in this category could not be categorized. | ||
- response error: Models in this category encountered an error. | ||
- no code: Models in this category produced no code. | ||
- invalid code: Models in this category produced invalid code. | ||
- executable code: Models in this category produced executable code. | ||
- statement coverage reached: Models in this category produced code that reached full statement coverage. | ||
- no excess response: Models in this category did not respond with more content than requested. | ||
|
||
The following sections list all models with their categories. The complete log of the evaluation with all outputs can be found [here](./evaluation.log). Detailed scoring can be found [here](./evaluation.csv). | ||
|
||
### Result category "category unknown" | ||
|
||
Models in this category could not be categorized. | ||
|
||
- [`custom-nvidia/google/gemma-2-27b-it`](./custom-nvidia_google_gemma-2-27b-it/) | ||
- [`custom-nvidia/google/gemma-2-9b-it`](./custom-nvidia_google_gemma-2-9b-it/) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,5 @@ | ||
model-id,model-name,cost,language,repository,task,score,coverage,files-executed,generate-tests-for-file-character-count,processing-time,response-character-count,response-no-error,response-no-excess,response-with-code | ||
custom-nvidia/google/gemma-2-27b-it,gemma-2-27b-it,0,golang,golang/light,write-tests,4095,3650,100,83660,950489,85270,115,115,115 | ||
custom-nvidia/google/gemma-2-27b-it,gemma-2-27b-it,0,golang,golang/plain,write-tests,70,50,5,370,7029,440,5,5,5 | ||
custom-nvidia/google/gemma-2-27b-it,gemma-2-27b-it,0,java,java/light,write-tests,13812,13360,107,125031,1023420,126411,115,115,115 | ||
custom-nvidia/google/gemma-2-27b-it,gemma-2-27b-it,0,java,java/plain,write-tests,70,50,5,940,9753,1000,5,5,5 |
97,515 changes: 97,515 additions & 0 deletions
97,515
docs/reports/v0.5.0/gemma-2-27b-it/evaluation.log
Large diffs are not rendered by default.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,2 @@ | ||
model,score,coverage,files-executed,generate-tests-for-file-character-count,processing-time,response-character-count,response-no-error,response-no-excess,response-with-code | ||
custom-nvidia/google/gemma-2-27b-it,4165,3700,105,84030,957518,85710,120,120,120 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,2 @@ | ||
model,score,coverage,files-executed,generate-tests-for-file-character-count,processing-time,response-character-count,response-no-error,response-no-excess,response-with-code | ||
custom-nvidia/google/gemma-2-27b-it,13882,13410,112,125971,1033173,127411,120,120,120 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,2 @@ | ||
model,score,coverage,files-executed,generate-tests-for-file-character-count,processing-time,response-character-count,response-no-error,response-no-excess,response-with-code | ||
custom-nvidia/google/gemma-2-27b-it,18047,17110,217,210001,1990691,213121,240,240,240 |
Oops, something went wrong.