|
2 | 2 |
|
3 | 3 | ## [Upcoming]
|
4 | 4 |
|
| 5 | +## [v0.5.2] - 2024-06-17 |
| 6 | + |
| 7 | +### Scenarios |
| 8 | + |
| 9 | +- Updated VHELM scenarios for VLMs (#2719, #2684, #2685, #2641, #2691) |
| 10 | +- Updated Image2Struct scenarios (#2608, #2640, #2660, #2661) |
| 11 | +- Added Automatic GPT4V Evaluation for VLM Originality Evaluation |
| 12 | +- Added FinQA scenario (#2588) |
| 13 | +- Added AIR-Bench 2024 (#2698, #2706, #2710, #2712, #2713) |
| 14 | +- Fixed `entity_data_imputation` scenario breakage by mirroring source data files (#2750) |
| 15 | + |
| 16 | +### Models |
| 17 | + |
| 18 | +- Added google-cloud-aiplatform~=1.48 dependency requirement for Vertex AI client (#2628) |
| 19 | +- Fixed bug with Vertex AI client error handling (#2614) |
| 20 | +- Fixed bug with for Arctic tokenizer (#2615) |
| 21 | +- Added Qwen1.5 110B Chat (#2621) |
| 22 | +- Added TogetherCompletionClient (#2629) |
| 23 | +- Fixed bugs with Yi Chat and Llama 3 Chat on Together (#2636) |
| 24 | +- Added Optimum Intel (#2609, #2674) |
| 25 | +- Added GPT-4o model (#2649, #2656) |
| 26 | +- Added SEA-LION 7B and SEA-LION 7B Instruct (#2647) |
| 27 | +- Added more Gemini 1.5 Flash and Pro versions (#2653, #2664, #2718, #2718) |
| 28 | +- Added Gemini 1.0 Pro 002 (#2664) |
| 29 | +- Added Command R and Command R+ models (#2548) |
| 30 | +- Fixed GPT4V Evaluator Out of Option Range Issue (#2677) |
| 31 | +- Added OLMo 1.5 (#2671) |
| 32 | +- Added RekaClient (#2675) |
| 33 | +- Added PaliGemma (#2683) |
| 34 | +- Added Mistral 7B Instruct v0.1, v0.2 and v0.3 (#2665) |
| 35 | +- Switched most Together chat models to use the chat client (#2703, #2701, #2705) |
| 36 | +- Added MedLM model (#2696, #2709) |
| 37 | +- Added Typhoon v1.5 models (#2659) |
| 38 | +- Changed HuggingFaceClient to truncate end of text token (#2643) |
| 39 | +- Added Qwen2 Instruct (72B) (#2722) |
| 40 | +- Added Yi Large (#2723, #1731) |
| 41 | +- Added Sailor models (#2658) |
| 42 | +- Added BioMistral and Meditron (#2728) |
| 43 | + |
| 44 | +### Frontend |
| 45 | + |
| 46 | +- Miscellaneous improvements and bug fixes (#2618, #2617, #2616, #2651, #2667, #2724) |
| 47 | + |
| 48 | +### Framework |
| 49 | + |
| 50 | +- Removed `adapter_spec` from `schema_*.yaml` files (#2611) |
| 51 | +- Added support for annotators / LLM-as-judge (#2622, #2700) |
| 52 | +- Updated documentation (#2626, #2529, #2521) |
| 53 | + |
| 54 | +### Evaluation Results |
| 55 | + |
| 56 | +- [MMLU v1.2.0](https://crfm.stanford.edu/helm/mmlu/v1.2.0/) |
| 57 | + - Added results for DBRX Instruct, DeepSeek LLM Chat (67B), Gemini 1.5 Pro (0409 preview), Mistral Small (2402), Mistral Large (2402), Arctic Instruct |
| 58 | +- [MMLU v1.3.0](https://crfm.stanford.edu/helm/mmlu/v1.3.0/) |
| 59 | + - Added results for Gemini 1.5 Flash (0514 preview), GPT-4o (2024-05-13), Palmyra X V3 (72B) |
| 60 | +- [MMLU v1.4.0](https://crfm.stanford.edu/helm/mmlu/v1.4.0/) |
| 61 | + - Added results for Yi Large (Preview), OLMo 1.7 (7B), Command R, Command R Plus, Gemini 1.5 Flash (001), Gemini 1.5 Pro (001), Mistral Instruct v0.3 (7B), GPT-4 Turbo (2024-04-09), Qwen1.5 Chat (110B), Qwen2 Instruct (72B) |
| 62 | +- [Image2Struct v1.0.0](https://crfm.stanford.edu/helm/image2struct/v1.0.0/) |
| 63 | + - Initial release with Claude 3 Sonnet (20240229), Claude 3 Opus (20240229), Gemini 1.0 Pro Vision, Gemini 1.5 Pro (0409 preview),IDEFICS 2 (8B), IDEFICS-instruct (9B), IDEFICS-instruct (80B), LLaVA 1.5 (13B), LLaVA 1.6 (13B), GPT-4o (2024-05-13), GPT-4V (1106 preview), Qwen-VL Chat |
| 64 | +- [AIR-Bench v1.0.0](https://crfm.stanford.edu/helm/air-bench/v1.0.0/) |
| 65 | + - Initial release with Claude 3 Haiku (20240307), Claude 3 Sonnet (20240229), Claude 3 Opus (20240229), Cohere Command R, Cohere Command R Plus, DBRX Instruct, DeepSeek LLM Chat (67B), Gemini 1.5 Pro (001, default safety), Gemini 1.5 Flash (001, default safety), Llama 3 Instruct (8B), Llama 3 Instruct (70B), Yi Chat (34B), Mistral Instruct v0.3 (7B), Mixtral Instruct (8x7B), Mixtral Instruct (8x22B), GPT-3.5 Turbo (0613), GPT-3.5 Turbo (1106), GPT-3.5 Turbo (0125), GPT-4 Turbo (2024-04-09), GPT-4o (2024-05-13), Qwen1.5 Chat (72B) |
| 66 | + |
| 67 | +### Contributors |
| 68 | + |
| 69 | +Thank you to the following contributors for your work on this HELM release! |
| 70 | + |
| 71 | +- @andyt-cohere |
| 72 | +- @bryanzhou008 |
| 73 | +- @chiheem |
| 74 | +- @farzaank |
| 75 | +- @ImKeTT |
| 76 | +- @JosselinSomervilleRoberts |
| 77 | +- @NoushNabi |
| 78 | +- @percyliang |
| 79 | +- @raileymontalan |
| 80 | +- @shakatoday |
| 81 | +- @teetone |
| 82 | +- @yifanmai |
| 83 | + |
5 | 84 | ## [v0.5.1] - 2024-05-06
|
6 | 85 |
|
7 | 86 | ### Scenarios
|
@@ -461,7 +540,8 @@ Thank you to the following contributors for your contributions to this HELM rele
|
461 | 540 |
|
462 | 541 | - Initial release
|
463 | 542 |
|
464 |
| -[upcoming]: https://github.com/stanford-crfm/helm/compare/v0.5.1...HEAD |
| 543 | +[upcoming]: https://github.com/stanford-crfm/helm/compare/v0.5.2...HEAD |
| 544 | +[v0.5.2]: https://github.com/stanford-crfm/helm/releases/tag/v0.5.2 |
465 | 545 | [v0.5.1]: https://github.com/stanford-crfm/helm/releases/tag/v0.5.1
|
466 | 546 | [v0.5.0]: https://github.com/stanford-crfm/helm/releases/tag/v0.5.0
|
467 | 547 | [v0.4.0]: https://github.com/stanford-crfm/helm/releases/tag/v0.4.0
|
|
0 commit comments