Skip to content

Commit bceee3f

Browse files
authored
Release v0.5.2 (#2751)
1 parent 1991502 commit bceee3f

File tree

2 files changed

+82
-2
lines changed

2 files changed

+82
-2
lines changed

CHANGELOG.md

+81-1
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,85 @@
22

33
## [Upcoming]
44

5+
## [v0.5.2] - 2024-06-17
6+
7+
### Scenarios
8+
9+
- Updated VHELM scenarios for VLMs (#2719, #2684, #2685, #2641, #2691)
10+
- Updated Image2Struct scenarios (#2608, #2640, #2660, #2661)
11+
- Added Automatic GPT4V Evaluation for VLM Originality Evaluation
12+
- Added FinQA scenario (#2588)
13+
- Added AIR-Bench 2024 (#2698, #2706, #2710, #2712, #2713)
14+
- Fixed `entity_data_imputation` scenario breakage by mirroring source data files (#2750)
15+
16+
### Models
17+
18+
- Added google-cloud-aiplatform~=1.48 dependency requirement for Vertex AI client (#2628)
19+
- Fixed bug with Vertex AI client error handling (#2614)
20+
- Fixed bug with for Arctic tokenizer (#2615)
21+
- Added Qwen1.5 110B Chat (#2621)
22+
- Added TogetherCompletionClient (#2629)
23+
- Fixed bugs with Yi Chat and Llama 3 Chat on Together (#2636)
24+
- Added Optimum Intel (#2609, #2674)
25+
- Added GPT-4o model (#2649, #2656)
26+
- Added SEA-LION 7B and SEA-LION 7B Instruct (#2647)
27+
- Added more Gemini 1.5 Flash and Pro versions (#2653, #2664, #2718, #2718)
28+
- Added Gemini 1.0 Pro 002 (#2664)
29+
- Added Command R and Command R+ models (#2548)
30+
- Fixed GPT4V Evaluator Out of Option Range Issue (#2677)
31+
- Added OLMo 1.5 (#2671)
32+
- Added RekaClient (#2675)
33+
- Added PaliGemma (#2683)
34+
- Added Mistral 7B Instruct v0.1, v0.2 and v0.3 (#2665)
35+
- Switched most Together chat models to use the chat client (#2703, #2701, #2705)
36+
- Added MedLM model (#2696, #2709)
37+
- Added Typhoon v1.5 models (#2659)
38+
- Changed HuggingFaceClient to truncate end of text token (#2643)
39+
- Added Qwen2 Instruct (72B) (#2722)
40+
- Added Yi Large (#2723, #1731)
41+
- Added Sailor models (#2658)
42+
- Added BioMistral and Meditron (#2728)
43+
44+
### Frontend
45+
46+
- Miscellaneous improvements and bug fixes (#2618, #2617, #2616, #2651, #2667, #2724)
47+
48+
### Framework
49+
50+
- Removed `adapter_spec` from `schema_*.yaml` files (#2611)
51+
- Added support for annotators / LLM-as-judge (#2622, #2700)
52+
- Updated documentation (#2626, #2529, #2521)
53+
54+
### Evaluation Results
55+
56+
- [MMLU v1.2.0](https://crfm.stanford.edu/helm/mmlu/v1.2.0/)
57+
- Added results for DBRX Instruct, DeepSeek LLM Chat (67B), Gemini 1.5 Pro (0409 preview), Mistral Small (2402), Mistral Large (2402), Arctic Instruct
58+
- [MMLU v1.3.0](https://crfm.stanford.edu/helm/mmlu/v1.3.0/)
59+
- Added results for Gemini 1.5 Flash (0514 preview), GPT-4o (2024-05-13), Palmyra X V3 (72B)
60+
- [MMLU v1.4.0](https://crfm.stanford.edu/helm/mmlu/v1.4.0/)
61+
- Added results for Yi Large (Preview), OLMo 1.7 (7B), Command R, Command R Plus, Gemini 1.5 Flash (001), Gemini 1.5 Pro (001), Mistral Instruct v0.3 (7B), GPT-4 Turbo (2024-04-09), Qwen1.5 Chat (110B), Qwen2 Instruct (72B)
62+
- [Image2Struct v1.0.0](https://crfm.stanford.edu/helm/image2struct/v1.0.0/)
63+
- Initial release with Claude 3 Sonnet (20240229), Claude 3 Opus (20240229), Gemini 1.0 Pro Vision, Gemini 1.5 Pro (0409 preview),IDEFICS 2 (8B), IDEFICS-instruct (9B), IDEFICS-instruct (80B), LLaVA 1.5 (13B), LLaVA 1.6 (13B), GPT-4o (2024-05-13), GPT-4V (1106 preview), Qwen-VL Chat
64+
- [AIR-Bench v1.0.0](https://crfm.stanford.edu/helm/air-bench/v1.0.0/)
65+
- Initial release with Claude 3 Haiku (20240307), Claude 3 Sonnet (20240229), Claude 3 Opus (20240229), Cohere Command R, Cohere Command R Plus, DBRX Instruct, DeepSeek LLM Chat (67B), Gemini 1.5 Pro (001, default safety), Gemini 1.5 Flash (001, default safety), Llama 3 Instruct (8B), Llama 3 Instruct (70B), Yi Chat (34B), Mistral Instruct v0.3 (7B), Mixtral Instruct (8x7B), Mixtral Instruct (8x22B), GPT-3.5 Turbo (0613), GPT-3.5 Turbo (1106), GPT-3.5 Turbo (0125), GPT-4 Turbo (2024-04-09), GPT-4o (2024-05-13), Qwen1.5 Chat (72B)
66+
67+
### Contributors
68+
69+
Thank you to the following contributors for your work on this HELM release!
70+
71+
- @andyt-cohere
72+
- @bryanzhou008
73+
- @chiheem
74+
- @farzaank
75+
- @ImKeTT
76+
- @JosselinSomervilleRoberts
77+
- @NoushNabi
78+
- @percyliang
79+
- @raileymontalan
80+
- @shakatoday
81+
- @teetone
82+
- @yifanmai
83+
584
## [v0.5.1] - 2024-05-06
685

786
### Scenarios
@@ -461,7 +540,8 @@ Thank you to the following contributors for your contributions to this HELM rele
461540

462541
- Initial release
463542

464-
[upcoming]: https://github.com/stanford-crfm/helm/compare/v0.5.1...HEAD
543+
[upcoming]: https://github.com/stanford-crfm/helm/compare/v0.5.2...HEAD
544+
[v0.5.2]: https://github.com/stanford-crfm/helm/releases/tag/v0.5.2
465545
[v0.5.1]: https://github.com/stanford-crfm/helm/releases/tag/v0.5.1
466546
[v0.5.0]: https://github.com/stanford-crfm/helm/releases/tag/v0.5.0
467547
[v0.4.0]: https://github.com/stanford-crfm/helm/releases/tag/v0.4.0

setup.cfg

+1-1
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
[metadata]
22
name = crfm-helm
3-
version = 0.5.1
3+
version = 0.5.2
44
author = Stanford CRFM
55
author_email = contact-crfm@stanford.edu
66
description = Benchmark for language models

0 commit comments

Comments
 (0)