Release v0.5.2 (#2751)

yifanmai · web-flow · commit bceee3fdeca1 · 2024-06-17T14:39:12.000-07:00
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -2,6 +2,85 @@
 
 ## [Upcoming]
 
+## [v0.5.2] - 2024-06-17
+
+### Scenarios
+
+- Updated VHELM scenarios for VLMs (#2719, #2684, #2685, #2641, #2691)
+- Updated Image2Struct scenarios (#2608, #2640, #2660, #2661)
+- Added Automatic GPT4V Evaluation for VLM Originality Evaluation
+- Added FinQA scenario (#2588)
+- Added AIR-Bench 2024 (#2698, #2706, #2710, #2712, #2713)
+- Fixed `entity_data_imputation` scenario breakage by mirroring source data files (#2750)
+
+### Models
+
+- Added google-cloud-aiplatform~=1.48 dependency requirement for Vertex AI client (#2628)
+- Fixed bug with Vertex AI client error handling (#2614)
+- Fixed bug with for Arctic tokenizer (#2615)
+- Added Qwen1.5 110B Chat (#2621)
+- Added TogetherCompletionClient (#2629)
+- Fixed bugs with Yi Chat and Llama 3 Chat on Together (#2636)
+- Added Optimum Intel (#2609, #2674)
+- Added GPT-4o model (#2649, #2656)
+- Added SEA-LION 7B and SEA-LION 7B Instruct (#2647) 
+- Added more Gemini 1.5 Flash and Pro versions (#2653, #2664, #2718, #2718)
+- Added Gemini 1.0 Pro 002 (#2664)
+- Added Command R and Command R+ models (#2548) 
+- Fixed GPT4V Evaluator Out of Option Range Issue (#2677)
+- Added OLMo 1.5 (#2671)
+- Added RekaClient (#2675)
+- Added PaliGemma (#2683)
+- Added Mistral 7B Instruct v0.1, v0.2 and v0.3 (#2665)
+- Switched most Together chat models to use the chat client (#2703, #2701, #2705)
+- Added MedLM model (#2696, #2709)
+- Added Typhoon v1.5 models (#2659)
+- Changed HuggingFaceClient to truncate end of text token (#2643)
+- Added Qwen2 Instruct (72B) (#2722)
+- Added Yi Large (#2723, #1731)
+- Added Sailor models (#2658)
+- Added BioMistral and Meditron (#2728)
+
+### Frontend
+
+- Miscellaneous improvements and bug fixes (#2618, #2617, #2616, #2651, #2667, #2724)
+
+### Framework
+
+- Removed `adapter_spec` from `schema_*.yaml` files (#2611)
+- Added support for annotators / LLM-as-judge (#2622, #2700)
+- Updated documentation (#2626, #2529, #2521)
+
+### Evaluation Results
+
+- [MMLU v1.2.0](https://crfm.stanford.edu/helm/mmlu/v1.2.0/)
+    - Added results for DBRX Instruct, DeepSeek LLM Chat (67B), Gemini 1.5 Pro (0409 preview), Mistral Small (2402), Mistral Large (2402), Arctic Instruct
+- [MMLU v1.3.0](https://crfm.stanford.edu/helm/mmlu/v1.3.0/)
+    - Added results for Gemini 1.5 Flash (0514 preview), GPT-4o (2024-05-13), Palmyra X V3 (72B)
+- [MMLU v1.4.0](https://crfm.stanford.edu/helm/mmlu/v1.4.0/)
+    - Added results for Yi Large (Preview), OLMo 1.7 (7B), Command R, Command R Plus, Gemini 1.5 Flash (001), Gemini 1.5 Pro (001), Mistral Instruct v0.3 (7B), GPT-4 Turbo (2024-04-09), Qwen1.5 Chat (110B), Qwen2 Instruct (72B)
+- [Image2Struct v1.0.0](https://crfm.stanford.edu/helm/image2struct/v1.0.0/)
+    - Initial release with Claude 3 Sonnet (20240229), Claude 3 Opus (20240229), Gemini 1.0 Pro Vision, Gemini 1.5 Pro (0409 preview),IDEFICS 2 (8B), IDEFICS-instruct (9B), IDEFICS-instruct (80B), LLaVA 1.5 (13B), LLaVA 1.6 (13B), GPT-4o (2024-05-13), GPT-4V (1106 preview), Qwen-VL Chat
+- [AIR-Bench v1.0.0](https://crfm.stanford.edu/helm/air-bench/v1.0.0/)
+    - Initial release with Claude 3 Haiku (20240307), Claude 3 Sonnet (20240229), Claude 3 Opus (20240229), Cohere Command R, Cohere Command R Plus, DBRX Instruct, DeepSeek LLM Chat (67B), Gemini 1.5 Pro (001, default safety), Gemini 1.5 Flash (001, default safety), Llama 3 Instruct (8B), Llama 3 Instruct (70B), Yi Chat (34B), Mistral Instruct v0.3 (7B), Mixtral Instruct (8x7B), Mixtral Instruct (8x22B), GPT-3.5 Turbo (0613), GPT-3.5 Turbo (1106), GPT-3.5 Turbo (0125), GPT-4 Turbo (2024-04-09), GPT-4o (2024-05-13), Qwen1.5 Chat (72B)
+
+### Contributors
+
+Thank you to the following contributors for your work on this HELM release!
+
+- @andyt-cohere
+- @bryanzhou008
+- @chiheem
+- @farzaank
+- @ImKeTT
+- @JosselinSomervilleRoberts
+- @NoushNabi
+- @percyliang
+- @raileymontalan
+- @shakatoday
+- @teetone
+- @yifanmai
+
 ## [v0.5.1] - 2024-05-06
 
 ### Scenarios
@@ -461,7 +540,8 @@ Thank you to the following contributors for your contributions to this HELM rele
 
 - Initial release
 
-[upcoming]: https://github.com/stanford-crfm/helm/compare/v0.5.1...HEAD
+[upcoming]: https://github.com/stanford-crfm/helm/compare/v0.5.2...HEAD
+[v0.5.2]: https://github.com/stanford-crfm/helm/releases/tag/v0.5.2
 [v0.5.1]: https://github.com/stanford-crfm/helm/releases/tag/v0.5.1
 [v0.5.0]: https://github.com/stanford-crfm/helm/releases/tag/v0.5.0
 [v0.4.0]: https://github.com/stanford-crfm/helm/releases/tag/v0.4.0
diff --git a/setup.cfg b/setup.cfg
@@ -1,6 +1,6 @@
 [metadata]
 name = crfm-helm
-version = 0.5.1
+version = 0.5.2
 author = Stanford CRFM
 author_email = contact-crfm@stanford.edu
 description = Benchmark for language models