Skip to content

Commit faa25f4

Browse files
authored
[Version] v1.8.0. (#22)
1 parent 275b673 commit faa25f4

File tree

2 files changed

+25
-1
lines changed

2 files changed

+25
-1
lines changed

CHANGELOG.md

+24
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,28 @@
11
# CHANGELOG
2+
# [Version v1.8.0](https://github.com/intel/xFasterTransformer/releases/tag/v1.8.0)
3+
v1.8.0 Continuous Batching on Single ARC GPU and AMX_FP16 Support.
4+
5+
# Highlight
6+
- Continuous Batching on Single ARC GPU is supported and can be integrated by `vllm-xft`.
7+
- Introduce Intel AMX instructions support for `float16` data type.
8+
9+
# Models
10+
- Support ChatGLM4 series models.
11+
- Introduce BF16/FP16 full path support for Qwen series models.
12+
13+
## BUG fix
14+
- Fixed memory leak of oneDNN primitive cache.
15+
- Fixed SPR-HBM flat QUAD mode detect issue in benchmark scripts.
16+
- Fixed heads Split error for distributed Grouped-query attention(GQA).
17+
- Fixed an issue with the invokeAttentionLLaMA API.
18+
19+
# [Version v1.7.3](https://github.com/intel/xFasterTransformer/releases/tag/v1.7.3)
20+
v1.7.3
21+
22+
## BUG fix
23+
- Fixed SHM reduceAdd & rope error when batch size is large.
24+
- Fixed the issue of abnormal usage of oneDNN primitive cache.
25+
226
# [Version v1.7.2](https://github.com/intel/xFasterTransformer/releases/tag/v1.7.2)
327
v1.7.2 - Continuous batching feature supports Qwen 1.0 & hybrid data types.
428

VERSION

+1-1
Original file line numberDiff line numberDiff line change
@@ -1 +1 @@
1-
1.7.2
1+
1.8.0

0 commit comments

Comments
 (0)