File tree 2 files changed +25
-1
lines changed
2 files changed +25
-1
lines changed Original file line number Diff line number Diff line change 1
1
# CHANGELOG
2
+ # [ Version v1.8.0] ( https://github.com/intel/xFasterTransformer/releases/tag/v1.8.0 )
3
+ v1.8.0 Continuous Batching on Single ARC GPU and AMX_FP16 Support.
4
+
5
+ # Highlight
6
+ - Continuous Batching on Single ARC GPU is supported and can be integrated by ` vllm-xft ` .
7
+ - Introduce Intel AMX instructions support for ` float16 ` data type.
8
+
9
+ # Models
10
+ - Support ChatGLM4 series models.
11
+ - Introduce BF16/FP16 full path support for Qwen series models.
12
+
13
+ ## BUG fix
14
+ - Fixed memory leak of oneDNN primitive cache.
15
+ - Fixed SPR-HBM flat QUAD mode detect issue in benchmark scripts.
16
+ - Fixed heads Split error for distributed Grouped-query attention(GQA).
17
+ - Fixed an issue with the invokeAttentionLLaMA API.
18
+
19
+ # [ Version v1.7.3] ( https://github.com/intel/xFasterTransformer/releases/tag/v1.7.3 )
20
+ v1.7.3
21
+
22
+ ## BUG fix
23
+ - Fixed SHM reduceAdd & rope error when batch size is large.
24
+ - Fixed the issue of abnormal usage of oneDNN primitive cache.
25
+
2
26
# [ Version v1.7.2] ( https://github.com/intel/xFasterTransformer/releases/tag/v1.7.2 )
3
27
v1.7.2 - Continuous batching feature supports Qwen 1.0 & hybrid data types.
4
28
Original file line number Diff line number Diff line change 1
- 1.7.2
1
+ 1.8.0
You can’t perform that action at this time.
0 commit comments