@@ -10,7 +10,7 @@ This document aims to publish the specific recipes we achieved for the popular L
10
10
> - The quantization algorithms provide by [ Intel® Neural Compressor] ( https://github.com/intel/neural-compressor ) and the evaluate functions provide by [ Intel® Extension for Transformers] ( https://github.com/intel/intel-extension-for-transformers ) .
11
11
> - The model list are continuing update, please expect to find more LLMs in the future.
12
12
13
- ## IPEX key models
13
+ ## Large Language Models Recipes
14
14
15
15
| Models | SQ INT8 | WOQ INT8 | WOQ INT4 |
16
16
| :-----------------------------: | :-----: | :------: | :------: |
@@ -38,3 +38,237 @@ This document aims to publish the specific recipes we achieved for the popular L
38
38
>
39
39
> - This model list comes from [ IPEX] ( https://intel.github.io/intel-extension-for-pytorch/cpu/latest/tutorials/llm.html ) .
40
40
> - The WIP recipes will be published soon.
41
+
42
+ ## Large Language Models Accuracy
43
+ <table >
44
+ <thead >
45
+ <tr >
46
+ <th rowspan="3">Model</th>
47
+ <th colspan="9">lambada_openai</th>
48
+ </tr >
49
+ <tr >
50
+ <th>FP32</th>
51
+ <th colspan="2">SQ INT8</th>
52
+ <th colspan="2">WOQ INT8</th>
53
+ <th colspan="2">WOQ INT4 GPTQ</th>
54
+ <th colspan="2">WOQ INT4 AutoRound</th>
55
+ </tr >
56
+ <tr >
57
+ <th>ACC</th>
58
+ <th>ACC</th>
59
+ <th>Ratio</th>
60
+ <th>ACC</th>
61
+ <th>Ratio</th>
62
+ <th>ACC</th>
63
+ <th>Ratio</th>
64
+ <th>ACC</th>
65
+ <th>Ratio</th>
66
+ </tr >
67
+ </thead >
68
+ <tbody >
69
+ <tr >
70
+ <td>baichuan-inc/Baichuan-13B-Chat</td>
71
+ <td>67.57%</td>
72
+ <td>68.23%</td>
73
+ <td>1.0098</td>
74
+ <td>67.57%</td>
75
+ <td>1.0000</td>
76
+ <td>67.84%</td>
77
+ <td>1.0040</td>
78
+ <td>NA</td>
79
+ <td>NA</td>
80
+ </tr >
81
+ <tr >
82
+ <td>baichuan-inc/Baichuan2-13B-Chat</td>
83
+ <td>71.51%</td>
84
+ <td>70.89%</td>
85
+ <td>0.9913</td>
86
+ <td>71.53%</td>
87
+ <td>1.0003</td>
88
+ <td>71.76%</td>
89
+ <td>1.0035</td>
90
+ <td>NA</td>
91
+ <td>NA</td>
92
+ </tr >
93
+ <tr >
94
+ <td>baichuan-inc/Baichuan2-7B-Chat</td>
95
+ <td>67.67%</td>
96
+ <td>67.96%</td>
97
+ <td>1.0043</td>
98
+ <td>67.59%</td>
99
+ <td>0.9988</td>
100
+ <td>67.24%</td>
101
+ <td>0.9936</td>
102
+ <td>67.42%</td>
103
+ <td>0.9963</td>
104
+ </tr >
105
+ <tr >
106
+ <td>bigscience/bloom-1b7</td>
107
+ <td>46.34%</td>
108
+ <td>47.99%</td>
109
+ <td>1.0356</td>
110
+ <td>46.38%</td>
111
+ <td>1.0009</td>
112
+ <td>46.19%</td>
113
+ <td>0.9968</td>
114
+ <td>NA</td>
115
+ <td>NA</td>
116
+ </tr >
117
+ <tr >
118
+ <td>databricks/dolly-v2-12b</td>
119
+ <td>64.35%</td>
120
+ <td>NA</td>
121
+ <td>NA</td>
122
+ <td>64.10%</td>
123
+ <td>0.9961</td>
124
+ <td>NA</td>
125
+ <td>NA</td>
126
+ <td>NA</td>
127
+ <td>NA</td>
128
+ </tr >
129
+ <tr >
130
+ <td>EleutherAI/gpt-j-6b</td>
131
+ <td>68.31%</td>
132
+ <td>68.33%</td>
133
+ <td>1.0003</td>
134
+ <td>68.23%</td>
135
+ <td>0.9988</td>
136
+ <td>68.79%</td>
137
+ <td>1.0070</td>
138
+ <td>68.43%</td>
139
+ <td>1.0018</td>
140
+ </tr >
141
+ <tr >
142
+ <td>EleutherAI/gpt-neox-20b</td>
143
+ <td>72.33%</td>
144
+ <td>NA</td>
145
+ <td>NA</td>
146
+ <td>72.25%</td>
147
+ <td>0.9989</td>
148
+ <td>71.96%</td>
149
+ <td>0.9949</td>
150
+ <td>NA</td>
151
+ <td>NA</td>
152
+ </tr >
153
+ <tr >
154
+ <td>facebook/opt-1.3b</td>
155
+ <td>57.89%</td>
156
+ <td>57.54%</td>
157
+ <td>0.9940</td>
158
+ <td>58.08%</td>
159
+ <td>1.0033</td>
160
+ <td>58.57%</td>
161
+ <td>1.0117</td>
162
+ <td>NA</td>
163
+ <td>NA</td>
164
+ </tr >
165
+ <tr >
166
+ <td>facebook/opt-30b</td>
167
+ <td>71.49%</td>
168
+ <td>71.51%</td>
169
+ <td>1.0003</td>
170
+ <td>71.51%</td>
171
+ <td>1.0003</td>
172
+ <td>71.82%</td>
173
+ <td>1.0046</td>
174
+ <td>72.11%</td>
175
+ <td>1.0087</td>
176
+ </tr >
177
+ <tr >
178
+ <td>meta-llama/Llama-2-13b-hf</td>
179
+ <td>76.77%</td>
180
+ <td>76.25%</td>
181
+ <td>0.9932</td>
182
+ <td>76.75%</td>
183
+ <td>0.9997</td>
184
+ <td>77.43%</td>
185
+ <td>1.0086</td>
186
+ <td>76.75%</td>
187
+ <td>0.9997</td>
188
+ </tr >
189
+ <tr >
190
+ <td>meta-llama/Llama-2-70b-hf</td>
191
+ <td>79.64%</td>
192
+ <td>79.55%</td>
193
+ <td>0.9989</td>
194
+ <td>79.57%</td>
195
+ <td>0.9991</td>
196
+ <td>80.09%</td>
197
+ <td>1.0057</td>
198
+ <td>79.97%</td>
199
+ <td>1.0041</td>
200
+ </tr >
201
+ <tr >
202
+ <td>meta-llama/Llama-2-7b-hf</td>
203
+ <td>73.92%</td>
204
+ <td>73.45%</td>
205
+ <td>0.9936</td>
206
+ <td>73.96%</td>
207
+ <td>1.0005</td>
208
+ <td>73.45%</td>
209
+ <td>0.9936</td>
210
+ <td>73.49%</td>
211
+ <td>0.9942</td>
212
+ </tr >
213
+ <tr >
214
+ <td>mistralai/Mistral-7B-v0.1</td>
215
+ <td>75.90%</td>
216
+ <td>NA</td>
217
+ <td>NA</td>
218
+ <td>75.80%</td>
219
+ <td>0.9987</td>
220
+ <td>76.13%</td>
221
+ <td>1.0030</td>
222
+ <td>75.61%</td>
223
+ <td>0.9962</td>
224
+ </tr >
225
+ <tr >
226
+ <td>THUDM/chatglm2-6b</td>
227
+ <td>53.23%</td>
228
+ <td>NA</td>
229
+ <td>NA</td>
230
+ <td>53.19%</td>
231
+ <td>0.9992</td>
232
+ <td>52.77%</td>
233
+ <td>0.9914</td>
234
+ <td>53.35%</td>
235
+ <td>1.0023</td>
236
+ </tr >
237
+ <tr >
238
+ <td>THUDM/chatglm3-6b</td>
239
+ <td>59.09%</td>
240
+ <td>NA</td>
241
+ <td>NA</td>
242
+ <td>59.01%</td>
243
+ <td>0.9986</td>
244
+ <td>NA</td>
245
+ <td>NA</td>
246
+ <td>58.61%</td>
247
+ <td>0.9919</td>
248
+ </tr >
249
+ <tr >
250
+ <td>tiiuae/falcon-40b</td>
251
+ <td>77.22%</td>
252
+ <td>77.04%</td>
253
+ <td>0.9977</td>
254
+ <td>77.22%</td>
255
+ <td>1.0000</td>
256
+ <td>77.94%</td>
257
+ <td>1.0093</td>
258
+ <td>78.79%</td>
259
+ <td>1.0203</td>
260
+ </tr >
261
+ <tr >
262
+ <td>tiiuae/falcon-7b</td>
263
+ <td>74.67%</td>
264
+ <td>76.44%</td>
265
+ <td>1.0237</td>
266
+ <td>74.77%</td>
267
+ <td>1.0013</td>
268
+ <td>75.00%</td>
269
+ <td>1.0044</td>
270
+ <td>NA</td>
271
+ <td>NA</td>
272
+ </tr >
273
+ </tbody >
274
+ </table >
0 commit comments