You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardexpand all lines: docs/3x/PT_WeightOnlyQuant.md
+8-8
Original file line number
Diff line number
Diff line change
@@ -31,13 +31,13 @@ Theoretically, round-to-nearest (RTN) is the most straightforward way to quantiz
31
31
32
32
## Supported Matrix
33
33
34
-
| Algorithms/Backend | PyTorch eager mode |
34
+
| Algorithms/Backend | PyTorch eager mode |
35
35
|--------------|----------|
36
36
| RTN |✔|
37
37
| GPTQ |✔|
38
38
| AutoRound|✔|
39
39
| AWQ |✔|
40
-
| TEQ |✔|
40
+
| TEQ |✔|
41
41
| HQQ |✔|
42
42
> **RTN:** A quantification method that we can think of very intuitively. It does not require additional datasets and is a very fast quantization method. Generally speaking, RTN will convert the weight into a uniformly distributed integer data type, but some algorithms, such as Qlora, propose a non-uniform NF4 data type and prove its theoretical optimality.
43
43
@@ -64,8 +64,8 @@ WeightOnlyQuant quantization for PyTorch is using prepare and convert [APIs](./P
0 commit comments