Quantization
Quantization is significant to accelerate the model inference. Since there's little accuracy (performance) reduction when quantizing the model, get it easy to quantize it!
To quantize the model, please call Quantize
from LLamaQuantizer
, which is a static method.
1 2 3 4 5 |
|
After calling it, a quantized model file will be saved.
There're currently the following types of quantization supported:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 |
|