LLamaModelQuantizeParams
Namespace: LLama.Native
Quantizer parameters used in the native API
1 |
|
Inheritance Object → ValueType → LLamaModelQuantizeParams
Remarks:
llama_model_quantize_params
Fields
nthread
number of threads to use for quantizing, if <=0 will use std::thread::hardware_concurrency()
1 |
|
ftype
quantize to this llama_ftype
1 |
|
output_tensor_type
output tensor type
1 |
|
token_embedding_type
token embeddings tensor type
1 |
|
imatrix
pointer to importance matrix data
1 |
|
kv_overrides
pointer to vector containing overrides
1 |
|
tensor_types
pointer to vector containing tensor types
1 |
|
Properties
allow_requantize
allow quantizing non-f32/f16 tensors
1 |
|
Property Value
quantize_output_tensor
quantize output.weight
1 |
|
Property Value
only_copy
only copy tensors - ftype, allow_requantize and quantize_output_tensor are ignored
1 |
|
Property Value
pure
quantize all tensors to the default type
1 |
|
Property Value
keep_split
quantize to the same number of shards
1 |
|
Property Value
Methods
Default()
Create a LLamaModelQuantizeParams with default values
1 |
|