LLamaModelQuantizeParams
Namespace: LLama.Native
Quantizer parameters used in the native API
public struct LLamaModelQuantizeParams
Inheritance Object → ValueType → LLamaModelQuantizeParams
Fields
nthread
number of threads to use for quantizing, if <=0 will use std::thread::hardware_concurrency()
public int nthread;
ftype
quantize to this llama_ftype
public LLamaFtype ftype;
Properties
allow_requantize
allow quantizing non-f32/f16 tensors
public bool allow_requantize { get; set; }
Property Value
quantize_output_tensor
quantize output.weight
public bool quantize_output_tensor { get; set; }