IContextParams
Namespace: LLama.Abstractions
The parameters for initializing a LLama context from a model.
1 |
|
Attributes NullableContextAttribute
Properties
ContextSize
Model context size (n_ctx)
1 |
|
Property Value
BatchSize
maximum batch size that can be submitted at once (must be >=32 to use BLAS) (n_batch)
1 |
|
Property Value
UBatchSize
Physical batch size
1 |
|
Property Value
SeqMax
max number of sequences (i.e. distinct states for recurrent models)
1 |
|
Property Value
Embeddings
If true, extract embeddings (together with logits).
1 |
|
Property Value
RopeFrequencyBase
RoPE base frequency (null to fetch from the model)
1 |
|
Property Value
RopeFrequencyScale
RoPE frequency scaling factor (null to fetch from the model)
1 |
|
Property Value
Encoding
The encoding to use for models
1 |
|
Property Value
Threads
Number of threads (null = autodetect) (n_threads)
1 |
|
Property Value
BatchThreads
Number of threads to use for batch processing (null = autodetect) (n_threads)
1 |
|
Property Value
YarnExtrapolationFactor
YaRN extrapolation mix factor (null = from model)
1 |
|
Property Value
YarnAttentionFactor
YaRN magnitude scaling factor (null = from model)
1 |
|
Property Value
YarnBetaFast
YaRN low correction dim (null = from model)
1 |
|
Property Value
YarnBetaSlow
YaRN high correction dim (null = from model)
1 |
|
Property Value
YarnOriginalContext
YaRN original context length (null = from model)
1 |
|
Property Value
YarnScalingType
YaRN scaling method to use.
1 |
|
Property Value
TypeK
Override the type of the K cache
1 |
|
Property Value
TypeV
Override the type of the V cache
1 |
|
Property Value
NoKqvOffload
Whether to disable offloading the KQV cache to the GPU
1 |
|
Property Value
FlashAttention
Whether to use flash attention
1 |
|
Property Value
DefragThreshold
defragment the KV cache if holes/size > defrag_threshold, Set to < 0 to disable (default) defragment the KV cache if holes/size > defrag_threshold, Set to or < 0 to disable (default)
1 |
|
Property Value
PoolingType
How to pool (sum) embedding results by sequence id (ignored if no pooling layer)
1 |
|
Property Value
AttentionType
Attention type to use for embeddings
1 |
|