SafeLLamaSamplerChainHandle
Namespace: LLama.Native
A chain of sampler stages that can be used to select tokens from logits.
1 |
|
Inheritance Object → CriticalFinalizerObject → SafeHandle → SafeLLamaHandleBase → SafeLLamaSamplerChainHandle
Implements IDisposable
Attributes NullableContextAttribute, NullableAttribute
Remarks:
Wraps a handle returned from llama_sampler_chain_init
. Other samplers are owned by this chain and are never directly exposed.
Fields
handle
1 |
|
Properties
Count
Get the number of samplers in this chain
1 |
|
Property Value
IsInvalid
1 |
|
Property Value
IsClosed
1 |
|
Property Value
Constructors
SafeLLamaSamplerChainHandle()
1 |
|
Methods
ReleaseHandle()
1 |
|
Returns
Apply(LLamaTokenDataArrayNative&)
Apply this sampler to a set of candidates
1 |
|
Parameters
candidates
LLamaTokenDataArrayNative&
Sample(SafeLLamaContextHandle, Int32)
Sample and accept a token from the idx-th output of the last evaluation. Shorthand for:
1 2 3 4 5 6 7 |
|
1 |
|
Parameters
context
SafeLLamaContextHandle
index
Int32
Returns
Reset()
Reset the state of this sampler
1 |
|
Accept(LLamaToken)
Accept a token and update the internal state of this sampler
1 |
|
Parameters
token
LLamaToken
GetName(Int32)
Get the name of the sampler at the given index
1 |
|
Parameters
index
Int32
Returns
GetSeed(Int32)
Get the seed of the sampler at the given index if applicable. returns LLAMA_DEFAULT_SEED otherwise
1 |
|
Parameters
index
Int32
Returns
Create(LLamaSamplerChainParams)
Create a new sampler chain
1 |
|
Parameters
params
LLamaSamplerChainParams
Returns
AddClone(SafeLLamaSamplerChainHandle, Int32)
Clone a sampler stage from another chain and add it to this chain
1 |
|
Parameters
src
SafeLLamaSamplerChainHandle
The chain to clone a stage from
index
Int32
The index of the stage to clone
Remove(Int32)
Remove a sampler stage from this chain
1 |
|
Parameters
index
Int32
Exceptions
AddCustom<TSampler>(TSampler)
Add a custom sampler stage
1 |
|
Type Parameters
TSampler
Parameters
sampler
TSampler
AddGreedySampler()
Add a sampler which picks the most likely token.
1 |
|
AddDistributionSampler(UInt32)
Add a sampler which picks from the probability distribution of all tokens
1 |
|
Parameters
seed
UInt32
AddMirostat1Sampler(Int32, UInt32, Single, Single, Int32)
Mirostat 1.0 algorithm described in the paper https://arxiv.org/abs/2007.14966. Uses tokens instead of words.
1 |
|
Parameters
vocabCount
Int32
seed
UInt32
tau
Single
The target cross-entropy (or surprise) value you want to achieve for the generated text. A higher value corresponds to more surprising or less predictable text, while a lower value corresponds to less surprising or more predictable text.
eta
Single
The learning rate used to update mu
based on the error between the target and observed surprisal of the sampled word. A larger learning rate will cause mu
to be updated more quickly, while a smaller learning rate will result in slower updates.
m
Int32
The number of tokens considered in the estimation of s_hat
. This is an arbitrary value that is used to calculate s_hat
, which in turn helps to calculate the value of k
. In the paper, they use m = 100
, but you can experiment with different values to see how it affects the performance of the algorithm.
AddMirostat2Sampler(UInt32, Single, Single)
Mirostat 2.0 algorithm described in the paper https://arxiv.org/abs/2007.14966. Uses tokens instead of words.
1 |
|
Parameters
seed
UInt32
tau
Single
The target cross-entropy (or surprise) value you want to achieve for the generated text. A higher value corresponds to more surprising or less predictable text, while a lower value corresponds to less surprising or more predictable text.
eta
Single
The learning rate used to update mu
based on the error between the target and observed surprisal of the sampled word. A larger learning rate will cause mu
to be updated more quickly, while a smaller learning rate will result in slower updates.
AddTopK(Int32)
Top-K sampling described in academic paper "The Curious Case of Neural Text Degeneration" https://arxiv.org/abs/1904.09751
1 |
|
Parameters
k
Int32
Remarks:
Setting k <= 0 makes this a noop
AddTopNSigma(Single)
Top n sigma sampling as described in academic paper "Top-nσ: Not All Logits Are You Need" https://arxiv.org/pdf/2411.07641
1 |
|
Parameters
n
Single
AddTopP(Single, IntPtr)
Nucleus sampling described in academic paper "The Curious Case of Neural Text Degeneration" https://arxiv.org/abs/1904.09751
1 |
|
Parameters
p
Single
minKeep
IntPtr
AddMinP(Single, IntPtr)
Minimum P sampling as described in https://github.com/ggerganov/llama.cpp/pull/3841
1 |
|
Parameters
p
Single
minKeep
IntPtr
AddTypical(Single, IntPtr)
Locally Typical Sampling implementation described in the paper https://arxiv.org/abs/2202.00666.
1 |
|
Parameters
p
Single
minKeep
IntPtr
AddTemperature(Single)
Apply temperature to the logits. If temperature is less than zero the maximum logit is left unchanged and the rest are set to -infinity
1 |
|
Parameters
t
Single
AddDynamicTemperature(Single, Single, Single)
Dynamic temperature implementation (a.k.a. entropy) described in the paper https://arxiv.org/abs/2309.02772.
1 |
|
Parameters
t
Single
delta
Single
exponent
Single
AddXTC(Single, Single, Int32, UInt32)
XTC sampler as described in https://github.com/oobabooga/text-generation-webui/pull/6335
1 |
|
Parameters
p
Single
t
Single
minKeep
Int32
seed
UInt32
AddFillInMiddleInfill(SafeLlamaModelHandle)
This sampler is meant to be used for fill-in-the-middle infilling, after top_k + top_p sampling
1. if the sum of the EOG probs times the number of candidates is higher than the sum of the other probs -> pick EOG
2. combine probs of tokens that have the same prefix
example:
- before:
"abc": 0.5
"abcd": 0.2
"abcde": 0.1
"dummy": 0.1
- after:
"abc": 0.8
"dummy": 0.1
3. discard non-EOG tokens with low prob
4. if no tokens are left -> pick EOT
1 |
|
Parameters
model
SafeLlamaModelHandle
AddGrammar(SafeLlamaModelHandle, String, String)
Create a sampler which makes tokens impossible unless they match the grammar.
1 |
|
Parameters
model
SafeLlamaModelHandle
The model that this grammar will be used with
grammar
String
root
String
Root rule of the grammar
AddGrammar(Vocabulary, String, String)
Create a sampler which makes tokens impossible unless they match the grammar.
1 |
|
Parameters
vocab
Vocabulary
The vocabulary that this grammar will be used with
grammar
String
root
String
Root rule of the grammar
AddLazyGrammar(SafeLlamaModelHandle, String, String, ReadOnlySpan<String>, ReadOnlySpan<LLamaToken>)
Create a sampler using lazy grammar sampling: https://github.com/ggerganov/llama.cpp/pull/9639
1 |
|
Parameters
model
SafeLlamaModelHandle
grammar
String
Grammar in GBNF form
root
String
Root rule of the grammar
patterns
ReadOnlySpan<String>
A list of patterns that will trigger the grammar sampler. Pattern will be matched from the start of the generation output, and grammar sampler will be fed content starting from its first match group.
triggerTokens
ReadOnlySpan<LLamaToken>
A list of tokens that will trigger the grammar sampler. Grammar sampler will be fed content starting from the trigger token included..
AddPenalties(Int32, Single, Single, Single)
Create a sampler that applies various repetition penalties.
Avoid using on the full vocabulary as searching for repeated tokens can become slow. For example, apply top-k or top-p sampling first.
1 |
|
Parameters
penaltyCount
Int32
How many tokens of history to consider when calculating penalties
repeat
Single
Repetition penalty
freq
Single
Frequency penalty
presence
Single
Presence penalty
AddDry(SafeLlamaModelHandle, ReadOnlySpan<String>, Single, Single, Int32, Int32)
DRY sampler, designed by p-e-w, as described in: https://github.com/oobabooga/text-generation-webui/pull/5677. Porting Koboldcpp implementation authored by pi6am: https://github.com/LostRuins/koboldcpp/pull/982
1 |
|
Parameters
model
SafeLlamaModelHandle
The model this sampler will be used with
sequenceBreakers
ReadOnlySpan<String>
multiplier
Single
penalty multiplier, 0.0 = disabled
base
Single
exponential base
allowedLength
Int32
repeated sequences longer than this are penalized
penaltyLastN
Int32
how many tokens to scan for repetitions (0 = entire context)
AddLogitBias(Int32, Span<LLamaLogitBias>)
Create a sampler that applies a bias directly to the logits
1 |
|
Parameters
vocabSize
Int32
biases
Span<LLamaLogitBias>