SafeLLamaSamplerChainHandle
Namespace: LLama.Native
A chain of sampler stages that can be used to select tokens from logits.
1 | |
Inheritance Object → CriticalFinalizerObject → SafeHandle → SafeLLamaHandleBase → SafeLLamaSamplerChainHandle
Implements IDisposable
Attributes NullableContextAttribute, NullableAttribute
Remarks:
Wraps a handle returned from llama_sampler_chain_init. Other samplers are owned by this chain and are never directly exposed.
Fields
handle
1 | |
Properties
Count
Get the number of samplers in this chain
1 | |
Property Value
IsInvalid
1 | |
Property Value
IsClosed
1 | |
Property Value
Constructors
SafeLLamaSamplerChainHandle()
1 | |
Methods
ReleaseHandle()
1 | |
Returns
Apply(LLamaTokenDataArrayNative&)
Apply this sampler to a set of candidates
1 | |
Parameters
candidates LLamaTokenDataArrayNative&
Sample(SafeLLamaContextHandle, Int32)
Sample and accept a token from the idx-th output of the last evaluation. Shorthand for:
1 2 3 4 5 6 7 | |
1 | |
Parameters
context SafeLLamaContextHandle
index Int32
Returns
Reset()
Reset the state of this sampler
1 | |
Accept(LLamaToken)
Accept a token and update the internal state of this sampler
1 | |
Parameters
token LLamaToken
GetName(Int32)
Get the name of the sampler at the given index
1 | |
Parameters
index Int32
Returns
GetSeed(Int32)
Get the seed of the sampler at the given index if applicable. returns LLAMA_DEFAULT_SEED otherwise
1 | |
Parameters
index Int32
Returns
Create(LLamaSamplerChainParams)
Create a new sampler chain
1 | |
Parameters
params LLamaSamplerChainParams
Returns
AddClone(SafeLLamaSamplerChainHandle, Int32)
Clone a sampler stage from another chain and add it to this chain
1 | |
Parameters
src SafeLLamaSamplerChainHandle
The chain to clone a stage from
index Int32
The index of the stage to clone
Remove(Int32)
Remove a sampler stage from this chain
1 | |
Parameters
index Int32
Exceptions
AddCustom<TSampler>(TSampler)
Add a custom sampler stage
1 | |
Type Parameters
TSampler
Parameters
sampler TSampler
AddGreedySampler()
Add a sampler which picks the most likely token.
1 | |
AddDistributionSampler(UInt32)
Add a sampler which picks from the probability distribution of all tokens
1 | |
Parameters
seed UInt32
AddMirostat1Sampler(Int32, UInt32, Single, Single, Int32)
Mirostat 1.0 algorithm described in the paper https://arxiv.org/abs/2007.14966. Uses tokens instead of words.
1 | |
Parameters
vocabCount Int32
seed UInt32
tau Single
The target cross-entropy (or surprise) value you want to achieve for the generated text. A higher value corresponds to more surprising or less predictable text, while a lower value corresponds to less surprising or more predictable text.
eta Single
The learning rate used to update mu based on the error between the target and observed surprisal of the sampled word. A larger learning rate will cause mu to be updated more quickly, while a smaller learning rate will result in slower updates.
m Int32
The number of tokens considered in the estimation of s_hat. This is an arbitrary value that is used to calculate s_hat, which in turn helps to calculate the value of k. In the paper, they use m = 100, but you can experiment with different values to see how it affects the performance of the algorithm.
AddMirostat2Sampler(UInt32, Single, Single)
Mirostat 2.0 algorithm described in the paper https://arxiv.org/abs/2007.14966. Uses tokens instead of words.
1 | |
Parameters
seed UInt32
tau Single
The target cross-entropy (or surprise) value you want to achieve for the generated text. A higher value corresponds to more surprising or less predictable text, while a lower value corresponds to less surprising or more predictable text.
eta Single
The learning rate used to update mu based on the error between the target and observed surprisal of the sampled word. A larger learning rate will cause mu to be updated more quickly, while a smaller learning rate will result in slower updates.
AddTopK(Int32)
Top-K sampling described in academic paper "The Curious Case of Neural Text Degeneration" https://arxiv.org/abs/1904.09751
1 | |
Parameters
k Int32
Remarks:
Setting k <= 0 makes this a noop
AddTopNSigma(Single)
Top n sigma sampling as described in academic paper "Top-nσ: Not All Logits Are You Need" https://arxiv.org/pdf/2411.07641
1 | |
Parameters
n Single
AddTopP(Single, IntPtr)
Nucleus sampling described in academic paper "The Curious Case of Neural Text Degeneration" https://arxiv.org/abs/1904.09751
1 | |
Parameters
p Single
minKeep IntPtr
AddMinP(Single, IntPtr)
Minimum P sampling as described in https://github.com/ggerganov/llama.cpp/pull/3841
1 | |
Parameters
p Single
minKeep IntPtr
AddTypical(Single, IntPtr)
Locally Typical Sampling implementation described in the paper https://arxiv.org/abs/2202.00666.
1 | |
Parameters
p Single
minKeep IntPtr
AddTemperature(Single)
Apply temperature to the logits. If temperature is less than zero the maximum logit is left unchanged and the rest are set to -infinity
1 | |
Parameters
t Single
AddDynamicTemperature(Single, Single, Single)
Dynamic temperature implementation (a.k.a. entropy) described in the paper https://arxiv.org/abs/2309.02772.
1 | |
Parameters
t Single
delta Single
exponent Single
AddXTC(Single, Single, Int32, UInt32)
XTC sampler as described in https://github.com/oobabooga/text-generation-webui/pull/6335
1 | |
Parameters
p Single
t Single
minKeep Int32
seed UInt32
AddFillInMiddleInfill(SafeLlamaModelHandle)
This sampler is meant to be used for fill-in-the-middle infilling, after top_k + top_p sampling
1. if the sum of the EOG probs times the number of candidates is higher than the sum of the other probs -> pick EOG
2. combine probs of tokens that have the same prefix
example:
- before:
"abc": 0.5
"abcd": 0.2
"abcde": 0.1
"dummy": 0.1
- after:
"abc": 0.8
"dummy": 0.1
3. discard non-EOG tokens with low prob
4. if no tokens are left -> pick EOT
1 | |
Parameters
model SafeLlamaModelHandle
AddGrammar(SafeLlamaModelHandle, String, String)
Create a sampler which makes tokens impossible unless they match the grammar.
1 | |
Parameters
model SafeLlamaModelHandle
The model that this grammar will be used with
grammar String
root String
Root rule of the grammar
AddGrammar(Vocabulary, String, String)
Create a sampler which makes tokens impossible unless they match the grammar.
1 | |
Parameters
vocab Vocabulary
The vocabulary that this grammar will be used with
grammar String
root String
Root rule of the grammar
AddLazyGrammar(SafeLlamaModelHandle, String, String, ReadOnlySpan<String>, ReadOnlySpan<LLamaToken>)
Create a sampler using lazy grammar sampling: https://github.com/ggerganov/llama.cpp/pull/9639
1 | |
Parameters
model SafeLlamaModelHandle
grammar String
Grammar in GBNF form
root String
Root rule of the grammar
patterns ReadOnlySpan<String>
A list of patterns that will trigger the grammar sampler. Pattern will be matched from the start of the generation output, and grammar sampler will be fed content starting from its first match group.
triggerTokens ReadOnlySpan<LLamaToken>
A list of tokens that will trigger the grammar sampler. Grammar sampler will be fed content starting from the trigger token included..
AddPenalties(Int32, Single, Single, Single)
Create a sampler that applies various repetition penalties.
Avoid using on the full vocabulary as searching for repeated tokens can become slow. For example, apply top-k or top-p sampling first.
1 | |
Parameters
penaltyCount Int32
How many tokens of history to consider when calculating penalties
repeat Single
Repetition penalty
freq Single
Frequency penalty
presence Single
Presence penalty
AddDry(SafeLlamaModelHandle, ReadOnlySpan<String>, Single, Single, Int32, Int32)
DRY sampler, designed by p-e-w, as described in: https://github.com/oobabooga/text-generation-webui/pull/5677. Porting Koboldcpp implementation authored by pi6am: https://github.com/LostRuins/koboldcpp/pull/982
1 | |
Parameters
model SafeLlamaModelHandle
The model this sampler will be used with
sequenceBreakers ReadOnlySpan<String>
multiplier Single
penalty multiplier, 0.0 = disabled
base Single
exponential base
allowedLength Int32
repeated sequences longer than this are penalized
penaltyLastN Int32
how many tokens to scan for repetitions (0 = entire context)
AddLogitBias(Int32, Span<LLamaLogitBias>)
Create a sampler that applies a bias directly to the logits
1 | |
Parameters
vocabSize Int32
biases Span<LLamaLogitBias>