SafeLLamaSamplerChainHandle

Namespace: LLama.Native

A chain of sampler stages that can be used to select tokens from logits.

public sealed class SafeLLamaSamplerChainHandle : SafeLLamaHandleBase, System.IDisposable

Inheritance Object → CriticalFinalizerObject → SafeHandle → SafeLLamaHandleBase → SafeLLamaSamplerChainHandle
Implements IDisposable
Attributes NullableContextAttribute, NullableAttribute

Remarks:

Wraps a handle returned from llama_sampler_chain_init. Other samplers are owned by this chain and are never directly exposed.

Fields

handle

protected IntPtr handle;

Properties

Count

Get the number of samplers in this chain

public int Count { get; }

Property Value

Int32

IsInvalid

public bool IsInvalid { get; }

Property Value

Boolean

IsClosed

public bool IsClosed { get; }

Property Value

Boolean

Constructors

SafeLLamaSamplerChainHandle()

public SafeLLamaSamplerChainHandle()

Methods

ReleaseHandle()

protected bool ReleaseHandle()

Returns

Boolean

Apply(LLamaTokenDataArrayNative&)

Apply this sampler to a set of candidates

public void Apply(LLamaTokenDataArrayNative& candidates)

Parameters

candidates LLamaTokenDataArrayNative&

Sample(SafeLLamaContextHandle, Int32)

Sample and accept a token from the idx-th output of the last evaluation. Shorthand for:

var logits = ctx.GetLogitsIth(idx);
var token_data_array = LLamaTokenDataArray.Create(logits);
using LLamaTokenDataArrayNative.Create(token_data_array, out var native_token_data);
sampler_chain.Apply(native_token_data);
var token = native_token_data.Data.Span[native_token_data.Selected];
sampler_chain.Accept(token);
return token;

public LLamaToken Sample(SafeLLamaContextHandle context, int index)

Parameters

context SafeLLamaContextHandle

index Int32

Returns

LLamaToken

Reset()

Reset the state of this sampler

public void Reset()

Accept(LLamaToken)

Accept a token and update the internal state of this sampler

public void Accept(LLamaToken token)

Parameters

token LLamaToken

GetName(Int32)

Get the name of the sampler at the given index

public string GetName(int index)

Parameters

index Int32

Returns

String

GetSeed(Int32)

Get the seed of the sampler at the given index if applicable. returns LLAMA_DEFAULT_SEED otherwise

public uint GetSeed(int index)

Parameters

index Int32

Returns

UInt32

Create(LLamaSamplerChainParams)

Create a new sampler chain

public static SafeLLamaSamplerChainHandle Create(LLamaSamplerChainParams params)

Parameters

params LLamaSamplerChainParams

Returns

SafeLLamaSamplerChainHandle

AddClone(SafeLLamaSamplerChainHandle, Int32)

Clone a sampler stage from another chain and add it to this chain

public void AddClone(SafeLLamaSamplerChainHandle src, int index)

Parameters

src SafeLLamaSamplerChainHandle
The chain to clone a stage from

index Int32
The index of the stage to clone

Remove(Int32)

Remove a sampler stage from this chain

public void Remove(int index)

Parameters

index Int32

Exceptions

ArgumentOutOfRangeException

AddCustom<TSampler>(TSampler)

Add a custom sampler stage

public void AddCustom<TSampler>(TSampler sampler)

Type Parameters

TSampler

Parameters

sampler TSampler

AddGreedySampler()

Add a sampler which picks the most likely token.

public void AddGreedySampler()

AddDistributionSampler(UInt32)

Add a sampler which picks from the probability distribution of all tokens

public void AddDistributionSampler(uint seed)

Parameters

seed UInt32

AddMirostat1Sampler(Int32, UInt32, Single, Single, Int32)

Mirostat 1.0 algorithm described in the paper https://arxiv.org/abs/2007.14966. Uses tokens instead of words.

public void AddMirostat1Sampler(int vocabCount, uint seed, float tau, float eta, int m)

Parameters

vocabCount Int32

seed UInt32

tau Single
The target cross-entropy (or surprise) value you want to achieve for the generated text. A higher value corresponds to more surprising or less predictable text, while a lower value corresponds to less surprising or more predictable text.

eta Single
The learning rate used to update mu based on the error between the target and observed surprisal of the sampled word. A larger learning rate will cause mu to be updated more quickly, while a smaller learning rate will result in slower updates.

m Int32
The number of tokens considered in the estimation of s_hat. This is an arbitrary value that is used to calculate s_hat, which in turn helps to calculate the value of k. In the paper, they use m = 100, but you can experiment with different values to see how it affects the performance of the algorithm.

AddMirostat2Sampler(UInt32, Single, Single)

Mirostat 2.0 algorithm described in the paper https://arxiv.org/abs/2007.14966. Uses tokens instead of words.

public void AddMirostat2Sampler(uint seed, float tau, float eta)

Parameters

seed UInt32

tau Single
The target cross-entropy (or surprise) value you want to achieve for the generated text. A higher value corresponds to more surprising or less predictable text, while a lower value corresponds to less surprising or more predictable text.

eta Single
The learning rate used to update mu based on the error between the target and observed surprisal of the sampled word. A larger learning rate will cause mu to be updated more quickly, while a smaller learning rate will result in slower updates.

AddTopK(Int32)

Top-K sampling described in academic paper "The Curious Case of Neural Text Degeneration" https://arxiv.org/abs/1904.09751

public void AddTopK(int k)

Parameters

k Int32

Remarks:

Setting k <= 0 makes this a noop

AddTopNSigma(Single)

Top n sigma sampling as described in academic paper "Top-nσ: Not All Logits Are You Need" https://arxiv.org/pdf/2411.07641

public void AddTopNSigma(float n)

Parameters

n Single

AddTopP(Single, IntPtr)

Nucleus sampling described in academic paper "The Curious Case of Neural Text Degeneration" https://arxiv.org/abs/1904.09751

public void AddTopP(float p, IntPtr minKeep)

Parameters

p Single

minKeep IntPtr

AddMinP(Single, IntPtr)

Minimum P sampling as described in https://github.com/ggerganov/llama.cpp/pull/3841

public void AddMinP(float p, IntPtr minKeep)

Parameters

p Single

minKeep IntPtr

AddTypical(Single, IntPtr)

Locally Typical Sampling implementation described in the paper https://arxiv.org/abs/2202.00666.

public void AddTypical(float p, IntPtr minKeep)

Parameters

p Single

minKeep IntPtr

AddTemperature(Single)

Apply temperature to the logits. If temperature is less than zero the maximum logit is left unchanged and the rest are set to -infinity

public void AddTemperature(float t)

Parameters

t Single

AddDynamicTemperature(Single, Single, Single)

Dynamic temperature implementation (a.k.a. entropy) described in the paper https://arxiv.org/abs/2309.02772.

public void AddDynamicTemperature(float t, float delta, float exponent)

Parameters

t Single

delta Single

exponent Single

AddXTC(Single, Single, Int32, UInt32)

XTC sampler as described in https://github.com/oobabooga/text-generation-webui/pull/6335

public void AddXTC(float p, float t, int minKeep, uint seed)

Parameters

p Single

t Single

minKeep Int32

seed UInt32

AddFillInMiddleInfill(SafeLlamaModelHandle)

This sampler is meant to be used for fill-in-the-middle infilling, after top_k + top_p sampling
1. if the sum of the EOG probs times the number of candidates is higher than the sum of the other probs -> pick EOG
2. combine probs of tokens that have the same prefix

example:

- before:
"abc": 0.5
"abcd": 0.2
"abcde": 0.1
"dummy": 0.1

- after:
"abc": 0.8
"dummy": 0.1

3. discard non-EOG tokens with low prob
4. if no tokens are left -> pick EOT

public void AddFillInMiddleInfill(SafeLlamaModelHandle model)

Parameters

model SafeLlamaModelHandle

AddGrammar(SafeLlamaModelHandle, String, String)

Create a sampler which makes tokens impossible unless they match the grammar.

public void AddGrammar(SafeLlamaModelHandle model, string grammar, string root)

Parameters

model SafeLlamaModelHandle
The model that this grammar will be used with

grammar String

root String
Root rule of the grammar

AddGrammar(Vocabulary, String, String)

Create a sampler which makes tokens impossible unless they match the grammar.

public void AddGrammar(Vocabulary vocab, string grammar, string root)

Parameters

vocab Vocabulary
The vocabulary that this grammar will be used with

grammar String

root String
Root rule of the grammar

AddLazyGrammar(SafeLlamaModelHandle, String, String, ReadOnlySpan<String>, ReadOnlySpan<LLamaToken>)

Create a sampler using lazy grammar sampling: https://github.com/ggerganov/llama.cpp/pull/9639

public void AddLazyGrammar(SafeLlamaModelHandle model, string grammar, string root, ReadOnlySpan<string> patterns, ReadOnlySpan<LLamaToken> triggerTokens)

Parameters

model SafeLlamaModelHandle

grammar String
Grammar in GBNF form

root String
Root rule of the grammar

patterns ReadOnlySpan<String>
A list of patterns that will trigger the grammar sampler. Pattern will be matched from the start of the generation output, and grammar sampler will be fed content starting from its first match group.

triggerTokens ReadOnlySpan<LLamaToken>
A list of tokens that will trigger the grammar sampler. Grammar sampler will be fed content starting from the trigger token included..

AddPenalties(Int32, Single, Single, Single)

Create a sampler that applies various repetition penalties.

Avoid using on the full vocabulary as searching for repeated tokens can become slow. For example, apply top-k or top-p sampling first.

public void AddPenalties(int penaltyCount, float repeat, float freq, float presence)

Parameters

penaltyCount Int32
How many tokens of history to consider when calculating penalties

repeat Single
Repetition penalty

freq Single
Frequency penalty

presence Single
Presence penalty

AddDry(SafeLlamaModelHandle, ReadOnlySpan<String>, Single, Single, Int32, Int32)

DRY sampler, designed by p-e-w, as described in: https://github.com/oobabooga/text-generation-webui/pull/5677. Porting Koboldcpp implementation authored by pi6am: https://github.com/LostRuins/koboldcpp/pull/982

public void AddDry(SafeLlamaModelHandle model, ReadOnlySpan<string> sequenceBreakers, float multiplier, float base, int allowedLength, int penaltyLastN)

Parameters

model SafeLlamaModelHandle
The model this sampler will be used with

sequenceBreakers ReadOnlySpan<String>

multiplier Single
penalty multiplier, 0.0 = disabled

base Single
exponential base

allowedLength Int32
repeated sequences longer than this are penalized

penaltyLastN Int32
how many tokens to scan for repetitions (0 = entire context)

AddLogitBias(Int32, Span<LLamaLogitBias>)

Create a sampler that applies a bias directly to the logits

public void AddLogitBias(int vocabSize, Span<LLamaLogitBias> biases)

Parameters

vocabSize Int32

biases Span<LLamaLogitBias>

< Back