LLamaTokenDataArray

Namespace: LLama.Native

Contains an array of LLamaTokenData, potentially sorted.

public struct LLamaTokenDataArray

Inheritance Object → ValueType → LLamaTokenDataArray

Fields

data

The LLamaTokenData

public Memory<LLamaTokenData> data;

sorted

Indicates if data is sorted by logits in descending order. If this is false the token data is in no particular order.

public bool sorted;

Constructors

LLamaTokenDataArray(Memory<LLamaTokenData>, Boolean)

Create a new LLamaTokenDataArray

LLamaTokenDataArray(Memory<LLamaTokenData> tokens, bool isSorted)

Parameters

tokens Memory<LLamaTokenData>

isSorted Boolean

Methods

Create(ReadOnlySpan<Single>)

Create a new LLamaTokenDataArray, copying the data from the given logits

LLamaTokenDataArray Create(ReadOnlySpan<float> logits)

Parameters

logits ReadOnlySpan<Single>

Returns

LLamaTokenDataArray

OverwriteLogits(ReadOnlySpan<ValueTuple<LLamaToken, Single>>)

Overwrite the logit values for all given tokens

void OverwriteLogits(ReadOnlySpan<ValueTuple<LLamaToken, float>> values)

Parameters

values ReadOnlySpan<ValueTuple<LLamaToken, Single>>
tuples of token and logit value to overwrite

ApplyGrammar(SafeLLamaContextHandle, SafeLLamaGrammarHandle)

Apply grammar rules to candidate tokens

void ApplyGrammar(SafeLLamaContextHandle ctx, SafeLLamaGrammarHandle grammar)

Parameters

ctx SafeLLamaContextHandle

grammar SafeLLamaGrammarHandle

TopK(SafeLLamaContextHandle, Int32, UInt64)

Top-K sampling described in academic paper "The Curious Case of Neural Text Degeneration" https://arxiv.org/abs/1904.09751

void TopK(SafeLLamaContextHandle context, int k, ulong minKeep)

Parameters

context SafeLLamaContextHandle

k Int32
Number of tokens to keep

minKeep UInt64
Minimum number to keep

TopP(SafeLLamaContextHandle, Single, UInt64)

Nucleus sampling described in academic paper "The Curious Case of Neural Text Degeneration" https://arxiv.org/abs/1904.09751

void TopP(SafeLLamaContextHandle context, float p, ulong minKeep)

Parameters

context SafeLLamaContextHandle

p Single

minKeep UInt64

MinP(SafeLLamaContextHandle, Single, UInt64)

Minimum P sampling as described in https://github.com/ggerganov/llama.cpp/pull/3841

void MinP(SafeLLamaContextHandle context, float p, ulong minKeep)

Parameters

context SafeLLamaContextHandle

p Single
All tokens with probability greater than this will be kept

minKeep UInt64

TailFree(SafeLLamaContextHandle, Single, UInt64)

Tail Free Sampling described in https://www.trentonbricken.com/Tail-Free-Sampling/.

void TailFree(SafeLLamaContextHandle context, float z, ulong min_keep)

Parameters

context SafeLLamaContextHandle

z Single

min_keep UInt64

LocallyTypical(SafeLLamaContextHandle, Single, UInt64)

Locally Typical Sampling implementation described in the paper https://arxiv.org/abs/2202.00666.

void LocallyTypical(SafeLLamaContextHandle context, float p, ulong min_keep)

Parameters

context SafeLLamaContextHandle

p Single

min_keep UInt64

RepetitionPenalty(SafeLLamaContextHandle, ReadOnlySpan<LLamaToken>, Single, Single, Single)

Repetition penalty described in CTRL academic paper https://arxiv.org/abs/1909.05858, with negative logit fix. Frequency and presence penalties described in OpenAI API https://platform.openai.com/docs/api-reference/parameter-details.

void RepetitionPenalty(SafeLLamaContextHandle context, ReadOnlySpan<LLamaToken> last_tokens, float penalty_repeat, float penalty_freq, float penalty_present)

Parameters

context SafeLLamaContextHandle

last_tokens ReadOnlySpan<LLamaToken>

penalty_repeat Single

penalty_freq Single

penalty_present Single

Guidance(SafeLLamaContextHandle, ReadOnlySpan<Single>, Single)

Apply classifier-free guidance to the logits as described in academic paper "Stay on topic with Classifier-Free Guidance" https://arxiv.org/abs/2306.17806

void Guidance(SafeLLamaContextHandle context, ReadOnlySpan<float> guidanceLogits, float guidance)

Parameters

context SafeLLamaContextHandle

guidanceLogits ReadOnlySpan<Single>
Logits extracted from a separate context from the same model. Other than a negative prompt at the beginning, it should have all generated and user input tokens copied from the main context.

guidance Single
Guidance strength. 0 means no guidance, higher values applies stronger guidance

Temperature(SafeLLamaContextHandle, Single)

Sample with temperature. As temperature increases, the prediction becomes more diverse but also vulnerable to hallucinations -- generating tokens that are sensible but not factual

void Temperature(SafeLLamaContextHandle context, float temp)

Parameters

context SafeLLamaContextHandle

temp Single

Softmax(SafeLLamaContextHandle)

Sorts candidate tokens by their logits in descending order and calculate probabilities based on logits.

void Softmax(SafeLLamaContextHandle context)

Parameters

context SafeLLamaContextHandle

SampleToken(SafeLLamaContextHandle)

Randomly selects a token from the candidates based on their probabilities.

LLamaToken SampleToken(SafeLLamaContextHandle context)

Parameters

context SafeLLamaContextHandle

Returns

LLamaToken

SampleTokenGreedy(SafeLLamaContextHandle)

Selects the token with the highest probability.

LLamaToken SampleTokenGreedy(SafeLLamaContextHandle context)

Parameters

context SafeLLamaContextHandle

Returns

LLamaToken

SampleTokenMirostat(SafeLLamaContextHandle, Single, Single, Int32, Single&)

Mirostat 1.0 algorithm described in the paper https://arxiv.org/abs/2007.14966. Uses tokens instead of words.

LLamaToken SampleTokenMirostat(SafeLLamaContextHandle context, float tau, float eta, int m, Single& mu)

Parameters

context SafeLLamaContextHandle

tau Single
The target cross-entropy (or surprise) value you want to achieve for the generated text. A higher value corresponds to more surprising or less predictable text, while a lower value corresponds to less surprising or more predictable text.

eta Single
The learning rate used to update mu based on the error between the target and observed surprisal of the sampled word. A larger learning rate will cause mu to be updated more quickly, while a smaller learning rate will result in slower updates.

m Int32
The number of tokens considered in the estimation of s_hat. This is an arbitrary value that is used to calculate s_hat, which in turn helps to calculate the value of k. In the paper, they use m = 100, but you can experiment with different values to see how it affects the performance of the algorithm.

mu Single&
Maximum cross-entropy. This value is initialized to be twice the target cross-entropy (2 * tau) and is updated in the algorithm based on the error between the target and observed surprisal.

Returns

LLamaToken

SampleTokenMirostat2(SafeLLamaContextHandle, Single, Single, Single&)

Mirostat 2.0 algorithm described in the paper https://arxiv.org/abs/2007.14966. Uses tokens instead of words.

LLamaToken SampleTokenMirostat2(SafeLLamaContextHandle context, float tau, float eta, Single& mu)

Parameters

context SafeLLamaContextHandle

tau Single
The target cross-entropy (or surprise) value you want to achieve for the generated text. A higher value corresponds to more surprising or less predictable text, while a lower value corresponds to less surprising or more predictable text.

eta Single
The learning rate used to update mu based on the error between the target and observed surprisal of the sampled word. A larger learning rate will cause mu to be updated more quickly, while a smaller learning rate will result in slower updates.

mu Single&
Maximum cross-entropy. This value is initialized to be twice the target cross-entropy (2 * tau) and is updated in the algorithm based on the error between the target and observed surprisal.

Returns

LLamaToken