LLamaTokenDataArray
Namespace: LLama.Native
Contains an array of LLamaTokenData, potentially sorted.
public struct LLamaTokenDataArray
Inheritance Object → ValueType → LLamaTokenDataArray
Fields
data
The LLamaTokenData
public Memory<LLamaTokenData> data;
sorted
Indicates if data
is sorted by logits in descending order. If this is false the token data is in no particular order.
public bool sorted;
Constructors
LLamaTokenDataArray(Memory<LLamaTokenData>, Boolean)
Create a new LLamaTokenDataArray
LLamaTokenDataArray(Memory<LLamaTokenData> tokens, bool isSorted)
Parameters
tokens
Memory<LLamaTokenData>
isSorted
Boolean
Methods
Create(ReadOnlySpan<Single>)
Create a new LLamaTokenDataArray, copying the data from the given logits
LLamaTokenDataArray Create(ReadOnlySpan<float> logits)
Parameters
logits
ReadOnlySpan<Single>
Returns
OverwriteLogits(ReadOnlySpan<ValueTuple<LLamaToken, Single>>)
Overwrite the logit values for all given tokens
void OverwriteLogits(ReadOnlySpan<ValueTuple<LLamaToken, float>> values)
Parameters
values
ReadOnlySpan<ValueTuple<LLamaToken, Single>>
tuples of token and logit value to overwrite
ApplyGrammar(SafeLLamaContextHandle, SafeLLamaGrammarHandle)
Apply grammar rules to candidate tokens
void ApplyGrammar(SafeLLamaContextHandle ctx, SafeLLamaGrammarHandle grammar)
Parameters
grammar
SafeLLamaGrammarHandle
TopK(SafeLLamaContextHandle, Int32, UInt64)
Top-K sampling described in academic paper "The Curious Case of Neural Text Degeneration" https://arxiv.org/abs/1904.09751
void TopK(SafeLLamaContextHandle context, int k, ulong minKeep)
Parameters
context
SafeLLamaContextHandle
k
Int32
Number of tokens to keep
minKeep
UInt64
Minimum number to keep
TopP(SafeLLamaContextHandle, Single, UInt64)
Nucleus sampling described in academic paper "The Curious Case of Neural Text Degeneration" https://arxiv.org/abs/1904.09751
void TopP(SafeLLamaContextHandle context, float p, ulong minKeep)
Parameters
context
SafeLLamaContextHandle
p
Single
minKeep
UInt64
MinP(SafeLLamaContextHandle, Single, UInt64)
Minimum P sampling as described in https://github.com/ggerganov/llama.cpp/pull/3841
void MinP(SafeLLamaContextHandle context, float p, ulong minKeep)
Parameters
context
SafeLLamaContextHandle
p
Single
All tokens with probability greater than this will be kept
minKeep
UInt64
TailFree(SafeLLamaContextHandle, Single, UInt64)
Tail Free Sampling described in https://www.trentonbricken.com/Tail-Free-Sampling/.
void TailFree(SafeLLamaContextHandle context, float z, ulong min_keep)
Parameters
context
SafeLLamaContextHandle
z
Single
min_keep
UInt64
LocallyTypical(SafeLLamaContextHandle, Single, UInt64)
Locally Typical Sampling implementation described in the paper https://arxiv.org/abs/2202.00666.
void LocallyTypical(SafeLLamaContextHandle context, float p, ulong min_keep)
Parameters
context
SafeLLamaContextHandle
p
Single
min_keep
UInt64
RepetitionPenalty(SafeLLamaContextHandle, ReadOnlySpan<LLamaToken>, Single, Single, Single)
Repetition penalty described in CTRL academic paper https://arxiv.org/abs/1909.05858, with negative logit fix. Frequency and presence penalties described in OpenAI API https://platform.openai.com/docs/api-reference/parameter-details.
void RepetitionPenalty(SafeLLamaContextHandle context, ReadOnlySpan<LLamaToken> last_tokens, float penalty_repeat, float penalty_freq, float penalty_present)
Parameters
context
SafeLLamaContextHandle
last_tokens
ReadOnlySpan<LLamaToken>
penalty_repeat
Single
penalty_freq
Single
penalty_present
Single
Guidance(SafeLLamaContextHandle, ReadOnlySpan<Single>, Single)
Apply classifier-free guidance to the logits as described in academic paper "Stay on topic with Classifier-Free Guidance" https://arxiv.org/abs/2306.17806
void Guidance(SafeLLamaContextHandle context, ReadOnlySpan<float> guidanceLogits, float guidance)
Parameters
context
SafeLLamaContextHandle
guidanceLogits
ReadOnlySpan<Single>
Logits extracted from a separate context from the same model.
Other than a negative prompt at the beginning, it should have all generated and user input tokens copied from the main context.
guidance
Single
Guidance strength. 0 means no guidance, higher values applies stronger guidance
Temperature(SafeLLamaContextHandle, Single)
Sample with temperature. As temperature increases, the prediction becomes more diverse but also vulnerable to hallucinations -- generating tokens that are sensible but not factual
void Temperature(SafeLLamaContextHandle context, float temp)
Parameters
context
SafeLLamaContextHandle
temp
Single
Softmax(SafeLLamaContextHandle)
Sorts candidate tokens by their logits in descending order and calculate probabilities based on logits.
void Softmax(SafeLLamaContextHandle context)
Parameters
context
SafeLLamaContextHandle
SampleToken(SafeLLamaContextHandle)
Randomly selects a token from the candidates based on their probabilities.
LLamaToken SampleToken(SafeLLamaContextHandle context)
Parameters
context
SafeLLamaContextHandle
Returns
SampleTokenGreedy(SafeLLamaContextHandle)
Selects the token with the highest probability.
LLamaToken SampleTokenGreedy(SafeLLamaContextHandle context)
Parameters
context
SafeLLamaContextHandle
Returns
SampleTokenMirostat(SafeLLamaContextHandle, Single, Single, Int32, Single&)
Mirostat 1.0 algorithm described in the paper https://arxiv.org/abs/2007.14966. Uses tokens instead of words.
LLamaToken SampleTokenMirostat(SafeLLamaContextHandle context, float tau, float eta, int m, Single& mu)
Parameters
context
SafeLLamaContextHandle
tau
Single
The target cross-entropy (or surprise) value you want to achieve for the generated text. A higher value corresponds to more surprising or less predictable text, while a lower value corresponds to less surprising or more predictable text.
eta
Single
The learning rate used to update mu
based on the error between the target and observed surprisal of the sampled word. A larger learning rate will cause mu
to be updated more quickly, while a smaller learning rate will result in slower updates.
m
Int32
The number of tokens considered in the estimation of s_hat
. This is an arbitrary value that is used to calculate s_hat
, which in turn helps to calculate the value of k
. In the paper, they use m = 100
, but you can experiment with different values to see how it affects the performance of the algorithm.
mu
Single&
Maximum cross-entropy. This value is initialized to be twice the target cross-entropy (2 * tau
) and is updated in the algorithm based on the error between the target and observed surprisal.
Returns
SampleTokenMirostat2(SafeLLamaContextHandle, Single, Single, Single&)
Mirostat 2.0 algorithm described in the paper https://arxiv.org/abs/2007.14966. Uses tokens instead of words.
LLamaToken SampleTokenMirostat2(SafeLLamaContextHandle context, float tau, float eta, Single& mu)
Parameters
context
SafeLLamaContextHandle
tau
Single
The target cross-entropy (or surprise) value you want to achieve for the generated text. A higher value corresponds to more surprising or less predictable text, while a lower value corresponds to less surprising or more predictable text.
eta
Single
The learning rate used to update mu
based on the error between the target and observed surprisal of the sampled word. A larger learning rate will cause mu
to be updated more quickly, while a smaller learning rate will result in slower updates.
mu
Single&
Maximum cross-entropy. This value is initialized to be twice the target cross-entropy (2 * tau
) and is updated in the algorithm based on the error between the target and observed surprisal.