LLamaTokenDataArray
Namespace: LLama.Native
Contains an array of LLamaTokenData, potentially sorted.
1 |
|
Inheritance Object → ValueType → LLamaTokenDataArray
Fields
data
The LLamaTokenData
1 |
|
sorted
Indicates if data
is sorted by logits in descending order. If this is false the token data is in no particular order.
1 |
|
Constructors
LLamaTokenDataArray(Memory<LLamaTokenData>, Boolean)
Create a new LLamaTokenDataArray
1 |
|
Parameters
tokens
Memory<LLamaTokenData>
isSorted
Boolean
Methods
Create(ReadOnlySpan<Single>)
Create a new LLamaTokenDataArray, copying the data from the given logits
1 |
|
Parameters
logits
ReadOnlySpan<Single>
Returns
OverwriteLogits(ReadOnlySpan<ValueTuple<LLamaToken, Single>>)
Overwrite the logit values for all given tokens
1 |
|
Parameters
values
ReadOnlySpan<ValueTuple<LLamaToken, Single>>
tuples of token and logit value to overwrite
ApplyGrammar(SafeLLamaContextHandle, SafeLLamaGrammarHandle)
Apply grammar rules to candidate tokens
1 |
|
Parameters
grammar
SafeLLamaGrammarHandle
TopK(SafeLLamaContextHandle, Int32, UInt64)
Top-K sampling described in academic paper "The Curious Case of Neural Text Degeneration" https://arxiv.org/abs/1904.09751
1 |
|
Parameters
context
SafeLLamaContextHandle
k
Int32
Number of tokens to keep
minKeep
UInt64
Minimum number to keep
TopP(SafeLLamaContextHandle, Single, UInt64)
Nucleus sampling described in academic paper "The Curious Case of Neural Text Degeneration" https://arxiv.org/abs/1904.09751
1 |
|
Parameters
context
SafeLLamaContextHandle
p
Single
minKeep
UInt64
MinP(SafeLLamaContextHandle, Single, UInt64)
Minimum P sampling as described in https://github.com/ggerganov/llama.cpp/pull/3841
1 |
|
Parameters
context
SafeLLamaContextHandle
p
Single
All tokens with probability greater than this will be kept
minKeep
UInt64
TailFree(SafeLLamaContextHandle, Single, UInt64)
Tail Free Sampling described in https://www.trentonbricken.com/Tail-Free-Sampling/.
1 |
|
Parameters
context
SafeLLamaContextHandle
z
Single
min_keep
UInt64
LocallyTypical(SafeLLamaContextHandle, Single, UInt64)
Locally Typical Sampling implementation described in the paper https://arxiv.org/abs/2202.00666.
1 |
|
Parameters
context
SafeLLamaContextHandle
p
Single
min_keep
UInt64
RepetitionPenalty(SafeLLamaContextHandle, ReadOnlySpan<LLamaToken>, Single, Single, Single)
Repetition penalty described in CTRL academic paper https://arxiv.org/abs/1909.05858, with negative logit fix. Frequency and presence penalties described in OpenAI API https://platform.openai.com/docs/api-reference/parameter-details.
1 |
|
Parameters
context
SafeLLamaContextHandle
last_tokens
ReadOnlySpan<LLamaToken>
penalty_repeat
Single
penalty_freq
Single
penalty_present
Single
Guidance(SafeLLamaContextHandle, ReadOnlySpan<Single>, Single)
Apply classifier-free guidance to the logits as described in academic paper "Stay on topic with Classifier-Free Guidance" https://arxiv.org/abs/2306.17806
1 |
|
Parameters
context
SafeLLamaContextHandle
guidanceLogits
ReadOnlySpan<Single>
Logits extracted from a separate context from the same model.
Other than a negative prompt at the beginning, it should have all generated and user input tokens copied from the main context.
guidance
Single
Guidance strength. 0 means no guidance, higher values applies stronger guidance
Temperature(SafeLLamaContextHandle, Single)
Sample with temperature. As temperature increases, the prediction becomes more diverse but also vulnerable to hallucinations -- generating tokens that are sensible but not factual
1 |
|
Parameters
context
SafeLLamaContextHandle
temp
Single
Softmax(SafeLLamaContextHandle)
Sorts candidate tokens by their logits in descending order and calculate probabilities based on logits.
1 |
|
Parameters
context
SafeLLamaContextHandle
SampleToken(SafeLLamaContextHandle)
Randomly selects a token from the candidates based on their probabilities.
1 |
|
Parameters
context
SafeLLamaContextHandle
Returns
SampleTokenGreedy(SafeLLamaContextHandle)
Selects the token with the highest probability.
1 |
|
Parameters
context
SafeLLamaContextHandle
Returns
SampleTokenMirostat(SafeLLamaContextHandle, Single, Single, Int32, Single&)
Mirostat 1.0 algorithm described in the paper https://arxiv.org/abs/2007.14966. Uses tokens instead of words.
1 |
|
Parameters
context
SafeLLamaContextHandle
tau
Single
The target cross-entropy (or surprise) value you want to achieve for the generated text. A higher value corresponds to more surprising or less predictable text, while a lower value corresponds to less surprising or more predictable text.
eta
Single
The learning rate used to update mu
based on the error between the target and observed surprisal of the sampled word. A larger learning rate will cause mu
to be updated more quickly, while a smaller learning rate will result in slower updates.
m
Int32
The number of tokens considered in the estimation of s_hat
. This is an arbitrary value that is used to calculate s_hat
, which in turn helps to calculate the value of k
. In the paper, they use m = 100
, but you can experiment with different values to see how it affects the performance of the algorithm.
mu
Single&
Maximum cross-entropy. This value is initialized to be twice the target cross-entropy (2 * tau
) and is updated in the algorithm based on the error between the target and observed surprisal.
Returns
SampleTokenMirostat2(SafeLLamaContextHandle, Single, Single, Single&)
Mirostat 2.0 algorithm described in the paper https://arxiv.org/abs/2007.14966. Uses tokens instead of words.
1 |
|
Parameters
context
SafeLLamaContextHandle
tau
Single
The target cross-entropy (or surprise) value you want to achieve for the generated text. A higher value corresponds to more surprising or less predictable text, while a lower value corresponds to less surprising or more predictable text.
eta
Single
The learning rate used to update mu
based on the error between the target and observed surprisal of the sampled word. A larger learning rate will cause mu
to be updated more quickly, while a smaller learning rate will result in slower updates.
mu
Single&
Maximum cross-entropy. This value is initialized to be twice the target cross-entropy (2 * tau
) and is updated in the algorithm based on the error between the target and observed surprisal.