SamplingApi
Namespace: LLama.Native
Direct translation of the llama.cpp sampling API
public class SamplingApi
Inheritance Object → SamplingApi
Constructors
SamplingApi()
public SamplingApi()
Methods
llama_sample_grammar(SafeLLamaContextHandle, LLamaTokenDataArray, SafeLLamaGrammarHandle)
Apply grammar rules to candidate tokens
public static void llama_sample_grammar(SafeLLamaContextHandle ctx, LLamaTokenDataArray candidates, SafeLLamaGrammarHandle grammar)
Parameters
candidates
LLamaTokenDataArray
grammar
SafeLLamaGrammarHandle
llama_sample_repetition_penalty(SafeLLamaContextHandle, LLamaTokenDataArray, Memory<Int32>, UInt64, Single)
Caution
last_tokens_size parameter is no longer needed
Repetition penalty described in CTRL academic paper https://arxiv.org/abs/1909.05858, with negative logit fix.
public static void llama_sample_repetition_penalty(SafeLLamaContextHandle ctx, LLamaTokenDataArray candidates, Memory<int> last_tokens, ulong last_tokens_size, float penalty)
Parameters
candidates
LLamaTokenDataArray
Pointer to LLamaTokenDataArray
last_tokens
Memory<Int32>
last_tokens_size
UInt64
penalty
Single
llama_sample_repetition_penalty(SafeLLamaContextHandle, LLamaTokenDataArray, Memory<Int32>, Single)
Repetition penalty described in CTRL academic paper https://arxiv.org/abs/1909.05858, with negative logit fix.
public static void llama_sample_repetition_penalty(SafeLLamaContextHandle ctx, LLamaTokenDataArray candidates, Memory<int> last_tokens, float penalty)
Parameters
candidates
LLamaTokenDataArray
Pointer to LLamaTokenDataArray
last_tokens
Memory<Int32>
penalty
Single
llama_sample_frequency_and_presence_penalties(SafeLLamaContextHandle, LLamaTokenDataArray, Memory<Int32>, UInt64, Single, Single)
Caution
last_tokens_size parameter is no longer needed
Frequency and presence penalties described in OpenAI API https://platform.openai.com/docs/api-reference/parameter-details.
public static void llama_sample_frequency_and_presence_penalties(SafeLLamaContextHandle ctx, LLamaTokenDataArray candidates, Memory<int> last_tokens, ulong last_tokens_size, float alpha_frequency, float alpha_presence)
Parameters
candidates
LLamaTokenDataArray
Pointer to LLamaTokenDataArray
last_tokens
Memory<Int32>
last_tokens_size
UInt64
alpha_frequency
Single
alpha_presence
Single
llama_sample_frequency_and_presence_penalties(SafeLLamaContextHandle, LLamaTokenDataArray, Memory<Int32>, Single, Single)
Frequency and presence penalties described in OpenAI API https://platform.openai.com/docs/api-reference/parameter-details.
public static void llama_sample_frequency_and_presence_penalties(SafeLLamaContextHandle ctx, LLamaTokenDataArray candidates, Memory<int> last_tokens, float alpha_frequency, float alpha_presence)
Parameters
candidates
LLamaTokenDataArray
Pointer to LLamaTokenDataArray
last_tokens
Memory<Int32>
alpha_frequency
Single
alpha_presence
Single
llama_sample_softmax(SafeLLamaContextHandle, LLamaTokenDataArray)
Sorts candidate tokens by their logits in descending order and calculate probabilities based on logits.
public static void llama_sample_softmax(SafeLLamaContextHandle ctx, LLamaTokenDataArray candidates)
Parameters
candidates
LLamaTokenDataArray
Pointer to LLamaTokenDataArray
llama_sample_top_k(SafeLLamaContextHandle, LLamaTokenDataArray, Int32, UInt64)
Top-K sampling described in academic paper "The Curious Case of Neural Text Degeneration" https://arxiv.org/abs/1904.09751
public static void llama_sample_top_k(SafeLLamaContextHandle ctx, LLamaTokenDataArray candidates, int k, ulong min_keep)
Parameters
candidates
LLamaTokenDataArray
Pointer to LLamaTokenDataArray
k
Int32
min_keep
UInt64
llama_sample_top_p(SafeLLamaContextHandle, LLamaTokenDataArray, Single, UInt64)
Nucleus sampling described in academic paper "The Curious Case of Neural Text Degeneration" https://arxiv.org/abs/1904.09751
public static void llama_sample_top_p(SafeLLamaContextHandle ctx, LLamaTokenDataArray candidates, float p, ulong min_keep)
Parameters
candidates
LLamaTokenDataArray
Pointer to LLamaTokenDataArray
p
Single
min_keep
UInt64
llama_sample_tail_free(SafeLLamaContextHandle, LLamaTokenDataArray, Single, UInt64)
Tail Free Sampling described in https://www.trentonbricken.com/Tail-Free-Sampling/.
public static void llama_sample_tail_free(SafeLLamaContextHandle ctx, LLamaTokenDataArray candidates, float z, ulong min_keep)
Parameters
candidates
LLamaTokenDataArray
Pointer to LLamaTokenDataArray
z
Single
min_keep
UInt64
llama_sample_typical(SafeLLamaContextHandle, LLamaTokenDataArray, Single, UInt64)
Locally Typical Sampling implementation described in the paper https://arxiv.org/abs/2202.00666.
public static void llama_sample_typical(SafeLLamaContextHandle ctx, LLamaTokenDataArray candidates, float p, ulong min_keep)
Parameters
candidates
LLamaTokenDataArray
Pointer to LLamaTokenDataArray
p
Single
min_keep
UInt64
llama_sample_temperature(SafeLLamaContextHandle, LLamaTokenDataArray, Single)
Sample with temperature. As temperature increases, the prediction becomes diverse but also vulnerable to hallucinations -- generating tokens that are sensible but not factual
public static void llama_sample_temperature(SafeLLamaContextHandle ctx, LLamaTokenDataArray candidates, float temp)
Parameters
candidates
LLamaTokenDataArray
temp
Single
llama_sample_token_mirostat(SafeLLamaContextHandle, LLamaTokenDataArray, Single, Single, Int32, Single&)
Mirostat 1.0 algorithm described in the paper https://arxiv.org/abs/2007.14966. Uses tokens instead of words.
public static int llama_sample_token_mirostat(SafeLLamaContextHandle ctx, LLamaTokenDataArray candidates, float tau, float eta, int m, Single& mu)
Parameters
candidates
LLamaTokenDataArray
A vector of LLamaTokenData
containing the candidate tokens, their probabilities (p), and log-odds (logit) for the current position in the generated text.
tau
Single
The target cross-entropy (or surprise) value you want to achieve for the generated text. A higher value corresponds to more surprising or less predictable text, while a lower value corresponds to less surprising or more predictable text.
eta
Single
The learning rate used to update mu
based on the error between the target and observed surprisal of the sampled word. A larger learning rate will cause mu
to be updated more quickly, while a smaller learning rate will result in slower updates.
m
Int32
The number of tokens considered in the estimation of s_hat
. This is an arbitrary value that is used to calculate s_hat
, which in turn helps to calculate the value of k
. In the paper, they use m = 100
, but you can experiment with different values to see how it affects the performance of the algorithm.
mu
Single&
Maximum cross-entropy. This value is initialized to be twice the target cross-entropy (2 * tau
) and is updated in the algorithm based on the error between the target and observed surprisal.
Returns
llama_sample_token_mirostat_v2(SafeLLamaContextHandle, LLamaTokenDataArray, Single, Single, Single&)
Mirostat 2.0 algorithm described in the paper https://arxiv.org/abs/2007.14966. Uses tokens instead of words.
public static int llama_sample_token_mirostat_v2(SafeLLamaContextHandle ctx, LLamaTokenDataArray candidates, float tau, float eta, Single& mu)
Parameters
candidates
LLamaTokenDataArray
A vector of LLamaTokenData
containing the candidate tokens, their probabilities (p), and log-odds (logit) for the current position in the generated text.
tau
Single
The target cross-entropy (or surprise) value you want to achieve for the generated text. A higher value corresponds to more surprising or less predictable text, while a lower value corresponds to less surprising or more predictable text.
eta
Single
The learning rate used to update mu
based on the error between the target and observed surprisal of the sampled word. A larger learning rate will cause mu
to be updated more quickly, while a smaller learning rate will result in slower updates.
mu
Single&
Maximum cross-entropy. This value is initialized to be twice the target cross-entropy (2 * tau
) and is updated in the algorithm based on the error between the target and observed surprisal.
Returns
llama_sample_token_greedy(SafeLLamaContextHandle, LLamaTokenDataArray)
Selects the token with the highest probability.
public static int llama_sample_token_greedy(SafeLLamaContextHandle ctx, LLamaTokenDataArray candidates)
Parameters
candidates
LLamaTokenDataArray
Pointer to LLamaTokenDataArray
Returns
llama_sample_token(SafeLLamaContextHandle, LLamaTokenDataArray)
Randomly selects a token from the candidates based on their probabilities.
public static int llama_sample_token(SafeLLamaContextHandle ctx, LLamaTokenDataArray candidates)
Parameters
candidates
LLamaTokenDataArray
Pointer to LLamaTokenDataArray