NativeApi
Namespace: LLama.Native
Direct translation of the llama.cpp API
1 |
|
Inheritance Object → NativeApi
Methods
llama_sample_token_mirostat(SafeLLamaContextHandle, LLamaTokenDataArrayNative&, Single, Single, Int32, Single&)
Mirostat 1.0 algorithm described in the paper https://arxiv.org/abs/2007.14966. Uses tokens instead of words.
1 |
|
Parameters
candidates
LLamaTokenDataArrayNative&
A vector of llama_token_data
containing the candidate tokens, their probabilities (p), and log-odds (logit) for the current position in the generated text.
tau
Single
The target cross-entropy (or surprise) value you want to achieve for the generated text. A higher value corresponds to more surprising or less predictable text, while a lower value corresponds to less surprising or more predictable text.
eta
Single
The learning rate used to update mu
based on the error between the target and observed surprisal of the sampled word. A larger learning rate will cause mu
to be updated more quickly, while a smaller learning rate will result in slower updates.
m
Int32
The number of tokens considered in the estimation of s_hat
. This is an arbitrary value that is used to calculate s_hat
, which in turn helps to calculate the value of k
. In the paper, they use m = 100
, but you can experiment with different values to see how it affects the performance of the algorithm.
mu
Single&
Maximum cross-entropy. This value is initialized to be twice the target cross-entropy (2 * tau
) and is updated in the algorithm based on the error between the target and observed surprisal.
Returns
llama_sample_token_mirostat_v2(SafeLLamaContextHandle, LLamaTokenDataArrayNative&, Single, Single, Single&)
Mirostat 2.0 algorithm described in the paper https://arxiv.org/abs/2007.14966. Uses tokens instead of words.
1 |
|
Parameters
candidates
LLamaTokenDataArrayNative&
A vector of llama_token_data
containing the candidate tokens, their probabilities (p), and log-odds (logit) for the current position in the generated text.
tau
Single
The target cross-entropy (or surprise) value you want to achieve for the generated text. A higher value corresponds to more surprising or less predictable text, while a lower value corresponds to less surprising or more predictable text.
eta
Single
The learning rate used to update mu
based on the error between the target and observed surprisal of the sampled word. A larger learning rate will cause mu
to be updated more quickly, while a smaller learning rate will result in slower updates.
mu
Single&
Maximum cross-entropy. This value is initialized to be twice the target cross-entropy (2 * tau
) and is updated in the algorithm based on the error between the target and observed surprisal.
Returns
llama_sample_token_greedy(SafeLLamaContextHandle, LLamaTokenDataArrayNative&)
Selects the token with the highest probability.
1 |
|
Parameters
candidates
LLamaTokenDataArrayNative&
Pointer to LLamaTokenDataArray
Returns
llama_sample_token(SafeLLamaContextHandle, LLamaTokenDataArrayNative&)
Randomly selects a token from the candidates based on their probabilities.
1 |
|
Parameters
candidates
LLamaTokenDataArrayNative&
Pointer to LLamaTokenDataArray
Returns
<llama_get_embeddings>g__llama_get_embeddings_native|30_0(SafeLLamaContextHandle)
1 |
|
Parameters
Returns
<llama_token_to_piece>g__llama_token_to_piece_native|44_0(SafeLlamaModelHandle, LLamaToken, Byte*, Int32)
1 |
|
Parameters
model
SafeLlamaModelHandle
llamaToken
LLamaToken
buffer
Byte*
length
Int32
Returns
<TryLoadLibraries>g__TryLoad|84_0(String)
1 |
|
Parameters
path
String
Returns
<TryLoadLibraries>g__TryFindPath|84_1(String, <>c__DisplayClass84_0&)
1 |
|
Parameters
filename
String
Returns
llama_set_n_threads(SafeLLamaContextHandle, UInt32, UInt32)
Set the number of threads used for decoding
1 |
|
Parameters
n_threads
UInt32
n_threads is the number of threads used for generation (single token)
n_threads_batch
UInt32
n_threads_batch is the number of threads used for prompt and batch processing (multiple tokens)
llama_vocab_type(SafeLlamaModelHandle)
1 |
|
Parameters
model
SafeLlamaModelHandle
Returns
llama_rope_type(SafeLlamaModelHandle)
1 |
|
Parameters
model
SafeLlamaModelHandle
Returns
llama_grammar_init(LLamaGrammarElement, UInt64, UInt64)**
Create a new grammar from the given set of grammar rules
1 |
|
Parameters
rules
LLamaGrammarElement**
n_rules
UInt64
start_rule_index
UInt64
Returns
llama_grammar_free(IntPtr)
Free all memory from the given SafeLLamaGrammarHandle
1 |
|
Parameters
grammar
IntPtr
llama_grammar_copy(SafeLLamaGrammarHandle)
Create a copy of an existing grammar instance
1 |
|
Parameters
grammar
SafeLLamaGrammarHandle
Returns
llama_sample_grammar(SafeLLamaContextHandle, LLamaTokenDataArrayNative&, SafeLLamaGrammarHandle)
Apply constraints from grammar
1 |
|
Parameters
candidates
LLamaTokenDataArrayNative&
grammar
SafeLLamaGrammarHandle
llama_grammar_accept_token(SafeLLamaContextHandle, SafeLLamaGrammarHandle, LLamaToken)
Accepts the sampled token into the grammar
1 |
|
Parameters
grammar
SafeLLamaGrammarHandle
token
LLamaToken
llava_validate_embed_size(SafeLLamaContextHandle, SafeLlavaModelHandle)
Sanity check for clip <-> llava embed size match
1 |
|
Parameters
ctxLlama
SafeLLamaContextHandle
LLama Context
ctxClip
SafeLlavaModelHandle
Llava Model
Returns
Boolean
True if validate successfully
llava_image_embed_make_with_bytes(SafeLlavaModelHandle, Int32, Byte[], Int32)
Build an image embed from image file bytes
1 |
|
Parameters
ctx_clip
SafeLlavaModelHandle
SafeHandle to the Clip Model
n_threads
Int32
Number of threads
image_bytes
Byte[]
Binary image in jpeg format
image_bytes_length
Int32
Bytes lenght of the image
Returns
SafeLlavaImageEmbedHandle
SafeHandle to the Embeddings
llava_image_embed_make_with_filename(SafeLlavaModelHandle, Int32, String)
Build an image embed from a path to an image filename
1 |
|
Parameters
ctx_clip
SafeLlavaModelHandle
SafeHandle to the Clip Model
n_threads
Int32
Number of threads
image_path
String
Image filename (jpeg) to generate embeddings
Returns
SafeLlavaImageEmbedHandle
SafeHandel to the embeddings
llava_image_embed_free(IntPtr)
Free an embedding made with llava_image_embed_make_*
1 |
|
Parameters
embed
IntPtr
Embeddings to release
llava_eval_image_embed(SafeLLamaContextHandle, SafeLlavaImageEmbedHandle, Int32, Int32&)
Write the image represented by embed into the llama context with batch size n_batch, starting at context pos n_past. on completion, n_past points to the next position in the context after the image embed.
1 |
|
Parameters
ctx_llama
SafeLLamaContextHandle
Llama Context
embed
SafeLlavaImageEmbedHandle
Embedding handle
n_batch
Int32
n_past
Int32&
Returns
Boolean
True on success
llama_model_quantize(String, String, LLamaModelQuantizeParams*)
Returns 0 on success
1 |
|
Parameters
fname_inp
String
fname_out
String
param
LLamaModelQuantizeParams*
Returns
UInt32
Returns 0 on success
llama_sample_repetition_penalties(SafeLLamaContextHandle, LLamaTokenDataArrayNative&, LLamaToken*, UInt64, Single, Single, Single)
Repetition penalty described in CTRL academic paper https://arxiv.org/abs/1909.05858, with negative logit fix. Frequency and presence penalties described in OpenAI API https://platform.openai.com/docs/api-reference/parameter-details.
1 |
|
Parameters
candidates
LLamaTokenDataArrayNative&
Pointer to LLamaTokenDataArray
last_tokens
LLamaToken*
last_tokens_size
UInt64
penalty_repeat
Single
Repetition penalty described in CTRL academic paper https://arxiv.org/abs/1909.05858, with negative logit fix.
penalty_freq
Single
Frequency and presence penalties described in OpenAI API https://platform.openai.com/docs/api-reference/parameter-details.
penalty_present
Single
Frequency and presence penalties described in OpenAI API https://platform.openai.com/docs/api-reference/parameter-details.
llama_sample_apply_guidance(SafeLLamaContextHandle, Span<Single>, ReadOnlySpan<Single>, Single)
Apply classifier-free guidance to the logits as described in academic paper "Stay on topic with Classifier-Free Guidance" https://arxiv.org/abs/2306.17806
1 |
|
Parameters
logits
Span<Single>
Logits extracted from the original generation context.
logits_guidance
ReadOnlySpan<Single>
Logits extracted from a separate context from the same model.
Other than a negative prompt at the beginning, it should have all generated and user input tokens copied from the main context.
scale
Single
Guidance strength. 1.0f means no guidance. Higher values mean stronger guidance.
llama_sample_apply_guidance(SafeLLamaContextHandle, Single, Single, Single)
Apply classifier-free guidance to the logits as described in academic paper "Stay on topic with Classifier-Free Guidance" https://arxiv.org/abs/2306.17806
1 |
|
Parameters
logits
Single*
Logits extracted from the original generation context.
logits_guidance
Single*
Logits extracted from a separate context from the same model.
Other than a negative prompt at the beginning, it should have all generated and user input tokens copied from the main context.
scale
Single
Guidance strength. 1.0f means no guidance. Higher values mean stronger guidance.
llama_sample_softmax(SafeLLamaContextHandle, LLamaTokenDataArrayNative&)
Sorts candidate tokens by their logits in descending order and calculate probabilities based on logits.
1 |
|
Parameters
candidates
LLamaTokenDataArrayNative&
Pointer to LLamaTokenDataArray
llama_sample_top_k(SafeLLamaContextHandle, LLamaTokenDataArrayNative&, Int32, UInt64)
Top-K sampling described in academic paper "The Curious Case of Neural Text Degeneration" https://arxiv.org/abs/1904.09751
1 |
|
Parameters
candidates
LLamaTokenDataArrayNative&
Pointer to LLamaTokenDataArray
k
Int32
min_keep
UInt64
llama_sample_top_p(SafeLLamaContextHandle, LLamaTokenDataArrayNative&, Single, UInt64)
Nucleus sampling described in academic paper "The Curious Case of Neural Text Degeneration" https://arxiv.org/abs/1904.09751
1 |
|
Parameters
candidates
LLamaTokenDataArrayNative&
Pointer to LLamaTokenDataArray
p
Single
min_keep
UInt64
llama_sample_min_p(SafeLLamaContextHandle, LLamaTokenDataArrayNative&, Single, UInt64)
Minimum P sampling as described in https://github.com/ggerganov/llama.cpp/pull/3841
1 |
|
Parameters
candidates
LLamaTokenDataArrayNative&
Pointer to LLamaTokenDataArray
p
Single
min_keep
UInt64
llama_sample_tail_free(SafeLLamaContextHandle, LLamaTokenDataArrayNative&, Single, UInt64)
Tail Free Sampling described in https://www.trentonbricken.com/Tail-Free-Sampling/.
1 |
|
Parameters
candidates
LLamaTokenDataArrayNative&
Pointer to LLamaTokenDataArray
z
Single
min_keep
UInt64
llama_sample_typical(SafeLLamaContextHandle, LLamaTokenDataArrayNative&, Single, UInt64)
Locally Typical Sampling implementation described in the paper https://arxiv.org/abs/2202.00666.
1 |
|
Parameters
candidates
LLamaTokenDataArrayNative&
Pointer to LLamaTokenDataArray
p
Single
min_keep
UInt64
llama_sample_typical(SafeLLamaContextHandle, LLamaTokenDataArrayNative&, Single, Single, Single)
Dynamic temperature implementation described in the paper https://arxiv.org/abs/2309.02772.
1 |
|
Parameters
candidates
LLamaTokenDataArrayNative&
Pointer to LLamaTokenDataArray
min_temp
Single
max_temp
Single
exponent_val
Single
llama_sample_temp(SafeLLamaContextHandle, LLamaTokenDataArrayNative&, Single)
Modify logits by temperature
1 |
|
Parameters
candidates
LLamaTokenDataArrayNative&
temp
Single
llama_get_embeddings(SafeLLamaContextHandle)
Get the embeddings for the input
1 |
|
Parameters
Returns
llama_chat_apply_template(SafeLlamaModelHandle, Char, LLamaChatMessage, IntPtr, Boolean, Char*, Int32)
Apply chat template. Inspired by hf apply_chat_template() on python. Both "model" and "custom_template" are optional, but at least one is required. "custom_template" has higher precedence than "model" NOTE: This function does not use a jinja parser. It only support a pre-defined list of template. See more: https://github.com/ggerganov/llama.cpp/wiki/Templates-supported-by-llama_chat_apply_template
1 |
|
Parameters
model
SafeLlamaModelHandle
tmpl
Char*
A Jinja template to use for this chat. If this is nullptr, the model’s default chat template will be used instead.
chat
LLamaChatMessage*
Pointer to a list of multiple llama_chat_message
n_msg
IntPtr
Number of llama_chat_message in this chat
add_ass
Boolean
Whether to end the prompt with the token(s) that indicate the start of an assistant message.
buf
Char*
A buffer to hold the output formatted prompt. The recommended alloc size is 2 * (total number of characters of all messages)
length
Int32
The size of the allocated buffer
Returns
Int32
The total number of bytes of the formatted prompt. If is it larger than the size of buffer, you may need to re-alloc it and then re-apply the template.
llama_token_bos(SafeLlamaModelHandle)
Get the "Beginning of sentence" token
1 |
|
Parameters
model
SafeLlamaModelHandle
Returns
llama_token_eos(SafeLlamaModelHandle)
Get the "End of sentence" token
1 |
|
Parameters
model
SafeLlamaModelHandle
Returns
llama_token_nl(SafeLlamaModelHandle)
Get the "new line" token
1 |
|
Parameters
model
SafeLlamaModelHandle
Returns
llama_add_bos_token(SafeLlamaModelHandle)
Returns -1 if unknown, 1 for true or 0 for false.
1 |
|
Parameters
model
SafeLlamaModelHandle
Returns
llama_add_eos_token(SafeLlamaModelHandle)
Returns -1 if unknown, 1 for true or 0 for false.
1 |
|
Parameters
model
SafeLlamaModelHandle
Returns
llama_token_prefix(SafeLlamaModelHandle)
codellama infill tokens, Beginning of infill prefix
1 |
|
Parameters
model
SafeLlamaModelHandle
Returns
llama_token_middle(SafeLlamaModelHandle)
codellama infill tokens, Beginning of infill middle
1 |
|
Parameters
model
SafeLlamaModelHandle
Returns
llama_token_suffix(SafeLlamaModelHandle)
codellama infill tokens, Beginning of infill suffix
1 |
|
Parameters
model
SafeLlamaModelHandle
Returns
llama_token_eot(SafeLlamaModelHandle)
codellama infill tokens, End of infill middle
1 |
|
Parameters
model
SafeLlamaModelHandle
Returns
llama_print_timings(SafeLLamaContextHandle)
Print out timing information for this context
1 |
|
Parameters
llama_reset_timings(SafeLLamaContextHandle)
Reset all collected timing information for this context
1 |
|
Parameters
llama_print_system_info()
Print system information
1 |
|
Returns
llama_token_to_piece(SafeLlamaModelHandle, LLamaToken, Span<Byte>)
Convert a single token into text
1 |
|
Parameters
model
SafeLlamaModelHandle
llamaToken
LLamaToken
buffer
Span<Byte>
buffer to write string into
Returns
Int32
The length written, or if the buffer is too small a negative that indicates the length required
llama_tokenize(SafeLlamaModelHandle, Byte, Int32, LLamaToken, Int32, Boolean, Boolean)
Convert text into tokens
1 |
|
Parameters
model
SafeLlamaModelHandle
text
Byte*
text_len
Int32
tokens
LLamaToken*
n_max_tokens
Int32
add_bos
Boolean
special
Boolean
Allow tokenizing special and/or control tokens which otherwise are not exposed and treated as plaintext. Does not insert a leading space.
Returns
Int32
Returns the number of tokens on success, no more than n_max_tokens.
Returns a negative number on failure - the number of tokens that would have been returned
llama_log_set(LLamaLogCallback)
Register a callback to receive llama log messages
1 |
|
Parameters
logCallback
LLamaLogCallback
llama_kv_cache_clear(SafeLLamaContextHandle)
Clear the KV cache
1 |
|
Parameters
llama_kv_cache_seq_rm(SafeLLamaContextHandle, LLamaSeqId, LLamaPos, LLamaPos)
Removes all tokens that belong to the specified sequence and have positions in [p0, p1)
1 |
|
Parameters
seq
LLamaSeqId
p0
LLamaPos
p1
LLamaPos
llama_kv_cache_seq_cp(SafeLLamaContextHandle, LLamaSeqId, LLamaSeqId, LLamaPos, LLamaPos)
Copy all tokens that belong to the specified sequence to another sequence Note that this does not allocate extra KV cache memory - it simply assigns the tokens to the new sequence
1 |
|
Parameters
src
LLamaSeqId
dest
LLamaSeqId
p0
LLamaPos
p1
LLamaPos
llama_kv_cache_seq_keep(SafeLLamaContextHandle, LLamaSeqId)
Removes all tokens that do not belong to the specified sequence
1 |
|
Parameters
seq
LLamaSeqId
llama_kv_cache_seq_add(SafeLLamaContextHandle, LLamaSeqId, LLamaPos, LLamaPos, Int32)
Adds relative position "delta" to all tokens that belong to the specified sequence and have positions in [p0, p1) If the KV cache is RoPEd, the KV data is updated accordingly: - lazily on next llama_decode() - explicitly with llama_kv_cache_update()
1 |
|
Parameters
seq
LLamaSeqId
p0
LLamaPos
p1
LLamaPos
delta
Int32
llama_kv_cache_seq_div(SafeLLamaContextHandle, LLamaSeqId, LLamaPos, LLamaPos, Int32)
Integer division of the positions by factor of d > 1
If the KV cache is RoPEd, the KV data is updated accordingly:
- lazily on next llama_decode()
- explicitly with llama_kv_cache_update()
p0 < 0 : [0, p1]
p1 < 0 : [p0, inf)
1 |
|
Parameters
seq
LLamaSeqId
p0
LLamaPos
p1
LLamaPos
d
Int32
llama_kv_cache_seq_pos_max(SafeLLamaContextHandle, LLamaSeqId)
Returns the largest position present in the KV cache for the specified sequence
1 |
|
Parameters
seq
LLamaSeqId
Returns
llama_kv_cache_defrag(SafeLLamaContextHandle)
Defragment the KV cache. This will be applied: - lazily on next llama_decode() - explicitly with llama_kv_cache_update()
1 |
|
Parameters
Returns
llama_kv_cache_update(SafeLLamaContextHandle)
Apply the KV cache updates (such as K-shifts, defragmentation, etc.)
1 |
|
Parameters
llama_batch_init(Int32, Int32, Int32)
Allocates a batch of tokens on the heap Each token can be assigned up to n_seq_max sequence ids The batch has to be freed with llama_batch_free() If embd != 0, llama_batch.embd will be allocated with size of n_tokens * embd * sizeof(float) Otherwise, llama_batch.token will be allocated to store n_tokens llama_token The rest of the llama_batch members are allocated with size n_tokens All members are left uninitialized
1 |
|
Parameters
n_tokens
Int32
embd
Int32
n_seq_max
Int32
Each token can be assigned up to n_seq_max sequence ids
Returns
llama_batch_free(LLamaNativeBatch)
Frees a batch of tokens allocated with llama_batch_init()
1 |
|
Parameters
batch
LLamaNativeBatch
llama_decode(SafeLLamaContextHandle, LLamaNativeBatch)
1 |
|
Parameters
batch
LLamaNativeBatch
Returns
Int32
Positive return values does not mean a fatal error, but rather a warning:
- 0: success
- 1: could not find a KV slot for the batch (try reducing the size of the batch or increase the context)
- < 0: error
llama_kv_cache_view_init(SafeLLamaContextHandle, Int32)
Create an empty KV cache view. (use only for debugging purposes)
1 |
|
Parameters
n_max_seq
Int32
Returns
llama_kv_cache_view_free(LLamaKvCacheView&)
Free a KV cache view. (use only for debugging purposes)
1 |
|
Parameters
view
LLamaKvCacheView&
llama_kv_cache_view_update(SafeLLamaContextHandle, LLamaKvCacheView&)
Update the KV cache view structure with the current state of the KV cache. (use only for debugging purposes)
1 |
|
Parameters
view
LLamaKvCacheView&
llama_get_kv_cache_token_count(SafeLLamaContextHandle)
Returns the number of tokens in the KV cache (slow, use only for debug) If a KV cell has multiple sequences assigned to it, it will be counted multiple times
1 |
|
Parameters
Returns
llama_get_kv_cache_used_cells(SafeLLamaContextHandle)
Returns the number of used KV cells (i.e. have at least one sequence assigned to them)
1 |
|
Parameters
Returns
llama_beam_search(SafeLLamaContextHandle, LLamaBeamSearchCallback, IntPtr, UInt64, Int32, Int32, Int32)
Deterministically returns entire sentence constructed by a beam search.
1 |
|
Parameters
ctx
SafeLLamaContextHandle
Pointer to the llama_context.
callback
LLamaBeamSearchCallback
Invoked for each iteration of the beam_search loop, passing in beams_state.
callback_data
IntPtr
A pointer that is simply passed back to callback.
n_beams
UInt64
Number of beams to use.
n_past
Int32
Number of tokens already evaluated.
n_predict
Int32
Maximum number of tokens to predict. EOS may occur earlier.
n_threads
Int32
Number of threads.
llama_empty_call()
A method that does nothing. This is a native method, calling it will force the llama native dependencies to be loaded.
1 |
|
llama_max_devices()
Get the maximum number of devices supported by llama.cpp
1 |
|
Returns
llama_model_default_params()
Create a LLamaModelParams with default values
1 |
|
Returns
llama_context_default_params()
Create a LLamaContextParams with default values
1 |
|
Returns
llama_model_quantize_default_params()
Create a LLamaModelQuantizeParams with default values
1 |
|
Returns
llama_supports_mmap()
Check if memory mapping is supported
1 |
|
Returns
llama_supports_mlock()
Check if memory locking is supported
1 |
|
Returns
llama_supports_gpu_offload()
Check if GPU offload is supported
1 |
|
Returns
llama_set_rng_seed(SafeLLamaContextHandle, UInt32)
Sets the current rng seed.
1 |
|
Parameters
seed
UInt32
llama_get_state_size(SafeLLamaContextHandle)
Returns the maximum size in bytes of the state (rng, logits, embedding and kv_cache) - will often be smaller after compacting tokens
1 |
|
Parameters
Returns
llama_copy_state_data(SafeLLamaContextHandle, Byte*)
Copies the state to the specified destination address. Destination needs to have allocated enough memory.
1 |
|
Parameters
dest
Byte*
Returns
UInt64
the number of bytes copied
llama_set_state_data(SafeLLamaContextHandle, Byte*)
Set the state reading from the specified address
1 |
|
Parameters
src
Byte*
Returns
UInt64
the number of bytes read
llama_load_session_file(SafeLLamaContextHandle, String, LLamaToken[], UInt64, UInt64&)
Load session file
1 |
|
Parameters
path_session
String
tokens_out
LLamaToken[]
n_token_capacity
UInt64
n_token_count_out
UInt64&
Returns
llama_save_session_file(SafeLLamaContextHandle, String, LLamaToken[], UInt64)
Save session file
1 |
|
Parameters
path_session
String
tokens
LLamaToken[]
n_token_count
UInt64
Returns
llama_token_get_text(SafeLlamaModelHandle, LLamaToken)
1 |
|
Parameters
model
SafeLlamaModelHandle
token
LLamaToken
Returns
llama_token_get_score(SafeLlamaModelHandle, LLamaToken)
1 |
|
Parameters
model
SafeLlamaModelHandle
token
LLamaToken
Returns
llama_token_get_type(SafeLlamaModelHandle, LLamaToken)
1 |
|
Parameters
model
SafeLlamaModelHandle
token
LLamaToken
Returns
llama_n_ctx(SafeLLamaContextHandle)
Get the size of the context window for the model for this context
1 |
|
Parameters
Returns
llama_n_batch(SafeLLamaContextHandle)
Get the batch size for this context
1 |
|
Parameters
Returns
llama_get_logits(SafeLLamaContextHandle)
Token logits obtained from the last call to llama_decode
The logits for the last token are stored in the last row
Can be mutated in order to change the probabilities of the next token.
Rows: n_tokens
Cols: n_vocab
1 |
|
Parameters
Returns
llama_get_logits_ith(SafeLLamaContextHandle, Int32)
Logits for the ith token. Equivalent to: llama_get_logits(ctx) + i*n_vocab
1 |
|
Parameters
i
Int32
Returns
llama_get_embeddings_ith(SafeLLamaContextHandle, Int32)
Get the embeddings for the ith sequence. Equivalent to: llama_get_embeddings(ctx) + i*n_embd
1 |
|
Parameters
i
Int32