NativeApi
Namespace: LLama.Native
Direct translation of the llama.cpp API
1 |
|
Inheritance Object → NativeApi
Attributes NullableContextAttribute, NullableAttribute
Methods
llama_empty_call()
A method that does nothing. This is a native method, calling it will force the llama native dependencies to be loaded.
1 |
|
llama_backend_free()
Call once at the end of the program - currently only used for MPI
1 |
|
llama_max_devices()
Get the maximum number of devices supported by llama.cpp
1 |
|
Returns
llama_supports_mmap()
Check if memory mapping is supported
1 |
|
Returns
llama_supports_mlock()
Check if memory locking is supported
1 |
|
Returns
llama_supports_gpu_offload()
Check if GPU offload is supported
1 |
|
Returns
llama_supports_rpc()
Check if RPC offload is supported
1 |
|
Returns
llama_state_load_file(SafeLLamaContextHandle, String, LLamaToken[], UInt64, UInt64&)
Load session file
1 |
|
Parameters
path_session
String
tokens_out
LLamaToken[]
n_token_capacity
UInt64
n_token_count_out
UInt64&
Returns
llama_state_save_file(SafeLLamaContextHandle, String, LLamaToken[], UInt64)
Save session file
1 |
|
Parameters
path_session
String
tokens
LLamaToken[]
n_token_count
UInt64
Returns
llama_state_seq_save_file(SafeLLamaContextHandle, String, LLamaSeqId, LLamaToken*, UIntPtr)
Saves the specified sequence as a file on specified filepath. Can later be loaded via NativeApi.llama_state_load_file(SafeLLamaContextHandle, String, LLamaToken[], UInt64, UInt64&)
1 |
|
Parameters
filepath
String
seq_id
LLamaSeqId
tokens
LLamaToken*
n_token_count
UIntPtr
Returns
llama_state_seq_load_file(SafeLLamaContextHandle, String, LLamaSeqId, LLamaToken*, UIntPtr, UIntPtr&)
Loads a sequence saved as a file via NativeApi.llama_state_save_file(SafeLLamaContextHandle, String, LLamaToken[], UInt64) into the specified sequence
1 |
|
Parameters
filepath
String
dest_seq_id
LLamaSeqId
tokens_out
LLamaToken*
n_token_capacity
UIntPtr
n_token_count_out
UIntPtr&
Returns
llama_set_causal_attn(SafeLLamaContextHandle, Boolean)
Set whether to use causal attention or not. If set to true, the model will only attend to the past tokens
1 |
|
Parameters
causalAttn
Boolean
llama_set_embeddings(SafeLLamaContextHandle, Boolean)
Set whether the model is in embeddings mode or not.
1 |
|
Parameters
embeddings
Boolean
If true, embeddings will be returned but logits will not
llama_set_abort_callback(SafeLlamaModelHandle, IntPtr, IntPtr)
Set abort callback
1 |
|
Parameters
abortCallback
IntPtr
abortCallbackData
IntPtr
llama_n_seq_max(SafeLLamaContextHandle)
Get the n_seq_max for this context
1 |
|
Parameters
Returns
llama_get_embeddings(SafeLLamaContextHandle)
Get all output token embeddings. When pooling_type == LLAMA_POOLING_TYPE_NONE or when using a generative model, the embeddings for which llama_batch.logits[i] != 0 are stored contiguously in the order they have appeared in the batch. shape: [n_outputs*n_embd] Otherwise, returns an empty span.
1 |
|
Parameters
Returns
llama_chat_apply_template(Byte, LLamaChatMessage, UIntPtr, Boolean, Byte*, Int32)
Apply chat template. Inspired by hf apply_chat_template() on python.
1 |
|
Parameters
tmpl
Byte*
A Jinja template to use for this chat. If this is nullptr, the model’s default chat template will be used instead.
chat
LLamaChatMessage*
Pointer to a list of multiple llama_chat_message
n_msg
UIntPtr
Number of llama_chat_message in this chat
add_ass
Boolean
Whether to end the prompt with the token(s) that indicate the start of an assistant message.
buf
Byte*
A buffer to hold the output formatted prompt. The recommended alloc size is 2 * (total number of characters of all messages)
length
Int32
The size of the allocated buffer
Returns
Int32
The total number of bytes of the formatted prompt. If is it larger than the size of buffer, you may need to re-alloc it and then re-apply the template.
llama_chat_builtin_templates(Char, UIntPtr)**
Get list of built-in chat templates
1 |
|
Parameters
output
Char**
len
UIntPtr
Returns
llama_print_timings(SafeLLamaContextHandle)
Print out timing information for this context
1 |
|
Parameters
llama_print_system_info()
Print system information
1 |
|
Returns
llama_token_to_piece(Vocabulary, LLamaToken, Span<Byte>, Int32, Boolean)
Convert a single token into text
1 |
|
Parameters
vocab
Vocabulary
llamaToken
LLamaToken
buffer
Span<Byte>
buffer to write string into
lstrip
Int32
User can skip up to 'lstrip' leading spaces before copying (useful when encoding/decoding multiple tokens with 'add_space_prefix')
special
Boolean
If true, special tokens are rendered in the output
Returns
Int32
The length written, or if the buffer is too small a negative that indicates the length required
llama_log_set(LLamaLogCallback)
Caution
Use NativeLogConfig.llama_log_set
instead
Register a callback to receive llama log messages
1 |
|
Parameters
logCallback
LLamaLogCallback
llama_kv_self_seq_rm(SafeLLamaContextHandle, LLamaSeqId, LLamaPos, LLamaPos)
Removes all tokens that belong to the specified sequence and have positions in [p0, p1)
1 |
|
Parameters
seq
LLamaSeqId
p0
LLamaPos
p1
LLamaPos
Returns
Boolean
Returns false if a partial sequence cannot be removed. Removing a whole sequence never fails
llama_batch_init(Int32, Int32, Int32)
Allocates a batch of tokens on the heap Each token can be assigned up to n_seq_max sequence ids The batch has to be freed with llama_batch_free() If embd != 0, llama_batch.embd will be allocated with size of n_tokens * embd * sizeof(float) Otherwise, llama_batch.token will be allocated to store n_tokens llama_token The rest of the llama_batch members are allocated with size n_tokens All members are left uninitialized
1 |
|
Parameters
n_tokens
Int32
embd
Int32
n_seq_max
Int32
Each token can be assigned up to n_seq_max sequence ids
Returns
llama_batch_free(LLamaNativeBatch)
Frees a batch of tokens allocated with llama_batch_init()
1 |
|
Parameters
batch
LLamaNativeBatch
llama_apply_adapter_cvec(SafeLLamaContextHandle, Single*, UIntPtr, Int32, Int32, Int32)
Apply a loaded control vector to a llama_context, or if data is NULL, clear the currently loaded vector. n_embd should be the size of a single layer's control, and data should point to an n_embd x n_layers buffer starting from layer 1. il_start and il_end are the layer range the vector should apply to (both inclusive) See llama_control_vector_load in common to load a control vector.
1 |
|
Parameters
data
Single*
len
UIntPtr
n_embd
Int32
il_start
Int32
il_end
Int32
Returns
llama_split_path(String, UIntPtr, String, Int32, Int32)
Build a split GGUF final path for this chunk. llama_split_path(split_path, sizeof(split_path), "/models/ggml-model-q4_0", 2, 4) => split_path = "/models/ggml-model-q4_0-00002-of-00004.gguf"
1 |
|
Parameters
split_path
String
maxlen
UIntPtr
path_prefix
String
split_no
Int32
split_count
Int32
Returns
Int32
Returns the split_path length.
llama_split_prefix(String, UIntPtr, String, Int32, Int32)
Extract the path prefix from the split_path if and only if the split_no and split_count match. llama_split_prefix(split_prefix, 64, "/models/ggml-model-q4_0-00002-of-00004.gguf", 2, 4) => split_prefix = "/models/ggml-model-q4_0"
1 |
|
Parameters
split_prefix
String
maxlen
UIntPtr
split_path
String
split_no
Int32
split_count
Int32
Returns
Int32
Returns the split_prefix length.
ggml_backend_dev_count()
Get the number of available backend devices
1 |
|
Returns
UIntPtr
Count of available backend devices
ggml_backend_dev_get(UIntPtr)
Get a backend device by index
1 |
|
Parameters
i
UIntPtr
Device index
Returns
IntPtr
Pointer to the backend device
ggml_backend_dev_buffer_type(IntPtr)
Get the buffer type for a backend device
1 |
|
Parameters
dev
IntPtr
Backend device pointer
Returns
IntPtr
Pointer to the buffer type
ggml_backend_buft_name(IntPtr)
Get the name of a buffer type
1 |
|
Parameters
buft
IntPtr
Buffer type pointer
Returns
IntPtr
Name of the buffer type
llava_validate_embed_size(SafeLLamaContextHandle, SafeLlavaModelHandle)
Sanity check for clip <-> llava embed size match
1 |
|
Parameters
ctxLlama
SafeLLamaContextHandle
LLama Context
ctxClip
SafeLlavaModelHandle
Llava Model
Returns
Boolean
True if validate successfully
llava_image_embed_make_with_bytes(SafeLlavaModelHandle, Int32, Byte[], Int32)
Build an image embed from image file bytes
1 |
|
Parameters
ctx_clip
SafeLlavaModelHandle
SafeHandle to the Clip Model
n_threads
Int32
Number of threads
image_bytes
Byte[]
Binary image in jpeg format
image_bytes_length
Int32
Bytes length of the image
Returns
SafeLlavaImageEmbedHandle
SafeHandle to the Embeddings
llava_image_embed_make_with_filename(SafeLlavaModelHandle, Int32, String)
Build an image embed from a path to an image filename
1 |
|
Parameters
ctx_clip
SafeLlavaModelHandle
SafeHandle to the Clip Model
n_threads
Int32
Number of threads
image_path
String
Image filename (jpeg) to generate embeddings
Returns
SafeLlavaImageEmbedHandle
SafeHandle to the embeddings
llava_image_embed_free(IntPtr)
Free an embedding made with llava_image_embed_make_*
1 |
|
Parameters
embed
IntPtr
Embeddings to release
llava_eval_image_embed(SafeLLamaContextHandle, SafeLlavaImageEmbedHandle, Int32, Int32&)
Write the image represented by embed into the llama context with batch size n_batch, starting at context pos n_past. on completion, n_past points to the next position in the context after the image embed.
1 |
|
Parameters
ctx_llama
SafeLLamaContextHandle
Llama Context
embed
SafeLlavaImageEmbedHandle
Embedding handle
n_batch
Int32
n_past
Int32&
Returns
Boolean
True on success
GetLoadedNativeLibrary(NativeLibraryName)
Get the loaded native library. If you are using netstandard2.0, it will always return null.
1 |
|
Parameters
name
NativeLibraryName
Returns
Exceptions
llama_model_quantize(String, String, LLamaModelQuantizeParams&)
Returns 0 on success
1 |
|
Parameters
fname_inp
String
fname_out
String
param
LLamaModelQuantizeParams&
Returns
UInt32
Returns 0 on success