NativeApi
Namespace: LLama.Native
Direct translation of the llama.cpp API
1 | |
Inheritance Object → NativeApi
Attributes NullableContextAttribute, NullableAttribute
Methods
llama_empty_call()
A method that does nothing. This is a native method, calling it will force the llama native dependencies to be loaded.
1 | |
llama_backend_free()
Call once at the end of the program - currently only used for MPI
1 | |
llama_max_devices()
Get the maximum number of devices supported by llama.cpp
1 | |
Returns
llama_supports_mmap()
Check if memory mapping is supported
1 | |
Returns
llama_supports_mlock()
Check if memory locking is supported
1 | |
Returns
llama_supports_gpu_offload()
Check if GPU offload is supported
1 | |
Returns
llama_supports_rpc()
Check if RPC offload is supported
1 | |
Returns
llama_state_load_file(SafeLLamaContextHandle, String, LLamaToken[], UInt64, UInt64&)
Load session file
1 | |
Parameters
path_session String
tokens_out LLamaToken[]
n_token_capacity UInt64
n_token_count_out UInt64&
Returns
llama_state_save_file(SafeLLamaContextHandle, String, LLamaToken[], UInt64)
Save session file
1 | |
Parameters
path_session String
tokens LLamaToken[]
n_token_count UInt64
Returns
llama_state_seq_save_file(SafeLLamaContextHandle, String, LLamaSeqId, LLamaToken*, UIntPtr)
Saves the specified sequence as a file on specified filepath. Can later be loaded via NativeApi.llama_state_load_file(SafeLLamaContextHandle, String, LLamaToken[], UInt64, UInt64&)
1 | |
Parameters
filepath String
seq_id LLamaSeqId
tokens LLamaToken*
n_token_count UIntPtr
Returns
llama_state_seq_load_file(SafeLLamaContextHandle, String, LLamaSeqId, LLamaToken*, UIntPtr, UIntPtr&)
Loads a sequence saved as a file via NativeApi.llama_state_save_file(SafeLLamaContextHandle, String, LLamaToken[], UInt64) into the specified sequence
1 | |
Parameters
filepath String
dest_seq_id LLamaSeqId
tokens_out LLamaToken*
n_token_capacity UIntPtr
n_token_count_out UIntPtr&
Returns
llama_set_causal_attn(SafeLLamaContextHandle, Boolean)
Set whether to use causal attention or not. If set to true, the model will only attend to the past tokens
1 | |
Parameters
causalAttn Boolean
llama_set_embeddings(SafeLLamaContextHandle, Boolean)
Set whether the model is in embeddings mode or not.
1 | |
Parameters
embeddings Boolean
If true, embeddings will be returned but logits will not
llama_set_abort_callback(SafeLlamaModelHandle, IntPtr, IntPtr)
Set abort callback
1 | |
Parameters
abortCallback IntPtr
abortCallbackData IntPtr
llama_n_seq_max(SafeLLamaContextHandle)
Get the n_seq_max for this context
1 | |
Parameters
Returns
llama_get_embeddings(SafeLLamaContextHandle)
Get all output token embeddings. When pooling_type == LLAMA_POOLING_TYPE_NONE or when using a generative model, the embeddings for which llama_batch.logits[i] != 0 are stored contiguously in the order they have appeared in the batch. shape: [n_outputs*n_embd] Otherwise, returns an empty span.
1 | |
Parameters
Returns
llama_chat_apply_template(Byte, LLamaChatMessage, UIntPtr, Boolean, Byte*, Int32)
Apply chat template. Inspired by hf apply_chat_template() on python.
1 | |
Parameters
tmpl Byte*
A Jinja template to use for this chat. If this is nullptr, the model’s default chat template will be used instead.
chat LLamaChatMessage*
Pointer to a list of multiple llama_chat_message
n_msg UIntPtr
Number of llama_chat_message in this chat
add_ass Boolean
Whether to end the prompt with the token(s) that indicate the start of an assistant message.
buf Byte*
A buffer to hold the output formatted prompt. The recommended alloc size is 2 * (total number of characters of all messages)
length Int32
The size of the allocated buffer
Returns
Int32
The total number of bytes of the formatted prompt. If is it larger than the size of buffer, you may need to re-alloc it and then re-apply the template.
llama_chat_builtin_templates(Char, UIntPtr)**
Get list of built-in chat templates
1 | |
Parameters
output Char**
len UIntPtr
Returns
llama_print_timings(SafeLLamaContextHandle)
Print out timing information for this context
1 | |
Parameters
llama_print_system_info()
Print system information
1 | |
Returns
llama_token_to_piece(Vocabulary, LLamaToken, Span<Byte>, Int32, Boolean)
Convert a single token into text
1 | |
Parameters
vocab Vocabulary
llamaToken LLamaToken
buffer Span<Byte>
buffer to write string into
lstrip Int32
User can skip up to 'lstrip' leading spaces before copying (useful when encoding/decoding multiple tokens with 'add_space_prefix')
special Boolean
If true, special tokens are rendered in the output
Returns
Int32
The length written, or if the buffer is too small a negative that indicates the length required
llama_log_set(LLamaLogCallback)
Caution
Use NativeLogConfig.llama_log_set instead
Register a callback to receive llama log messages
1 | |
Parameters
logCallback LLamaLogCallback
llama_kv_self_seq_rm(SafeLLamaContextHandle, LLamaSeqId, LLamaPos, LLamaPos)
Removes all tokens that belong to the specified sequence and have positions in [p0, p1)
1 | |
Parameters
seq LLamaSeqId
p0 LLamaPos
p1 LLamaPos
Returns
Boolean
Returns false if a partial sequence cannot be removed. Removing a whole sequence never fails
llama_batch_init(Int32, Int32, Int32)
Allocates a batch of tokens on the heap Each token can be assigned up to n_seq_max sequence ids The batch has to be freed with llama_batch_free() If embd != 0, llama_batch.embd will be allocated with size of n_tokens * embd * sizeof(float) Otherwise, llama_batch.token will be allocated to store n_tokens llama_token The rest of the llama_batch members are allocated with size n_tokens All members are left uninitialized
1 | |
Parameters
n_tokens Int32
embd Int32
n_seq_max Int32
Each token can be assigned up to n_seq_max sequence ids
Returns
llama_batch_free(LLamaNativeBatch)
Frees a batch of tokens allocated with llama_batch_init()
1 | |
Parameters
batch LLamaNativeBatch
llama_apply_adapter_cvec(SafeLLamaContextHandle, Single*, UIntPtr, Int32, Int32, Int32)
Apply a loaded control vector to a llama_context, or if data is NULL, clear the currently loaded vector. n_embd should be the size of a single layer's control, and data should point to an n_embd x n_layers buffer starting from layer 1. il_start and il_end are the layer range the vector should apply to (both inclusive) See llama_control_vector_load in common to load a control vector.
1 | |
Parameters
data Single*
len UIntPtr
n_embd Int32
il_start Int32
il_end Int32
Returns
llama_split_path(String, UIntPtr, String, Int32, Int32)
Build a split GGUF final path for this chunk. llama_split_path(split_path, sizeof(split_path), "/models/ggml-model-q4_0", 2, 4) => split_path = "/models/ggml-model-q4_0-00002-of-00004.gguf"
1 | |
Parameters
split_path String
maxlen UIntPtr
path_prefix String
split_no Int32
split_count Int32
Returns
Int32
Returns the split_path length.
llama_split_prefix(String, UIntPtr, String, Int32, Int32)
Extract the path prefix from the split_path if and only if the split_no and split_count match. llama_split_prefix(split_prefix, 64, "/models/ggml-model-q4_0-00002-of-00004.gguf", 2, 4) => split_prefix = "/models/ggml-model-q4_0"
1 | |
Parameters
split_prefix String
maxlen UIntPtr
split_path String
split_no Int32
split_count Int32
Returns
Int32
Returns the split_prefix length.
ggml_backend_dev_count()
Get the number of available backend devices
1 | |
Returns
UIntPtr
Count of available backend devices
ggml_backend_dev_get(UIntPtr)
Get a backend device by index
1 | |
Parameters
i UIntPtr
Device index
Returns
IntPtr
Pointer to the backend device
ggml_backend_dev_buffer_type(IntPtr)
Get the buffer type for a backend device
1 | |
Parameters
dev IntPtr
Backend device pointer
Returns
IntPtr
Pointer to the buffer type
ggml_backend_buft_name(IntPtr)
Get the name of a buffer type
1 | |
Parameters
buft IntPtr
Buffer type pointer
Returns
IntPtr
Name of the buffer type
llava_validate_embed_size(SafeLLamaContextHandle, SafeLlavaModelHandle)
Sanity check for clip <-> llava embed size match
1 | |
Parameters
ctxLlama SafeLLamaContextHandle
LLama Context
ctxClip SafeLlavaModelHandle
Llava Model
Returns
Boolean
True if validate successfully
llava_image_embed_make_with_bytes(SafeLlavaModelHandle, Int32, Byte[], Int32)
Build an image embed from image file bytes
1 | |
Parameters
ctx_clip SafeLlavaModelHandle
SafeHandle to the Clip Model
n_threads Int32
Number of threads
image_bytes Byte[]
Binary image in jpeg format
image_bytes_length Int32
Bytes length of the image
Returns
SafeLlavaImageEmbedHandle
SafeHandle to the Embeddings
llava_image_embed_make_with_filename(SafeLlavaModelHandle, Int32, String)
Build an image embed from a path to an image filename
1 | |
Parameters
ctx_clip SafeLlavaModelHandle
SafeHandle to the Clip Model
n_threads Int32
Number of threads
image_path String
Image filename (jpeg) to generate embeddings
Returns
SafeLlavaImageEmbedHandle
SafeHandle to the embeddings
llava_image_embed_free(IntPtr)
Free an embedding made with llava_image_embed_make_*
1 | |
Parameters
embed IntPtr
Embeddings to release
llava_eval_image_embed(SafeLLamaContextHandle, SafeLlavaImageEmbedHandle, Int32, Int32&)
Write the image represented by embed into the llama context with batch size n_batch, starting at context pos n_past. on completion, n_past points to the next position in the context after the image embed.
1 | |
Parameters
ctx_llama SafeLLamaContextHandle
Llama Context
embed SafeLlavaImageEmbedHandle
Embedding handle
n_batch Int32
n_past Int32&
Returns
Boolean
True on success
GetLoadedNativeLibrary(NativeLibraryName)
Get the loaded native library. If you are using netstandard2.0, it will always return null.
1 | |
Parameters
name NativeLibraryName
Returns
Exceptions
llama_model_quantize(String, String, LLamaModelQuantizeParams&)
Returns 0 on success
1 | |
Parameters
fname_inp String
fname_out String
param LLamaModelQuantizeParams&
Returns
UInt32
Returns 0 on success