SafeLLamaContextHandle
Namespace: LLama.Native
A safe wrapper around a llama_context
1 |
|
Inheritance Object → CriticalFinalizerObject → SafeHandle → SafeLLamaHandleBase → SafeLLamaContextHandle
Implements IDisposable
Attributes NullableContextAttribute, NullableAttribute
Fields
handle
1 |
|
Properties
ContextSize
Total number of tokens in the context
1 |
|
Property Value
EmbeddingSize
Dimension of embedding vectors
1 |
|
Property Value
BatchSize
Get the maximum batch size for this context
1 |
|
Property Value
UBatchSize
Get the physical maximum batch size for this context
1 |
|
Property Value
GenerationThreads
Get or set the number of threads used for generation of a single token.
1 |
|
Property Value
BatchThreads
Get or set the number of threads used for prompt and batch processing (multiple token).
1 |
|
Property Value
PoolingType
Get the pooling type for this context
1 |
|
Property Value
ModelHandle
Get the model which this context is using
1 |
|
Property Value
Vocab
Get the vocabulary for the model this context is using
1 |
|
Property Value
KvCacheCanShift
Check if the context supports KV cache shifting
1 |
|
Property Value
IsInvalid
1 |
|
Property Value
IsClosed
1 |
|
Property Value
Constructors
SafeLLamaContextHandle()
1 |
|
Methods
ReleaseHandle()
1 |
|
Returns
Create(SafeLlamaModelHandle, LLamaContextParams)
Create a new llama_state for the given model
1 |
|
Parameters
model
SafeLlamaModelHandle
lparams
LLamaContextParams
Returns
Exceptions
AddLoraAdapter(LoraAdapter, Single)
Add a LoRA adapter to this context
1 |
|
Parameters
lora
LoraAdapter
scale
Single
Exceptions
RemoveLoraAdapter(LoraAdapter)
Remove a LoRA adapter from this context
1 |
|
Parameters
lora
LoraAdapter
Returns
Boolean
Indicates if the lora was in this context and was remove
ClearLoraAdapters()
Remove all LoRA adapters from this context
1 |
|
GetLogits(Int32)
Token logits obtained from the last call to llama_decode.
The logits for the last token are stored in the last row.
Only tokens with logits = true
requested are present.
Can be mutated in order to change the probabilities of the next token.
Rows: n_tokens
Cols: n_vocab
1 |
|
Parameters
numTokens
Int32
The amount of tokens whose logits should be retrieved, in [numTokens X n_vocab] format.
Tokens' order is based on their order in the LlamaBatch (so, first tokens are first, etc).
This is helpful when requesting logits for many tokens in a sequence, or want to decode multiple sequences in one go.
Returns
GetLogitsIth(Int32)
Logits for the ith token. Equivalent to: llama_get_logits(ctx) + i*n_vocab
1 |
|
Parameters
i
Int32
Returns
GetEmbeddingsIth(LLamaPos)
Get the embeddings for the ith sequence. Equivalent to: llama_get_embeddings(ctx) + ctx->output_ids[i]*n_embd
1 |
|
Parameters
pos
LLamaPos
Returns
Span<Single>
A pointer to the first float in an embedding, length = ctx.EmbeddingSize
GetEmbeddingsSeq(LLamaSeqId)
Get the embeddings for the a specific sequence. Equivalent to: llama_get_embeddings(ctx) + ctx->output_ids[i]*n_embd
1 |
|
Parameters
seq
LLamaSeqId
Returns
Span<Single>
A pointer to the first float in an embedding, length = ctx.EmbeddingSize
Tokenize(String, Boolean, Boolean, Encoding)
Convert the given text into tokens
1 |
|
Parameters
text
String
The text to tokenize
add_bos
Boolean
Whether the "BOS" token should be added
special
Boolean
Allow tokenizing special and/or control tokens which otherwise are not exposed and treated as plaintext.
encoding
Encoding
Encoding to use for the text
Returns
Exceptions
TokenToSpan(LLamaToken, Span<Byte>)
Convert a single llama token into bytes
1 |
|
Parameters
token
LLamaToken
Token to decode
dest
Span<Byte>
A span to attempt to write into. If this is too small nothing will be written
Returns
UInt32
The size of this token. nothing will be written if this is larger than dest
Synchronize()
Wait until all computations are finished. This is automatically done when using any of the functions to obtain computation results and is not necessary to call it explicitly in most cases.
1 |
|
Encode(LLamaBatch)
Processes a batch of tokens with the encoder part of the encoder-decoder model. Stores the encoder output internally for later use by the decoder cross-attention layers.
1 |
|
Parameters
batch
LLamaBatch
Returns
DecodeResult
0 = success
< 0 = error (the KV cache state is restored to the state before this call)
Decode(LLamaBatch)
1 |
|
Parameters
batch
LLamaBatch
Returns
DecodeResult
Positive return values does not mean a fatal error, but rather a warning:
- 0: success
- 1: could not find a KV slot for the batch (try reducing the size of the batch or increase the context)
- < 0: error (the KV cache state is restored to the state before this call)
Decode(LLamaBatchEmbeddings)
1 |
|
Parameters
batch
LLamaBatchEmbeddings
Returns
DecodeResult
Positive return values does not mean a fatal error, but rather a warning:
- 0: success
- 1: could not find a KV slot for the batch (try reducing the size of the batch or increase the context)
- < 0: error
GetStateSize()
Get the size of the state, when saved as bytes
1 |
|
Returns
GetStateSize(LLamaSeqId)
Get the size of the KV cache for a single sequence ID, when saved as bytes
1 |
|
Parameters
sequence
LLamaSeqId
Returns
GetState(Byte*, UIntPtr)
Get the raw state of this context, encoded as bytes. Data is written into the dest
pointer.
1 |
|
Parameters
dest
Byte*
Destination to write to
size
UIntPtr
Number of bytes available to write to in dest (check required size with GetStateSize()
)
Returns
UIntPtr
The number of bytes written to dest
Exceptions
ArgumentOutOfRangeException
Thrown if dest is too small
GetState(Byte*, UIntPtr, LLamaSeqId)
Get the raw state of a single sequence from this context, encoded as bytes. Data is written into the dest
pointer.
1 |
|
Parameters
dest
Byte*
Destination to write to
size
UIntPtr
Number of bytes available to write to in dest (check required size with GetStateSize()
)
sequence
LLamaSeqId
The sequence to get state data for
Returns
UIntPtr
The number of bytes written to dest
SetState(Byte*, UIntPtr)
Set the raw state of this context
1 |
|
Parameters
src
Byte*
The pointer to read the state from
size
UIntPtr
Number of bytes that can be safely read from the pointer
Returns
UIntPtr
Number of bytes read from the src pointer
SetState(Byte*, UIntPtr, LLamaSeqId)
Set the raw state of a single sequence
1 |
|
Parameters
src
Byte*
The pointer to read the state from
size
UIntPtr
Number of bytes that can be safely read from the pointer
sequence
LLamaSeqId
Sequence ID to set
Returns
UIntPtr
Number of bytes read from the src pointer
GetTimings()
Get performance information
1 |
|
Returns
ResetTimings()
Reset all performance information for this context
1 |
|
KvCacheUpdate()
Apply KV cache updates (such as K-shifts, defragmentation, etc.)
1 |
|
KvCacheDefrag()
Defragment the KV cache. This will be applied: - lazily on next llama_decode() - explicitly with llama_kv_self_update()
1 |
|
KvCacheGetDebugView(Int32)
Get a new KV cache view that can be used to debug the KV cache
1 |
|
Parameters
maxSequences
Int32
Returns
KvCacheCountCells()
Count the number of used cells in the KV cache (i.e. have at least one sequence assigned to them)
1 |
|
Returns
KvCacheCountTokens()
Returns the number of tokens in the KV cache (slow, use only for debug) If a KV cell has multiple sequences assigned to it, it will be counted multiple times
1 |
|
Returns
KvCacheClear()
Clear the KV cache - both cell info is erased and KV data is zeroed
1 |
|
KvCacheRemove(LLamaSeqId, LLamaPos, LLamaPos)
Removes all tokens that belong to the specified sequence and have positions in [p0, p1)
1 |
|
Parameters
seq
LLamaSeqId
p0
LLamaPos
p1
LLamaPos
KvCacheSequenceCopy(LLamaSeqId, LLamaSeqId, LLamaPos, LLamaPos)
Copy all tokens that belong to the specified sequence to another sequence. Note that this does not allocate extra KV cache memory - it simply assigns the tokens to the new sequence
1 |
|
Parameters
src
LLamaSeqId
dest
LLamaSeqId
p0
LLamaPos
p1
LLamaPos
KvCacheSequenceKeep(LLamaSeqId)
Removes all tokens that do not belong to the specified sequence
1 |
|
Parameters
seq
LLamaSeqId
KvCacheSequenceAdd(LLamaSeqId, LLamaPos, LLamaPos, Int32)
Adds relative position "delta" to all tokens that belong to the specified sequence and have positions in [p0, p1. If the KV cache is RoPEd, the KV data is updated accordingly
1 |
|
Parameters
seq
LLamaSeqId
p0
LLamaPos
p1
LLamaPos
delta
Int32
KvCacheSequenceDivide(LLamaSeqId, LLamaPos, LLamaPos, Int32)
Integer division of the positions by factor of d > 1
.
If the KV cache is RoPEd, the KV data is updated accordingly.
p0 < 0 : [0, p1]
p1 < 0 : [p0, inf)
1 |
|
Parameters
seq
LLamaSeqId
p0
LLamaPos
p1
LLamaPos
divisor
Int32
KvCacheMaxPosition(LLamaSeqId)
Returns the largest position present in the KV cache for the specified sequence
1 |
|
Parameters
seq
LLamaSeqId