SafeLLamaContextHandle

Namespace: LLama.Native

A safe wrapper around a llama_context

public sealed class SafeLLamaContextHandle : SafeLLamaHandleBase, System.IDisposable

Inheritance Object → CriticalFinalizerObject → SafeHandle → SafeLLamaHandleBase → SafeLLamaContextHandle
Implements IDisposable
Attributes NullableContextAttribute, NullableAttribute

Fields

handle

protected IntPtr handle;

Properties

ContextSize

Total number of tokens in the context

public uint ContextSize { get; }

Property Value

UInt32

EmbeddingSize

Dimension of embedding vectors

public int EmbeddingSize { get; }

Property Value

Int32

BatchSize

Get the maximum batch size for this context

public uint BatchSize { get; }

Property Value

UInt32

UBatchSize

Get the physical maximum batch size for this context

public uint UBatchSize { get; }

Property Value

UInt32

GenerationThreads

Get or set the number of threads used for generation of a single token.

public int GenerationThreads { get; set; }

Property Value

Int32

BatchThreads

Get or set the number of threads used for prompt and batch processing (multiple token).

public int BatchThreads { get; set; }

Property Value

Int32

PoolingType

Get the pooling type for this context

public LLamaPoolingType PoolingType { get; }

Property Value

LLamaPoolingType

ModelHandle

Get the model which this context is using

public SafeLlamaModelHandle ModelHandle { get; }

Property Value

SafeLlamaModelHandle

Vocab

Get the vocabulary for the model this context is using

public Vocabulary Vocab { get; }

Property Value

Vocabulary

KvCacheCanShift

Check if the context supports KV cache shifting

public bool KvCacheCanShift { get; }

Property Value

Boolean

IsInvalid

public bool IsInvalid { get; }

Property Value

Boolean

IsClosed

public bool IsClosed { get; }

Property Value

Boolean

Constructors

SafeLLamaContextHandle()

public SafeLLamaContextHandle()

Methods

ReleaseHandle()

protected bool ReleaseHandle()

Returns

Boolean

Create(SafeLlamaModelHandle, LLamaContextParams)

Create a new llama_state for the given model

public static SafeLLamaContextHandle Create(SafeLlamaModelHandle model, LLamaContextParams lparams)

AddLoraAdapter(LoraAdapter, Single)

Add a LoRA adapter to this context

public void AddLoraAdapter(LoraAdapter lora, float scale)

Parameters

lora LoraAdapter

scale Single

Exceptions

ArgumentException

RuntimeError

RemoveLoraAdapter(LoraAdapter)

Remove a LoRA adapter from this context

public bool RemoveLoraAdapter(LoraAdapter lora)

Parameters

lora LoraAdapter

Returns

Boolean
Indicates if the lora was in this context and was remove

ClearLoraAdapters()

Remove all LoRA adapters from this context

public void ClearLoraAdapters()

GetLogits(Int32)

Token logits obtained from the last call to llama_decode. The logits for the last token are stored in the last row. Only tokens with logits = true requested are present.
Can be mutated in order to change the probabilities of the next token.
Rows: n_tokens
Cols: n_vocab

public Span<float> GetLogits(int numTokens)

Parameters

numTokens Int32
The amount of tokens whose logits should be retrieved, in [numTokens X n_vocab] format.
Tokens' order is based on their order in the LlamaBatch (so, first tokens are first, etc).
This is helpful when requesting logits for many tokens in a sequence, or want to decode multiple sequences in one go.

Returns

Span<Single>

GetLogitsIth(Int32)

Logits for the ith token. Equivalent to: llama_get_logits(ctx) + i*n_vocab

public Span<float> GetLogitsIth(int i)

Parameters

i Int32

Returns

Span<Single>

GetEmbeddingsIth(LLamaPos)

Get the embeddings for the ith sequence. Equivalent to: llama_get_embeddings(ctx) + ctx->output_ids[i]*n_embd

public Span<float> GetEmbeddingsIth(LLamaPos pos)

Parameters

pos LLamaPos

Returns

Span<Single>
A pointer to the first float in an embedding, length = ctx.EmbeddingSize

GetEmbeddingsSeq(LLamaSeqId)

Get the embeddings for the a specific sequence. Equivalent to: llama_get_embeddings(ctx) + ctx->output_ids[i]*n_embd

public Span<float> GetEmbeddingsSeq(LLamaSeqId seq)

Parameters

seq LLamaSeqId

Returns

Span<Single>
A pointer to the first float in an embedding, length = ctx.EmbeddingSize

Tokenize(String, Boolean, Boolean, Encoding)

Convert the given text into tokens

public LLamaToken[] Tokenize(string text, bool add_bos, bool special, Encoding encoding)

Parameters

text String
The text to tokenize

add_bos Boolean
Whether the "BOS" token should be added

special Boolean
Allow tokenizing special and/or control tokens which otherwise are not exposed and treated as plaintext.

encoding Encoding
Encoding to use for the text

Returns

LLamaToken[]

Exceptions

RuntimeError

TokenToSpan(LLamaToken, Span<Byte>)

Convert a single llama token into bytes

public uint TokenToSpan(LLamaToken token, Span<byte> dest)

Parameters

token LLamaToken
Token to decode

dest Span<Byte>
A span to attempt to write into. If this is too small nothing will be written

Returns

UInt32
The size of this token. nothing will be written if this is larger than dest

Synchronize()

Wait until all computations are finished. This is automatically done when using any of the functions to obtain computation results and is not necessary to call it explicitly in most cases.

public void Synchronize()

Encode(LLamaBatch)

Processes a batch of tokens with the encoder part of the encoder-decoder model. Stores the encoder output internally for later use by the decoder cross-attention layers.

public DecodeResult Encode(LLamaBatch batch)

Parameters

batch LLamaBatch

Returns

DecodeResult
0 = success
< 0 = error (the KV cache state is restored to the state before this call)

Decode(LLamaBatch)

public DecodeResult Decode(LLamaBatch batch)

Parameters

batch LLamaBatch

Returns

DecodeResult
Positive return values does not mean a fatal error, but rather a warning:
- 0: success
- 1: could not find a KV slot for the batch (try reducing the size of the batch or increase the context)
- < 0: error (the KV cache state is restored to the state before this call)

Decode(LLamaBatchEmbeddings)

public DecodeResult Decode(LLamaBatchEmbeddings batch)

Parameters

batch LLamaBatchEmbeddings

Returns

DecodeResult
Positive return values does not mean a fatal error, but rather a warning:
- 0: success
- 1: could not find a KV slot for the batch (try reducing the size of the batch or increase the context)
- < 0: error

GetStateSize()

Get the size of the state, when saved as bytes

public UIntPtr GetStateSize()

Returns

UIntPtr

GetStateSize(LLamaSeqId)

Get the size of the KV cache for a single sequence ID, when saved as bytes

public UIntPtr GetStateSize(LLamaSeqId sequence)

Parameters

sequence LLamaSeqId

Returns

UIntPtr

**GetState(Byte*, UIntPtr)**

Get the raw state of this context, encoded as bytes. Data is written into the dest pointer.

public UIntPtr GetState(Byte* dest, UIntPtr size)

Parameters

dest Byte*
Destination to write to

size UIntPtr
Number of bytes available to write to in dest (check required size with GetStateSize())

Returns

UIntPtr
The number of bytes written to dest

Exceptions

ArgumentOutOfRangeException
Thrown if dest is too small

**GetState(Byte*, UIntPtr, LLamaSeqId)**

Get the raw state of a single sequence from this context, encoded as bytes. Data is written into the dest pointer.

public UIntPtr GetState(Byte* dest, UIntPtr size, LLamaSeqId sequence)

Parameters

dest Byte*
Destination to write to

size UIntPtr
Number of bytes available to write to in dest (check required size with GetStateSize())

sequence LLamaSeqId
The sequence to get state data for

Returns

UIntPtr
The number of bytes written to dest

**SetState(Byte*, UIntPtr)**

Set the raw state of this context

public UIntPtr SetState(Byte* src, UIntPtr size)

Parameters

src Byte*
The pointer to read the state from

size UIntPtr
Number of bytes that can be safely read from the pointer

Returns

UIntPtr
Number of bytes read from the src pointer

**SetState(Byte*, UIntPtr, LLamaSeqId)**

Set the raw state of a single sequence

public UIntPtr SetState(Byte* src, UIntPtr size, LLamaSeqId sequence)

Parameters

src Byte*
The pointer to read the state from

size UIntPtr
Number of bytes that can be safely read from the pointer

sequence LLamaSeqId
Sequence ID to set

Returns

UIntPtr
Number of bytes read from the src pointer

GetTimings()

Get performance information

public LLamaPerfContextTimings GetTimings()

Returns

LLamaPerfContextTimings

ResetTimings()

Reset all performance information for this context

public void ResetTimings()

KvCacheUpdate()

Apply KV cache updates (such as K-shifts, defragmentation, etc.)

public void KvCacheUpdate()

KvCacheDefrag()

Defragment the KV cache. This will be applied: - lazily on next llama_decode() - explicitly with llama_kv_self_update()

public void KvCacheDefrag()

KvCacheGetDebugView(Int32)

Get a new KV cache view that can be used to debug the KV cache

public LLamaKvCacheViewSafeHandle KvCacheGetDebugView(int maxSequences)

Parameters

maxSequences Int32

Returns

LLamaKvCacheViewSafeHandle

KvCacheCountCells()

Count the number of used cells in the KV cache (i.e. have at least one sequence assigned to them)

public int KvCacheCountCells()

Returns

Int32

KvCacheCountTokens()

Returns the number of tokens in the KV cache (slow, use only for debug) If a KV cell has multiple sequences assigned to it, it will be counted multiple times

public int KvCacheCountTokens()

Returns

Int32

KvCacheClear()

Clear the KV cache - both cell info is erased and KV data is zeroed

public void KvCacheClear()

KvCacheRemove(LLamaSeqId, LLamaPos, LLamaPos)

Removes all tokens that belong to the specified sequence and have positions in [p0, p1)

public void KvCacheRemove(LLamaSeqId seq, LLamaPos p0, LLamaPos p1)

Parameters

seq LLamaSeqId

p0 LLamaPos

p1 LLamaPos

KvCacheSequenceCopy(LLamaSeqId, LLamaSeqId, LLamaPos, LLamaPos)

Copy all tokens that belong to the specified sequence to another sequence. Note that this does not allocate extra KV cache memory - it simply assigns the tokens to the new sequence

public void KvCacheSequenceCopy(LLamaSeqId src, LLamaSeqId dest, LLamaPos p0, LLamaPos p1)

Parameters

src LLamaSeqId

dest LLamaSeqId

p0 LLamaPos

p1 LLamaPos

KvCacheSequenceKeep(LLamaSeqId)

Removes all tokens that do not belong to the specified sequence

public void KvCacheSequenceKeep(LLamaSeqId seq)

Parameters

seq LLamaSeqId

KvCacheSequenceAdd(LLamaSeqId, LLamaPos, LLamaPos, Int32)

Adds relative position "delta" to all tokens that belong to the specified sequence and have positions in [p0, p1. If the KV cache is RoPEd, the KV data is updated accordingly

public void KvCacheSequenceAdd(LLamaSeqId seq, LLamaPos p0, LLamaPos p1, int delta)

Parameters

seq LLamaSeqId

p0 LLamaPos

p1 LLamaPos

delta Int32

KvCacheSequenceDivide(LLamaSeqId, LLamaPos, LLamaPos, Int32)

Integer division of the positions by factor of d > 1. If the KV cache is RoPEd, the KV data is updated accordingly.
p0 < 0 : [0, p1]
p1 < 0 : [p0, inf)

public void KvCacheSequenceDivide(LLamaSeqId seq, LLamaPos p0, LLamaPos p1, int divisor)

Parameters

seq LLamaSeqId

p0 LLamaPos

p1 LLamaPos

divisor Int32

KvCacheMaxPosition(LLamaSeqId)

Returns the largest position present in the KV cache for the specified sequence

public LLamaPos KvCacheMaxPosition(LLamaSeqId seq)

Parameters

seq LLamaSeqId

Returns

LLamaPos

< Back