Skip to content

< Back


SafeLLamaContextHandle

Namespace: LLama.Native

A safe wrapper around a llama_context

1
public sealed class SafeLLamaContextHandle : SafeLLamaHandleBase, System.IDisposable

Inheritance ObjectCriticalFinalizerObjectSafeHandleSafeLLamaHandleBaseSafeLLamaContextHandle
Implements IDisposable
Attributes NullableContextAttribute, NullableAttribute

Fields

handle

1
protected IntPtr handle;

Properties

ContextSize

Total number of tokens in the context

1
public uint ContextSize { get; }

Property Value

UInt32

EmbeddingSize

Dimension of embedding vectors

1
public int EmbeddingSize { get; }

Property Value

Int32

BatchSize

Get the maximum batch size for this context

1
public uint BatchSize { get; }

Property Value

UInt32

UBatchSize

Get the physical maximum batch size for this context

1
public uint UBatchSize { get; }

Property Value

UInt32

GenerationThreads

Get or set the number of threads used for generation of a single token.

1
public int GenerationThreads { get; set; }

Property Value

Int32

BatchThreads

Get or set the number of threads used for prompt and batch processing (multiple token).

1
public int BatchThreads { get; set; }

Property Value

Int32

PoolingType

Get the pooling type for this context

1
public LLamaPoolingType PoolingType { get; }

Property Value

LLamaPoolingType

ModelHandle

Get the model which this context is using

1
public SafeLlamaModelHandle ModelHandle { get; }

Property Value

SafeLlamaModelHandle

Vocab

Get the vocabulary for the model this context is using

1
public Vocabulary Vocab { get; }

Property Value

Vocabulary

KvCacheCanShift

Check if the context supports KV cache shifting

1
public bool KvCacheCanShift { get; }

Property Value

Boolean

IsInvalid

1
public bool IsInvalid { get; }

Property Value

Boolean

IsClosed

1
public bool IsClosed { get; }

Property Value

Boolean

Constructors

SafeLLamaContextHandle()

1
public SafeLLamaContextHandle()

Methods

ReleaseHandle()

1
protected bool ReleaseHandle()

Returns

Boolean

Create(SafeLlamaModelHandle, LLamaContextParams)

Create a new llama_state for the given model

1
public static SafeLLamaContextHandle Create(SafeLlamaModelHandle model, LLamaContextParams lparams)

Parameters

model SafeLlamaModelHandle

lparams LLamaContextParams

Returns

SafeLLamaContextHandle

Exceptions

RuntimeError

AddLoraAdapter(LoraAdapter, Single)

Add a LoRA adapter to this context

1
public void AddLoraAdapter(LoraAdapter lora, float scale)

Parameters

lora LoraAdapter

scale Single

Exceptions

ArgumentException

RuntimeError

RemoveLoraAdapter(LoraAdapter)

Remove a LoRA adapter from this context

1
public bool RemoveLoraAdapter(LoraAdapter lora)

Parameters

lora LoraAdapter

Returns

Boolean
Indicates if the lora was in this context and was remove

ClearLoraAdapters()

Remove all LoRA adapters from this context

1
public void ClearLoraAdapters()

GetLogits(Int32)

Token logits obtained from the last call to llama_decode. The logits for the last token are stored in the last row. Only tokens with logits = true requested are present.
Can be mutated in order to change the probabilities of the next token.
Rows: n_tokens
Cols: n_vocab

1
public Span<float> GetLogits(int numTokens)

Parameters

numTokens Int32
The amount of tokens whose logits should be retrieved, in [numTokens X n_vocab] format.
Tokens' order is based on their order in the LlamaBatch (so, first tokens are first, etc).
This is helpful when requesting logits for many tokens in a sequence, or want to decode multiple sequences in one go.

Returns

Span<Single>

GetLogitsIth(Int32)

Logits for the ith token. Equivalent to: llama_get_logits(ctx) + i*n_vocab

1
public Span<float> GetLogitsIth(int i)

Parameters

i Int32

Returns

Span<Single>

GetEmbeddingsIth(LLamaPos)

Get the embeddings for the ith sequence. Equivalent to: llama_get_embeddings(ctx) + ctx->output_ids[i]*n_embd

1
public Span<float> GetEmbeddingsIth(LLamaPos pos)

Parameters

pos LLamaPos

Returns

Span<Single>
A pointer to the first float in an embedding, length = ctx.EmbeddingSize

GetEmbeddingsSeq(LLamaSeqId)

Get the embeddings for the a specific sequence. Equivalent to: llama_get_embeddings(ctx) + ctx->output_ids[i]*n_embd

1
public Span<float> GetEmbeddingsSeq(LLamaSeqId seq)

Parameters

seq LLamaSeqId

Returns

Span<Single>
A pointer to the first float in an embedding, length = ctx.EmbeddingSize

Tokenize(String, Boolean, Boolean, Encoding)

Convert the given text into tokens

1
public LLamaToken[] Tokenize(string text, bool add_bos, bool special, Encoding encoding)

Parameters

text String
The text to tokenize

add_bos Boolean
Whether the "BOS" token should be added

special Boolean
Allow tokenizing special and/or control tokens which otherwise are not exposed and treated as plaintext.

encoding Encoding
Encoding to use for the text

Returns

LLamaToken[]

Exceptions

RuntimeError

TokenToSpan(LLamaToken, Span<Byte>)

Convert a single llama token into bytes

1
public uint TokenToSpan(LLamaToken token, Span<byte> dest)

Parameters

token LLamaToken
Token to decode

dest Span<Byte>
A span to attempt to write into. If this is too small nothing will be written

Returns

UInt32
The size of this token. nothing will be written if this is larger than dest

Synchronize()

Wait until all computations are finished. This is automatically done when using any of the functions to obtain computation results and is not necessary to call it explicitly in most cases.

1
public void Synchronize()

Encode(LLamaBatch)

Processes a batch of tokens with the encoder part of the encoder-decoder model. Stores the encoder output internally for later use by the decoder cross-attention layers.

1
public DecodeResult Encode(LLamaBatch batch)

Parameters

batch LLamaBatch

Returns

DecodeResult
0 = success
< 0 = error (the KV cache state is restored to the state before this call)

Decode(LLamaBatch)

1
public DecodeResult Decode(LLamaBatch batch)

Parameters

batch LLamaBatch

Returns

DecodeResult
Positive return values does not mean a fatal error, but rather a warning:
- 0: success
- 1: could not find a KV slot for the batch (try reducing the size of the batch or increase the context)
- < 0: error (the KV cache state is restored to the state before this call)

Decode(LLamaBatchEmbeddings)

1
public DecodeResult Decode(LLamaBatchEmbeddings batch)

Parameters

batch LLamaBatchEmbeddings

Returns

DecodeResult
Positive return values does not mean a fatal error, but rather a warning:
- 0: success
- 1: could not find a KV slot for the batch (try reducing the size of the batch or increase the context)
- < 0: error

GetStateSize()

Get the size of the state, when saved as bytes

1
public UIntPtr GetStateSize()

Returns

UIntPtr

GetStateSize(LLamaSeqId)

Get the size of the KV cache for a single sequence ID, when saved as bytes

1
public UIntPtr GetStateSize(LLamaSeqId sequence)

Parameters

sequence LLamaSeqId

Returns

UIntPtr

GetState(Byte*, UIntPtr)

Get the raw state of this context, encoded as bytes. Data is written into the dest pointer.

1
public UIntPtr GetState(Byte* dest, UIntPtr size)

Parameters

dest Byte*
Destination to write to

size UIntPtr
Number of bytes available to write to in dest (check required size with GetStateSize())

Returns

UIntPtr
The number of bytes written to dest

Exceptions

ArgumentOutOfRangeException
Thrown if dest is too small

GetState(Byte*, UIntPtr, LLamaSeqId)

Get the raw state of a single sequence from this context, encoded as bytes. Data is written into the dest pointer.

1
public UIntPtr GetState(Byte* dest, UIntPtr size, LLamaSeqId sequence)

Parameters

dest Byte*
Destination to write to

size UIntPtr
Number of bytes available to write to in dest (check required size with GetStateSize())

sequence LLamaSeqId
The sequence to get state data for

Returns

UIntPtr
The number of bytes written to dest

SetState(Byte*, UIntPtr)

Set the raw state of this context

1
public UIntPtr SetState(Byte* src, UIntPtr size)

Parameters

src Byte*
The pointer to read the state from

size UIntPtr
Number of bytes that can be safely read from the pointer

Returns

UIntPtr
Number of bytes read from the src pointer

SetState(Byte*, UIntPtr, LLamaSeqId)

Set the raw state of a single sequence

1
public UIntPtr SetState(Byte* src, UIntPtr size, LLamaSeqId sequence)

Parameters

src Byte*
The pointer to read the state from

size UIntPtr
Number of bytes that can be safely read from the pointer

sequence LLamaSeqId
Sequence ID to set

Returns

UIntPtr
Number of bytes read from the src pointer

GetTimings()

Get performance information

1
public LLamaPerfContextTimings GetTimings()

Returns

LLamaPerfContextTimings

ResetTimings()

Reset all performance information for this context

1
public void ResetTimings()

KvCacheUpdate()

Apply KV cache updates (such as K-shifts, defragmentation, etc.)

1
public void KvCacheUpdate()

KvCacheDefrag()

Defragment the KV cache. This will be applied: - lazily on next llama_decode() - explicitly with llama_kv_self_update()

1
public void KvCacheDefrag()

KvCacheGetDebugView(Int32)

Get a new KV cache view that can be used to debug the KV cache

1
public LLamaKvCacheViewSafeHandle KvCacheGetDebugView(int maxSequences)

Parameters

maxSequences Int32

Returns

LLamaKvCacheViewSafeHandle

KvCacheCountCells()

Count the number of used cells in the KV cache (i.e. have at least one sequence assigned to them)

1
public int KvCacheCountCells()

Returns

Int32

KvCacheCountTokens()

Returns the number of tokens in the KV cache (slow, use only for debug) If a KV cell has multiple sequences assigned to it, it will be counted multiple times

1
public int KvCacheCountTokens()

Returns

Int32

KvCacheClear()

Clear the KV cache - both cell info is erased and KV data is zeroed

1
public void KvCacheClear()

KvCacheRemove(LLamaSeqId, LLamaPos, LLamaPos)

Removes all tokens that belong to the specified sequence and have positions in [p0, p1)

1
public void KvCacheRemove(LLamaSeqId seq, LLamaPos p0, LLamaPos p1)

Parameters

seq LLamaSeqId

p0 LLamaPos

p1 LLamaPos

KvCacheSequenceCopy(LLamaSeqId, LLamaSeqId, LLamaPos, LLamaPos)

Copy all tokens that belong to the specified sequence to another sequence. Note that this does not allocate extra KV cache memory - it simply assigns the tokens to the new sequence

1
public void KvCacheSequenceCopy(LLamaSeqId src, LLamaSeqId dest, LLamaPos p0, LLamaPos p1)

Parameters

src LLamaSeqId

dest LLamaSeqId

p0 LLamaPos

p1 LLamaPos

KvCacheSequenceKeep(LLamaSeqId)

Removes all tokens that do not belong to the specified sequence

1
public void KvCacheSequenceKeep(LLamaSeqId seq)

Parameters

seq LLamaSeqId

KvCacheSequenceAdd(LLamaSeqId, LLamaPos, LLamaPos, Int32)

Adds relative position "delta" to all tokens that belong to the specified sequence and have positions in [p0, p1. If the KV cache is RoPEd, the KV data is updated accordingly

1
public void KvCacheSequenceAdd(LLamaSeqId seq, LLamaPos p0, LLamaPos p1, int delta)

Parameters

seq LLamaSeqId

p0 LLamaPos

p1 LLamaPos

delta Int32

KvCacheSequenceDivide(LLamaSeqId, LLamaPos, LLamaPos, Int32)

Integer division of the positions by factor of d &gt; 1. If the KV cache is RoPEd, the KV data is updated accordingly.
p0 < 0 : [0, p1]
p1 < 0 : [p0, inf)

1
public void KvCacheSequenceDivide(LLamaSeqId seq, LLamaPos p0, LLamaPos p1, int divisor)

Parameters

seq LLamaSeqId

p0 LLamaPos

p1 LLamaPos

divisor Int32

KvCacheMaxPosition(LLamaSeqId)

Returns the largest position present in the KV cache for the specified sequence

1
public LLamaPos KvCacheMaxPosition(LLamaSeqId seq)

Parameters

seq LLamaSeqId

Returns

LLamaPos


< Back