SafeLLamaContextHandle

Namespace: LLama.Native

A safe wrapper around a llama_context

public sealed class SafeLLamaContextHandle : SafeLLamaHandleBase, System.IDisposable

Inheritance Object → CriticalFinalizerObject → SafeHandle → SafeLLamaHandleBase → SafeLLamaContextHandle
Implements IDisposable

Properties

VocabCount

Total number of tokens in vocabulary of this model

public int VocabCount { get; }

Property Value

Int32

ContextSize

Total number of tokens in the context

public uint ContextSize { get; }

Property Value

UInt32

EmbeddingSize

Dimension of embedding vectors

public int EmbeddingSize { get; }

Property Value

Int32

BatchSize

Get the maximum batch size for this context

public uint BatchSize { get; }

Property Value

UInt32

ModelHandle

Get the model which this context is using

public SafeLlamaModelHandle ModelHandle { get; }

Property Value

SafeLlamaModelHandle

IsInvalid

public bool IsInvalid { get; }

Property Value

Boolean

IsClosed

public bool IsClosed { get; }

Property Value

Boolean

Constructors

SafeLLamaContextHandle()

public SafeLLamaContextHandle()

Methods

ReleaseHandle()

protected bool ReleaseHandle()

Returns

Boolean

Create(SafeLlamaModelHandle, LLamaContextParams)

Create a new llama_state for the given model

public static SafeLLamaContextHandle Create(SafeLlamaModelHandle model, LLamaContextParams lparams)

GetLogits()

Token logits obtained from the last call to llama_decode The logits for the last token are stored in the last row Can be mutated in order to change the probabilities of the next token.
Rows: n_tokens
Cols: n_vocab

public Span<float> GetLogits()

Returns

Span<Single>

GetLogitsIth(Int32)

Logits for the ith token. Equivalent to: llama_get_logits(ctx) + i*n_vocab

public Span<float> GetLogitsIth(int i)

Parameters

i Int32

Returns

Span<Single>

Tokenize(String, Boolean, Boolean, Encoding)

Convert the given text into tokens

public LLamaToken[] Tokenize(string text, bool add_bos, bool special, Encoding encoding)

Parameters

text String
The text to tokenize

add_bos Boolean
Whether the "BOS" token should be added

special Boolean
Allow tokenizing special and/or control tokens which otherwise are not exposed and treated as plaintext.

encoding Encoding
Encoding to use for the text

Returns

LLamaToken[]

Exceptions

RuntimeError

TokenToSpan(LLamaToken, Span<Byte>)

Convert a single llama token into bytes

public uint TokenToSpan(LLamaToken token, Span<byte> dest)

Parameters

token LLamaToken
Token to decode

dest Span<Byte>
A span to attempt to write into. If this is too small nothing will be written

Returns

UInt32
The size of this token. nothing will be written if this is larger than dest

Decode(LLamaBatch)

public DecodeResult Decode(LLamaBatch batch)

Parameters

batch LLamaBatch

Returns

DecodeResult
Positive return values does not mean a fatal error, but rather a warning:
- 0: success
- 1: could not find a KV slot for the batch (try reducing the size of the batch or increase the context)
- < 0: error

Decode(List<LLamaToken>, LLamaSeqId, LLamaBatch, Int32&)

Decode a set of tokens in batch-size chunks.

internal ValueTuple<DecodeResult, int> Decode(List<LLamaToken> tokens, LLamaSeqId id, LLamaBatch batch, Int32& n_past)

Parameters

tokens List<LLamaToken>

id LLamaSeqId

batch LLamaBatch

n_past Int32&

Returns

ValueTuple<DecodeResult, Int32>
A tuple, containing the decode result and the number of tokens that have not been decoded yet.

GetStateSize()

Get the size of the state, when saved as bytes

public ulong GetStateSize()

Returns

UInt64

**GetState(Byte*, UInt64)**

Get the raw state of this context, encoded as bytes. Data is written into the dest pointer.

public ulong GetState(Byte* dest, ulong size)

Parameters

dest Byte*
Destination to write to

size UInt64
Number of bytes available to write to in dest (check required size with GetStateSize())

Returns

UInt64
The number of bytes written to dest

Exceptions

ArgumentOutOfRangeException
Thrown if dest is too small

GetState(IntPtr, UInt64)

Get the raw state of this context, encoded as bytes. Data is written into the dest pointer.

public ulong GetState(IntPtr dest, ulong size)

Parameters

dest IntPtr
Destination to write to

size UInt64
Number of bytes available to write to in dest (check required size with GetStateSize())

Returns

UInt64
The number of bytes written to dest

Exceptions

ArgumentOutOfRangeException
Thrown if dest is too small

**SetState(Byte*)**

Set the raw state of this context

public ulong SetState(Byte* src)

Parameters

src Byte*
The pointer to read the state from

Returns

UInt64
Number of bytes read from the src pointer

SetState(IntPtr)

Set the raw state of this context

public ulong SetState(IntPtr src)

Parameters

src IntPtr
The pointer to read the state from

Returns

UInt64
Number of bytes read from the src pointer

SetSeed(UInt32)

Set the RNG seed

public void SetSeed(uint seed)

Parameters

seed UInt32

SetThreads(UInt32, UInt32)

Set the number of threads used for decoding

public void SetThreads(uint threads, uint threadsBatch)

Parameters

threads UInt32
n_threads is the number of threads used for generation (single token)

threadsBatch UInt32
n_threads_batch is the number of threads used for prompt and batch processing (multiple tokens)

KvCacheGetDebugView(Int32)

Get a new KV cache view that can be used to debug the KV cache

public LLamaKvCacheViewSafeHandle KvCacheGetDebugView(int maxSequences)

Parameters

maxSequences Int32

Returns

LLamaKvCacheViewSafeHandle

KvCacheCountCells()

Count the number of used cells in the KV cache (i.e. have at least one sequence assigned to them)

public int KvCacheCountCells()

Returns

Int32

KvCacheCountTokens()

Returns the number of tokens in the KV cache (slow, use only for debug) If a KV cell has multiple sequences assigned to it, it will be counted multiple times

public int KvCacheCountTokens()

Returns

Int32

KvCacheClear()

Clear the KV cache

public void KvCacheClear()

KvCacheRemove(LLamaSeqId, LLamaPos, LLamaPos)

Removes all tokens that belong to the specified sequence and have positions in [p0, p1)

public void KvCacheRemove(LLamaSeqId seq, LLamaPos p0, LLamaPos p1)

Parameters

seq LLamaSeqId

p0 LLamaPos

p1 LLamaPos

KvCacheSequenceCopy(LLamaSeqId, LLamaSeqId, LLamaPos, LLamaPos)

Copy all tokens that belong to the specified sequence to another sequence. Note that this does not allocate extra KV cache memory - it simply assigns the tokens to the new sequence

public void KvCacheSequenceCopy(LLamaSeqId src, LLamaSeqId dest, LLamaPos p0, LLamaPos p1)

Parameters

src LLamaSeqId

dest LLamaSeqId

p0 LLamaPos

p1 LLamaPos

KvCacheSequenceKeep(LLamaSeqId)

Removes all tokens that do not belong to the specified sequence

public void KvCacheSequenceKeep(LLamaSeqId seq)

Parameters

seq LLamaSeqId

KvCacheSequenceAdd(LLamaSeqId, LLamaPos, LLamaPos, Int32)

Adds relative position "delta" to all tokens that belong to the specified sequence and have positions in [p0, p1. If the KV cache is RoPEd, the KV data is updated accordingly

public void KvCacheSequenceAdd(LLamaSeqId seq, LLamaPos p0, LLamaPos p1, int delta)

Parameters

seq LLamaSeqId

p0 LLamaPos

p1 LLamaPos

delta Int32

KvCacheSequenceDivide(LLamaSeqId, LLamaPos, LLamaPos, Int32)

Integer division of the positions by factor of d > 1. If the KV cache is RoPEd, the KV data is updated accordingly.
p0 < 0 : [0, p1]
p1 < 0 : [p0, inf)

public void KvCacheSequenceDivide(LLamaSeqId seq, LLamaPos p0, LLamaPos p1, int divisor)

Parameters

seq LLamaSeqId

p0 LLamaPos

p1 LLamaPos

divisor Int32