SafeLLamaContextHandle
Namespace: LLama.Native
A safe wrapper around a llama_context
public sealed class SafeLLamaContextHandle : SafeLLamaHandleBase, System.IDisposable
Inheritance Object → CriticalFinalizerObject → SafeHandle → SafeLLamaHandleBase → SafeLLamaContextHandle
Implements IDisposable
Properties
VocabCount
Total number of tokens in vocabulary of this model
public int VocabCount { get; }
Property Value
ContextSize
Total number of tokens in the context
public uint ContextSize { get; }
Property Value
EmbeddingSize
Dimension of embedding vectors
public int EmbeddingSize { get; }
Property Value
BatchSize
Get the maximum batch size for this context
public uint BatchSize { get; }
Property Value
ModelHandle
Get the model which this context is using
public SafeLlamaModelHandle ModelHandle { get; }
Property Value
IsInvalid
public bool IsInvalid { get; }
Property Value
IsClosed
public bool IsClosed { get; }
Property Value
Constructors
SafeLLamaContextHandle()
public SafeLLamaContextHandle()
Methods
ReleaseHandle()
protected bool ReleaseHandle()
Returns
Create(SafeLlamaModelHandle, LLamaContextParams)
Create a new llama_state for the given model
public static SafeLLamaContextHandle Create(SafeLlamaModelHandle model, LLamaContextParams lparams)
Parameters
model
SafeLlamaModelHandle
lparams
LLamaContextParams
Returns
Exceptions
GetLogits()
Token logits obtained from the last call to llama_decode
The logits for the last token are stored in the last row
Can be mutated in order to change the probabilities of the next token.
Rows: n_tokens
Cols: n_vocab
public Span<float> GetLogits()
Returns
GetLogitsIth(Int32)
Logits for the ith token. Equivalent to: llama_get_logits(ctx) + i*n_vocab
public Span<float> GetLogitsIth(int i)
Parameters
i
Int32
Returns
Tokenize(String, Boolean, Boolean, Encoding)
Convert the given text into tokens
public LLamaToken[] Tokenize(string text, bool add_bos, bool special, Encoding encoding)
Parameters
text
String
The text to tokenize
add_bos
Boolean
Whether the "BOS" token should be added
special
Boolean
Allow tokenizing special and/or control tokens which otherwise are not exposed and treated as plaintext.
encoding
Encoding
Encoding to use for the text
Returns
Exceptions
TokenToSpan(LLamaToken, Span<Byte>)
Convert a single llama token into bytes
public uint TokenToSpan(LLamaToken token, Span<byte> dest)
Parameters
token
LLamaToken
Token to decode
dest
Span<Byte>
A span to attempt to write into. If this is too small nothing will be written
Returns
UInt32
The size of this token. nothing will be written if this is larger than dest
Decode(LLamaBatch)
public DecodeResult Decode(LLamaBatch batch)
Parameters
batch
LLamaBatch
Returns
DecodeResult
Positive return values does not mean a fatal error, but rather a warning:
- 0: success
- 1: could not find a KV slot for the batch (try reducing the size of the batch or increase the context)
- < 0: error
Decode(List<LLamaToken>, LLamaSeqId, LLamaBatch, Int32&)
Decode a set of tokens in batch-size chunks.
internal ValueTuple<DecodeResult, int> Decode(List<LLamaToken> tokens, LLamaSeqId id, LLamaBatch batch, Int32& n_past)
Parameters
tokens
List<LLamaToken>
id
LLamaSeqId
batch
LLamaBatch
n_past
Int32&
Returns
ValueTuple<DecodeResult, Int32>
A tuple, containing the decode result and the number of tokens that have not been decoded yet.
GetStateSize()
Get the size of the state, when saved as bytes
public ulong GetStateSize()
Returns
GetState(Byte*, UInt64)
Get the raw state of this context, encoded as bytes. Data is written into the dest
pointer.
public ulong GetState(Byte* dest, ulong size)
Parameters
dest
Byte*
Destination to write to
size
UInt64
Number of bytes available to write to in dest (check required size with GetStateSize()
)
Returns
UInt64
The number of bytes written to dest
Exceptions
ArgumentOutOfRangeException
Thrown if dest is too small
GetState(IntPtr, UInt64)
Get the raw state of this context, encoded as bytes. Data is written into the dest
pointer.
public ulong GetState(IntPtr dest, ulong size)
Parameters
dest
IntPtr
Destination to write to
size
UInt64
Number of bytes available to write to in dest (check required size with GetStateSize()
)
Returns
UInt64
The number of bytes written to dest
Exceptions
ArgumentOutOfRangeException
Thrown if dest is too small
SetState(Byte*)
Set the raw state of this context
public ulong SetState(Byte* src)
Parameters
src
Byte*
The pointer to read the state from
Returns
UInt64
Number of bytes read from the src pointer
SetState(IntPtr)
Set the raw state of this context
public ulong SetState(IntPtr src)
Parameters
src
IntPtr
The pointer to read the state from
Returns
UInt64
Number of bytes read from the src pointer
SetSeed(UInt32)
Set the RNG seed
public void SetSeed(uint seed)
Parameters
seed
UInt32
SetThreads(UInt32, UInt32)
Set the number of threads used for decoding
public void SetThreads(uint threads, uint threadsBatch)
Parameters
threads
UInt32
n_threads is the number of threads used for generation (single token)
threadsBatch
UInt32
n_threads_batch is the number of threads used for prompt and batch processing (multiple tokens)
KvCacheGetDebugView(Int32)
Get a new KV cache view that can be used to debug the KV cache
public LLamaKvCacheViewSafeHandle KvCacheGetDebugView(int maxSequences)
Parameters
maxSequences
Int32
Returns
KvCacheCountCells()
Count the number of used cells in the KV cache (i.e. have at least one sequence assigned to them)
public int KvCacheCountCells()
Returns
KvCacheCountTokens()
Returns the number of tokens in the KV cache (slow, use only for debug) If a KV cell has multiple sequences assigned to it, it will be counted multiple times
public int KvCacheCountTokens()
Returns
KvCacheClear()
Clear the KV cache
public void KvCacheClear()
KvCacheRemove(LLamaSeqId, LLamaPos, LLamaPos)
Removes all tokens that belong to the specified sequence and have positions in [p0, p1)
public void KvCacheRemove(LLamaSeqId seq, LLamaPos p0, LLamaPos p1)
Parameters
seq
LLamaSeqId
p0
LLamaPos
p1
LLamaPos
KvCacheSequenceCopy(LLamaSeqId, LLamaSeqId, LLamaPos, LLamaPos)
Copy all tokens that belong to the specified sequence to another sequence. Note that this does not allocate extra KV cache memory - it simply assigns the tokens to the new sequence
public void KvCacheSequenceCopy(LLamaSeqId src, LLamaSeqId dest, LLamaPos p0, LLamaPos p1)
Parameters
src
LLamaSeqId
dest
LLamaSeqId
p0
LLamaPos
p1
LLamaPos
KvCacheSequenceKeep(LLamaSeqId)
Removes all tokens that do not belong to the specified sequence
public void KvCacheSequenceKeep(LLamaSeqId seq)
Parameters
seq
LLamaSeqId
KvCacheSequenceAdd(LLamaSeqId, LLamaPos, LLamaPos, Int32)
Adds relative position "delta" to all tokens that belong to the specified sequence and have positions in [p0, p1. If the KV cache is RoPEd, the KV data is updated accordingly
public void KvCacheSequenceAdd(LLamaSeqId seq, LLamaPos p0, LLamaPos p1, int delta)
Parameters
seq
LLamaSeqId
p0
LLamaPos
p1
LLamaPos
delta
Int32
KvCacheSequenceDivide(LLamaSeqId, LLamaPos, LLamaPos, Int32)
Integer division of the positions by factor of d > 1
.
If the KV cache is RoPEd, the KV data is updated accordingly.
p0 < 0 : [0, p1]
p1 < 0 : [p0, inf)
public void KvCacheSequenceDivide(LLamaSeqId seq, LLamaPos p0, LLamaPos p1, int divisor)
Parameters
seq
LLamaSeqId
p0
LLamaPos
p1
LLamaPos
divisor
Int32