SafeLLamaContextHandle
Namespace: LLama.Native
A safe wrapper around a llama_context
1 |
|
Inheritance Object → CriticalFinalizerObject → SafeHandle → SafeLLamaHandleBase → SafeLLamaContextHandle
Implements IDisposable
Properties
VocabCount
Total number of tokens in vocabulary of this model
1 |
|
Property Value
ContextSize
Total number of tokens in the context
1 |
|
Property Value
EmbeddingSize
Dimension of embedding vectors
1 |
|
Property Value
BatchSize
Get the maximum batch size for this context
1 |
|
Property Value
ModelHandle
Get the model which this context is using
1 |
|
Property Value
IsInvalid
1 |
|
Property Value
IsClosed
1 |
|
Property Value
Constructors
SafeLLamaContextHandle()
1 |
|
Methods
ReleaseHandle()
1 |
|
Returns
Create(SafeLlamaModelHandle, LLamaContextParams)
Create a new llama_state for the given model
1 |
|
Parameters
model
SafeLlamaModelHandle
lparams
LLamaContextParams
Returns
Exceptions
GetLogits()
Token logits obtained from the last call to llama_decode
The logits for the last token are stored in the last row
Can be mutated in order to change the probabilities of the next token.
Rows: n_tokens
Cols: n_vocab
1 |
|
Returns
GetLogitsIth(Int32)
Logits for the ith token. Equivalent to: llama_get_logits(ctx) + i*n_vocab
1 |
|
Parameters
i
Int32
Returns
Tokenize(String, Boolean, Boolean, Encoding)
Convert the given text into tokens
1 |
|
Parameters
text
String
The text to tokenize
add_bos
Boolean
Whether the "BOS" token should be added
special
Boolean
Allow tokenizing special and/or control tokens which otherwise are not exposed and treated as plaintext.
encoding
Encoding
Encoding to use for the text
Returns
Exceptions
TokenToSpan(LLamaToken, Span<Byte>)
Convert a single llama token into bytes
1 |
|
Parameters
token
LLamaToken
Token to decode
dest
Span<Byte>
A span to attempt to write into. If this is too small nothing will be written
Returns
UInt32
The size of this token. nothing will be written if this is larger than dest
Decode(LLamaBatch)
1 |
|
Parameters
batch
LLamaBatch
Returns
DecodeResult
Positive return values does not mean a fatal error, but rather a warning:
- 0: success
- 1: could not find a KV slot for the batch (try reducing the size of the batch or increase the context)
- < 0: error
Decode(List<LLamaToken>, LLamaSeqId, LLamaBatch, Int32&)
Decode a set of tokens in batch-size chunks.
1 |
|
Parameters
tokens
List<LLamaToken>
id
LLamaSeqId
batch
LLamaBatch
n_past
Int32&
Returns
ValueTuple<DecodeResult, Int32>
A tuple, containing the decode result and the number of tokens that have not been decoded yet.
GetStateSize()
Get the size of the state, when saved as bytes
1 |
|
Returns
GetState(Byte*, UInt64)
Get the raw state of this context, encoded as bytes. Data is written into the dest
pointer.
1 |
|
Parameters
dest
Byte*
Destination to write to
size
UInt64
Number of bytes available to write to in dest (check required size with GetStateSize()
)
Returns
UInt64
The number of bytes written to dest
Exceptions
ArgumentOutOfRangeException
Thrown if dest is too small
GetState(IntPtr, UInt64)
Get the raw state of this context, encoded as bytes. Data is written into the dest
pointer.
1 |
|
Parameters
dest
IntPtr
Destination to write to
size
UInt64
Number of bytes available to write to in dest (check required size with GetStateSize()
)
Returns
UInt64
The number of bytes written to dest
Exceptions
ArgumentOutOfRangeException
Thrown if dest is too small
SetState(Byte*)
Set the raw state of this context
1 |
|
Parameters
src
Byte*
The pointer to read the state from
Returns
UInt64
Number of bytes read from the src pointer
SetState(IntPtr)
Set the raw state of this context
1 |
|
Parameters
src
IntPtr
The pointer to read the state from
Returns
UInt64
Number of bytes read from the src pointer
SetSeed(UInt32)
Set the RNG seed
1 |
|
Parameters
seed
UInt32
SetThreads(UInt32, UInt32)
Set the number of threads used for decoding
1 |
|
Parameters
threads
UInt32
n_threads is the number of threads used for generation (single token)
threadsBatch
UInt32
n_threads_batch is the number of threads used for prompt and batch processing (multiple tokens)
KvCacheGetDebugView(Int32)
Get a new KV cache view that can be used to debug the KV cache
1 |
|
Parameters
maxSequences
Int32
Returns
KvCacheCountCells()
Count the number of used cells in the KV cache (i.e. have at least one sequence assigned to them)
1 |
|
Returns
KvCacheCountTokens()
Returns the number of tokens in the KV cache (slow, use only for debug) If a KV cell has multiple sequences assigned to it, it will be counted multiple times
1 |
|
Returns
KvCacheClear()
Clear the KV cache
1 |
|
KvCacheRemove(LLamaSeqId, LLamaPos, LLamaPos)
Removes all tokens that belong to the specified sequence and have positions in [p0, p1)
1 |
|
Parameters
seq
LLamaSeqId
p0
LLamaPos
p1
LLamaPos
KvCacheSequenceCopy(LLamaSeqId, LLamaSeqId, LLamaPos, LLamaPos)
Copy all tokens that belong to the specified sequence to another sequence. Note that this does not allocate extra KV cache memory - it simply assigns the tokens to the new sequence
1 |
|
Parameters
src
LLamaSeqId
dest
LLamaSeqId
p0
LLamaPos
p1
LLamaPos
KvCacheSequenceKeep(LLamaSeqId)
Removes all tokens that do not belong to the specified sequence
1 |
|
Parameters
seq
LLamaSeqId
KvCacheSequenceAdd(LLamaSeqId, LLamaPos, LLamaPos, Int32)
Adds relative position "delta" to all tokens that belong to the specified sequence and have positions in [p0, p1. If the KV cache is RoPEd, the KV data is updated accordingly
1 |
|
Parameters
seq
LLamaSeqId
p0
LLamaPos
p1
LLamaPos
delta
Int32
KvCacheSequenceDivide(LLamaSeqId, LLamaPos, LLamaPos, Int32)
Integer division of the positions by factor of d > 1
.
If the KV cache is RoPEd, the KV data is updated accordingly.
p0 < 0 : [0, p1]
p1 < 0 : [p0, inf)
1 |
|
Parameters
seq
LLamaSeqId
p0
LLamaPos
p1
LLamaPos
divisor
Int32