BatchedExecutor
Namespace: LLama.Batched
A batched executor that can infer multiple separate "conversations" simultaneously.
public sealed class BatchedExecutor : System.IDisposable
Inheritance Object → BatchedExecutor
Implements IDisposable
Properties
Context
The LLamaContext this executor is using
public LLamaContext Context { get; }
Property Value
Model
The LLamaWeights this executor is using
public LLamaWeights Model { get; }
Property Value
BatchedTokenCount
Get the number of tokens in the batch, waiting for BatchedExecutor.Infer(CancellationToken) to be called
public int BatchedTokenCount { get; }
Property Value
IsDisposed
Check if this executor has been disposed.
public bool IsDisposed { get; private set; }
Property Value
Constructors
BatchedExecutor(LLamaWeights, IContextParams)
Create a new batched executor
public BatchedExecutor(LLamaWeights model, IContextParams contextParams)
Parameters
model
LLamaWeights
The model to use
contextParams
IContextParams
Parameters to create a new context
Methods
Prompt(String)
Caution
Use BatchedExecutor.Create instead
Start a new Conversation with the given prompt
public Conversation Prompt(string prompt)
Parameters
prompt
String
Returns
Create()
Start a new Conversation
public Conversation Create()
Returns
Infer(CancellationToken)
Run inference for all conversations in the batch which have pending tokens.
If the result is NoKvSlot
then there is not enough memory for inference, try disposing some conversation
threads and running inference again.
public Task<DecodeResult> Infer(CancellationToken cancellation)
Parameters
cancellation
CancellationToken
Returns
Dispose()
public void Dispose()
GetNextSequenceId()
internal LLamaSeqId GetNextSequenceId()