BatchedExecutor

Namespace: LLama.Batched

A batched executor that can infer multiple separate "conversations" simultaneously.

public sealed class BatchedExecutor : System.IDisposable

Inheritance Object → BatchedExecutor
Implements IDisposable
Attributes NullableContextAttribute, NullableAttribute

Properties

Context

The LLamaContext this executor is using

public LLamaContext Context { get; }

Property Value

LLamaContext

Model

The LLamaWeights this executor is using

public LLamaWeights Model { get; }

Property Value

LLamaWeights

BatchedTokenCount

Get the number of tokens in the batch, waiting for BatchedExecutor.Infer(CancellationToken) to be called

public int BatchedTokenCount { get; }

Property Value

Int32

BatchQueueCount

Number of batches in the queue, waiting for BatchedExecutor.Infer(CancellationToken) to be called

public int BatchQueueCount { get; }

Property Value

Int32

IsDisposed

Check if this executor has been disposed.

public bool IsDisposed { get; private set; }

Property Value

Boolean

Constructors

BatchedExecutor(LLamaWeights, IContextParams)

Create a new batched executor

public BatchedExecutor(LLamaWeights model, IContextParams contextParams)

Parameters

model LLamaWeights
The model to use

contextParams IContextParams
Parameters to create a new context

Methods

Create()

Start a new Conversation

public Conversation Create()

Returns

Conversation

Load(String)

Load a conversation that was previously saved to a file. Once loaded the conversation will need to be prompted.

public Conversation Load(string filepath)

Parameters

filepath String

Returns

Conversation

Exceptions

ObjectDisposedException

Load(State)

Load a conversation that was previously saved into memory. Once loaded the conversation will need to be prompted.

public Conversation Load(State state)

Parameters

state State

Returns

Conversation

Exceptions

ObjectDisposedException

Infer(CancellationToken)

Run inference for all conversations in the batch which have pending tokens.

If the result is NoKvSlot then there is not enough memory for inference, try disposing some conversation threads and running inference again.

public Task<DecodeResult> Infer(CancellationToken cancellation)

Parameters

cancellation CancellationToken

Returns

Task<DecodeResult>

Dispose()

public void Dispose()

< Back