BatchedExecutor

Namespace: LLama.Batched

A batched executor that can infer multiple separate "conversations" simultaneously.

public sealed class BatchedExecutor : System.IDisposable

Inheritance Object → BatchedExecutor
Implements IDisposable

Properties

Context

The LLamaContext this executor is using

public LLamaContext Context { get; }

Property Value

LLamaContext

Model

The LLamaWeights this executor is using

public LLamaWeights Model { get; }

Property Value

LLamaWeights

BatchedTokenCount

Get the number of tokens in the batch, waiting for BatchedExecutor.Infer(CancellationToken) to be called

public int BatchedTokenCount { get; }

Property Value

Int32

IsDisposed

Check if this executor has been disposed.

public bool IsDisposed { get; private set; }

Property Value

Boolean

Constructors

BatchedExecutor(LLamaWeights, IContextParams)

Create a new batched executor

public BatchedExecutor(LLamaWeights model, IContextParams contextParams)

Parameters

model LLamaWeights
The model to use

contextParams IContextParams
Parameters to create a new context

Methods

Prompt(String)

Caution

Use BatchedExecutor.Create instead

Start a new Conversation with the given prompt

public Conversation Prompt(string prompt)

Parameters

prompt String

Returns

Conversation

Create()

Start a new Conversation

public Conversation Create()

Returns

Conversation

Infer(CancellationToken)

Run inference for all conversations in the batch which have pending tokens.

If the result is NoKvSlot then there is not enough memory for inference, try disposing some conversation threads and running inference again.

public Task<DecodeResult> Infer(CancellationToken cancellation)

Parameters

cancellation CancellationToken

Returns

Task<DecodeResult>

Dispose()

public void Dispose()

GetNextSequenceId()

internal LLamaSeqId GetNextSequenceId()

Returns

LLamaSeqId