Overview

logo

LLamaSharp is the C#/.NET binding of llama.cpp. It provides APIs to inference the LLaMa Models and deploy it on native environment or Web. It could help C# developers to deploy the LLM (Large Language Model) locally and integrate with C# apps.

Main features

Model inference
Model quantization
Generating embeddings
Interactive/Instruct/Stateless executor mode
Chat session APIs
Save/load the state
Integration with other applications like BotSharp and semantic-kernel

Essential insights for novice learners

If you are new to LLM, here're some tips for you to help you to get start with LLamaSharp. If you are experienced in this field, we'd still recommend you to take a few minutes to read it because somethings performs differently compared to cpp/python.

Tha main ability of LLamaSharp is to provide an efficient way to run inference of LLM (Large Language Model) locally (and fine-tune model in the future). The model weights, however, needs to be downloaded from other resources, like huggingface.
Since LLamaSharp supports multiple platforms, The nuget package is splitted to LLamaSharp and LLama.Backend. After installing LLamaSharp, please install one of LLama.Backend.Cpu, LLama.Backend.Cuda11 and LLama.Backend.Cuda12. If you use the source code, dynamic libraries could be found in LLama/Runtimes. Then rename the one you want to use to libllama.dll.
LLaMa originally refers to the weights released by Meta (Facebook Research). After that, many models are fine-tuned based on it, such as Vicuna, GPT4All, and Pyglion. Though all of these models are supported by LLamaSharp, some steps are necessary with different file formats. There're mainly three kinds of files, which are .pth, .bin (ggml), .bin (quantized). If you have the .bin (quantized) file, it could be used directly by LLamaSharp. If you have the .bin (ggml) file, you could use it directly but get higher inference speed after the quantization. If you have the .pth file, you need to follow the instructions in llama.cpp to convert it to .bin (ggml) file at first.
LLamaSharp supports GPU acceleration, but it requires cuda installation. Please install cuda 11 or cuda 12 on your system before using LLamaSharp to enable GPU. If you have another cuda version, you could compile llama.cpp from source to get the dll. For building from source, please refer to issue #5.

Welcome to join the development!

Community effort is always one of the most important things in open-source projects. Any contribution in any way is welcomed here. For example, the following things mean a lot for LLamaSharp:

Open an issue when you find something wrong.
Open an PR if you've fixed something. Even if just correcting a typo, it also makes great sense.
Help to optimize the documentation.
Write an example or blog about how to integrate LLamaSharp with your APPs.
Ask for a missed feature and discuss with other developers.

If you'd like to get deeply involved in development, please touch us in discord channel or send email to AsakusaRinne@gmail.com. :)