wllama WASM Files (PrismML Fork)

Note: This build uses PrismML-Eng's fork of llama.cpp which supports the Q1_0_g128 quantization format required for Bonsai-8B.

Multi-threaded Version (Recommended)

wllama.js

Size: 91KB

Download: wllama.js

wllama.wasm

Size: 2.1MB

Download: wllama.wasm

Single-threaded Version

wllama-single.js

Size: 73KB

Download: wllama-single.js

wllama-single.wasm

Size: 2.1MB

Download: wllama-single.wasm

Usage

To use these files in your wllama project:

import { Wllama } from 'wllama';

const wllama = new Wllama({
  wasmUrl: 'https://your-domain.com/wllama.wasm',
  wasmJsUrl: 'https://your-domain.com/wllama.js'
});

await wllama.loadModelFromUrl('path/to/bonsai-8b-q1_0_g128.gguf');

Repository

Built from: https://github.com/ngxson/wllama with llama.cpp replaced by https://github.com/PrismML-Eng/llama.cpp