Tools

A small, growing set of interactive tools that pair with the blog posts.

KV Cache Calculator

Interactive calculator that estimates KV cache and total VRAM for serving an LLM. Pick a model preset (or custom config), batch size, sequence length, KV cache precision, and GPU class — and see whether it fits before you hit OOM.

Open tool →