Data Incoming

Know your hardware. Run your models.

Tokens/sec benchmarks, cost-per-GB tracking, and compatibility matrices across NVIDIA, Apple Silicon, and AMD -- for people who run models, not rent them.

0+
GPUs Benchmarked
0
Avg t/s: 3090 + 8B Q4
$0
RTX 3090 eBay Median
0
Max Unified Memory (GB)

Launching 2026. Inference benchmarks, cost tracking, compatibility data.

The numbers behind every inference decision

Which GPU runs your model, how fast, and at what cost. Benchmarks across platforms, new and used pricing, and the setup guides to get running.

Tokens/sec Benchmarks

Real inference speed on real models. We test every GPU across Llama, DeepSeek, Mistral, and more -- with llama.cpp, MLX, and vLLM backends. Quantized and full precision.

Cost/GB Tracker

New retail, eBay used, Apple refurb. We track cost-per-GB of usable memory across every inference-capable GPU and unified memory config. Updated daily.

Model Compatibility Matrix

Can your GPU run it? Enter a model name and size. We show every GPU that fits it, the expected tokens/sec at each quantization level, and what it costs.

Setup Guides by Platform

Mac + ollama. NVIDIA + llama.cpp. AMD + ROCm. Step-by-step from unboxing to first inference, with benchmarked settings for your exact hardware.

Run models locally. Make better hardware decisions.

Get early access to our inference benchmarks, cost trackers, and the compatibility data that saves you from expensive mistakes.