Tokens/sec benchmarks, cost-per-GB tracking, and compatibility matrices across NVIDIA, Apple Silicon, and AMD -- for people who run models, not rent them.
Launching 2026. Inference benchmarks, cost tracking, compatibility data.
What We Track
Which GPU runs your model, how fast, and at what cost. Benchmarks across platforms, new and used pricing, and the setup guides to get running.
Real inference speed on real models. We test every GPU across Llama, DeepSeek, Mistral, and more -- with llama.cpp, MLX, and vLLM backends. Quantized and full precision.
New retail, eBay used, Apple refurb. We track cost-per-GB of usable memory across every inference-capable GPU and unified memory config. Updated daily.
Can your GPU run it? Enter a model name and size. We show every GPU that fits it, the expected tokens/sec at each quantization level, and what it costs.
Mac + ollama. NVIDIA + llama.cpp. AMD + ROCm. Step-by-step from unboxing to first inference, with benchmarked settings for your exact hardware.
Get early access to our inference benchmarks, cost trackers, and the compatibility data that saves you from expensive mistakes.