Data Incoming

Know your hardware. Run your models.

Tokens/sec benchmarks, cost-per-GB tracking, and compatibility matrices across NVIDIA, Apple Silicon, and AMD -- for people who run models, not rent them.

GPUs Benchmarked

Avg t/s: 3090 + 8B Q4

RTX 3090 eBay Median

Max Unified Memory (GB)

Launching 2026. Inference benchmarks, cost tracking, compatibility data.

RTX 3090 Used $672 medianRTX 4090 New $1,899M4 Max 128GB $2,499RTX 3090 Llama 8B Q4 22 t/sM4 Max 70B Q4 14.2 t/sRTX 4090 Cost/GB $79/GBRTX 3090 Cost/GB $28/GBP40 Used $189 medianM2 Ultra 192GB $4,899A100 80GB Used $8,200RTX 3090 Used $672 medianRTX 4090 New $1,899M4 Max 128GB $2,499RTX 3090 Llama 8B Q4 22 t/sM4 Max 70B Q4 14.2 t/sRTX 4090 Cost/GB $79/GBRTX 3090 Cost/GB $28/GBP40 Used $189 medianM2 Ultra 192GB $4,899A100 80GB Used $8,200

What We Track

The numbers behind every inference decision

Which GPU runs your model, how fast, and at what cost. Benchmarks across platforms, new and used pricing, and the setup guides to get running.

Tokens/sec Benchmarks

Real inference speed on real models. We test every GPU across Llama, DeepSeek, Mistral, and more -- with llama.cpp, MLX, and vLLM backends. Quantized and full precision.

Cost/GB Tracker

New retail, eBay used, Apple refurb. We track cost-per-GB of usable memory across every inference-capable GPU and unified memory config. Updated daily.

Model Compatibility Matrix

Can your GPU run it? Enter a model name and size. We show every GPU that fits it, the expected tokens/sec at each quantization level, and what it costs.

Setup Guides by Platform