> [BENCHMARKS]: Local vs Cloud — Live Performance & Cost Comparison

> [ METHODOLOGY ]: HOW THESE NUMBERS ARE MEASURED

All benchmarks run on the live A1AI cluster. TPS (tokens per second) measured using Ollama's native metrics during real workload runs — not synthetic tests. Cost per 1M tokens calculated as $0.00 because the hardware is owned outright. Cloud costs use current public API pricing.

> [ INFERENCE_BENCHMARKS ]: MOOLAH (RTX 5070 — 8GB VRAM)

Model	Quantization	TPS (gen)	Cost / 1M tokens	Cloud equivalent
Llama 3.1 8B	Q4_K_M	~50 TPS	$0.00	~$0.20 (Groq)
Gemma 3 12B	Q4_K_M	~30 TPS	$0.00	~$0.40 (Groq)
Phi-3 Mini	Q4_K_M	~70 TPS	$0.00	~$0.10 (Groq)
SDXL Base 1.0	fp16	~12s/img	$0.00	~$0.04/img

> [ LARGE_MODEL_BENCHMARKS ]: MAC STUDIO (M1 Max — 32GB unified)

Model	Quantization	TPS (gen)	Cost / 1M tokens	Cloud equivalent
Qwen2.5 32B	Q4_K_M	~18 TPS	$0.00	~$1.80 (Together AI)
Llama 3.1 70B	Q2_K	~8 TPS	$0.00	~$0.90 (Groq)

> [ ROI_CALCULATOR ]: WHEN DOES LOCAL PAY FOR ITSELF?

Monthly API Spend	Annual Cloud Cost	Local Hardware	Break-even
$500/mo	$6,000/yr	$1,200 (Moolah tier)	~73 days
$1,000/mo	$12,000/yr	$2,500 (Studio tier)	~75 days
$2,000/mo	$24,000/yr	$2,500 (Studio tier)	~38 days

> [ GET_STARTED ]

Want these numbers for your specific workload? The AI Homelab Blueprint includes the full cost spreadsheet template. For a custom migration plan, book an audit.