> [ METHODOLOGY ]: HOW THESE NUMBERS ARE MEASURED
All benchmarks run on the live A1AI cluster. TPS (tokens per second) measured using Ollama's native metrics during real workload runs — not synthetic tests. Cost per 1M tokens calculated as $0.00 because the hardware is owned outright. Cloud costs use current public API pricing.
> [ INFERENCE_BENCHMARKS ]: MOOLAH (RTX 5070 — 8GB VRAM)
| Model | Quantization | TPS (gen) | Cost / 1M tokens | Cloud equivalent |
| Llama 3.1 8B | Q4_K_M | ~50 TPS | $0.00 | ~$0.20 (Groq) |
| Gemma 3 12B | Q4_K_M | ~30 TPS | $0.00 | ~$0.40 (Groq) |
| Phi-3 Mini | Q4_K_M | ~70 TPS | $0.00 | ~$0.10 (Groq) |
| SDXL Base 1.0 | fp16 | ~12s/img | $0.00 | ~$0.04/img |
> [ LARGE_MODEL_BENCHMARKS ]: MAC STUDIO (M1 Max — 32GB unified)
| Model | Quantization | TPS (gen) | Cost / 1M tokens | Cloud equivalent |
| Qwen2.5 32B | Q4_K_M | ~18 TPS | $0.00 | ~$1.80 (Together AI) |
| Llama 3.1 70B | Q2_K | ~8 TPS | $0.00 | ~$0.90 (Groq) |
> [ ROI_CALCULATOR ]: WHEN DOES LOCAL PAY FOR ITSELF?
| Monthly API Spend | Annual Cloud Cost | Local Hardware | Break-even |
| $500/mo | $6,000/yr | $1,200 (Moolah tier) | ~73 days |
| $1,000/mo | $12,000/yr | $2,500 (Studio tier) | ~75 days |
| $2,000/mo | $24,000/yr | $2,500 (Studio tier) | ~38 days |
> [ GET_STARTED ]
Want these numbers for your specific workload? The AI Homelab Blueprint includes the full cost spreadsheet template. For a custom migration plan, book an audit.