Did this land?

How to Run LLMs Locally on Your iPhone in 2026 (Completely Offline, No Subscription)

Apple’s Metal GPU and Neural Engine exist in every iPhone since 2017. They’re dedicated AI accelerators, sitting mostly idle while you pay a monthly subscription to send queries to someone else’s server.

Off Grid changes that. Run Qwen 3.5, Gemma 4, Mistral, and other leading models directly on your iPhone - offline, private, with no ongoing cost. Inference runs via llama.cpp with Metal GPU acceleration.


Requirements

  • iPhone 12 or newer (A14 Bionic or later)
  • iOS 16 or later
  • 3GB free storage minimum
  • Internet once for the model download

Step 1 - Install Off Grid

Download from the App Store


Step 2 - Choose your model

All models use Q4_K_M quantisation - the best balance of quality and size for mobile.

Model Min iPhone RAM needed Size Best for
Qwen 3.5 0.8B iPhone 12 3GB ~0.8GB Ultra-fast, 262K context
Qwen 3.5 2B iPhone 12 4GB ~1.7GB Best for 4–6GB devices
Gemma 4 E2B iPhone 12 4GB ~1.5GB Vision + thinking mode
Mistral 7B iPhone 14 6GB ~4.1GB Fast, reliable general purpose
Gemma 4 E4B iPhone 14 6GB ~2.5GB Reasoning + vision, thinking mode
Qwen 3.5 9B iPhone 15 Pro 8GB ~5.5GB Best on-device quality overall

iPhone 12/13 users: start with Qwen 3.5 2B. iPhone 15 Pro / 16 users: try Qwen 3.5 9B.


Step 3 - Download, load, chat

  1. Open Off Grid → Models
  2. Tap a model → Download
  3. Tap Load - the model loads via Metal (Apple’s GPU framework)
  4. Open Chat

You’re now running inference locally on Apple Silicon. Nothing leaves your phone.


Why iPhone is great for local AI

iPhones have a key advantage: unified memory. The Metal GPU and CPU share the same memory pool, which means models load faster and inference is more efficient than CPU-only devices.

Qwen 3.5 2B on an iPhone 14 generates around 20–30 tokens per second. That’s fast enough for a fluid conversation.

Thinking mode (Qwen 3.5, Gemma 4) works particularly well on iPhone because Metal acceleration keeps the longer reasoning sequences from feeling slow.


Performance by device

iPhone RAM Recommended model Approx tok/s
iPhone 16 Pro Max 8GB Qwen 3.5 9B 18–28
iPhone 16 / 16 Plus 8GB Qwen 3.5 9B 18–28
iPhone 15 Pro 8GB Qwen 3.5 9B 15–25
iPhone 14 Pro 6GB Gemma 4 E4B 15–22
iPhone 14 6GB Qwen 3.5 2B 20–30
iPhone 13 4GB Qwen 3.5 2B 18–26
iPhone 12 4GB Qwen 3.5 0.8B 25–40