How to Run LLMs Locally on Your Android Phone in 2026 (No Cloud, No Account)
Every time you ask ChatGPT a question, it’s logged on a server. Your query, the response, the time, your account. It’s stored indefinitely. That data is used to improve models, inform advertising, comply with law enforcement requests.
Off Grid removes that entire layer. The model runs in your phone’s RAM via llama.cpp on ARM64. Nothing is sent anywhere.
Here’s how to set it up.
What you need
- Android phone with 4GB RAM or more (Android 10+)
- 2–5GB free storage depending on the model you choose
- Internet once for the initial download - then never again
Step 1 - Download Off Grid
Step 2 - Choose a model
All models use Q4_K_M quantisation by default - the best balance of quality and size for mobile.
| Model | Min RAM | Size | Best for |
|---|---|---|---|
| Qwen 3.5 0.8B | 3GB | ~0.8GB | Ultra-fast, 262K context, budget devices |
| Qwen 3.5 2B | 4GB | ~1.7GB | Best for 4–6GB RAM devices, 262K context |
| Gemma 4 E2B | 4GB | ~1.5GB | Vision + thinking mode, MoE architecture |
| Mistral 7B | 6GB | ~4.1GB | Fast, reliable general purpose |
| Gemma 4 E4B | 6GB | ~2.5GB | Strong reasoning + vision, thinking mode |
| Qwen 3.5 9B | 8GB | ~5.5GB | Best on-device quality overall |
Start with Qwen 3.5 2B on a 4–6GB device. Start with Qwen 3.5 9B if you have 8GB+ RAM.
Step 3 - Download and load
- Open Off Grid → tap Models
- Select your model → tap Download
- Once downloaded, tap Load
- Open Chat and start
The model runs entirely on your device from this point. No network requests.
Step 4 - Go offline
Turn on airplane mode. Open a chat. It still works.
This is the point. You now have a capable AI assistant that works without any network connection, on any network, in any country, with no monthly bill.
Performance by device
Off Grid uses llama.cpp on ARM64 with NEON, i8mm, and dotprod SIMD instructions. Optional OpenCL GPU offloading is available on Qualcomm Adreno GPUs.
| Device | RAM | Recommended model | Approx tok/s |
|---|---|---|---|
| Pixel 9 Pro | 16GB | Qwen 3.5 9B | 15–25 |
| Samsung Galaxy S25 | 12GB | Qwen 3.5 9B | 15–25 |
| Pixel 8 Pro | 12GB | Qwen 3.5 9B | 12–20 |
| Samsung S24 | 8GB | Qwen 3.5 9B or Gemma 4 E4B | 10–18 |
| Pixel 7 | 8GB | Qwen 3.5 9B | 8–15 |
| OnePlus 12 | 12GB | Qwen 3.5 9B | 12–20 |
| Samsung A55 | 8GB | Qwen 3.5 2B | 15–25 |
| Budget 4GB device | 4GB | Qwen 3.5 0.8B | 20–35 |
Why run LLMs locally instead of using the cloud?
Privacy. Your queries never leave your device.
No cost. No API fees, no subscription. The model is free to download and runs forever.
Offline. Works on planes, in areas with bad signal, in countries where cloud AI services are restricted.
Speed. For short queries, local inference on modern ARM chips is surprisingly fast - often faster than waiting for a cloud response on a slow connection.