Did this land?

How to Run LLMs Locally on Your Android Phone in 2026 (No Cloud, No Account)

Every time you ask ChatGPT a question, it’s logged on a server. Your query, the response, the time, your account. It’s stored indefinitely. That data is used to improve models, inform advertising, comply with law enforcement requests.

Off Grid removes that entire layer. The model runs in your phone’s RAM via llama.cpp on ARM64. Nothing is sent anywhere.

Here’s how to set it up.


What you need

  • Android phone with 4GB RAM or more (Android 10+)
  • 2–5GB free storage depending on the model you choose
  • Internet once for the initial download - then never again

Step 1 - Download Off Grid

Get Off Grid on Google Play


Step 2 - Choose a model

All models use Q4_K_M quantisation by default - the best balance of quality and size for mobile.

Model Min RAM Size Best for
Qwen 3.5 0.8B 3GB ~0.8GB Ultra-fast, 262K context, budget devices
Qwen 3.5 2B 4GB ~1.7GB Best for 4–6GB RAM devices, 262K context
Gemma 4 E2B 4GB ~1.5GB Vision + thinking mode, MoE architecture
Mistral 7B 6GB ~4.1GB Fast, reliable general purpose
Gemma 4 E4B 6GB ~2.5GB Strong reasoning + vision, thinking mode
Qwen 3.5 9B 8GB ~5.5GB Best on-device quality overall

Start with Qwen 3.5 2B on a 4–6GB device. Start with Qwen 3.5 9B if you have 8GB+ RAM.


Step 3 - Download and load

  1. Open Off Grid → tap Models
  2. Select your model → tap Download
  3. Once downloaded, tap Load
  4. Open Chat and start

The model runs entirely on your device from this point. No network requests.


Step 4 - Go offline

Turn on airplane mode. Open a chat. It still works.

This is the point. You now have a capable AI assistant that works without any network connection, on any network, in any country, with no monthly bill.


Performance by device

Off Grid uses llama.cpp on ARM64 with NEON, i8mm, and dotprod SIMD instructions. Optional OpenCL GPU offloading is available on Qualcomm Adreno GPUs.

Device RAM Recommended model Approx tok/s
Pixel 9 Pro 16GB Qwen 3.5 9B 15–25
Samsung Galaxy S25 12GB Qwen 3.5 9B 15–25
Pixel 8 Pro 12GB Qwen 3.5 9B 12–20
Samsung S24 8GB Qwen 3.5 9B or Gemma 4 E4B 10–18
Pixel 7 8GB Qwen 3.5 9B 8–15
OnePlus 12 12GB Qwen 3.5 9B 12–20
Samsung A55 8GB Qwen 3.5 2B 15–25
Budget 4GB device 4GB Qwen 3.5 0.8B 20–35

Why run LLMs locally instead of using the cloud?

Privacy. Your queries never leave your device.

No cost. No API fees, no subscription. The model is free to download and runs forever.

Offline. Works on planes, in areas with bad signal, in countries where cloud AI services are restricted.

Speed. For short queries, local inference on modern ARM chips is surprisingly fast - often faster than waiting for a cloud response on a slow connection.