Did this land?

Guides

Everything you need to get the most out of running AI locally on your phone.

Getting started

Which Model Should I Use?

Pick the right model for your device RAM and use case. Full catalogue with performance numbers.

Download, install, and run your first model on iPhone. Metal GPU acceleration, supported devices.

Get up and running on Android. Vulkan acceleration, background behaviour, tested devices.

Running LLMs locally

Run LLMs on Your iPhone in 2026

Qwen 3.5, Gemma 4, Phi-4 Mini running locally on iPhone via llama.cpp and Metal. Real tok/s numbers.

Run LLMs on Your Android Phone in 2026

Local LLMs on Android with CPU and Vulkan GPU acceleration. Device performance table.

Image generation

Stable Diffusion on iPhone

On-device image generation using Core ML and the Apple Neural Engine. SD 1.5, 2.1, SDXL.

Stable Diffusion on Android

MNN (CPU, all devices) and QNN NPU (Snapdragon 8 Gen 1+). 5-10s images on flagship chips.

Vision, voice and documents

Point your camera at anything and ask questions. SmolVLM, Qwen3-VL, Gemma 4 - all on-device.

Voice Input with Whisper

On-device speech-to-text via whisper.cpp. Hold to record, auto-transcribe. 99 languages, no audio leaves your phone.

Document Analysis

Attach PDFs, CSVs, and code files directly to your chat. Native PDF extraction on iOS and Android.

Knowledge Base and RAG

Upload documents to a project. Off Grid embeds and indexes them on-device, retrieves context automatically.

Tools and intelligence

Web search, calculator, date/time, device info, knowledge base search. Automatic tool loop with runaway prevention.

Remote servers

Connect to Ollama, LM Studio, LocalAI, or vLLM on your home network. Access larger models from your phone over WiFi.

Ollama from Android

Run Llama 3.1 70B on your desktop, control it from your Android phone over WiFi.

LM Studio from Android

Use LM Studio's local server from your phone. Port 1234, network access enabled.