Guides
Everything you need to get the most out of running AI locally on your phone.
Getting started
Which Model Should I Use?
Pick the right model for your device RAM and use case. Full catalogue with performance numbers.
iOS Setup
Download, install, and run your first model on iPhone. Metal GPU acceleration, supported devices.
Android Setup
Get up and running on Android. Vulkan acceleration, background behaviour, tested devices.
Running LLMs locally
Run LLMs on Your iPhone in 2026
Qwen 3.5, Gemma 4, Phi-4 Mini running locally on iPhone via llama.cpp and Metal. Real tok/s numbers.
Run LLMs on Your Android Phone in 2026
Local LLMs on Android with CPU and Vulkan GPU acceleration. Device performance table.
Image generation
Stable Diffusion on iPhone
On-device image generation using Core ML and the Apple Neural Engine. SD 1.5, 2.1, SDXL.
Stable Diffusion on Android
MNN (CPU, all devices) and QNN NPU (Snapdragon 8 Gen 1+). 5-10s images on flagship chips.
Vision, voice and documents
Vision AI
Point your camera at anything and ask questions. SmolVLM, Qwen3-VL, Gemma 4 - all on-device.
Voice Input with Whisper
On-device speech-to-text via whisper.cpp. Hold to record, auto-transcribe. 99 languages, no audio leaves your phone.
Document Analysis
Attach PDFs, CSVs, and code files directly to your chat. Native PDF extraction on iOS and Android.
Knowledge Base and RAG
Upload documents to a project. Off Grid embeds and indexes them on-device, retrieves context automatically.
Tools and intelligence
Remote servers
Remote Servers
Connect to Ollama, LM Studio, LocalAI, or vLLM on your home network. Access larger models from your phone over WiFi.
Ollama from Android
Run Llama 3.1 70B on your desktop, control it from your Android phone over WiFi.
LM Studio from Android
Use LM Studio's local server from your phone. Port 1234, network access enabled.