How to Run Stable Diffusion on Your Android Phone (On-Device AI Image Generation)
Every image you generate on Midjourney, DALL-E, or Adobe Firefly is stored on their servers. Your prompts, the images, metadata. It’s used for training and stored indefinitely.
Off Grid runs Stable Diffusion entirely on your phone using Alibaba’s MNN framework (CPU) or Qualcomm’s QNN engine (NPU). Nothing is uploaded.
Requirements
- Android phone with 4GB RAM minimum (6GB+ recommended)
- Android 10 or later
- ~1–2GB free storage per model
- Internet once for the model download
Step 1 - Install Off Grid
Step 2 - Download an image model
- Open Off Grid → Models → switch to the Image tab
- Choose a model based on your chipset:
All devices (MNN/CPU):
- Anything V5 - anime/stylised art
- Absolute Reality - photorealistic
- QteaMix - versatile
- ChilloutMix - portrait-focused
- CuteYukiMix - stylised
Snapdragon 8 Gen 1+ (QNN/NPU) - faster:
- DreamShaper, Realistic Vision, MajicmixRealistic, and 15+ more
- Tap Download (~1–1.2GB per model)
Step 3 - Generate your first image
- Open Off Grid → Image Generation
- Type a prompt:
a mountain valley at sunset, photorealistic, golden hour - Tap Generate
Off Grid automatically detects whether your device supports QNN NPU and uses it if available, falling back to MNN (CPU) otherwise.
Performance
| Backend | Chipset | Time for 512×512 @ 20 steps |
|---|---|---|
| QNN NPU | Snapdragon 8 Gen 2/3/4 | ~5–10s |
| QNN NPU | Snapdragon 8 Gen 1 | ~10–15s |
| MNN CPU | Any ARM64 | ~15s (Snapdragon 8 Gen 3) |
| MNN CPU | Mid-range | ~25–40s |
Tips for better images
Prompt structure - [subject], [style], [lighting], [quality descriptors]. Example: a red fox in a forest, digital art, golden hour lighting, highly detailed, sharp focus
Use prompt enhancement - Off Grid can use your loaded text model to automatically expand a short prompt into a detailed one. Enable it in the generation screen. Just type a fox in a forest and let the LLM do the rest.
Steps - 20 steps is a good default. 30 gives marginally better quality at the cost of ~50% more time.
Negative prompt - Add blurry, low quality, distorted, deformed to suppress common artifacts.