Which Model Should I Use?
Off Grid uses the actual models in the app - not generic suggestions. All recommendations below are sourced directly from the model catalogue. Default quantisation is Q4_K_M for everything.
Quick pick by RAM
| Your device RAM | Best text model | Best vision model |
|---|---|---|
| 3GB | Qwen 3.5 0.8B | SmolVLM2 500M |
| 4GB | Qwen 3.5 2B | Gemma 4 E2B |
| 6GB | Gemma 4 E4B or Phi-4 Mini | Gemma 4 E4B |
| 8GB+ | Qwen 3.5 9B | Qwen 3.5 9B |
Full model catalogue
Text models
| Model | Params | Min RAM | Context | Best for |
|---|---|---|---|---|
| SmolLM2 360M | 0.36B | 3GB | 8K | Ultra-light, low-RAM devices only |
| Qwen 3.5 0.8B | 0.8B | 3GB | 262K | Fast responses, long context on budget devices |
| Qwen 3.5 2B | 2B | 4GB | 262K | Best general-purpose model for 4GB devices |
| SmolLM3 3B | 3B | 6GB | 128K | Purpose-built for constrained devices |
| Phi-4 Mini | 3.8B | 6GB | 128K | Reasoning, math, structured tasks |
| Mistral 7B | 7B | 6GB | 32K | Fast, reliable general purpose |
| Qwen 3.5 9B | 9B | 8GB | 262K | Best on-device quality overall |
Vision models (can see images)
| Model | Params | Min RAM | Best for |
|---|---|---|---|
| SmolVLM2 500M | 0.5B | 3GB | Tiny vision model for low-RAM devices |
| SmolVLM 2B | 2B | 4GB | General vision tasks on mid-range phones |
| SmolVLM2 2.2B | 2.2B | 4GB | Vision + video understanding |
| Gemma 4 E2B | 2B (MoE) | 4GB | Best vision quality for 4GB devices, thinking mode |
| Gemma 4 E4B | 4B (MoE) | 6GB | Strongest reasoning + vision, thinking mode |
Gemma 4 uses a Mixture-of-Experts (MoE) architecture - the effective parameter count is lower than it looks, which is why it fits in less RAM than you’d expect while delivering quality above its weight class.
What is thinking mode?
Qwen 3.5 and Gemma 4 models support thinking mode - the model reasons through a problem step-by-step before producing its final answer, similar to chain-of-thought prompting but built into the model weights.
Use it for: complex reasoning, math, multi-step problems. Skip it for: quick Q&A, summarisation, casual chat (it’s slower).
Understanding Q4_K_M
Off Grid defaults to Q4_K_M quantisation for all models. This means:
- ~4.5 bits per weight
- ~5–8% quality loss vs the full-precision original
- ~50–60% smaller than the float16 version
- Recommended by the llama.cpp community as the best mobile tradeoff
Don’t go below Q4_K_S unless you’re severely constrained on storage. Q2/Q3 models have noticeable quality degradation.
RAM safety thresholds
Off Grid automatically checks if a model fits safely before loading:
- 4GB RAM devices: model budget = 40% of total RAM
- 6GB+ RAM devices: model budget = 60% of total RAM
- Text models need ~1.5x their raw size in RAM (KV cache + activations)
- Image models need ~1.5x on iOS (CoreML), ~1.8x on Android (Vulkan)
If a model is marked as incompatible with your device, this is why.
FAQ
What is the best model for 4GB RAM? Qwen 3.5 2B (Q4_K_M). For vision tasks, Gemma 4 E2B.
What quantisation does Off Grid use? Q4_K_M by default - the best balance of quality and size for mobile.
What is the best model for reasoning? Gemma 4 E4B (6GB RAM) or Qwen 3.5 9B (8GB RAM). Both have thinking mode.