Top 5 Best Open Models to Use with Ollama and OpenClaw
One of the best things about running Ollama alongside OpenClaw is that you get a fully local, private AI assistant with zero ongoing API costs. But with hundreds of models available, which ones are actually worth running? Here are the top five.
1. Llama 3.1 8B — Best All-Rounder
Pull: ollama pull llama3.1:8b
Meta’s Llama 3.1 8B is the sweet spot for most setups. It handles general chat, writing, summarisation, and basic coding well enough for everyday tasks, and at 4.9GB it fits comfortably on machines with 8GB of RAM. It follows instructions reliably and is the most widely tested model in the Ollama ecosystem. If you only pull one model, make it this one.
- Size: ~4.9GB
- Best for: General assistant tasks, writing, Q&A
- Min RAM: 8GB
2. Mistral 7B — Fast and Sharp
Pull: ollama pull mistral
Mistral 7B punches above its weight. It is notably faster than Llama 3.1 8B on CPU-only setups while keeping quality surprisingly high. It excels at structured tasks — formatting responses, following multi-step instructions, and light reasoning. A great choice if response speed matters more than raw capability.
- Size: ~4.1GB
- Best for: Speed-sensitive tasks, structured outputs
- Min RAM: 8GB
3. Phi-3 Mini — Lightweight Powerhouse
Pull: ollama pull phi3:mini
Microsoft’s Phi-3 Mini is only 2.3GB but delivers performance that rivals much larger models on reasoning and instruction-following tasks. It is ideal for low-resource VPS setups or as a heartbeat model in OpenClaw — fast enough to handle quick checks without eating through RAM. A genuinely impressive small model.
- Size: ~2.3GB
- Best for: Low-resource environments, heartbeat tasks, quick Q&A
- Min RAM: 4GB
4. Qwen2.5 7B — Best for Coding
Pull: ollama pull qwen2.5:7b
Alibaba’s Qwen2.5 7B is one of the strongest open-source models for code generation and technical tasks. If you use OpenClaw for scripting, automation, or working with APIs, Qwen2.5 is worth having around. It also handles multilingual content well, which is a bonus if you work in languages other than English.
- Size: ~4.7GB
- Best for: Coding, scripting, technical tasks, multilingual
- Min RAM: 8GB
5. Gemma 2 9B — Google’s Best Open Model
Pull: ollama pull gemma2:9b
Google’s Gemma 2 9B is one of the most capable open models at this size range. It is particularly strong at nuanced reasoning, long-form writing, and maintaining context across longer conversations — which makes it a natural fit for OpenClaw sessions that involve complex, multi-step workflows. It requires a bit more RAM than the others, so best suited to machines with 16GB or more.
- Size: ~5.4GB
- Best for: Long conversations, complex reasoning, writing
- Min RAM: 16GB
How to Set Any of These as Primary in OpenClaw
Once you have pulled a model with Ollama, you can set it as the primary model in your openclaw.json config:
{
"agents": {
"defaults": {
"model": {
"primary": "ollama/llama3.1:8b",
"fallbacks": ["anthropic/claude-sonnet-4-6"]
}
}
}
}
This gives you a free local model as the first line of response, with a cloud model as a fallback for anything too complex. It is one of the most cost-effective setups you can run.
Final Thoughts
For most people running OpenClaw on a standard VPS, Llama 3.1 8B is the default recommendation. Add Phi-3 Mini if you want something lighter for background tasks, and Qwen2.5 if you do a lot of coding. The beauty of Ollama is that pulling a new model takes minutes — so experimenting costs nothing but disk space.