AI Tools

May 4, 202611 min read

How to Use Claude Cowork with Local Models: Complete Setup Guide (2026)

Run Claude Cowork with local models via Ollama for free. Step-by-step setup, best models, performance benchmarks, limitations, and full comparison chart. No API key needed.

Tweet Share

How to Use Claude Cowork with Local Models: Complete Setup Guide (2026)

Want to run Claude Cowork with local models — completely free, offline, and private? Since January 2026, Ollama v0.14 ships native Anthropic Messages API compatibility, which means Claude's agentic desktop tool can now talk directly to open-source models running on your hardware. No API key. No subscription. No data leaving your machine.

This guide covers everything: installation, configuration, model selection, performance benchmarks, limitations, and a full comparison chart between cloud Claude and local inference. Whether you're a developer concerned about privacy or someone who wants unlimited Claude Cowork usage at zero cost — this is the definitive setup guide for 2026.

💡

Already using AI coding tools? Check our Cursor vs Windsurf vs Claude Code comparison to see how Cowork fits into the broader landscape.

What Is Claude Cowork?

Claude Cowork is Anthropic's agentic desktop tool that brings Claude Code's capabilities to Claude Desktop for knowledge work beyond coding. Instead of responding to prompts one at a time, Claude can take on complex, multi-step tasks and execute them on your behalf — formatting documents, organizing files, synthesizing research, and automating workflows.

Key Capabilities

Multi-step task execution: Describe an outcome, step away, and come back to finished work
File system access: Read, write, and organize files on your computer
Scheduled tasks: Automate recurring work (cloud-only feature)
Projects: Persistent workspaces with their own files, links, instructions, and memory
Plugins: Extend functionality with skills, connectors, and sub-agents
Computer Use: Control desktop apps by seeing, clicking, and typing

Cowork runs directly on your computer in an isolated VM, giving Claude access to files you choose to share. Code executes safely in sandboxed environments while Claude makes real changes to your files.

Why Use Local Models with Claude Cowork?

Running Claude Cowork against cloud APIs costs money and sends your data to external servers. Here's why local models change the equation:

Factor	Cloud Claude	Local Models
Cost	$20-200/month (Pro/Max plans)	$0 after hardware
Privacy	Data sent to Anthropic servers	Everything stays on your machine
Rate Limits	Usage caps, especially heavy Cowork tasks	Unlimited — run as much as you want
Offline	Requires internet	Works completely offline
Data Residency	Cross-border transfer concerns	Full GDPR/compliance control
Speed	60-80 tokens/sec	8-25 tokens/sec (hardware dependent)

The tradeoff is clear: local models trade speed for privacy, cost savings, and unlimited usage. For many workflows — especially those involving sensitive code, proprietary documents, or air-gapped environments — that tradeoff makes perfect sense.

Prerequisites & Hardware Requirements

Before setting up local models with Claude Cowork, ensure your system meets these requirements:

Software Requirements

Ollama v0.14.0+ (required for Anthropic Messages API compatibility)
Claude Code CLI installed via curl -fsSL https://claude.ai/install.sh | bash
macOS 13+, Windows 10+, or Linux (Ubuntu 20.04+ recommended)

Hardware Requirements

Tier	Hardware	Best Model	Experience
Minimum Viable	16GB RAM (M1/M2) or RTX 3060 12GB	GLM-4.7-Flash (Q4)	Usable for single-file tasks. Slower on complex operations.
Recommended	32GB RAM (M1 Pro/Max) or RTX 4070 Ti 16GB	Qwen3-Coder 30B (Q4)	Solid for most coding workflows. Multi-file works but slower.
Ideal	64GB+ RAM (M2/M3/M4 Max) or RTX 4090 24GB	Qwen2.5-Coder-32B (Q6)	Best local experience. Higher quantization, faster throughput.

Step-by-Step Setup: Ollama + Claude Code

Step 1: Install Ollama

macOS (Homebrew):

terminal

brew install ollama

Linux:

terminal

curl -fsSL https://ollama.com/install.sh | sh

Windows: Download from ollama.com

Verify installation:

terminal

ollama --version
# Must be v0.14.0 or later

Step 2: Pull a Local Model

Choose a model with tool calling support (required for Claude Code's agentic features):

terminal

# Top pick — 30B MoE, only 3B active params, runs on 16GB RAM
ollama pull glm-4.7-flash

# Alternative — strong coding model
ollama pull qwen3-coder

# Budget option for 8GB machines
ollama pull devstral-small-2

Step 3: Install Claude Code

macOS/Linux:

terminal

curl -fsSL https://claude.ai/install.sh | bash

Windows:

terminal

irm https://claude.ai/install.ps1 | iex

Step 4: Connect Claude Code to Ollama

Fastest method — one command:

terminal

ollama launch claude

This automatically sets ANTHROPIC_AUTH_TOKEN, ANTHROPIC_BASE_URL, and launches Claude Code pointed at your local Ollama instance. Select your model from the list and hit Enter.

Manual method — explicit environment variables:

terminal

export ANTHROPIC_AUTH_TOKEN=ollama
export ANTHROPIC_BASE_URL=http://localhost:11434

# Launch Claude Code
claude

Or inline without modifying your shell profile:

terminal

ANTHROPIC_AUTH_TOKEN=ollama ANTHROPIC_BASE_URL=http://localhost:11434 claude

Step 5: Verify Connection

Once Claude Code launches, try a simple command:

terminal

> Read the current directory and list all files

If the model reads files and responds with actual file listings (not just describing what it would do), tool calling is working correctly.

Setup with LM Studio

LM Studio provides a graphical interface for managing local models:

Download LM Studio from lmstudio.ai
Search and download GLM-4.7-Flash or Qwen3-Coder
Go to Local Server tab → Start Server (default port: 1234)
Configure Claude Code:

terminal

export ANTHROPIC_AUTH_TOKEN=lm-studio
export ANTHROPIC_BASE_URL=http://localhost:1234
claude

Best Local Models for Claude Cowork

Model	Parameters	Context	Tool Calling	RAM/VRAM Needed	Best For
GLM-4.7-Flash ⭐	30B MoE (3B active)	128K	Yes (79.5%)	~6.5GB (Q4)	Best balance of speed + capability
Qwen3-Coder	30B	128K	Yes	~20GB (Q4)	Strong coding tasks
GPT-OSS:20B	20B	32K	Yes	~12GB (Q4)	Good general purpose
Devstral-Small-2	24B	128K	Yes	~16GB (Q4)	Code-focused tasks
Qwen2.5-Coder:32B	32B	128K	Limited	~24GB (Q4)	Complex coding (needs strong hardware)

Top recommendation: GLM-4.7-Flash. Its mixture-of-experts architecture means only 3B parameters activate per token despite being a 30B model. That translates to fast inference on modest hardware (16GB RAM) while maintaining 128K context and strong tool-calling abilities (79.5% on agent benchmarks).

Free Cloud Models via Ollama

Don't want to run inference locally? Ollama also proxies free cloud models with generous rate limits:

Model	Context	Speed	Cost
qwen3.5:cloud	128K+	30-60 tok/s	Free (rate limited)
glm-5:cloud	128K+	30-60 tok/s	Free (rate limited)
kimi-k2.5:cloud	128K+	30-60 tok/s	Free (rate limited)
qwen3-coder:480b-cloud	128K+	30-60 tok/s	Free (rate limited)

terminal

# Use free cloud model through Ollama
ollama launch claude --model qwen3.5:cloud

These models run on remote infrastructure but use the same Ollama interface. Your code still goes to external servers (not truly private), but it's free and significantly faster than local inference.

Full Comparison: Cloud Claude vs Local Models

Aspect	Cloud Claude (Sonnet/Opus)	Local Models (Ollama)	Ollama Cloud Models
Speed	60-80 tok/s	8-25 tok/s	30-60 tok/s
Code Quality	98% edit accuracy	70-80% edit accuracy	85-95% edit accuracy
Multi-file Reasoning	Excellent	Fair (degrades with complexity)	Good
Tool Calling	Always reliable	Model-dependent (GLM best)	Reliable
Monthly Cost	$20-200	$0 (electricity only)	$0
Privacy	Data sent to Anthropic	100% local	Data sent to model provider
Offline	No	Yes	No
Rate Limits	Yes (heavy Cowork tasks consume more)	None	Yes (generous)
Scheduled Tasks	Yes	No	No
Computer Use	Yes	No	No
Plugins	Full support	Limited	Limited
Context Window	200K+	32K-128K (model dependent)	128K+

Performance Benchmarks

Real-world numbers from published benchmarks comparing local and cloud inference:

Token Throughput

Setup	Tokens/sec	Notes
Claude API (Sonnet 4)	60-80	Anthropic's infrastructure
Ollama cloud model	30-60	Varies by model and load
RTX 4070 Ti Super (32B Q4)	15-25	$489 GPU, 16GB VRAM
M1 Max 64GB (GLM-4.7-Flash)	10-20	Apple Silicon unified memory
RTX 3060 12GB (GLM-4.7-Flash)	8-15	Budget GPU

Real-World Task Timing

Task	Cloud Claude	GLM-4.7 Local (M1 Max)	Difference
Simple file read + edit	~3 seconds	~15 seconds	5x slower
Multi-file refactoring	~1 minute	~12 minutes	12x slower
Full repo analysis	~1.2 minutes	~82 minutes	68x slower

Coding Quality Scores (50-task benchmark)

Task Type	GLM-4.7-Flash	Qwen3-Coder	Cloud Claude Sonnet
Function generation	3.9/5	4.1/5	4.4/5
Bug detection	3.5/5	3.8/5	4.6/5
Refactoring	3.7/5	4.0/5	4.3/5
Multi-file context	2.5/5	2.8/5	4.5/5
Code explanation	4.0/5	4.2/5	4.1/5

Cost Analysis

Option	Upfront	Monthly	6-Month Total	12-Month Total
Claude Pro Plan	$0	$20	$120	$240
Claude Max Plan	$0	$100-200	$600-1,200	$1,200-2,400
Local GPU (RTX 4070 Ti)	$489	$8-12 (electricity)	$537-561	$585-633
Local (Apple Silicon, existing Mac)	$0	$3-5 (electricity)	$18-30	$36-60
Ollama Cloud Models	$0	$0	$0	$0

Breakeven point: A heavy Claude Max user ($200/month) recoups GPU investment in just 2.5 months. Even Claude Pro users ($20/month) break even within 6-8 months if they already own capable hardware.

Limitations of Local Models

Be realistic about what local models cannot do:

Slower inference (3-68x): Simple tasks take 5x longer. Complex repo analysis can take 68x longer than cloud Claude.
Lower edit accuracy (70-80% vs 98%): Local models produce patches with wrong line numbers, bad whitespace, and mismatched context. Over a 50-edit session, you'll spend more time fixing broken patches than writing code.
Weaker multi-file reasoning: Cloud Claude excels at understanding relationships across large codebases. Local models degrade significantly with complexity.
Tool calling reliability: Not all models support tool calling. Without it, Claude Code becomes a plain text generator that describes actions instead of executing them.
No scheduled tasks: Recurring automated work only runs with cloud Cowork.
No Computer Use: Desktop control (clicking, typing in apps) requires cloud Claude.
No plugins: Most Cowork plugins require cloud infrastructure.
Context window limits: Local models typically max at 128K tokens vs 200K+ for cloud Claude.
Streaming tool calls require Ollama 0.14.3-rc1+: The stable release may not handle all tool-use scenarios correctly.

What's Possible with Local Models

Despite the limitations, local models unlock significant capabilities:

100% offline development: Write code on planes, in cafes without WiFi, or in restricted networks.
Complete data privacy: Proprietary code, PII, medical records, defense contracts — nothing leaves your machine.
GDPR/compliance: Eliminate cross-border data transfer concerns entirely. No DPAs needed.
Air-gapped environments: Defense, healthcare, and government organizations can use AI coding assistance without network access.
Unlimited usage: No rate limits, no monthly caps, no throttling during heavy use.
Custom fine-tuned models: Train models on your codebase for domain-specific assistance.
Hybrid workflows: Use local for sensitive work, cloud for complex tasks. Switch instantly.
Zero-cost experimentation: Try different models, approaches, and prompts without watching a billing meter.

Troubleshooting

"Connection refused" error

Ensure Ollama is running: ollama serve
Check port isn't blocked: curl http://localhost:11434/api/tags
Verify Ollama version: ollama --version (must be 0.14.0+)

Model just talks instead of acting

If Claude Code responds with "I would read the file..." instead of actually reading it, tool calling isn't working:

Switch to a model with confirmed tool support: GLM-4.7-Flash or any cloud model
Update Ollama to 0.14.3-rc1+ for streaming tool calls
Ensure ANTHROPIC_AUTH_TOKEN is set to ollama, not an actual API key

Slow generation (under 5 tok/s)

Drop to smaller quantization: Q4_K_M instead of Q6_K
Reduce context: ollama run glm-4.7-flash --num-ctx 32768
Switch to GLM-4.7-Flash if using a dense model (MoE = faster)
Consider Ollama cloud models: ollama launch claude --model qwen3.5:cloud

"Role model" request failures

Claude Code tries to use "haiku" for background tasks. Fix by setting the small model override in your Claude Code settings to use the same local model.

Frequently Asked Questions

Can I use Claude Cowork completely offline with local models?

Yes. Once you've pulled your model via Ollama, everything runs locally. No internet required for inference. However, some Cowork features (scheduled tasks, plugins, Computer Use) are cloud-only and won't work offline.

Is it really free?

Running local models through Ollama is completely free. No API keys, no billing, no subscription. Ollama's cloud models (like qwen3.5:cloud) are also free with generous rate limits. Your only cost for truly local inference is hardware and electricity.

What's the best model for Claude Code with Ollama?

GLM-4.7-Flash is the top recommendation: 128K context, native tool calling (79.5% benchmark), and it runs on 16GB RAM thanks to mixture-of-experts architecture. For Ollama cloud models, Qwen 3.5 and GLM-5 offer frontier-level quality at zero cost.

How much slower is local compared to cloud?

Expect 3-5x slower for simple tasks and up to 68x slower for complex multi-file analysis. The speed gap is the primary tradeoff. However, for many single-file tasks (code explanation, simple edits, documentation), the delay is tolerable (10-20 seconds vs 3-5 seconds).

Can I switch between local and cloud models?

Yes. Use local models for sensitive/private work and cloud Claude for complex tasks. You can switch by simply changing environment variables or using separate terminal profiles.

Does the quality match cloud Claude?

No. Local models score 85-90% of cloud Claude on single-file tasks but significantly worse on multi-file reasoning (50-60% of cloud quality). Edit accuracy drops from 98% to 70-80%, meaning more manual corrections needed.

The Bottom Line

Claude Cowork with local models is not a replacement for cloud Claude — it's a complement. The ideal workflow in 2026 looks like this:

Local models → sensitive codebases, unlimited experimentation, offline work, privacy-first environments
Ollama cloud models → free, faster than local, good quality, acceptable for non-sensitive work
Cloud Claude → complex multi-file reasoning, scheduled automation, Computer Use, maximum quality

The setup takes 5 minutes. The cost is zero. If you have a Mac with 16GB+ RAM or a GPU with 12GB+ VRAM, there's no reason not to try it. Start with ollama pull glm-4.7-flash and ollama launch claude — you'll be coding with a local AI agent in under a minute.

For more AI coding tools, explore our Claude Opus 4.6 review and our free AI Image Generator.

Tags:#claude cowork#local models#ollama#claude code#ai coding#free ai tools#offline ai#claude desktop

Tweet Share

All Articles

PromptSpace Team

Curating the best AI prompts, guides, and tools since 2025.

What Is Claude Cowork?

Key Capabilities

Why Use Local Models with Claude Cowork?

Prerequisites & Hardware Requirements

Software Requirements

Hardware Requirements

Step-by-Step Setup: Ollama + Claude Code

Step 1: Install Ollama

Step 2: Pull a Local Model

Step 3: Install Claude Code

Step 4: Connect Claude Code to Ollama

Step 5: Verify Connection

Setup with LM Studio

Best Local Models for Claude Cowork

Free Cloud Models via Ollama

Full Comparison: Cloud Claude vs Local Models

Performance Benchmarks

Token Throughput

Real-World Task Timing

Coding Quality Scores (50-task benchmark)

Cost Analysis

Limitations of Local Models

What's Possible with Local Models

Troubleshooting

"Connection refused" error

Model just talks instead of acting

Slow generation (under 5 tok/s)

"Role model" request failures

Frequently Asked Questions

Can I use Claude Cowork completely offline with local models?

Is it really free?

What's the best model for Claude Code with Ollama?

How much slower is local compared to cloud?

Can I switch between local and cloud models?

Does the quality match cloud Claude?

The Bottom Line

Ready to Create Stunning AI Art?