Skip to main content
PROMPT SPACE
AI Tools
11 min read

How to Use Claude Cowork with Local Models: Complete Setup Guide (2026)

Run Claude Cowork with local models via Ollama for free. Step-by-step setup, best models, performance benchmarks, limitations, and full comparison chart. No API key needed.

How to Use Claude Cowork with Local Models: Complete Setup Guide (2026)

Want to run Claude Cowork with local models — completely free, offline, and private? Since January 2026, Ollama v0.14 ships native Anthropic Messages API compatibility, which means Claude's agentic desktop tool can now talk directly to open-source models running on your hardware. No API key. No subscription. No data leaving your machine.

This guide covers everything: installation, configuration, model selection, performance benchmarks, limitations, and a full comparison chart between cloud Claude and local inference. Whether you're a developer concerned about privacy or someone who wants unlimited Claude Cowork usage at zero cost — this is the definitive setup guide for 2026.

💡
Already using AI coding tools? Check our Cursor vs Windsurf vs Claude Code comparison to see how Cowork fits into the broader landscape.

What Is Claude Cowork?

Claude Cowork is Anthropic's agentic desktop tool that brings Claude Code's capabilities to Claude Desktop for knowledge work beyond coding. Instead of responding to prompts one at a time, Claude can take on complex, multi-step tasks and execute them on your behalf — formatting documents, organizing files, synthesizing research, and automating workflows.

Key Capabilities

  • Multi-step task execution: Describe an outcome, step away, and come back to finished work
  • File system access: Read, write, and organize files on your computer
  • Scheduled tasks: Automate recurring work (cloud-only feature)
  • Projects: Persistent workspaces with their own files, links, instructions, and memory
  • Plugins: Extend functionality with skills, connectors, and sub-agents
  • Computer Use: Control desktop apps by seeing, clicking, and typing

Cowork runs directly on your computer in an isolated VM, giving Claude access to files you choose to share. Code executes safely in sandboxed environments while Claude makes real changes to your files.

Why Use Local Models with Claude Cowork?

Running Claude Cowork against cloud APIs costs money and sends your data to external servers. Here's why local models change the equation:

FactorCloud ClaudeLocal Models
Cost$20-200/month (Pro/Max plans)$0 after hardware
PrivacyData sent to Anthropic serversEverything stays on your machine
Rate LimitsUsage caps, especially heavy Cowork tasksUnlimited — run as much as you want
OfflineRequires internetWorks completely offline
Data ResidencyCross-border transfer concernsFull GDPR/compliance control
Speed60-80 tokens/sec8-25 tokens/sec (hardware dependent)

The tradeoff is clear: local models trade speed for privacy, cost savings, and unlimited usage. For many workflows — especially those involving sensitive code, proprietary documents, or air-gapped environments — that tradeoff makes perfect sense.

Prerequisites & Hardware Requirements

Before setting up local models with Claude Cowork, ensure your system meets these requirements:

Software Requirements

  • Ollama v0.14.0+ (required for Anthropic Messages API compatibility)
  • Claude Code CLI installed via curl -fsSL https://claude.ai/install.sh | bash
  • macOS 13+, Windows 10+, or Linux (Ubuntu 20.04+ recommended)

Hardware Requirements

TierHardwareBest ModelExperience
Minimum Viable16GB RAM (M1/M2) or RTX 3060 12GBGLM-4.7-Flash (Q4)Usable for single-file tasks. Slower on complex operations.
Recommended32GB RAM (M1 Pro/Max) or RTX 4070 Ti 16GBQwen3-Coder 30B (Q4)Solid for most coding workflows. Multi-file works but slower.
Ideal64GB+ RAM (M2/M3/M4 Max) or RTX 4090 24GBQwen2.5-Coder-32B (Q6)Best local experience. Higher quantization, faster throughput.

Step-by-Step Setup: Ollama + Claude Code

Step 1: Install Ollama

macOS (Homebrew):

terminal
brew install ollama

Linux:

terminal
curl -fsSL https://ollama.com/install.sh | sh

Windows: Download from ollama.com

Verify installation:

terminal
ollama --version
# Must be v0.14.0 or later

Step 2: Pull a Local Model

Choose a model with tool calling support (required for Claude Code's agentic features):

terminal
# Top pick — 30B MoE, only 3B active params, runs on 16GB RAM
ollama pull glm-4.7-flash

# Alternative — strong coding model
ollama pull qwen3-coder

# Budget option for 8GB machines
ollama pull devstral-small-2

Step 3: Install Claude Code

macOS/Linux:

terminal
curl -fsSL https://claude.ai/install.sh | bash

Windows:

terminal
irm https://claude.ai/install.ps1 | iex

Step 4: Connect Claude Code to Ollama

Fastest method — one command:

terminal
ollama launch claude

This automatically sets ANTHROPIC_AUTH_TOKEN, ANTHROPIC_BASE_URL, and launches Claude Code pointed at your local Ollama instance. Select your model from the list and hit Enter.

Manual method — explicit environment variables:

terminal
export ANTHROPIC_AUTH_TOKEN=ollama
export ANTHROPIC_BASE_URL=http://localhost:11434

# Launch Claude Code
claude

Or inline without modifying your shell profile:

terminal
ANTHROPIC_AUTH_TOKEN=ollama ANTHROPIC_BASE_URL=http://localhost:11434 claude

Step 5: Verify Connection

Once Claude Code launches, try a simple command:

terminal
> Read the current directory and list all files

If the model reads files and responds with actual file listings (not just describing what it would do), tool calling is working correctly.

Setup with LM Studio

LM Studio provides a graphical interface for managing local models:

  1. Download LM Studio from lmstudio.ai
  2. Search and download GLM-4.7-Flash or Qwen3-Coder
  3. Go to Local Server tab → Start Server (default port: 1234)
  4. Configure Claude Code:
terminal
export ANTHROPIC_AUTH_TOKEN=lm-studio
export ANTHROPIC_BASE_URL=http://localhost:1234
claude

Best Local Models for Claude Cowork

ModelParametersContextTool CallingRAM/VRAM NeededBest For
GLM-4.7-Flash ⭐30B MoE (3B active)128KYes (79.5%)~6.5GB (Q4)Best balance of speed + capability
Qwen3-Coder30B128KYes~20GB (Q4)Strong coding tasks
GPT-OSS:20B20B32KYes~12GB (Q4)Good general purpose
Devstral-Small-224B128KYes~16GB (Q4)Code-focused tasks
Qwen2.5-Coder:32B32B128KLimited~24GB (Q4)Complex coding (needs strong hardware)

Top recommendation: GLM-4.7-Flash. Its mixture-of-experts architecture means only 3B parameters activate per token despite being a 30B model. That translates to fast inference on modest hardware (16GB RAM) while maintaining 128K context and strong tool-calling abilities (79.5% on agent benchmarks).

Free Cloud Models via Ollama

Don't want to run inference locally? Ollama also proxies free cloud models with generous rate limits:

ModelContextSpeedCost
qwen3.5:cloud128K+30-60 tok/sFree (rate limited)
glm-5:cloud128K+30-60 tok/sFree (rate limited)
kimi-k2.5:cloud128K+30-60 tok/sFree (rate limited)
qwen3-coder:480b-cloud128K+30-60 tok/sFree (rate limited)
terminal
# Use free cloud model through Ollama
ollama launch claude --model qwen3.5:cloud

These models run on remote infrastructure but use the same Ollama interface. Your code still goes to external servers (not truly private), but it's free and significantly faster than local inference.

Full Comparison: Cloud Claude vs Local Models

AspectCloud Claude (Sonnet/Opus)Local Models (Ollama)Ollama Cloud Models
Speed60-80 tok/s8-25 tok/s30-60 tok/s
Code Quality98% edit accuracy70-80% edit accuracy85-95% edit accuracy
Multi-file ReasoningExcellentFair (degrades with complexity)Good
Tool CallingAlways reliableModel-dependent (GLM best)Reliable
Monthly Cost$20-200$0 (electricity only)$0
PrivacyData sent to Anthropic100% localData sent to model provider
OfflineNoYesNo
Rate LimitsYes (heavy Cowork tasks consume more)NoneYes (generous)
Scheduled TasksYesNoNo
Computer UseYesNoNo
PluginsFull supportLimitedLimited
Context Window200K+32K-128K (model dependent)128K+

Performance Benchmarks

Real-world numbers from published benchmarks comparing local and cloud inference:

Token Throughput

SetupTokens/secNotes
Claude API (Sonnet 4)60-80Anthropic's infrastructure
Ollama cloud model30-60Varies by model and load
RTX 4070 Ti Super (32B Q4)15-25$489 GPU, 16GB VRAM
M1 Max 64GB (GLM-4.7-Flash)10-20Apple Silicon unified memory
RTX 3060 12GB (GLM-4.7-Flash)8-15Budget GPU

Real-World Task Timing

TaskCloud ClaudeGLM-4.7 Local (M1 Max)Difference
Simple file read + edit~3 seconds~15 seconds5x slower
Multi-file refactoring~1 minute~12 minutes12x slower
Full repo analysis~1.2 minutes~82 minutes68x slower

Coding Quality Scores (50-task benchmark)

Task TypeGLM-4.7-FlashQwen3-CoderCloud Claude Sonnet
Function generation3.9/54.1/54.4/5
Bug detection3.5/53.8/54.6/5
Refactoring3.7/54.0/54.3/5
Multi-file context2.5/52.8/54.5/5
Code explanation4.0/54.2/54.1/5

Cost Analysis

OptionUpfrontMonthly6-Month Total12-Month Total
Claude Pro Plan$0$20$120$240
Claude Max Plan$0$100-200$600-1,200$1,200-2,400
Local GPU (RTX 4070 Ti)$489$8-12 (electricity)$537-561$585-633
Local (Apple Silicon, existing Mac)$0$3-5 (electricity)$18-30$36-60
Ollama Cloud Models$0$0$0$0

Breakeven point: A heavy Claude Max user ($200/month) recoups GPU investment in just 2.5 months. Even Claude Pro users ($20/month) break even within 6-8 months if they already own capable hardware.

Limitations of Local Models

Be realistic about what local models cannot do:

  • Slower inference (3-68x): Simple tasks take 5x longer. Complex repo analysis can take 68x longer than cloud Claude.
  • Lower edit accuracy (70-80% vs 98%): Local models produce patches with wrong line numbers, bad whitespace, and mismatched context. Over a 50-edit session, you'll spend more time fixing broken patches than writing code.
  • Weaker multi-file reasoning: Cloud Claude excels at understanding relationships across large codebases. Local models degrade significantly with complexity.
  • Tool calling reliability: Not all models support tool calling. Without it, Claude Code becomes a plain text generator that describes actions instead of executing them.
  • No scheduled tasks: Recurring automated work only runs with cloud Cowork.
  • No Computer Use: Desktop control (clicking, typing in apps) requires cloud Claude.
  • No plugins: Most Cowork plugins require cloud infrastructure.
  • Context window limits: Local models typically max at 128K tokens vs 200K+ for cloud Claude.
  • Streaming tool calls require Ollama 0.14.3-rc1+: The stable release may not handle all tool-use scenarios correctly.

What's Possible with Local Models

Despite the limitations, local models unlock significant capabilities:

  • 100% offline development: Write code on planes, in cafes without WiFi, or in restricted networks.
  • Complete data privacy: Proprietary code, PII, medical records, defense contracts — nothing leaves your machine.
  • GDPR/compliance: Eliminate cross-border data transfer concerns entirely. No DPAs needed.
  • Air-gapped environments: Defense, healthcare, and government organizations can use AI coding assistance without network access.
  • Unlimited usage: No rate limits, no monthly caps, no throttling during heavy use.
  • Custom fine-tuned models: Train models on your codebase for domain-specific assistance.
  • Hybrid workflows: Use local for sensitive work, cloud for complex tasks. Switch instantly.
  • Zero-cost experimentation: Try different models, approaches, and prompts without watching a billing meter.

Troubleshooting

"Connection refused" error

  • Ensure Ollama is running: ollama serve
  • Check port isn't blocked: curl http://localhost:11434/api/tags
  • Verify Ollama version: ollama --version (must be 0.14.0+)

Model just talks instead of acting

If Claude Code responds with "I would read the file..." instead of actually reading it, tool calling isn't working:

  • Switch to a model with confirmed tool support: GLM-4.7-Flash or any cloud model
  • Update Ollama to 0.14.3-rc1+ for streaming tool calls
  • Ensure ANTHROPIC_AUTH_TOKEN is set to ollama, not an actual API key

Slow generation (under 5 tok/s)

  • Drop to smaller quantization: Q4_K_M instead of Q6_K
  • Reduce context: ollama run glm-4.7-flash --num-ctx 32768
  • Switch to GLM-4.7-Flash if using a dense model (MoE = faster)
  • Consider Ollama cloud models: ollama launch claude --model qwen3.5:cloud

"Role model" request failures

Claude Code tries to use "haiku" for background tasks. Fix by setting the small model override in your Claude Code settings to use the same local model.

Frequently Asked Questions

Can I use Claude Cowork completely offline with local models?

Yes. Once you've pulled your model via Ollama, everything runs locally. No internet required for inference. However, some Cowork features (scheduled tasks, plugins, Computer Use) are cloud-only and won't work offline.

Is it really free?

Running local models through Ollama is completely free. No API keys, no billing, no subscription. Ollama's cloud models (like qwen3.5:cloud) are also free with generous rate limits. Your only cost for truly local inference is hardware and electricity.

What's the best model for Claude Code with Ollama?

GLM-4.7-Flash is the top recommendation: 128K context, native tool calling (79.5% benchmark), and it runs on 16GB RAM thanks to mixture-of-experts architecture. For Ollama cloud models, Qwen 3.5 and GLM-5 offer frontier-level quality at zero cost.

How much slower is local compared to cloud?

Expect 3-5x slower for simple tasks and up to 68x slower for complex multi-file analysis. The speed gap is the primary tradeoff. However, for many single-file tasks (code explanation, simple edits, documentation), the delay is tolerable (10-20 seconds vs 3-5 seconds).

Can I switch between local and cloud models?

Yes. Use local models for sensitive/private work and cloud Claude for complex tasks. You can switch by simply changing environment variables or using separate terminal profiles.

Does the quality match cloud Claude?

No. Local models score 85-90% of cloud Claude on single-file tasks but significantly worse on multi-file reasoning (50-60% of cloud quality). Edit accuracy drops from 98% to 70-80%, meaning more manual corrections needed.

The Bottom Line

Claude Cowork with local models is not a replacement for cloud Claude — it's a complement. The ideal workflow in 2026 looks like this:

  • Local models → sensitive codebases, unlimited experimentation, offline work, privacy-first environments
  • Ollama cloud models → free, faster than local, good quality, acceptable for non-sensitive work
  • Cloud Claude → complex multi-file reasoning, scheduled automation, Computer Use, maximum quality

The setup takes 5 minutes. The cost is zero. If you have a Mac with 16GB+ RAM or a GPU with 12GB+ VRAM, there's no reason not to try it. Start with ollama pull glm-4.7-flash and ollama launch claude — you'll be coding with a local AI agent in under a minute.

For more AI coding tools, explore our Claude Opus 4.6 review and our free AI Image Generator.

Tags:#claude cowork#local models#ollama#claude code#ai coding#free ai tools#offline ai#claude desktop
P

Curating the best AI prompts, guides, and tools since 2025.

Free AI Prompts

Ready to Create Stunning AI Art?

Browse 4,000+ free, tested prompts for Midjourney, ChatGPT, Gemini, DALL-E & more. Copy, paste, create.