Skip to main content
PROMPT SPACE
Guides
7 min readUpdated April 15, 2026

How to Reduce Claude Code Token Usage — Skills That Cut Costs (2026)

Claude Code burns through tokens fast. These skills and techniques cut token usage by up to 65% without sacrificing output quality. Save money, code more.

Claude Code charges per token. A verbose agent that loves to explain itself burns through your usage limits fast. "Certainly! I'd be happy to help you refactor that function. Let me walk you through the changes step by step..." — that preamble costs money and adds nothing.
Token optimization is becoming the most practical skill in the Claude Code ecosystem. Here's what actually works.
> Quick Answer: The Caveman skill reduces tokens by making Claude communicate in terse, direct language, eliminating filler and unnecessary explanations while retaining the same information and saving approximately 65% of tokens.

How does the Caveman skill reduce tokens?

The most talked-about token optimization skill right now is Caveman. It makes Claude communicate in terse, direct language. No filler, no pleasantries, no step-by-step explanations you didn't ask for.
The before and after difference is dramatic:
Without Caveman: "I've successfully completed the refactoring of the authentication module. The changes include updating the token validation logic to handle edge cases more gracefully, adding appropriate error handling, and ensuring backwards compatibility with the existing API contracts."
With Caveman: "Auth module refactored. Token validation handles edge cases. Error handling added. Backwards compatible."
Same information. Roughly 65% fewer tokens. Over a full work session, this adds up to significant savings — both in cost and in context window usage.
The concept is simple: a SKILL.md that tells Claude to be concise. You can write your own in 10 lines:
```markdown --- name: concise-output description: Reduces token usage by producing concise responses. Always active. ---
# Output Rules
- No filler phrases. No "certainly", "I'd be happy to", "let me explain" - No step-by-step narration unless explicitly asked - Code changes: show the diff, not the explanation - One sentence summaries, not paragraphs - Skip confirmations. Just do the work. ```
Drop this in `~/.claude/skills/concise-output/SKILL.md` and your token usage drops immediately.

How do I manage Claude Code's context window?

Tokens aren't just about cost — they're about context window limits. Claude Code has a finite context window, and every token of fluff pushes out useful context. When your context window fills up, Claude loses track of earlier parts of the conversation.
Skills that reduce output verbosity keep more room for actual code context.
The /clear command. When Claude Code tells you "X tokens used," check if you're approaching limits. Use `/clear` to reset or let compaction handle it. Claude Code now shows a hint when you should clear.
Incremental requests. Instead of "refactor the entire auth module," say "refactor the login function in auth.ts." Smaller scope means less context needed, fewer tokens consumed, and more focused output.
The /recap command. New in April 2026. When you return to a session, `/recap` gives you a summary of where you left off without replaying the entire conversation. This saves tokens on session resumption.

Which skills reduce token usage indirectly?

Some skills reduce token usage not through output formatting, but by getting things right the first time:
Code review skills. A skill that catches bugs before you commit means fewer "fix the bug I just introduced" conversations. Each bug-fix round trip consumes tokens.
Testing skills. Tests that work on the first generation don't need "the test fails, fix it" follow-ups. A testing skill that knows your framework prevents false starts.
Architecture skills. A skill that knows your project conventions prevents Claude from generating code in the wrong pattern, which you then have to ask it to redo.
Browse skills that improve first-pass accuracy at Agensi.

How does the effort frontmatter reduce tokens?

Claude Code recently added `effort` frontmatter support for skills. You can set the model's effort level when a skill is invoked:
```markdown --- name: quick-review description: Fast code review for small changes effort: low --- ```
Lower effort means fewer tokens spent on reasoning. For routine tasks like formatting, linting, or simple reviews, `effort: low` can cut token usage substantially without noticeable quality loss.
Use `effort: high` only for complex tasks where deep reasoning matters — architecture decisions, security audits, complex refactoring.

What are practical token budget strategies?

Set a daily target. Track your usage for a week, then set a target 30% lower. Install a concise output skill and measure the difference.
Batch related tasks. Instead of five separate conversations about five endpoints, handle them in one session where Claude already has the context loaded. Context reuse saves input tokens.
Install only what you need. Skills loaded into `~/.claude/skills/` are read by Claude Code at session start. Keep the directory focused — install via Agensi's one-liner curl command only the skills you actively use, and remove ones you no longer need. Fewer loaded skills means lower input token cost per session.
Be specific in prompts. "Fix the bug" forces Claude to search your codebase and explain what it found. "In src/routes/users.ts line 42, the null check is wrong" gets straight to the fix. Fewer exploration tokens, more action tokens.

How do I monitor Claude Code token usage?

Claude Code now shows rate limit usage in the status line. Check your 5-hour and 7-day windows to understand your consumption patterns. If you consistently hit limits in the afternoon, your morning sessions might be too verbose.
The `/doctor` command also shows diagnostic information about your setup, including whether prompt caching is enabled. Prompt caching (via `ENABLE_PROMPT_CACHING_1H`) can dramatically reduce input token costs for long sessions by caching repeated context.
---
*Find skills that improve output quality and reduce rework at Agensi.*
Tags:#claude code#tokens#costs#optimization#caveman#skill.md
S

Creator of PromptSpace · AI Researcher & Prompt Engineer

Building the largest free AI prompt library with 4,000+ prompts. Covering AI image generation, prompt engineering, and tool comparisons since 2024. 159+ articles published.

Explore More Articles

Free AI Prompts

Ready to Create Stunning AI Art?

Browse 4,000+ free, tested prompts for Midjourney, ChatGPT, Gemini, DALL-E & more. Copy, paste, create.