module 13 tokens performance

Cut your token usage

System Text-to-Speech Ready
Slide: 0:00 / 0:00
Slide 1 of 0Interactive Deck

Full Lesson Reference

Three levers cut your token usage: the model you run, the tools that strip verbosity, and the habits you build. Most people only touch one. Stack all three and you save 60-80% of your daily token spend with no quality hit on deliverables.

This lesson covers all three in order of impact.

Lever 1: Pick the right model (Opus vs Sonnet)

The biggest single optimisation in Claude Code - and most people never touch it. Opus is the smartest + most expensive per token. Sonnet is faster + roughly 5x cheaper. Both are excellent.

Default habit is to run Opus on everything. Flipping that - Sonnet for execution, Opus only when the decision is hard - saves 40-60% of daily tokens before you install a single tool.

Switch mid-session

terminal
/model

Claude shows the available models. Pick one. Switch takes effect from the next message.

Use Opus (latest) for

  • Complex multi-step reasoning - strategy decisions, full-project planning

  • Heavy analysis - data interpretation, trend calls, attribution arguments

  • Writing that has to land - launch emails, landing pages, sales copy

  • Anything client-facing where a flat answer would lose the deal

  • First draft of a new skill or brand voice guide Use Sonnet (latest) for

  • File moves, renames, folder reorganisation

  • Git commits, pushes, merge conflict resolution

  • Pulling data from MCPs - queries are structured, not creative

  • Formatting + converting files - CSV to JSON, markdown to HTML

  • Simple fix es + typos inside an already-drafted deliverable

  • Following an already-written plan step-by-step

  • Installing MCPs, running audits, checking statuses

The workflow

Start: Opus (brainstorm, plan, strategy)

Execute: Sonnet (follow the plan)

Stuck: Opus (break the blocker)

Finish: Sonnet (commit, push, wrapup)

No install needed - /model is built into Claude Code.

Lever 2: RTK (Rust Token Killer)

RTK intercepts Claude Code's terminal output and strips verbosity before it counts against context. Progress messages, redundant confirmations, repeated headers - all stripped out.

How it works

RTK sits between Claude Code and your terminal as a hook that runs automatically on every command output. You don't change how you work - each round-trip just costs less.

What gets stripped

  • Redundant progress bars ("Downloading... [10%]... [20%]...")
  • Verbose tool confirmations that repeat what you already know
  • Duplicate context in nested tool outputs
  • Unnecessary metadata in API responses

Technical substance stays untouched - only the noise dies.

Install

Install RTK from rtk-ai.app and set it up for this terminal.

Claude handles the install + configur ation. Verify

rtk --version rtk gain # cumulative token savings

Typical users see 20-40% reduction in input tokens over time - and no workflow change to get it.

Lever 3: Caveman (compressed Claude responses)

Caveman is a Claude Code plugin that switches Claude's output into a compressed mode. Terse responses, no filler words, fragments over full sentences. All technical accuracy stays - just fewer tokens to say it.

Before caveman

Certainly! I'd be happy to help you with that. The issue you're experiencing is likely caused by an outdated cache. Here's what I'd recommend doing to fix it:

After caveman

Outdated cache. Fix

Same information. 90% fewer tokens.

Install + toggle

Install the Caveman plugin from JuliusBrussee/caveman.

Then in any session

/caveman # toggle on /caveman full # full compression (default) /caveman lite # lighter, still readable /caveman ultra # maximum compression /caveman off # back to verbose

Your prompts stay normal English - only Claude's output compresses.

Typical savings by session type

  • Code-heavy sessions - 20-30% (code can't compress much)
  • Analysis / report sessions - 40-50%
  • Conversation-heavy sessions - 50% or more

When Caveman isn't right

  • You're in a learning phase and need Claude's explanations verbose
  • You're working with a team who'll read the transcript - terse responses can feel curt
  • You need Claude to explicitly walk you through reasoning (Module 09 thinking skills)
  • You're generating client-facing output - though note Caveman only affects chat, not the output files

Toggle off when you need verbose. On again when you're back to execution.

Stack all three for the biggest savings

The three levers compound

  • Model - cheaper tokens per message
  • RTK - less terminal output per tool call
  • Caveman - shorter Claude responses

Full-stack execution mode = Sonnet + RTK + /caveman full. Use it for any session that's mostly follow-the-plan work. You still get technical accuracy, just with tight responses from a cheaper model with stripped terminal noise.

Switch to Opus (and optionally /caveman off) when you hit a judgement call, then switch back once the plan is clear again.

Critical: protect your memory layer

RTK and Caveman will silently corrupt your memory if they touch the /startup and /wrapup skills:

  • RTK truncates the Supabase responses /startup reads to load context - you'd start sessions with partial history
  • Caveman compresses the session summary /wrapup writes to memory - future sessions would load a stripped-down version of what happened

One-time fix. Tell Claude

Update my /startup and /wrapup skills so they explicitly bypass RTK compression (use rtk proxy on any curl calls) and turn Caveman off at the start of the skill then restore the previous mode at the end. Memory load + save must be full-fidelity, not compressed.

Do this immediately after installing RTK or Caveman. Never corrupts memory again.

Model switching (Opus/Sonnet) doesn't affect memory - swap freely.

Habits that cut tokens further

Once the three main levers are in place, these habits tighten things further.

Read slices of files, not whole files

Bad: Read this 20-page PDF and summarise. Good: Read pages 3-5 of this PDF (executive summary + conclusions). Summarise. Files over 10-20K tokens hurt context fast. Scope your reads.

Use a database over repeated MCP pulls

If you query the same data often - last 30 days of Google Ads for a weekly report - put it in a database once and query the database in sessions. Module 04 covers this for memory; the same pattern works for ad platform data.

Summarise + restart mid-session

If you've been going for a while and the conversation has drifted:

Summarise what we've discussed in one paragraph. I want to start fresh from this summary.

Then /wrapup + /startup. Your new session picks up from the summary without carrying the back-and-forth tokens.

One focused task per session

Jumping between 5 things in one session means context fills with irrelevant stuff. One task per session keeps context focused - and makes the /wrapup memory record sharper for next time.

Power-user tips

  • Default Sonnet, escalate to Opus - reverse the habit most people have. Sonnet handles most execution. Reach for Opus when the decision is hard, not as a comfort blanket.
  • Install RTK in week 1 - free savings, no workflow change
  • Install Caveman in week 2-3 - once you're comfortable, turn on compressed mode for productivity sessions
  • Toggle Caveman off for learning - when you need verbose explanations
  • Check rtk gain monthly - see cumulative savings, motivation to keep it running
  • Scope file reads + data pulls - specific slices, not whole things

Action items

☐ Practice /model switching - default Sonnet for execution, Opus for planning + heavy judgement

☐ Install RTK (tell Claude "install RTK from rtk-ai.app")

☐ Install Caveman (tell Claude "install the Caveman plugin from JuliusBrussee/caveman")

☐ Update /startup + /wrapup to bypass RTK + Caveman compression (protects memory fidelity)

☐ Try the full stack - Sonnet + RTK + /caveman full - in your next execution session

☐ Toggle Caveman off when you need verbose explanations

☐ Check rtk gain at the end of the week to see savings

Next lesson: Running multiple sessions + RAM.

Exercises

  1. Review the concepts covered in this lesson: Cut your token usage.
  2. Write down your key takeaway from this lesson.
  3. Practice running any commands or prompts mentioned above inside your terminal.