module 13 tokens performance

What Are Tokens

System Text-to-Speech Ready
Slide: 0:00 / 0:00
Slide 1 of 0Interactive Deck

Full Lesson Reference

What are tokens?

Tokens are the units Claude Code uses to process text. Every word you type, every response Claude gives, every file it reads - all measured in tokens. Understanding tokens is the difference between fast cheap sessions and slow expensive ones.

The basics

  • 1 token ≈ 0.75 words in English (approximately)
  • A 1,000-word document ≈ 1,300 tokens
  • A typical prompt + response round ≈ 500-2,000 tokens
  • A full session can use 100,000+ tokens
  • Long sessions with lots of file reads = 500,000+ tokens

Claude's context window has a limit measured in tokens. Bigger context = more tokens = more cost + slower responses.

The compounding problem

Every message you send resends the ENTIRE conversation history. This is the single most important thing to understand about tokens.

Your first message costs ~20K tokens (with CLAUDE.md, skills, MCPs, your prompt). Your 5th message costs ~80K tokens (all 4 previous + new). Your 15th message costs ~250K tokens.

Extra token overhead compounds. 5K extra tokens per message × 15 messages = 75K extra tokens wasted.

What uses tokens before you type

You haven't typed a single prompt and Claude has already loaded 15K-30K tokens:

  • System prompt + built-in tools - ~10K (can't change)
  • CLAUDE.md files - ~1-3K (keep lean, Module 02)
  • Skill metadata - up to 16K (the hard budget, Module 09)
  • MCP schemas - deferred loading now, but still some overhead

Your lean baseline is ~15K. A bloated baseline is 50K+. Gap = huge cost over a week.

What uses tokens as you work

  • Every prompt + response
  • Every file Claude reads
  • Every MCP tool call (schema loads when invoked)
  • Every skill you invoke (full skill body loads)
  • Every data query result
  • Every command output

Watch a single session: from 20K to 200K tokens over 15-20 messages is normal. Watch for how fast context fills - that's your usage rate.

The efficiency hierarchy (repeat from Module 08)

Most efficient to least

  1. Database queries - pre-aggregated, one call, minimal tokens
  2. CLIs - zero idle overhead, lean per call
  3. MCPs with deferred loading - small idle, moderate per call
  4. Large file reads - one big load hits context hard
  5. Web scraping + raw HTTP - most expensive + fragile

Practical impact

Token bloat doesn't usually break Claude. It just

  • Makes sessions feel slower
  • Costs you more if you're paying per token
  • Fills context faster - you hit the 50% rule sooner
  • Reduces how much actual work fits in one session

Keep overhead lean = longer productive sessions.

Power-user tips

  • Install ccstatusline - shows live context % + model + cost at the bottom of your terminal. Full walkthrough in Lesson 3.
  • Ask Claude about overhead - "how much context have I used in this session?"
  • Prefer DB queries over MCP calls for data you query often
  • Read parts of files, not whole files - "read rows 1-100" not "read this 10MB CSV"

Action items

☐ Understand: tokens are the currency of every session

☐ Remember: every message resends full history - cost compounds

☐ Know your baseline - what loads before you type

☐ Prefer databases > CLIs > MCPs > file reads for efficiency

Next lesson: Trimming your token overhead.

Exercises

  1. Review the concepts covered in this lesson: What Are Tokens.
  2. Write down your key takeaway from this lesson.
  3. Practice running any commands or prompts mentioned above inside your terminal.