module 10 build routines

Monitor + debug runs

System Text-to-Speech Ready

Slide: 0:00 / 0:00

Slide 1 of 0Interactive Deck

Full Lesson Reference

Routines fail quietly by default. They run at 3am, something goes wrong, nobody notices until Friday when the weekly report is missing. Monitoring is what turns an agent from a liability into an asset.

This lesson is the observability layer. Set it up once per Routine and you'll catch failures before they compound.

What "failed" actually looks like

Not every bad run throws an obvious error. The 5 failure modes:

Hard failure

The run errors out. MCP auth rejected, API rate-limited, context file missing. The Routine status is failed. Easiest to catch.

Silent success

The run completes. Status is success. Output is wrong or empty - wrong account, missing data, truncated report. Status lies.

Drift

The run works for weeks, then slowly degrades. Account structure changes, a campaign is renamed, a column moves in a sheet. Output gets worse each run without failing.

Cost blowout The run succeeds but burns 10x the expected tokens. Something in the prompt or context is bloating the context window. Bill surprises you at month-end.
Skipped runs

The Routine is paused, rate-limited, or the cron is wrong. It doesn't run at all. Nothing visible.

The 3-layer monitoring setup

Layer 1: Run history in the Routines dashboard

Tell Claude

Show me the last 10 runs of the daily-gads-healthcheck-demo Routine. Include status, duration, and token cost.

You get a table. Look for patterns - a run that's suddenly 3x longer, a status that went from success to failure, a cost spike.

Check this weekly for every Routine you own. 60 seconds of reading catches 80% of problems.

Layer 2: Output validation in the Routine itself

Bake validation into the prompt. Before marking itself complete, the agent checks its own output:

Before saving the report, verify

Row count for campaigns is between 5 and 50
Total spend is within 20% of the 7-day average
Every campaign has non-null CPA
The report covers the correct date range

If any check fails, write a fail summary to the errors folder and do NOT save the main report.

This catches silent successes. The agent refuses to ship bad output.

Layer 3: External notification on failure

For any Routine you rely on, add a "ping me if this breaks" step:

If the run fails for any reason, post a message to Slack channel [ID] with:

Routine name
Error message
Link to the run log
Last successful run timestamp

Now a hard failure puts a notification in front of you the same day, not 3 weeks later.

Debugging a failed run

When a run fails, Claude can pull the full log for you

Show me the full log of the last run of daily-gads-healthcheck-demo. Highlight the point it failed.

You get the prompt, the context loaded, the tool calls made, and the error. Read it top to bottom.

The 5 most common causes

Expired credential - OAuth token rotated, API key revoked, MCP disconnected. Fix: refresh the secret on the Routine.
Missing context file - a file you attached was deleted or renamed. Fix: reattach.
Rate limit - the Routine fir es at the same moment as 10 other Routines. Fix: stagger the cron (3:15, 3:30, 3:45 instead of 3:00 on the dot).
Prompt ambiguity - the prompt works on Mondays and breaks on Saturdays because of a weekday assumption. Fix: test the prompt across edge cases.
Upstream change - the platform renamed a field, the sheet structure changed, a column moved. Fix: update the prompt to match the new structure.

When to pause vs fix vs delete

Pause when

The Routine is fine but the target account is down, the template is being redesigned, or you're on leave. Pausing keeps everything ready to resume.

Fix when

The Routine works 9 out of 10 runs. One intermittent failure doesn't warrant a rebuild

identify the cause, patch the prompt, move on.

Delete when

You haven't read the output in 2 months. The Routine has outlived its purpose. Cheaper to rebuild later than to leave dead agents running.

The weekly Routine review

Add a 10-minute block to your calendar every Friday

List all active Routines
For each: last run status, last successful output, token cost trend
Pause anything that's failed 3+ times in a row
Delete anything you haven't read the output from in 4+ weeks
Note any cost trends worth investigating

Tell Claude to generate this review for you and paste it into the review block.

Power-user tips

Log every run to Supabase - routine_id, started_at, ended_at, status, token_cost, output_hash. Gives you forever history beyond what the dashboard keeps.
Use a single "routine-health" Slack channel - every failure across every Routine posts there. One place to check, not 15.
Version control the prompts - keep Routine prompts in GitHub as .md files. When you update, diff the change before pushing live.
Add a "why this matters" line to each Routine - future-you will thank you when you can't remember why this Routine exists

Action items

☐ Add output validation to your first Routine's prompt

☐ Add a Slack or email notification on failure

☐ Block 10 minutes every Friday for the weekly Routine review

☐ Check the last 10 runs of your first Routine - confirm success rate and token cost

Next lesson: Routines + Skills + MCPs.

Exercises

Review the concepts covered in this lesson: Monitor + debug runs.
Write down your key takeaway from this lesson.
Practice running any commands or prompts mentioned above inside your terminal.