Why Your Claude Code Sessions Keep Dying (And How to Fix Them)#

Published on December 8, 2025

Context window mismanagement is the number one productivity killer in Claude Code. This guide shows you how to recognize context degradation, why it happens around 70-75% capacity, and the exact strategies top engineers use to maintain peak Claude Code performance throughout long coding sessions.

You've been working with Claude Code for hours. The first few tasks went smoothly - fast, accurate, exactly what you needed. But now, two hours in, something's changed. Claude starts making obvious mistakes. It ignores context from earlier in the conversation. It tries random solutions instead of reasoning through the problem. You find yourself saying "no, we already tried that" over and over.

This isn't Claude getting tired. It's your context window degrading.

After researching insights from Simon Willison, Geoffrey Huntley, Sourcegraph's Amp team, and the broader agentic coding community, I've identified the primary cause of Claude Code performance degradation - and more importantly, how to fix it.


Understanding Context Degradation#

Geoffrey Huntley, engineer at Sourcegraph (makers of Amp), discovered something critical while building coding agents: context window quality degrades around 147,000-152,000 tokens, even though Claude Sonnet advertises a 200,000 token limit.

Here's what happens:

Phase 1: Peak Performance (0-25% capacity, 0-50k tokens)

  • Fast, accurate responses
  • Good reasoning and planning
  • Appropriate tool usage

Phase 2: Gradual Decline (25-73% capacity, 50k-147k tokens)

  • Slower responses as context fills
  • Occasional context misses
  • Still mostly functional

Phase 3: Degraded State (73%+ capacity, 147k-152k+ tokens)

  • Brute-force solutions replace reasoning
  • Ignores earlier conversation context
  • Tries the same failed approaches repeatedly
  • Quality drops significantly

Why this happens:

  1. System prompts consume tokens: Claude Sonnet's 200k advertised limit becomes approximately 176k usable after system prompts
  2. Quality does not equal capacity: The model can technically handle 176k tokens, but reasoning quality degrades much earlier
  3. Compaction makes it worse: When context gets too large, automatic compaction summarizes history - but summaries lose crucial details

Think of it like RAM fragmentation. Your computer might have 16GB of memory, but after running for days without a restart, performance tanks even when you're only using 8GB.

Diagnostic tip: Use the /context command periodically during your session to check your current token usage and see where you stand in the capacity spectrum.


Signals of Context Degradation#

You're experiencing context degradation if:

  • Claude tries solutions you already rejected
  • It asks for information you provided earlier
  • Solutions become increasingly random instead of reasoned
  • Simple tasks suddenly require multiple attempts
  • Claude edits wrong files or misunderstands the codebase structure
  • It starts brute-forcing instead of planning
  • You're constantly course-correcting with "no, not that"

The problem: This isn't obvious until you've already wasted 30+ minutes fighting with degraded performance.


The Golden Rule: One Task, One Context#

The single most important insight from the research:

"Use a context window for one task, and one task only. If your coding agent is misbehaving, it's time to create a new context window."

This sounds simple, but it requires a mindset shift. You're not having a "conversation" with Claude - you're running discrete work sessions.

What Counts as "One Task"?#

Good (focused):

  1. Add dark mode toggle to settings page
  2. Fix TypeScript errors in auth module
  3. Write unit tests for user service

Bad (context bloat):

  1. Fix dark mode, then update docs, then add tests
  2. "Help me with my project" (no clear end state)
  3. Debugging multiple unrelated issues in one session

The Fix: Strategic Context Management#

1. Use /clear Aggressively#

The /clear command isn't just cleanup - it's a performance optimization tool.

When to clear:

  • After completing a discrete task
  • Before starting unrelated work
  • When you notice degradation signals
  • After 1-2 hours of continuous work (regardless of task completion)

Example workflow:

# Task 1: Add feature
You: /task add dark mode toggle
Claude: [implements feature]
You: APPROVED
Claude: [completes work]

# CLEAR before next task
You: /clear

# Task 2: Fix tests (fresh context)
You: /task fix failing e2e tests

2. Run Parallel Instances for Different Concerns#

Peter Steinberger and others report running 3-8 Claude Code instances simultaneously in terminal grids.

The pattern:

  • Instance 1: Backend implementation
  • Instance 2: Frontend work
  • Instance 3: Code review and testing
  • Instance 4: Documentation

Benefits:

  • Each instance has a fresh, focused context
  • No cross-contamination between unrelated work
  • Natural task separation
  • Can leave instances running autonomously

How to set it up:

  • Use tmux or terminal tabs
  • Separate git worktrees for parallel work (git worktree add)
  • Or use /tmp checkouts for isolated experiments

3. Extended Thinking for Complex Tasks#

For complex tasks that require extended context, use the thinking budget hierarchy:

  • "think" - 4,000 token budget
  • "think harder" - 10,000 tokens
  • "ultrathink" - 31,999 tokens (maximum)

When to use:

# Simple task - default
You: Add a new API endpoint for user settings

# Complex task - extended thinking
You: think harder about how to refactor the authentication
     system to support OAuth while maintaining backward
     compatibility

This allocates more reasoning capacity before starting work, rather than letting context bloat kill performance mid-task.

Important caveat: If you find yourself frequently needing "think harder" or "ultrathink", that's often a signal your task is too large and should be broken down into smaller, focused subtasks. These extended thinking modes are workarounds, not standard practice.

4. Separate Research from Implementation#

Don't do this (common mistake):

You: Explain how the authentication system works, then add
     OAuth support
[Claude explains for 5k tokens]
[Context now bloated with explanatory text]
[Implementation quality suffers]

Do this instead:

# Session 1: Research (Plan Mode)
You: [Shift+Tab twice for Plan Mode]
     Explain how the authentication system works

[Take notes externally]
/clear

# Session 2: Implementation (fresh context)
You: /task add OAuth support to authentication system
     [paste relevant notes from research]

Use Plan Mode (Shift+Tab twice) for research - it prevents file changes and keeps context focused on analysis.


Pattern 1: The Sub-Agent Approach#

For massive tasks, spawn sub-agents to manage context:

# Parent agent
You: Create a plan to migrate from REST to GraphQL

[Review plan]

# Sub-agent 1
You: /clear
     /task implement GraphQL schema based on this plan:
     [paste schema section]

# Sub-agent 2 (different terminal)
You: /task update frontend to use GraphQL client
     [paste client section]

Each sub-agent consumes its own context allocation, then returns results to you for integration.

Pattern 2: The YOLO Sandbox#

Geoffrey Huntley and Simon Willison both advocate for --dangerously-skip-permissions mode, but only in sandboxed environments.

The workflow:

# Local machine: planning and approval
claude

# Docker/Codespace: autonomous execution
docker run -it --rm -v $(pwd):/workspace \
  claude --dangerously-skip-permissions

# Or GitHub Codespaces for safety

This eliminates permission interruptions (which bloat context with approval messages) while maintaining security.

Pattern 3: The Async Research Pattern#

Use Claude Code for Web (claude.ai/code) for long-running research tasks that don't need local files.

When to use:

  • Researching libraries/frameworks
  • Proof-of-concept experiments
  • Understanding unfamiliar codebases
  • Documentation analysis

Benefits:

  • Runs asynchronously while you work locally
  • Fresh context for each research task
  • Can queue multiple prompts
  • "Teleport" results back to local CLI

How to Keep Your Context Sane#

Use this checklist to maintain healthy context windows:

Before starting work:

  • Is this a discrete, focused task?
  • Do I need research first? (Use Plan Mode separately)
  • Should I use a fresh instance? (If previous task was unrelated)

During work:

  • Am I seeing degradation signals?
  • Have I been working for 1+ hours continuously?
  • Is Claude repeating failed approaches?

After completing work:

  • Run /clear before next unrelated task
  • Consider separate instance for next task
  • Document learnings in CLAUDE.md (not in conversation)

For complex work:

  • Should I use extended thinking?
  • Would sub-agents manage this better?
  • Can I break this into smaller discrete tasks?

The Deeper Principle: Curated Context Beats Comprehensive Context#

One of the key insights from Sourcegraph's Amp documentation is that focused, relevant context consistently outperforms throwing everything at the model.

Irrelevant information derails agents more than missing information.

This means:

  • A focused 50k token context outperforms a bloated 150k context
  • Quality of context matters more than quantity
  • Strategic /clear usage is a feature, not a workaround

Practical application: Don't try to "give Claude everything." Instead:

  1. Start with a clean context
  2. Reference only relevant files (@src/auth/login.tsx)
  3. Include focused CLAUDE.md instructions
  4. Stay on task
  5. Clear and restart for new tasks

Summary: Five Rules for Effective Context Management#

Context window degradation is the silent productivity killer in Claude Code. The fix is concrete:

  1. One task, one context - clear aggressively between tasks
  2. Parallel instances - separate terminals for different concerns
  3. Research separately - use Plan Mode, then clear before implementing
  4. Extended thinking sparingly - a signal to break down tasks, not standard practice
  5. Curate, don't accumulate - focused context outperforms comprehensive context

The difference between frustrating Claude Code sessions and genuinely transformative productivity isn't the prompts you write - it's how you manage context windows.

Try it for a week. Clear context between tasks. Run parallel instances. Separate research from implementation. You'll notice the difference immediately.


Related Reading: Check out my Autonomous Task Workflow with /task Command for structured task management and Claude Code Productivity Tips for settings, custom commands, and best practices.


References#

  1. Geoffrey Huntley - I dream about AI subagents - Context window degradation research, 147k-152k token threshold discovery
  2. Geoffrey Huntley - How to build a coding agent - Workshop on coding agent patterns and YOLO mode
  3. Peter Steinberger - Just Talk To It - Parallel terminal grid workflow, 3-8 instances pattern
  4. Simon Willison - Embracing the parallel coding agent lifestyle - Parallel instances workflow analysis
  5. Sourcegraph - Amp Owner's Manual - Context curation best practices
  6. Anthropic - Context windows documentation - Official context window specifications
  7. Anthropic - Managing context on Claude - Compaction and context management strategies