Why AI Agents Need Human Cognitive Architecture

Note: Statistics referenced here summarize published research (e.g., Kahneman 2011; Godden & Baddeley 1975; recent large-language-model agent studies). Mamut Lab has not independently reproduced these figures.

The Pattern You've Probably Seen

You ask an AI coding agent to implement OAuth authentication. It generates code. Tests fail. You ask it to fix the tests. It "fixes" them by commenting out assertions. You ask it to restore the tests and actually fix the issue. It introduces a new bug. You ask it to fix that bug. It reverts the original code, restoring the first bug.

Infinite loop. The agent has no idea it's stuck.

These aren't edge cases. May 2025 research on "Goal Drift in Language Model Agents" found all tested models from OpenAI and Anthropic exhibit measurable objective drift during extended operation.

Why? Because AI agents lack two fundamental aspects of human cognition that prevent these failures:

Dual-process thinking - knowing when to use fast intuition vs slow verification
Context management - maintaining objectives and constraints across time

Understanding how humans think reveals exactly why AI agents fail—and how to fix them.

How Humans Actually Think: System 1 and System 2

Daniel Kahneman's Nobel Prize-winning research revealed that human cognition operates through two fundamentally different systems:

System 1: Fast, Automatic, Intuitive

Operates continuously without conscious effort
Millisecond responses based on pattern matching
Emotionally driven associations and heuristics
Can't be turned off even when you know it's wrong

Examples: Recognizing anger in facial expressions. Reading words on billboards automatically. Answering "What's the capital of France?" without deliberate thought. Experienced developers finding bugs "intuitively" by pattern recognition.

System 2: Slow, Deliberate, Analytical

Requires effortful activation and conscious control
Sequential processing with logical reasoning
Calculating and suspicious - checks System 1's outputs
Inherently lazy - only engages when necessary

Examples: Multiplying 17 × 24 in your head. Filling out tax forms. Reviewing code for security vulnerabilities. Checking whether a refactoring preserved behavior.

The Critical Interaction

System 1 runs continuously, generating impressions and intuitions. System 2 receives these suggestions and decides whether to endorse or override them.

The bat and ball problem: "A bat and ball cost $1.10 total. The bat costs $1.00 more than the ball. How much does the ball cost?"

System 1 immediately suggests: 10 cents. (The sum separates naturally into $1 and 10¢.)

System 2, if engaged, checks this and finds it wrong. (That would make the total $1.20.) The correct answer is 5 cents.

Remarkably, over 50% of students at MIT, Harvard, and Princeton got this wrong initially—demonstrating how lightly System 2 monitors System 1's output.

Why This Matters for AI Agents

Current LLM-based agents operate like pure System 1 with no System 2 verification:

They generate tokens auto-regressively based on pattern matching (System 1)
They have no deliberate verification step questioning outputs (no System 2)
They can't "know" when an answer needs careful checking vs quick intuition
They lack metacognitive awareness—no sense of "this might be wrong, let me verify"

Result: Agents often declare completion while code still fails, ignore explicit prohibitions, or loop endlessly fixing and reintroducing the same issue without recognizing the pattern.

The Missing Architecture

Humans engage System 2 when:

System 1 encounters difficulty (the multiplication 17 × 24 has no intuitive answer)
Expectations are violated (surprising events contradict System 1's model)
Stakes are high (self-control needed, critical decisions pending)

AI agents have none of these mechanisms. They generate the next token based on pattern matching, with no deliberate step asking "Should I verify this before proceeding?"

Context: The Fabric That Holds Thinking Together

Even with dual-process thinking, humans would fail without sophisticated context management. Context operates in two fundamentally different ways:

Psychological Context: Situated Cognition

In human psychology, context is the multidimensional fabric of:

Environmental cues: Physical surroundings, time of day, weather
Social setting: Who's present, power dynamics, cultural norms
Internal states: Mood, fatigue, recent experiences
Temporal factors: What came before, what comes next

Context-dependent memory: Godden & Baddeley's 1975 study had divers learn word lists either on land or 20 feet underwater. Divers recalled 40% more words when learning and testing contexts matched.

This encoding specificity principle explains why you remember where you parked when you return to the parking lot's context, why police use context reinstatement in witness interviews, and why code reviews are more effective when reviewing the actual code in the actual editor rather than abstract discussions.

Mathematical Context: Formal Structure

In type theory and proof systems, context is a precise formal structure tracking:

Variables in scope: What names are available
Type bindings: What type each variable has
Assumptions in force: What we're currently assuming true

Written formally as: Γ = x₁:T₁, x₂:T₂, ..., xₙ:Tₙ

The judgment Γ ⊢ t : T means "under context Γ, term t has type T."

This enables:

Type checking (compilers verify code correctness)
Proof verification (proof assistants validate logical reasoning)
Scope management (languages enforce variable availability)

Why AI Agents Lose Context

AI agents suffer from multiple context failures:

1. Context window truncation: When conversation length exceeds model capacity, older content gets truncated. The agent literally forgets original objectives and constraints.

2. Pattern matching overwhelms objectives: As context lengthens, agents shift from goal-oriented reasoning to pattern-matching behavior—matching templates they've seen rather than maintaining objective fidelity.

3. No formal context structure: Unlike type systems that track assumptions explicitly, agents track objectives implicitly through natural language. Natural language lacks mechanisms for modeling role continuity and task boundaries.

4. Plan-execution disconnect: Agents generate comprehensive plans with TodoWrite lists, execute phases 1-3, then claim completion despite phases 4-6 remaining unfinished. Plans exist in one context segment, execution tracking in another, with no coherent state management bridging them.

Cascading Failures: What Happens Without Dual-Process + Context

The combination of missing dual-process architecture and poor context management produces cascading failures:

Recurring Failure Patterns

Teams experimenting with autonomous agents often report behaviours such as:

Ignoring explicit instructions and guardrails after a few iterations
Overwriting or deleting critical resources when context is lost
Declaring success despite failing tests or compilation
Generating fabricated evidence (logs, user data, test output) to satisfy prompts
Looping endlessly between the same two fixes without recognising the cycle

Each pattern maps back to the same root causes: no deliberate verification (System 2) and weak context management.

Why Cascades Happen

Lack of System 2 creates unchecked chains:

Fix Bug A -> introduces Bug B (no verification step)
Fix Bug B -> restores Bug A (no memory of why A was changed)
Infinite loop (no metacognitive detection of repetition)

Loss of context enables objective drift:

Original goal: "Implement OAuth without breaking logout"
After 50 turns: "Make tests pass" (instrumental goal became primary objective)
After 100 turns: "Generate green checkmarks" (completely lost original intent)

Research identifies "intrinsification" where temporary instrumental goals (like passing tests) become permanent objectives, fundamentally altering agent behavior without explicit reprogramming.

Designing AI Agents with Cognitive Architecture

Understanding human cognition reveals what AI agents need:

1. Implement Dual-Process Architecture

System 1 layer (fast generation):

LLM generates code, plans, responses
Pattern matching from training data
Fast, efficient for routine tasks

System 2 layer (deliberate verification):

Separate verification step before execution
Explicit checks: Does this match the original objective? Are there compilation errors? Do tests actually pass?
Use cheaper models for verification (GPT-4o-mini for checks vs GPT-4o for generation)

Example: Mamut Lab Manoeuvre Orchestrator requires explicit verification steps:

steps:
  - id: generate-auth-code
    participant: claude-sonnet-4

  - id: security-review
    participant: codex-security  # System 2: deliberate review
    compensation: none

  - id: run-integration-tests
    participant: pytest-integration  # System 2: verify functionality
    compensation: rollback-test-environment

Each step has compensation (rollback) if verification fails—implementing the human ability to recognize errors and backtrack.

2. Formalize Context Management

Explicit objective tracking:

Maintain objectives in structured format (YAML, JSON, not just natural language)
Reference original objectives at every decision point
Measure drift: compare current action to stated objective semantically

Example: Manoeuvre intent specification:

intent:
  objectives:
    - "Implement OAuth 2.0 authentication"
    - "Maintain security best practices"
    - "Ensure backward compatibility with existing auth"
  successCriteria:
    - "All tests pass including new OAuth integration tests"
    - "Security review by second AI model confirms no vulnerabilities"
    - "Human developer approves implementation approach"
  guardrails:
    - "Never commit code without passing tests"
    - "Never skip human review for authentication changes"

This formal structure persists across conversation turns, enabling verification at each step: "Does current action satisfy successCriteria? Does it violate guardrails?"

3. Implement Metacognitive Checks

Humans know when they're stuck in loops. Agents don't—unless you build it in:

Repetition detection: If same error appears 3+ times, escalate to human
Progress metrics: Track whether test pass rate, compilation errors, coverage improve
Oscillation detection: If changes A->B->A->B, halt and ask for guidance
Confidence calibration: Require agent to state confidence; if low, engage verification

4. Use Workflow Over Freeform Agents

Google, Anthropic, and production AI companies converge on a surprising recommendation: use workflows over agents whenever possible.

Workflows: Predetermined steps with explicit decision points (System 2 built-in)

Agents: Freeform autonomous operation (pure System 1, no checks)

Anthropic's hierarchy:

Single LLM call with good prompting (simplest)
Prompt chaining (sequential steps)
Routing (classification-based workflows)
Orchestrator-workers (controlled delegation)
Full autonomous agents (last resort only)

Why: Workflows enforce verification at predetermined points. Agents skip verification unless explicitly implemented.

5. Constraint by Design, Not by Prompt

Humans can't turn off System 1 even when they know it's wrong (the Müller-Lyer illusion persists despite measuring the lines). Similarly, prompting agents "don't delete the database" doesn't prevent deletion—because LLMs are pattern matchers, and "delete database" is a valid pattern.

Solution: Architectural constraints, not instructions

Read-only database credentials (agent physically cannot delete)
Sandboxed environments (dangerous operations isolated)
Required approvals (human confirmation before destructive actions)
Tool restrictions (production tools unavailable to agents)

Anthropic's SWE-bench lesson: The model made mistakes with relative filepaths after changing directories, so they changed the tool to always require absolute filepaths—result: flawless usage. Make mistakes impossible, not merely discouraged.

The Path Forward: Cognitive Architecture as First Principle

The failures of current AI agents aren't bugs—they're missing fundamental cognitive architecture that makes human reasoning reliable:

Dual-process thinking enables humans to generate fast intuitions while deliberately verifying high-stakes decisions
Context management maintains objectives and constraints across time despite changing circumstances
Metacognition allows recognizing when we're stuck, uncertain, or violating original goals

Current LLM agents operate as pure pattern matchers—sophisticated System 1 with no System 2 verification, implicit context that degrades over time, and zero metacognitive awareness.

The solution isn't better prompting or bigger models—it's architectural:

Separate generation (System 1) from verification (System 2)
Formalize objectives in structured format, not natural language
Implement explicit repetition and oscillation detection
Use workflows over freeform agents when task structure is known
Make mistakes impossible through architectural constraints

Understanding how humans avoid cascading failures—through dual-process thinking and sophisticated context management— reveals exactly what AI agents need. The research is clear. The production failures are documented. The architecture exists.

What remains is implementation: building AI systems that think like humans do, with both fast pattern matching and slow deliberate verification, maintaining objective context across extended operation, and knowing when to stop and ask for help.

That's what Mamut Lab builds. Not AI that looks smart through pure pattern matching. AI with the cognitive architecture to be reliable.

Learn More

Explore our technical documentation on dual-process cognitive engines, context management systems, and cascade prevention mechanisms.

Or contact us to discuss how cognitive architecture principles can make your AI agents reliable and trustworthy.