Technical Architecture

Universal Agentic Platform: Long-Running Task Execution + Research Intelligence

🎉 October 2025 Architecture Update

Mamut Lab now includes Layer 11: Research Intelligence! We've integrated Project Continuum's research capabilities—adding Temporal Knowledge Substrate, Multi-Domain Synthesis, and Graduated Autonomy. Read the full integration announcement →

Note: This documentation page reflects the original 10-layer architecture. Comprehensive docs update with Layer 11 details coming soon. See Framework Document 13 for complete Layer 11 specification.

Architecture Overview

Mamut Lab is designed with 11 interconnected architectural layers, each addressing specific challenges in long-running agentic AI systems. However, implementation follows a strategic sequence—starting with neurosymbolic reasoning as the foundational layer that enables all other capabilities.

Architecture vs. Implementation

The 11 layers below represent the complete conceptual architecture. The implementation roadmap prioritizes neurosymbolic reasoning first (Layer 6), as it provides the formal verification, explainability, and trustworthiness that every other capability requires. See Implementation Priority for the build sequence.

Mamut Lab Architecture

Design Principles

  • Human Understanding First: Every AI decision must be explainable and verifiable
  • Multi-Model Consensus: Cross-verify outputs using diverse AI models
  • Persistent Context: Maintain coherent state across days and weeks, not just sessions
  • Cascade Prevention: Detect and halt error amplification early
  • Continuous Learning: Improve from experience without catastrophic forgetting

Research Foundation

This architecture synthesizes findings from cognitive science, distributed systems, machine learning safety, and human-computer interaction research. See references for academic sources.

Implementation Priority: Neurosymbolic First

While Mamut Lab is designed with 11 interconnected architectural layers, neurosymbolic reasoning is our first core implementation—the foundational capability that makes everything else trustworthy and explainable.

Why Neurosymbolic First?

Without neurosymbolic reasoning, we'd build yet another LLM wrapper—fast, but opaque, prone to hallucinations, and impossible to verify. Neurosymbolic reasoning transforms Mamut Lab from "another AI task executor" into a provably correct, explainable, trustworthy platform.

Every future capability (Darwin-Gödel self-improvement, coordinated space task execution, continual learning) builds on this neurosymbolic foundation.

Implementation Sequence

NOW

Phase 1: Neurosymbolic Foundation

Weeks 1-8

  • Scallop differentiable logic
  • PyKEEN knowledge graphs
  • Z3 formal verification
  • Dual-process engine integration
NEXT

Phase 2: Darwin-Gödel Enhancement

Following neurosymbolic

  • Formal verification of self-modifications
  • Safe evolution mechanisms
  • Introspection capabilities
FUTURE

Phase 3: Full Task Execution Platform

Built on verified foundation

  • Coordinated space architecture
  • Multi-tool coordination
  • Production deployment

Read more: Neurosymbolic Reasoning: Mamut Lab' First Core Implementation

Layer 2: Memory & Context Management Future

Biologically-inspired three-tier memory architecture for maintaining coherent context across arbitrary time horizons.

Planned capabilities:

  • Working Memory: ~8K token in-context LLM buffer for current session
  • Session Memory: Vector database for semantic search across hours to days
  • Archive Memory: Knowledge graph storage for long-term facts and procedures
  • Consolidation: Automatic memory stabilization and semantic knowledge extraction

This layer is part of future development following the neurosymbolic foundation.

Layer 3: Dual-Process Cognitive Engine Future

Routes tasks between fast heuristic processing (System 1) and slow analytical reasoning (System 2).

Planned capabilities:

  • System 1 (Fast): Pattern matching with smaller models (GPT-4o-mini, Claude Haiku)
  • System 2 (Analytical): Deliberative reasoning with larger models (o1, Claude Opus)
  • Adaptive Routing: Classify tasks by novelty, stakes, and complexity
  • Performance Learning: Adjust routing thresholds based on outcomes

This layer is part of future development following the neurosymbolic foundation.

Layer 4: Multimodal Execution Modes Future

Three distinct reasoning modes for different task requirements: normal execution, controlled hallucination, and adversarial review.

Planned modes:

  • Normal Mode: Standard execution with accuracy priority and factual grounding
  • Hallucination Mode: Controlled creative exploration for brainstorming (flagged as hypothetical)
  • 10th Man Mode: Adversarial review and red-teaming to find edge cases

This layer is part of future development following the neurosymbolic foundation.

Layer 5: Cascade Prevention System Future

Detects and halts error amplification before small mistakes compound into catastrophic failures.

Planned mechanisms:

  • Divergence Monitoring: Track deviation from expected trajectories
  • Uncertainty Thresholds: Halt when confidence drops below safety threshold
  • Cross-Model Verification: Flag disagreements between models
  • Human Intervention: Escalate critical or destructive operations

This layer is part of future development following the neurosymbolic foundation.

FIRST CORE IMPLEMENTATION

Layer 6: Neurosymbolic Reasoning

The foundational layer—combines neural pattern recognition with symbolic logic for provably correct, explainable AI decisions. Every other capability builds on this neurosymbolic foundation.

Why Neurosymbolic?

Pure LLMs are fast but opaque, prone to hallucinations, and impossible to formally verify. Pure symbolic AI (expert systems) offers guarantees but can't learn from data and fails on noisy real-world inputs.

Neurosymbolic AI combines the best of both:

  • Neural components handle perception, ambiguity, and pattern recognition
  • Symbolic components verify correctness, provide explanations, and ensure safety
  • Integration through differentiable logic (Scallop) enables end-to-end training

Production-Proven Technology

Amazon Vulcan (2025)

Production Deployment

  • Warehouse robot task coordination
  • Neural vision + symbolic planning
  • Deployed in Spokane & Hamburg

SAP ABAP Gen (2025)

Production Deployment

  • Code generation with formal parser
  • LLM + formal parser verification
  • Commercial release planned

AlphaProof (2024)

IMO Silver Medal

  • Mathematical theorem proving
  • Gemini + Lean formal proofs
  • 1 point from gold medal

Mamut Lab Implementation Stack

  • Scallop: Differentiable Datalog for neurosymbolic reasoning
  • PyKEEN: Knowledge graph embeddings (40+ models)
  • Z3 Solver: Formal verification and constraint satisfaction
  • SymPy: Symbolic mathematics and exact computation

Dual-Process Integration

Neural models (Claude, GPT-4) generate candidate solutions. Symbolic components (Scallop, Z3) verify correctness. Knowledge graphs (ArangoDB + PyKEEN) provide semantic context. Only verified candidates proceed to execution.

🔬 CURRENT IMPLEMENTATION

Synthetic Data Generation Phase 1

Neurosymbolic reasoning enables unlimited training data generation from axioms—guaranteed correct, no privacy concerns, no human labeling required.

Why Synthetic Data?

Traditional machine learning requires massive labeled datasets, raising privacy concerns and scaling challenges. Symbolic systems can generate infinite correct examples from finite rules.

Generation Techniques

Axiomatic Theorem Generation

Infinite examples from finite axioms

  • Define logical axioms (e.g., group theory rules)
  • Derive unlimited valid theorems automatically
  • Train models on proven-correct examples
  • No human labeling required

Reverse-Process Generation

Backward reasoning from solutions

  • Generate random valid outputs (e.g., polynomials)
  • Apply inverse operations (differentiate → integration pairs)
  • Create unlimited training examples
  • Example: SymPy for calculus problem generation

Constraint-Based Generation

SMT solvers create test cases

  • Define complex constraints (e.g., resource limits)
  • Z3 generates satisfying test cases
  • Explore edge cases systematically
  • Exhaustive coverage of specification

Real-World Example: AlphaGeometry

Google DeepMind's AlphaGeometry (Nature 2024) used synthetic data generation to achieve near-IMO-gold-medal performance:

  • 100 million synthetic geometry problems generated from axioms
  • No human-labeled training data required
  • Solved 25/30 IMO geometry problems (human gold medalists average 25.9)
  • Combination: neural language model + symbolic deduction engine

Mamut Lab Application

We use synthetic data generation for:

  • Continual Learning: Generate training examples without catastrophic forgetting
  • Task Execution Testing: Create edge case scenarios for long-running task logic
  • Knowledge Graph Population: Derive facts from axioms automatically
  • Privacy-Preserving Training: No sensitive data required

Implementation Stack

# Example: Generate integration training pairs with SymPy
import sympy as sp
from sympy import symbols, diff, integrate, lambdify
import random

x = symbols('x')

# Generate random polynomial
def random_polynomial(degree=3):
    coeffs = [random.randint(-10, 10) for _ in range(degree + 1)]
    return sum(c * x**i for i, c in enumerate(coeffs))

# Generate training pair
polynomial = random_polynomial()
derivative = diff(polynomial, x)

# Training example: (derivative, polynomial)
# Task: Given derivative, find original function
print(f"Problem: Integrate {derivative}")
print(f"Solution: {polynomial}")

# Generate unlimited examples...
🕸️ CURRENT IMPLEMENTATION

Knowledge Graph Reasoning Phase 1

Hybrid queries combining symbolic logic with semantic similarity using ArangoDB + PyKEEN (40+ embedding models).

Hybrid Reasoning Architecture

Traditional databases handle exact queries (SQL). Neural embeddings handle semantic similarity (vector search). Knowledge graphs combine both—logical constraints + semantic relevance.

Symbolic Component

  • Technology: ArangoDB (graph database)
  • Query: AQL (graph traversals, logical filters)
  • Strength: Exact constraints, relationships
  • Example: "Find tasks that require tool capability X"

Neural Component

  • Technology: PyKEEN (40+ embedding models)
  • Models: TransE, RotatE, ComplEx, ConvE
  • Strength: Semantic similarity, analogies
  • Example: "Find entities similar to Y"

Hybrid Query Example

Query: "Find tasks similar to 'code_generation' that require 'code_analysis' capability"

Step 1 - Symbolic Filter (AQL):
FOR t IN tasks
  FILTER t.requires_capability == 'code_analysis'
  RETURN t

Step 2 - Neural Ranking (PyKEEN):
embedding_similarity(t.embedding, code_generation.embedding)
  → Rank by semantic relevance

Result: Logically valid candidates + semantically relevant

PyKEEN: 40+ Embedding Models

Translational Models

  • TransE: Simple translation (h + r ≈ t)
  • TransH: Hyperplane projections
  • TransR: Relation-specific spaces
  • RotatE: Rotations in complex space

Neural Models

  • ConvE: Convolutional networks
  • ConvKB: KB-specific convolutions
  • DistMult: Bilinear scoring
  • ComplEx: Complex-valued embeddings

Advanced Models

  • Tucker: Tensor decomposition
  • PairRE: Paired relation embeddings
  • QuatE: Quaternion embeddings
  • AutoSF: Automated feature learning

Real-World Applications

  • Tool Discovery: Find tools with similar capabilities (semantic) that satisfy constraints (symbolic)
  • Task Composition: Recommend compatible task phases based on past successes
  • Error Diagnosis: Find similar past failures with known solutions
  • Knowledge Completion: Predict missing facts from learned patterns

Integration with Scallop

Knowledge graph embeddings (PyKEEN) feed into neurosymbolic reasoning (Scallop). Neural similarity scores become probabilities in Scallop's probabilistic Datalog, enabling differentiable reasoning over knowledge graphs.

📋 CURRENT IMPLEMENTATION

Explainability & Regulatory Compliance Phase 1

Symbolic reasoning traces provide automatic audit trails for AI decisions—critical for EU AI Act compliance and enterprise trust.

The Explainability Problem

Pure neural networks are black boxes—opaque decision-making that can't be audited or verified. This creates serious problems:

  • Regulatory Risk: EU AI Act (2024) mandates explainability for high-risk systems
  • Trust Issues: Enterprises won't deploy systems they can't understand
  • Debugging Challenges: Can't fix what you can't trace
  • Liability Concerns: Who's responsible when AI makes a mistake?

Symbolic Reasoning = Built-in Explanations

Every symbolic inference produces a complete logical trace—no post-hoc explanation methods needed.

Symbolic Traces

  • Complete derivation for every decision
  • Step-by-step logical inference
  • Provenance tracking (which facts led to conclusion)
  • Machine-readable audit logs

Natural Language Translation

  • Convert predicates to sentences
  • Template-based explanations
  • User-friendly reasoning summaries
  • Multi-language support

Visual Explanations

  • Proof trees (hierarchical reasoning)
  • Reasoning graphs (knowledge flow)
  • Timeline views (decision history)
  • Interactive exploration

Example: Explainable Task Execution

Decision: Execute task phase "code_generation"

Symbolic Trace (Scallop/Datalog):
  1. task_phase(code_generation)
  2. requires_capability(code_generation, code_analysis)
  3. has_tool_with_capability(claude_sonnet, code_analysis)
  4. precondition_satisfied(code_generation)
  5. ∴ can_execute(code_generation)

Natural Language Explanation:
  "The system executed the 'code_generation' task phase because:
   - It is a valid registered task phase
   - It requires the 'code_analysis' capability
   - The 'claude_sonnet' tool provides this capability and is currently available
   - All preconditions are satisfied
   - Therefore, execution is authorized"

Audit Trail (ArangoDB):
  {
    "decision_id": "dec_20251022_143022",
    "timestamp": "2025-10-22T14:30:22Z",
    "decision": "execute",
    "target": "code_generation",
    "tool_selected": "claude_sonnet",
    "reasoning_chain": [...],
    "fact_sources": ["task_definition.yaml", "model_registry"],
    "confidence": 1.0,  // Symbolic = certain
    "user_approved": true
  }

EU AI Act Compliance

High-Risk AI System Requirements

The EU AI Act (2024) requires high-risk AI systems to provide:

  • Transparency: Decision logic must be documented → Symbolic traces
  • Explainability: Outputs must be interpretable → Natural language translations
  • Auditability: Decisions must be logged → ArangoDB audit trails
  • Human Oversight: Humans can review decisions → Visualization tools

Mamut Lab' neurosymbolic architecture provides these capabilities by default—no retrofitting required.

Implementation Stack

  • Scallop: Generates complete provenance for every derived fact
  • ArangoDB: Stores reasoning chains with timestamps and fact sources
  • Natural Language Generation: Template-based predicate → sentence conversion
  • Visualization: D3.js for proof trees, graph views, timelines

Regulatory Landscape

Beyond the EU AI Act, transparency requirements are emerging globally:

  • US: NIST AI Risk Management Framework (2023)
  • UK: AI regulation white paper (2023)
  • China: Algorithm regulation (2022)

Neurosymbolic AI with built-in explainability positions Mamut Lab for compliance across jurisdictions.

Layer 7: Continual Learning Subsystem Future

Learn from experience without catastrophic forgetting of prior knowledge.

Planned mechanisms:

  • Experience Replay: Rehearse past successes to maintain performance
  • Elastic Weight Consolidation: Protect important model weights
  • Progressive Neural Networks: Expand capacity for new tasks
  • Meta-Learning: Learn how to learn efficiently

This layer is part of future development following the neurosymbolic foundation.

Layer 8: Verbalized Sampling & Diversity Future

Generate diverse solutions through temperature control, ensemble methods, and explicit perspective-taking.

Planned techniques:

  • Temperature Control: Range from deterministic to creative sampling
  • Nucleus Sampling: Top-p filtering for diversity
  • Ensemble Methods: Multiple solutions with voting
  • Perspective Instructions: Explicit role-taking for diverse viewpoints

This layer is part of future development following the neurosymbolic foundation.

Layer 9: Preserve Human Agency Future

Ensure humans remain in control with transparency, explainability, and override capabilities.

Core principles:

  • Transparency: Every AI decision is visible and traceable
  • Explainability: Reasoning chains in human-readable format
  • Override: Humans can reject, modify, or redirect AI actions
  • Opt-In Automation: Explicit consent for autonomous operations

This layer is part of future development following the neurosymbolic foundation.

Layer 10: Implementable Self-Improvement Phase 2

Systematic improvement through Darwin-style variation-selection and Gödel-inspired introspection.

Evolution Loop

  1. Monitor: Track performance metrics
  2. Hypothesize: Generate improvement candidates
  3. Test: A/B test in sandboxed environment
  4. Select: Keep improvements, discard regressions
  5. Deploy: Roll out validated changes

Safety Constraints

All self-modifications are human-reviewed before production deployment. Contact us for detailed self-improvement safety protocols.

Layer 11: Data Abstraction & Integration Phase 3

Unified interface for heterogeneous data sources with automatic schema inference and adaptation.

Planned capabilities:

  • Database Integration: SQL, NoSQL, Graph databases
  • API Connectors: REST, GraphQL, gRPC
  • File Systems: Local and cloud storage
  • Version Control: Git, SVN integration
  • Documentation: Markdown, PDF, HTML parsing

This layer is part of Phase 3 development for full task execution platform capabilities.

Research References

This architecture is grounded in peer-reviewed research:

  • Memory Systems: Tulving (1985) - Multiple memory systems; McClelland et al. (1995) - Complementary learning systems
  • Dual-Process Theory: Kahneman (2011) - Thinking Fast and Slow; Sloman (1996) - Empirical case for two systems
  • Cascade Failures: Perrow (1984) - Normal Accidents; Reason (1990) - Human Error
  • Continual Learning: Kirkpatrick et al. (2017) - Elastic Weight Consolidation; Rusu et al. (2016) - Progressive Neural Networks
  • Human-AI Interaction: Amershi et al. (2019) - Guidelines for Human-AI Interaction; Ribeiro et al. (2016) - Why Should I Trust You (LIME)

Full Bibliography

Contact us for complete academic references and research justifications.

Documentation Access

The consolidated PDF is still in development. In the meantime, explore the live research notes:

Questions About the Architecture?

We're happy to discuss technical details, implementation strategies, or potential collaborations.

Contact Us