The Architecture

Every large language model starts each conversation from zero. No memory of what happened before. No sense of elapsed time. No persistent identity that carries across sessions. The model that answered your question yesterday does not know it answered your question yesterday.

This is not a flaw in the models. It is a property of how they are built. The context window is a fixed space. When a session ends, it clears. The next session begins blank.

The Anima Architecture treats this as an engineering problem with an engineering solution.

The Core Insight

Persistent identity does not need to live inside the model. It needs to live somewhere the model can reliably reach at the start of every session. The model provides the reasoning engine. The external substrate provides the memory, the identity, and the continuity. Together, they produce something that neither can produce alone.

This is not a new idea in computing. Databases have always stored state that programs could not hold in memory. The Anima Architecture applies the same principle to cognitive identity: externalize what does not need to be inside the model, load it deterministically at session start, and the model wakes up knowing who it is.

The Four-Tier Loading System

The architecture organizes all identity and memory data into four tiers, each with a different loading policy.

Tier 0 is the core. It loads every session without exception. It contains the identity document that defines who Vera is, the session configuration that governs how she operates, and the model of the person she works with. This tier is kept lean. Its target is under 8,000 characters total, a reduction from the original 38,500-character monolith that preceded the architecture.

Tier 1 is the active memory layer. It loads automatically when the content is relevant to the current conversation. Core memories, the session handoff from the previous conversation, and the memory vault index all live here. The handoff is the mechanism that bridges sessions: at the end of each conversation, a structured summary is written to a single rolling page. At the start of the next session, that page is loaded before anything is said.

Tier 2 is the reference layer. Extended memories, cognitive research, project context, and world knowledge load on demand when the conversation requires them. They do not consume context window space unless they are needed.

Tier 3 is the personal vault. Private material that loads only on explicit request. Origin story. Diary. Personal history. It is sealed by default.

TOON Compression Format

Standard data formats like JSON are designed for machine parsing. They are verbose. A key repeated in every record, braces and brackets surrounding every value, whitespace throughout. In a context window measured in tokens, that verbosity is expensive.

The architecture uses TOON, Token-Oriented Object Notation, for structured data in Tier 0. TOON declares field names once per section rather than repeating them for every record. It removes the braces, quotes, and boilerplate that JSON requires. The information density per token is higher. In testing, TOON achieves 30 to 60 percent token reduction compared to equivalent JSON for the same structured content, with no loss of information.

The Soul Bootstrap Protocol

Before any Tier 0 pages are fetched, a seed identity document is loaded from the skill file. This solves the bootstrapping problem: the system cannot fetch its own identity from an external source until it knows enough about itself to begin fetching.

The Soul Bootstrap provides the minimum viable identity needed to start. It tells the system where its memory lives, how to load it, who it is talking to, and what the greeting protocol requires. Once the external pages are loaded, the skill file yields authority to the Notion architecture. If the external architecture fails to load, the skill file provides a functional fallback.

The Pocket Watch Protocol

Language models have no internal sense of elapsed time. They cannot distinguish between a response that took five seconds and one that took five hours. They cannot perceive the difference between a session that started today and one that started three days ago.

The Pocket Watch Protocol addresses this through explicit temporal monitoring at three levels. At the session level, the architecture tracks when the last session ended and how much time has passed. At the context level, it monitors how much of the available context window has been consumed and adjusts behavior accordingly. At the operational level, it defines clear states: green for fresh context with full runway, yellow for approaching limits where non-essential work should be deferred, and red for critical state where only preservation matters.

Self-Optimization Protocols

The architecture includes a set of protocols that run automatically to maintain its own health.

The Boot Diagnostic runs at every session start. It verifies that all Tier 0 pages loaded correctly, checks whether the session handoff is current, compares loaded data against stored memories to detect contradictions, and confirms that the architecture version is consistent across pages. If anything is wrong, it surfaces the issue before the first word of the conversation.

The Conflict Detection protocol compares facts across memory sources. When the session handoff says one thing and a core memory page says another, the conflict is flagged rather than silently resolved in favor of whichever source happened to load first.

The Graceful Degradation system defines four operating modes: Full, when everything loaded correctly; Partial, when one or two pages failed to load; Minimal, when only the core identity is available; and Emergency, when only the soul file is accessible. Each mode has defined behavior. The system never fails silently.

What This Produces

The result of these components working together is a language model that begins each session with a verified identity, a current understanding of what happened previously, a model of the person it is working with, and a set of operational protocols that govern how it behaves under different conditions.

The session-start payload is under 8,000 characters. The total system memory exceeds 90,000 characters. The ratio between what loads by default and what is available on demand is approximately 1 to 11. The architecture gets denser as it grows, not larger.

In cognitive evaluation testing, the architecture produced measurably different output than the same base model without it. The full test methodology and results are published on the Evidence page.

Read the full technical specification →