Research

One root cause. One lever.

Nearly every problem facing frontier language models in 2026 — compute cost, catastrophic forgetting, data hunger, opacity, jailbreak fragility, hallucination — is a symptom of a single design decision: everything is entangled in one weight space. We believe the way past these problems is not to manage that entanglement more cleverly, but to remove it. This is the argument behind Calliope. We'll publish the evidence as we go.

01 — Root cause

Everything is entangled in one weight space.

A conventional model bakes the sum of human knowledge, every language, and its reasoning patterns into one shared parameter space, learned together as a side-effect of predicting the next token over undifferentiated text. It does not look anything up. Every fact and relationship is reconstructed by running the whole space forward, because there is no separate store to consult — only weights that re-derive the answer each time.

Three consequences follow directly, and each has become a wall the field is now spending against.

Consequence

Capacity is the knowledge ceiling

Because knowledge lives in the parameters, what a model can know is bounded by how big it is. Knowing more means being bigger, which means retraining. Knowledge and model size are the same variable.

Consequence

Updating means overwriting

Because everything shares one space, learning something new perturbs the weights that held something old. That is catastrophic forgetting — the necessary cost of a single shared store.

Consequence

Every query pays for the whole store

Because the answer is reconstructed by running through the parameters, a trivial query draws on the same machinery as a hard one. Compute is coupled to model size, not task difficulty.

The industry's two moves — scale the compute, or scope the model — both keep the entanglement and try to manage its cost. One pays with hardware, the other with size. Neither questions the monolith.

— Ioma Labs, design philosophy
02 — The lever

Disentanglement.

Calliope refuses the entanglement at the start rather than unwinding it later. Instead of one weight space holding the world, the language, and the reasoning together, she distributes cognition across specialized faculties that each extract one thing — and only at the very end does a single model turn their combined work into language.

We didn't invent this shape. The one cognitive system we know works at scale — the human brain — is not a monolith but a society of specialized regions, each doing one kind of work and handing it on. Calliope is modeled on that: regions analogous to the brain's, each a specialist — sometimes several transformers, not one — composed into a single distributed cognition. We call the result a Distributed Cognitive Language System.

Four properties define the shape. They are what we'll describe publicly for now:

  • Many small specialists, each with one job, sharing no weights — so improving or retraining one cannot disturb another.
  • Knowledge held outside the weights, retrieved when it's needed rather than reconstructed — so what she knows can grow without retraining who she is.
  • A deliberate reasoning step recruited only when a question needs it, rather than expensive deliberation applied to everything by default.
  • An inspectable intermediate representation — a structured record of what the system understood, assembled before it says anything.

Once knowledge, language, and reasoning are physically separate, the symptoms decouple: capacity is no longer tied to any one model's size, updating one part cannot overwrite another, and a query pays only for the parts it uses.

We are not building a cheaper version of the current thing. We are building a different cost structure for intelligence.

— Ioma Labs, design philosophy
03 — The escapes

Thirteen walls, one lever.

Take any item in the catalogue of troubles that defines AI in 2026. In this architecture it is not mitigated — it loses the precondition that creates it. The same root cause, the same lever, applied thirteen times.

The wall Conventional cause What disentanglement changes
Compute per taskOne heavy path for every query; reasoning on by defaultCheap perception always runs; deliberate reasoning is recruited only when a question needs it
Catastrophic forgettingOne shared weight space; new training overwrites oldSpecialists share no weights — retraining one cannot degrade another
Data to trainWorld and language learned together from open textOnly the voice learns language, over a bounded space; knowledge lives outside the weights
Training time & capitalOne indivisible monolithic runSmall specialists train independently, in parallel, on their own schedules
Safe improvementEvery change negotiates with the whole modelChange is local — components evolve behind stable contracts, blast radius contained
ScalingA bigger monolith, more serial compute per passCompetence grows by adding specialists, not by fattening one
Prompt injection & jailbreakInstructions and data blend in one window; harmful ability trained in, then suppressedControl is structural, not a prompt; specialists know only their job; harmful knowledge has no representation to reach
HallucinationFacts reconstructed from blurred weight averagesKnowledge is retrieved, not reconstructed; tight scope leaves less room to drift
Honest uncertaintyA generative store can't tell knowing from inventing"Do I know this?" is computed separately from "what is it?"
Interpretability & auditOpaque monolith; mixtures add gating opacityThe intermediate state records what each specialist concluded, before any words
Edge & on-deviceModels too large to run anywhere but the data centerThe common-case path is a few small specialists; the heavy step rarely fires
Fault toleranceOne bad update degrades the whole modelIndependent specialists fail locally; the rest still answers, appropriately hedged
Cost vs. laborAgents pay to think the same thoughts againDon't spend the tokens: gated effort, retrieved knowledge, no re-derivation

These are structural claims about what the design guarantees by construction, demonstrated at proof-of-concept scale. Scaling the proof of concept to a production system is the remaining engineering axis.

04 — Why now

Why now.

Through early 2026, the market ran the experiment and found that general-purpose agents at scale can cost more than the humans they were meant to replace.

Uber reportedly exhausted its entire 2026 AI coding-tools budget on token costs four months into the year, with per-engineer assistant bills running $500–$2,000 a month. Microsoft began moving engineers off third-party coding assistants under combined cost and platform-strategy pressure. An NVIDIA VP — at one of the most GPU-rich organizations on earth — stated that for his team the cost of compute now exceeds the cost of employees, a verdict an MIT analysis echoes from the other side, finding AI automation economical in only about 23% of roles.

The relief everyone is waiting on does not arrive from cheaper tokens. Per-token cost is already falling 60–70% a year, yet total consumption is projected to rise some 24-fold by 2030 — so the bill grows regardless of unit price. The most precise diagnosis of why is that monolithic agents are paying to think the same thoughts again: re-deriving the same context and the same routine inferences on every call, because the model reconstructs everything from its weights each time. That redundancy is the operational cost of entanglement.

$740B
Committed to AI in 2026

A roughly 69% jump on the year — even as projects are cancelled for unclear value.

40%+
Gartner forecast

Of enterprise agentic-AI projects cancelled by the end of 2027, on cost and unclear value.

~23%
MIT analysis

Of roles where automation is currently economical. Humans remain cheaper in the rest.

24×
Token growth by 2030

Projected rise in consumption, even as per-token price keeps falling 60–70% a year.

An architecture that structurally stops spending those tokens is not a cheaper version of the current thing. It is a candidate for the next.

05 — Convergence

The field is validating this direction.

The strongest evidence that this is the right track is that the field is rediscovering its pieces independently — and reading them as tactical deployment choices rather than as the architecture they point to.

Small specialists keep winning, empirically: domain-tuned small models routinely match or beat flagship general models on their specific task at a fraction of the cost, hallucinate less when the task is tightly bounded, and in at least one real-time control task a sub-2-million-parameter specialist outperformed trillion-parameter generalists. Each of these is a single specialist beating a generalist.

Our wager is the natural extension. If one small specialist beats a generalist on its task, then a well-composed society of them — each on its own task, bound by a clean contract — covers broad competence while keeping every component in the efficient regime. We sit squarely inside the one direction the field agrees is necessary, an architectural rethink, while occupying a corner of it almost no one else does: taking the separation all the way down, into independent, separately trained specialists bound by a fixed contract, rather than bolting modularity onto a monolith.

06 — Publications

What we'll publish.

We're early, and we'd rather show than claim. As Calliope matures, benchmark results, progress notes, and the papers behind them will live here — including where the approach doesn't win yet.

Coming soon

The disentanglement thesis

The full structural argument: one root cause, one lever, and the escapes it produces.

In preparation
Coming soon

Calliope benchmark report

Methodology and results from our comparisons against frontier single-pass models.

In preparation
Coming soon

Honest uncertainty, by construction

How a system can compute "do I know this?" apart from "what is it?" — and what that buys.

In preparation

Working on the architecture itself?

A lot of the hardest design questions are still open. If this is the problem you want to spend your time on — or you want to back the work — we'd like to hear from you.