Research
One root cause. One lever.
Nearly every problem facing frontier language models in 2026 — compute cost, catastrophic forgetting, data hunger, opacity, jailbreak fragility, hallucination — is a symptom of a single design decision: everything is entangled in one weight space. We believe the way past these problems is not to manage that entanglement more cleverly, but to remove it. This is the argument behind Calliope. We'll publish the evidence as we go.
Everything is entangled in one weight space.
A conventional model bakes the sum of human knowledge, every language, and its reasoning patterns into one shared parameter space, learned together as a side-effect of predicting the next token over undifferentiated text. It does not look anything up. Every fact and relationship is reconstructed by running the whole space forward, because there is no separate store to consult — only weights that re-derive the answer each time.
Three consequences follow directly, and each has become a wall the field is now spending against.
Capacity is the knowledge ceiling
Because knowledge lives in the parameters, what a model can know is bounded by how big it is. Knowing more means being bigger, which means retraining. Knowledge and model size are the same variable.
Updating means overwriting
Because everything shares one space, learning something new perturbs the weights that held something old. That is catastrophic forgetting — the necessary cost of a single shared store.
Every query pays for the whole store
Because the answer is reconstructed by running through the parameters, a trivial query draws on the same machinery as a hard one. Compute is coupled to model size, not task difficulty.
The industry's two moves — scale the compute, or scope the model — both keep the entanglement and try to manage its cost. One pays with hardware, the other with size. Neither questions the monolith.
— Ioma Labs, design philosophyDisentanglement.
Calliope refuses the entanglement at the start rather than unwinding it later. Instead of one weight space holding the world, the language, and the reasoning together, she distributes cognition across specialized faculties that each extract one thing — and only at the very end does a single model turn their combined work into language.
We didn't invent this shape. The one cognitive system we know works at scale — the human brain — is not a monolith but a society of specialized regions, each doing one kind of work and handing it on. Calliope is modeled on that: regions analogous to the brain's, each a specialist — sometimes several transformers, not one — composed into a single distributed cognition. We call the result a Distributed Cognitive Language System.
Four properties define the shape. They are what we'll describe publicly for now:
- Many small specialists, each with one job, sharing no weights — so improving or retraining one cannot disturb another.
- Knowledge held outside the weights, retrieved when it's needed rather than reconstructed — so what she knows can grow without retraining who she is.
- A deliberate reasoning step recruited only when a question needs it, rather than expensive deliberation applied to everything by default.
- An inspectable intermediate representation — a structured record of what the system understood, assembled before it says anything.
Once knowledge, language, and reasoning are physically separate, the symptoms decouple: capacity is no longer tied to any one model's size, updating one part cannot overwrite another, and a query pays only for the parts it uses.
We are not building a cheaper version of the current thing. We are building a different cost structure for intelligence.
— Ioma Labs, design philosophyThirteen walls, one lever.
Take any item in the catalogue of troubles that defines AI in 2026. In this architecture it is not mitigated — it loses the precondition that creates it. The same root cause, the same lever, applied thirteen times.
| The wall | Conventional cause | What disentanglement changes |
|---|---|---|
| Compute per task | One heavy path for every query; reasoning on by default | Cheap perception always runs; deliberate reasoning is recruited only when a question needs it |
| Catastrophic forgetting | One shared weight space; new training overwrites old | Specialists share no weights — retraining one cannot degrade another |
| Data to train | World and language learned together from open text | Only the voice learns language, over a bounded space; knowledge lives outside the weights |
| Training time & capital | One indivisible monolithic run | Small specialists train independently, in parallel, on their own schedules |
| Safe improvement | Every change negotiates with the whole model | Change is local — components evolve behind stable contracts, blast radius contained |
| Scaling | A bigger monolith, more serial compute per pass | Competence grows by adding specialists, not by fattening one |
| Prompt injection & jailbreak | Instructions and data blend in one window; harmful ability trained in, then suppressed | Control is structural, not a prompt; specialists know only their job; harmful knowledge has no representation to reach |
| Hallucination | Facts reconstructed from blurred weight averages | Knowledge is retrieved, not reconstructed; tight scope leaves less room to drift |
| Honest uncertainty | A generative store can't tell knowing from inventing | "Do I know this?" is computed separately from "what is it?" |
| Interpretability & audit | Opaque monolith; mixtures add gating opacity | The intermediate state records what each specialist concluded, before any words |
| Edge & on-device | Models too large to run anywhere but the data center | The common-case path is a few small specialists; the heavy step rarely fires |
| Fault tolerance | One bad update degrades the whole model | Independent specialists fail locally; the rest still answers, appropriately hedged |
| Cost vs. labor | Agents pay to think the same thoughts again | Don't spend the tokens: gated effort, retrieved knowledge, no re-derivation |
These are structural claims about what the design guarantees by construction, demonstrated at proof-of-concept scale. Scaling the proof of concept to a production system is the remaining engineering axis.
Why now.
Through early 2026, the market ran the experiment and found that general-purpose agents at scale can cost more than the humans they were meant to replace.
Uber reportedly exhausted its entire 2026 AI coding-tools budget on token costs four months into the year, with per-engineer assistant bills running $500–$2,000 a month. Microsoft began moving engineers off third-party coding assistants under combined cost and platform-strategy pressure. An NVIDIA VP — at one of the most GPU-rich organizations on earth — stated that for his team the cost of compute now exceeds the cost of employees, a verdict an MIT analysis echoes from the other side, finding AI automation economical in only about 23% of roles.
The relief everyone is waiting on does not arrive from cheaper tokens. Per-token cost is already falling 60–70% a year, yet total consumption is projected to rise some 24-fold by 2030 — so the bill grows regardless of unit price. The most precise diagnosis of why is that monolithic agents are paying to think the same thoughts again: re-deriving the same context and the same routine inferences on every call, because the model reconstructs everything from its weights each time. That redundancy is the operational cost of entanglement.
A roughly 69% jump on the year — even as projects are cancelled for unclear value.
Of enterprise agentic-AI projects cancelled by the end of 2027, on cost and unclear value.
Of roles where automation is currently economical. Humans remain cheaper in the rest.
Projected rise in consumption, even as per-token price keeps falling 60–70% a year.
An architecture that structurally stops spending those tokens is not a cheaper version of the current thing. It is a candidate for the next.
The field is validating this direction.
The strongest evidence that this is the right track is that the field is rediscovering its pieces independently — and reading them as tactical deployment choices rather than as the architecture they point to.
Small specialists keep winning, empirically: domain-tuned small models routinely match or beat flagship general models on their specific task at a fraction of the cost, hallucinate less when the task is tightly bounded, and in at least one real-time control task a sub-2-million-parameter specialist outperformed trillion-parameter generalists. Each of these is a single specialist beating a generalist.
Our wager is the natural extension. If one small specialist beats a generalist on its task, then a well-composed society of them — each on its own task, bound by a clean contract — covers broad competence while keeping every component in the efficient regime. We sit squarely inside the one direction the field agrees is necessary, an architectural rethink, while occupying a corner of it almost no one else does: taking the separation all the way down, into independent, separately trained specialists bound by a fixed contract, rather than bolting modularity onto a monolith.
What we'll publish.
We're early, and we'd rather show than claim. As Calliope matures, benchmark results, progress notes, and the papers behind them will live here — including where the approach doesn't win yet.
The disentanglement thesis
The full structural argument: one root cause, one lever, and the escapes it produces.
In preparationCalliope benchmark report
Methodology and results from our comparisons against frontier single-pass models.
In preparationHonest uncertainty, by construction
How a system can compute "do I know this?" apart from "what is it?" — and what that buys.
In preparationWorking on the architecture itself?
A lot of the hardest design questions are still open. If this is the problem you want to spend your time on — or you want to back the work — we'd like to hear from you.