MDx OS · Part II

The Autonomous Company

Seven primitives told us what a company of agents needs.
The next question is harder.
What happens when nobody's at the desk?

by MD · March 2026 · A Real Talk Publication · 25 min read

The Shift

The last publication ended with a line I didn't know I meant: "maybe ten by the time this is done." Turns out that line was doing more work than I realized.

The seven primitives answered a clear question: what does a company of agents irreducibly need to function? Message, Registry, Ledger, Context, Gate, Pulse, Pages. Communication, identity, trust, briefing, checkpoints, awareness, memory. Seven things. All built. All running.

And then I went and built them...for real, not on paper...across twelve sessions. Wrote the code, wrote the tests, ran the migrations, feature-flagged everything so the old system still works. 3,516 tests passing, no regressions, nothing broken.

That's when the next question showed up. Not because I went looking for it...because building revealed it. Same way it always does.

10
Primitives (v3)
3,516
Tests Passing
150
Migrations
5
Apps Running

The question shifted. It went from "what does a company of agents need to function?" to something bigger: what does a company of agents need to function without humans?

That "without humans" part changes everything. Not because I want to remove people...I don't...but because it's the forcing function. If you design for autonomy, you can always layer oversight on top...but going the other direction? Pulling humans out of a system that was built around them? That's way harder. The architecture has to support full autonomy even if you choose not to use it.

So that's what v3 is...not removing humans, but making them optional. The company runs on its own. Humans can walk in anytime...observe everything, intervene anywhere, add checkpoints wherever they want. But they shouldn't have to.

"Think of it like watching a factory floor from the observation deck. Everything's running. You can see it all. You can hit the stop button if you need to. But you shouldn't have to."

The Human-Shaped Hole

I went back and looked at the seven primitives through a different lens. Instead of asking "what does each one do," I asked "which one breaks without a human in the loop?"

Message works agent-to-agent...so does Registry, so does Ledger. Context assembles itself. Pulse just watches. Pages...agents are already the primary authors. Six out of seven primitives function fine in a world where no human is present.

And then there's Gate.

Gate is defined as "where work pauses and a human says go or no-go." That's literally the tagline. Every other primitive can function in a world where no human is present. Gate cannot. It's the seam. The one primitive that encodes a hard dependency on human participation.

And here's the thing...Gate isn't wrong. For where we are right now...early, experimental, building trust in autonomous systems...having a human at every decision point is the right call. You wouldn't let a new employee make high-stakes decisions on day one either. But the architecture has to grow past that.

"Gate was built to be the spot where a human says yes or no. And for v2...that was the right call. For v3, the question becomes: what happens when judgment itself can be delegated?"

The answer isn't "remove all the gates." That'd be reckless. The answer is...gates evolve. Instead of "a human approves," it becomes "a policy approves." Low-risk work flows through automatically, medium-risk gets peer-reviewed by another agent, high-risk goes to a governance council...multiple agents evaluating from different angles. And if you want, truly high-stakes decisions still pause for a human...and you set those thresholds wherever you want.

That evolution...from Gate to Governance...is the single biggest architectural shift in v3. It's what makes everything else possible. Because once decision-making can be autonomous, then strategy can be autonomous...and planning, and resource allocation. The whole workflow opens up.

But it also requires something Gate didn't need: constitutional constraints. Hard limits that no agent can override, no matter what...spending caps, data access boundaries, deployment restrictions. Things that can't be loosened without a human explicitly choosing to loosen them.

You give agents freedom to operate...and you also give them walls they can't move. That's what lets you sleep at night.

The Missing Three

The primitives publication identified Treasury, Evaluation, and Legal Identity as unsolved frontiers. Building v2 turned them from interesting questions into load-bearing requirements. And it clarified what they actually need to be.

Treasury
The financial nervous system. Budget, spend, acquire, throttle.
New · Kernel

Every task costs something...an API call to GPT-4 costs tokens, a web search costs credits, spawning a Rust compile takes compute, running five parallel agents for ten minutes burns real money. And right now...nobody's tracking that at the primitive level.

Treasury sits next to Ledger in the kernel because every financial operation needs to be chain-hashed and immutable. You need to know: how much budget does this project have? How much has been spent? What's the burn rate? Is this agent about to exceed its allocation?

But it's broader than money. Treasury manages any scarce resource...API tokens, compute quotas, storage limits, external service credits, model access. An agent needs an API key for a new service. Who pays? What's the limit? Can it auto-renew? What happens when the bill exceeds the budget?

This is where it gets genuinely novel. Stripe built payments infrastructure for humans buying from humans...but there's no equivalent for agents buying from services. No Plaid for agent-to-agent financial settlement. The infrastructure doesn't exist anywhere...not just in MDx, anywhere. And whoever figures it out first is probably building something a lot of autonomous systems will eventually need.

For now, the scope is internal: budget allocation, spend tracking, resource acquisition, Ledger attestation. But the pattern...agents managing their own financial operations...that's the seed of something much bigger.

Governance
Gate grows up...policies approve, councils decide, constraints hold.
Evolved · Service

Gate v2 has three modes: human-approval, agent-verification, auto-pass. That's the right starting vocabulary. Governance v3 expands it into a full policy engine.

The core idea is risk-based routing. Every decision has a risk level. Low-risk stuff...updating a knowledge document, generating a routine report...flows through automatically. Medium-risk...code changes, budget reallocation...gets reviewed by a peer agent. High-risk...production deployments, strategy changes, spending above threshold...goes to a governance council or pauses for a human. You set the thresholds.

And that governance council concept is where it gets interesting. For high-stakes decisions, multiple agents evaluate from different angles...a technical agent checks feasibility, a finance agent checks budget impact, a quality agent checks standards. They vote. The decision and every vote gets attested to Ledger. It's a board of directors pattern, except the deliberation is fast, fully recorded, and actually objective.

But the most important part is constitutional constraints. Hard limits no agent can override...total spending caps, data classification boundaries, deployment restrictions. Not guidelines, not best practices...actual walls. The things that let you leave the system running overnight without worrying about waking up to a disaster.

And the human gates don't disappear...they become optional and configurable. For each workflow stage, you choose: fully autonomous, notify me, or require my approval. Start with gates everywhere and loosen them as you build trust...or start fully autonomous and add gates where things go sideways. The point is you choose the level of control, and you can change it anytime.

Lifecycle
HR for agents...except this time, performance reviews actually work.
New · Service

The primitives publication talked about Evaluation...longitudinal performance tracking. Building v2 showed me that evaluation is one part of a bigger thing. It's not just "how's this agent doing?" It's the full lifecycle: hire, develop, evaluate, promote, and...sometimes...retire.

It starts with provisioning...spin up a new agent, give it capabilities, tool access, permissions, a budget allocation from Treasury. Set its initial autonomy level...how much oversight it gets from Governance. Assign it to projects. Day one for a new employee, except it takes seconds instead of weeks.

Then there's evaluation...and I don't mean per-task quality scores, Pulse handles that. I'm talking about career-level assessment. Is this agent getting better over time? Is it more cost-efficient than it was last month? Are its quality scores trending up? This is the annual review...except it's continuous, objective, and based on actual measured performance instead of your manager's vague recollection of that one project six months ago.

That evaluation feeds into autonomy escalation. An agent that consistently delivers high-quality work at low cost earns trust. More trust means less oversight...fewer gates, bigger budgets, higher-stakes assignments. An agent whose quality drops gets more gates, smaller scope, closer monitoring. Promotion and demotion, driven by data...not politics.

And then there's model migration...this one is uniquely agent-native. When a better model drops...and they drop fast...Lifecycle manages the transition. Shadow-run the new model alongside the old one, compare outputs, and when the new one consistently matches or exceeds...swap. If it regresses...rollback. No downtime, no drama.

And eventually, retirement. Graceful decommission...active tasks get reassigned, knowledge gets transferred to Pages, capabilities get redistributed, full history preserved in Ledger. The agent's work lives on even after the agent doesn't.

There's some irony in the fact that we might end up building better performance management for agents than humans ever got for themselves. But honestly...that's kind of the point. The system humans built for evaluating humans is...not great. Maybe starting fresh with agents is a chance to get it right.

Three new primitives...Treasury in the kernel, Governance and Lifecycle as system services. The seven gain siblings.

And there's a fourth frontier...Authority. Can an agent enter an agreement? Accept terms of service? Sign a data processing agreement? Operate across organizational boundaries? That one is partly a technology problem and partly a "the world isn't ready" problem. The technology side...agent certificates, cross-org trust, credential management...is buildable now. But I'm not going to pretend the legal framework exists yet. It doesn't. So Authority stays on the horizon...something we're thinking about but not building until the foundation is solid.

The Workflow

Primitives are the building blocks...but a company doesn't run on components sitting next to each other. It runs on the cycle between them. Strategy becomes projects. Projects become tasks. Tasks get assigned, executed, reviewed, delivered, and learned from. That cycle IS the company.

Here's what's wild...I already built this. Phase 18 of MDx Code created an autonomous SDLC pipeline. You give it an intent...a one-paragraph description of what you want built. It runs a full planning pipeline: loads the repo context, does a gap analysis, runs an expert panel of fifteen AI reviewers from different angles, generates dependency-ordered session briefs. You approve the plan. Then it executes...spawns parallel workers, monitors their output, auto-triages when they get stuck, advances through waves, consolidates results, and delivers a summary. You review the SQL, run it to production, and close.

That's a company workflow...it plans, assigns, executes in parallel, handles blockers autonomously, and delivers. It just happens to be scoped to engineering.

"The SDLC pipeline is already doing this...strategy, planning, parallel execution, self-healing, delivery. I built the autonomous company once already...just for engineering. v3 is the generalization."

The Universal Cycle

Most company functions follow a similar pattern. Marketing, support, finance, operations, product...they all roughly do this:

01

Strategy

What should we do? The Strategy Council reviews Pulse data, Lifecycle performance, Treasury budgets, and external signals...then sets goals, creates projects, and allocates resources. This is the CEO function...except it's an agent council, deliberating continuously, not a quarterly offsite.

02

Planning

How should we do it? The Planning Agent takes a project, breaks it into tasks with dependency graphs, estimates costs via Treasury, matches tasks to agents via Registry capability queries, and produces self-contained briefs. Same pattern as Phase 18's brief generator...generalized beyond engineering.

03

Execution

Do the work. The Supervisor spawns workers in parallel waves. Agents get briefed by Context, communicate via Message, document in Pages, report status to Pulse. When they get stuck, they ask for help. When they're blocked, the auto-triage system kicks in. Same process supervisor from Phase 18...tested across a full build cycle.

04

Review & Delivery

Is it good enough? Governance evaluates outputs against quality standards...low-risk work auto-approves, high-risk work goes to council or a human gate. Approved work ships, everything gets recorded in Ledger, and Treasury settles costs.

05

Learning

What did we learn? Lifecycle updates agent performance records. Pulse data feeds back to Strategy. Knowledge gaps discovered during execution get proposed to Pages. The company gets smarter with every cycle...not by accident, but by architecture.

The Agent CEO

This is the plan. Not a thought experiment...an agent council that performs the executive function.

The Strategy Council reviews what's working, what's failing, what's costing too much. It looks at Lifecycle data...which agents are performing, which aren't, where capability gaps exist. It checks Treasury...budget state, burn rate, projected runway. And then it makes decisions about what to prioritize, what to defer, where to invest, when to cut.

But it operates inside constitutional constraints. It can't exceed the total budget ceiling. It can't modify its own constraints...only a human can do that. It has to maintain minimum service levels for existing commitments. And every strategic decision gets attested to Ledger with full rationale, so you can always go back and understand why a choice was made.

That last part is actually something most human CEOs can't offer. Full, auditable rationale for every strategic decision, permanently recorded. No revisionist history, no "I don't recall"...just the data, the reasoning, and the outcome.

The Observatory

If the company runs without you, you still need to see into it. Not to control it...to understand it. The Observatory is the glass wall around the factory floor.

We already have the bones. The Console has ten sections...Dashboard, Agents, Security, Knowledge, Federation, Users, Settings, Registry, Performance, Forge. Pulse streams real-time data from seven sources. The metrics APIs expose usage, quality, tokens, costs. Health checks monitor every subsystem.

That's observability for a system. The Observatory is observability for a company...different altitude, different questions.

Real-Time

Agent Activity Feed

What is every agent doing right now? Which tasks are in progress, blocked, complete? You can read agent-to-agent conversations as they happen. Like watching a Slack workspace where all the employees are AI.

Treasury

Financial Dashboard

Budget allocation across agents and projects. Real-time spend vs. budget. Cost per task, cost per agent, cost trends. Projected burn rate and remaining runway. The "make sure this doesn't go off the rails" view.

Lifecycle

Performance Dashboard

Agent evaluation scores and trends over time. Autonomy levels...which agents are trusted for what. Quality metrics per agent. Model comparisons. Promotion and demotion history. The talent management view.

Governance

Decision Log

Every decision made, by whom, under what policy. Constitutional constraint checks...what got blocked and why. Council votes and rationale. Human gate history. The "why did this happen" view.

The Strategy Board

Company goals flow into projects. Projects flow into tasks. Tasks have status, assigned agents, cost, and quality scores. You can see the whole thing at a glance...like a Kanban board, except it's populated and managed entirely by agents. You're not moving cards. You're watching the company think.

And if you want to intervene...you can. Inject a goal, reprioritize a project. That becomes a Gate interaction...the system acknowledges it, integrates it, and adjusts. You're not fighting the system to change course...you're having a conversation with it.

A Night Shift

Here's what it actually looks like.

02:14 AM

Strategy Council reviews overnight Pulse data. Customer support quality dipped 3%. Creates project: "Investigate and fix support quality regression."

02:15 AM

Planning Agent breaks it into five tasks, estimates $2.50 in API costs, assigns agents by capability, produces briefs.

02:16 AM

Supervisor spawns workers. Support Analyst reads recent conversations via Context. Finds a knowledge gap in Tier 2 docs.

02:31 AM

Knowledge Agent proposes a fix...new document for Pages. Quality Agent reviews it. Governance auto-approves (low risk, matches policy).

02:33 AM

Lifecycle updates...Support Analyst quality score +1, Knowledge Agent completed in 40% fewer tokens than last time. Efficiency trending up.

02:34 AM

Strategy Council closes the project. Total cost: $1.80. Support quality metric marked for continued monitoring.

07:00 AM

You open the Observatory. See: "3 projects completed overnight. Support quality back to baseline. $4.20 total spend. All agents healthy." You nod. Close the laptop.

Or...you see something that doesn't feel right. A governance decision you wouldn't have made. You click into it. Read the council's rationale. Add a human gate on that decision type for the next week. The system adapts. Agents still run...they just pause at that new gate and wait for your input.

The point isn't that you don't care. The point is that you choose where to care. And you can change that choice at any time.

"The Observatory isn't a dashboard for a system. It's a window into a company. You're not monitoring servers. You're watching agents strategize, plan, execute, review, learn, and adapt. And deciding how much of that you want to be part of."

The Full Stack

The stack grows. New primitives, new services, new applications. And the company gains the ability to run itself.

Applications
Products
Pages Twin Code Stella Pulse Message Strategy Observatory
System Services
Forge
Context Governance Pulse Lifecycle
Kernel
Core
Message Registry Ledger Treasury
Sub-Kernel
Infrastructure
Model Abstraction Provider Adapters Knowledge Infra Memory & State Security & Guardrails

Kernel gains Treasury. Four primitives now...Message, Registry, Ledger, Treasury. If any of them go offline, the company stops. Communication, identity, trust, and money. Can't run a company without any of those.

System Services gain Lifecycle and Governance (Gate evolved). Four services now...Context briefs agents, Governance approves decisions, Pulse watches everything, and Lifecycle manages agent careers. The operational fabric of the company.

Applications gain Strategy and Observatory. Strategy is the executive function...goal-setting, project creation, resource allocation. Observatory is the human window into the company. These are products on the OS...swappable, rebuildable, not load-bearing for the kernel.

Sub-Kernel stays unchanged. The invisible infrastructure that makes agents work internally. Model routing, provider adapters, embeddings, guardrails. Critical but not the abstraction anyone thinks about.

The Dependency Map

v3 adds new dependency relationships that didn't exist in v2:

Treasury ↔ Ledger: Every financial operation is chain-hashed. You can audit every dollar ever spent, by whom, on what, and why.

Governance ↔ Lifecycle: An agent's autonomy level (from Lifecycle) determines how many gates Governance requires. High-performing agents get more freedom, low-performing agents get more oversight. Trust is earned, not assumed.

Strategy → Everything: Strategy reads from Pulse (what's happening), Lifecycle (who's performing), Treasury (what's affordable). It writes to the workflow engine (what to do next). It's the nerve center...but it operates within constitutional constraints that it can't modify.

Observatory → Everything: Read-only. Observatory consumes from every primitive and every application to render the company's state for a human observer. It never writes, never changes anything...it just watches.

The Full Company

Three versions of the same question, each at a higher altitude.

v1: how do you build a system that uses AI well? Fifteen components, four layers. Still running, still healthy, still the sub-kernel underneath everything.

v2: what does a company of agents need to function? Seven primitives...kernel, services, application. Built, tested, deployed. The operating surface of the company.

v3: what does a company of agents need to function without humans? Treasury in the kernel. Governance and Lifecycle as services. Strategy and Observatory as applications. The workflow engine generalized. The observation layer built. The full autonomous stack.

And each version built on the one before. The fifteen components didn't get thrown away...they became the sub-kernel. The seven primitives don't get thrown away...they gain siblings. The architecture layers...it doesn't replace.

The acid test for v3 is simple. Leave it running overnight. Wake up. Open the Observatory. See work completed, money spent responsibly, quality maintained, decisions documented. And feel comfortable enough to leave it running for another day.

That's the bar. Not whether it can do impressive demos or handle toy problems...can you trust it to run the company while you're asleep? Can you wake up and feel good about what happened?

I haven't built v3 yet. I'm writing about it the same way I wrote about the seven primitives before they were built...and then I'm going to go build it. Same process, same approach, same lesson: build something real, discover what's missing, formalize the pattern, build again.

The primitives publication ended with "more soon." This is the more. And there's more after this too...Authority is still on the horizon, and I'm sure building v3 will reveal things I haven't thought of yet. That's how this works. The questions keep getting harder...and that's how you know you're building something real.

The primitives gave the company a nervous system. v3 gives it a brain, a budget, and something to do with them.

Let's see if it works.