The Arena Is the Product.
How Karpathy's Autoresearch Validates the Domain Operating Frame Thesis — From a Completely Different Direction
Mohamed Anis, Founder, Human-Edge.AI
The Development
On March 7, 2026 — the same week this paper was being finalised — Andrej Karpathy open-sourced autoresearch, a system that runs approximately 100 ML experiments on a single GPU overnight with no human intervention after initial setup. The system accumulated 83 experiments and 15 kept improvements in its first published run. A larger version running on 8xH100 logged 276 experiments and 29 kept improvements. Karpathy described it as "part code, part sci-fi, and a pinch of psychosis."
The system has exactly three files that matter. prepare.py handles one-time data preparation and is never modified. train.py contains the full model, optimiser, and training loop — the agent edits this freely. program.md is where the human shapes the agent's research strategy. That third file is the only thing the human writes.
prepare.py
One-time data prep
train.py
Agent edits freely
program.md
Human writes frame
The agent operates in an autonomous loop: it modifies the training code, trains for exactly five minutes, checks the score (validation bits per byte), keeps or discards the result, commits improvements to a git branch, and repeats. All night. Without the human.
When asked why the agent doesn't iterate on the prompt too, Karpathy responded: "definitely. the current one is already 90% AI written I ain't writing all that."
Experiments (1x GPU)
15 improvements kept
Experiments (8x H100)
29 improvements kept
Garry Tan, president of Y Combinator, wrote the most incisive analysis of the pattern: "You can't just ask the agent to self-improve. You have to design the arena and give it a push based on your knowledge of how it all works." His closing line: "The best program.md wins."
Why This Matters for Human-Edge
At first glance, autoresearch has nothing to do with what we are building. Karpathy is optimising neural network hyperparameters. We are building context infrastructure for business domains. The domains are unrelated.
The architecture is identical.
The Structural Mapping
Create a beautiful comparison table/card grid mapping:
Karpathy's Autoresearch
program.md
Human-Edge.AI
Domain Operating Frame
Karpathy's Autoresearch
train.py
Human-Edge.AI
Human Graph + Job Graph
Karpathy's Autoresearch
val_bpb
Human-Edge.AI
Outcome telemetry (reply rates, meetings booked, etc.)
Karpathy's Autoresearch
5-minute training runs
Human-Edge.AI
Individual actions (emails, introductions, etc.)
Karpathy's Autoresearch
Git commits of improvements
Human-Edge.AI
Calibration signals
Karpathy's Autoresearch
The autonomous loop
Human-Edge.AI
The Execution and Feedback Loop
Karpathy built a Domain Operating Frame for ML research and called it program.md. He closed the feedback loop with a clean metric. He let agents iterate autonomously. The system compounds improvements without human intervention.
That is exactly what we described in Sections 4, 9, and 10 of the main paper. We arrived at the same architecture from the business domain direction. Karpathy arrived at it from the ML research direction. The convergence is not coincidental. It is structural.
Three-Layer Validation
Frame + Model + Feedback Loop = Compounding Improvement
What happens when each is removed:
Without the Frame
Agent has no constraints, seed-hacks, overfits.
Without the Model
Frame is inert. No execution layer.
Without the Loop
Random search, not research. No compounding.
The paper's core equations validate this:
Domain Operating Frame + Frontier Model = Aligned Decisions
Aligned Decisions + Execution Loop + Feedback Loop = Business Outcomes
Karpathy proved the architecture works. The open question is whether it transfers from domains with clean loss functions to domains with multi-dimensional objective functions. That is our research and engineering challenge.
Clean Metrics vs. Messy Objectives
Karpathy's arena has one metric: validation bits per byte. Lower is better. Unambiguous, immediate, comparable.
Karpathy's World
Validation Bits Per Byte
One metric. Lower is better. Unambiguous.
Business domains: Fundraising success is multi-dimensional — speed to close, valuation achieved, dilution accepted, strategic fit of investor, founder time invested, relationship capital preserved or depleted, signalling effects on future rounds. These dimensions trade off.
This is precisely why the Domain Operating Frame must be engineered, not just written in a Markdown file.
The Domain Operating Frame must encode six components:
The ontology
The workflow state
The constraints
The actor archetypes
The operational norms
The multi-dimensional objective function
A program.md for ML research is roughly 2,000 words of structured guidance. A Domain Operating Frame for fundraising is a living knowledge graph with hundreds of typed nodes, weighted edges, temporal dynamics, and an evolving objective function calibrated against real outcomes.
The architecture is the same. The arena is harder. That is the opportunity.
The Progression Timeline
Pre-2025
Gen 1: Raw Prompting
Human writes code
Feb 2025
Gen 2: Prompt Engineering
Human directs the work
Feb 2026
Transition: Agentic Engineering
Human orchestrates agents
Mar 2026
Gen 3: Context-Driven Interaction
Human designs the arena; agents execute autonomously
Each step removes one layer of human involvement. The human goes from writing code, to directing an agent that writes code, to designing the frame within which agents operate autonomously. The human's value migrates from execution to judgment to architecture.
This is exactly the shift Human-Edge is building for.
The Convergent Pattern
Three independent developments converging:
Mid-2025
The Ralph Wiggum Technique
Geoffrey Huntley
Bash loop, one agent, specs as success criteria. $50,000 contract delivered for $297 in API costs.
March 2026
Autoresearch
Andrej Karpathy
One agent, ML experiments, structured research frame. 100 experiments overnight.
January 2026
Gas Town
Steve Yegge
20-30 coding agents in parallel, 75,000 lines of code, 2,000 commits, 17 days.
The pattern:
- 01A structured frame defines the arena
- 02An autonomous loop executes within the frame
- 03A feedback mechanism determines what to keep/discard
- 04The human designs the arena, then steps back
The only difference is the domain.
The Investor Implication
A year from now, the companies winning won't be the ones with the most engineers or the most compute. They'll be the ones whose agents never stopped running. The best program.md wins.
— Garry Tan, President, Y Combinator
Translate into our language: the best Domain Operating Frame wins.
The model is commodity. The agents are commodity. The arena is the product. The frame is the moat.
That is what Human-Edge.AI is building.
What This Changes in Our Architecture
The autonomy gradient: how much freedom the agent has depends on the stakes.
Full Autonomy
Research, analysis, drafts, tracking
Human-in-the-Loop
Outreach sequencing, deck revisions, follow-up timing
Human Approval Required
Investor comms, data room sharing, term sheet responses
The frame governs not just what the agent does, but how much freedom the agent has at each stage.
Conclusion
Karpathy did not set out to validate the Human-Edge thesis. He set out to automate ML research. But the architecture he built — a structured frame, an autonomous execution loop, a clear feedback mechanism, and accumulated compounding improvement — is structurally identical to what this paper proposes for business domains.
The convergence from independent directions is the strongest possible signal that the pattern is real.
The arena is the product. The frame is the moat. The best Domain Operating Frame wins.
We intend to build it.
This addendum should be read alongside the main paper:
From Outputs to Outcomes: Context Is Not EnoughReferences
[8] Andrej Karpathy, "autoresearch," GitHub repository, March 7, 2026. github.com/karpathy/autoresearch
[9] Garry Tan, "Karpathy Just Turned One GPU Into a Research Lab," March 8, 2026.
[10] Geoffrey Huntley, "The Ralph Wiggum Technique," ghuntley.com, 2025.
[11] Steve Yegge, "Welcome to Gas Town," Medium, January 2026.
[12] Andrej Karpathy, @karpathy, Twitter/X, "agentic engineering" post, February 8, 2026.
Human-Edge.AI — March 2026
All rights reserved. Research paper.