articles

The Trust Threshold

Name: The Trust Threshold
Author: Simon Willison

Simon Willison

35 highlights

agentic-coding agentic-design-patterns agentic-insight 2026-roadmap-reflection agentic-philosophy-traces review bigideas-concepts agentic-workflow

Highlights & Annotations

But the insight that matters is not about speed. It is about what happens after the trust threshold is crossed. When a model is reliably good, you stop reading every line of its output. When you stop reading the code, you need a fundamentally different mechanism for knowing the code works. That mechanism—a layered stack of automated verification, conformance testing, and evidence-generating tools—is where the real intellectual action is happening right now. The developers who build this verification infrastructure will thrive. The ones who either cling to manual review or abandon verification entirely will suffer, just in different ways.

Ref. 66BA-A

“A lot of people worry about skill atrophy and being lazy. I think this is the opposite of that. You have to operate firing on all cylinders if you’re going to keep your trio or quadruple of agents busy solving all these different problems, and it’s mentally exhausting.”

Ref. 46F0-B

And then there is the fifth stage, the one that arrived in the past few weeks: zero-read . You do not write the code. You do not read the code. The proposition sounds like madness—Willison’s own phrase is “clear insanity”—and yet it works, if and only if you have invested in the right verification infrastructure.

Ref. 6DD4-C

Think of the trust ladder not as stages of laziness but as stages of abstraction. Each rung requires a new verification mechanism. Question-answering needed only your judgment. Code assistance needed your code review skills. Agent-written code needed your ability to evaluate full implementations. Zero-read needs automated proof that the work is correct —because your eyes are no longer on the output.

Ref. A804-D

The implication is subtle but important. The barrier to trust was never intelligence. It was consistency. A model that produces brilliant code 80% of the time and subtly broken code 20% of the time forces you to review everything, because you cannot predict which output you are holding. A model that produces reliable code 95%+ of the time changes the economics entirely. Review becomes sampling, not exhaustive inspection.

Ref. 3517-E

If the trust threshold is the what , the verification stack is the how . Willison’s workflow is not simply “let the agent loose and hope for the best.” It is a carefully layered system of automated evidence that replaces human code reading with machine-generated proof. Each layer catches a different class of failure, and together they provide enough confidence to operate at the zero-read level—at least for certain categories of work.

Ref. 85C7-F

TDD constrains the agent’s output in a way that improves quality. Without tests, agents will over-generate—producing code that covers cases you did not ask about, introducing complexity you did not need. The test-first discipline forces a question: what would prove to me that this task is done? That question keeps the output minimal and correct.

Ref. 3EFE-G

The lethal trifecta cannot be solved through better prompting, fine-tuning, or guardrails. Language models are gullible by design—they do what you tell them. The only reliable defense is architectural: remove one of the three legs. In practice, this usually means cutting the exfiltration vector (the model cannot send data externally) or cutting access to private data (the model operates only on information you would share publicly).

Ref. FE35-H

Layer 1: Red-Green Test-Driven Development

Ref. 2080-I

Layer 2: Manual Exercise via Automation

Ref. 5604-J

Layer 3: Showboat—The Evidence Document

Ref. C054-K

PATTERN: EVIDENCE-OVER-INSPECTION

Ref. 84FE-L

Layer 4: Conformance-Driven Development

Ref. BE7D-M

TECHNIQUE: REVERSE-ENGINEERED CONFORMANCE

Ref. D19E-N

Code Quality Is a Choice You Make

Ref. E39D-O

The Codebase as Context Engine

Ref. 6AF0-P

The analogy to human teams is precise. When you are the first person to use Redis at your company, you have to do it perfectly, because the next person will copy and paste what you did. Agents behave identically—except they copy more faithfully and more consistently than any human colleague would.

Ref. 326A-Q

The Exhaustion Governor

Ref. 288A-R

PATTERN: THE EXHAUSTION GOVERNOR

Ref. 16C3-S

Sandboxing as Primary Defense

Ref. 1425-U

The Phone as Development Environment

Ref. 33DA-V

MENTAL MODEL: SPECIFICATION SURFACE VS. EXECUTION SURFACE

Ref. F706-W

The Component Library Collapse

Ref. 2175-X

Expand Your Ambition, Not Your Hours

Ref. 082F-Y

Have Fun with Weird Projects

Ref. B039-Z

The Verification Stack

Ref. BADE-B

The Exhaustion Governor

Ref. 288A-C

Conformance-Driven Development

Ref. E081-D

Specification Surface vs. Execution Surface

Ref. 27BA-E

For Individual Developers

Ref. 2288-F

cognitive exhaustion. Plan accordingly. The productivity model is bursty, not sustained.

Ref. 7199-G