No More Code Reviews: Lights-Out Codebases Ahead
Code reviews aren't optional — they're prohibitively unrealistic in the new world.
Code reviews are a thing of the past. They’re not only impractical, but soon impracticable, and after that, outright negligent.
I was going to end the entire post there, thinking most developers have already drawn this conclusion. Then a small voice reminded me that many friends don’t even think AI will replace junior developers shortly, so I’ll have to “show my work” like it’s geometry class.
Unexpectedly Early Fireside True Story™ Time: When I first joined Microsoft Money (the personal finance product) back in 1998, code reviews weren’t required. In fact, they were done only on rare occasions when developers were unsure of their work.
I, on the other hand, as a chipper 22-year-old, enthusiastically scheduled a meeting with Money’s Principal Dev Manager to pitch the idea of mandatory code reviews. The meeting ended after a few minutes, with the manager incredulous at my proposition.
“You’re telling me it makes sense for you, a fresh college grad, to review the code of someone who’s been coding 20 years professionally?”
He didn’t buy into any of the other benefits. Like me learning from being reviewed by experienced engineers. Or the fact that everyone makes mistakes regardless of seniority. Or the value of having multiple people understand the implementation of a feature.
Though several other Microsoft teams required code reviews at the time, Money didn’t until well after I left the team.
Though I was code review’s biggest fanboy in 1998, I’ll now try to convince you they’re impracticable, and will soon be downright irresponsible.
The Paltry Information Content of Microsoft Reorg Emails
Three years ago, I made a then-humorous now-commonplace observation: “Soon we’ll have AIs writing emails, bombarding us with so much text that we’ll need AIs to summarize those AI-written emails. SMTP becomes the inverse of data compression: you start with the seed of an idea, blow it out with AI to several paragraphs during transport, only to have it be distilled back to the original seed by the time the recipient’s AI reads it.”
You know this phenomenon if you’ve ever worked at a large company. Microsoft reorg emails always follow this template:
Our team has achieved amazing results this past quarter: [list a few]. We should be proud of the great work we’ve done.
Over the past few months, there have been significant changes in our operating environment: [list a few; include numbers and statistics].
In looking ahead, we are reorganizing to align ourselves to future challenges:
Sally will head BooDeBah. You’ll remember her from such hits as: [list accomplishments]. Sally will support (editor’s note: “support!” Never “manage!”) Jack, Jill, and Mr. Hill.
Bob will head BahDeBoo. In having only joined us 6 months ago, he’s already: [list accomplishments]. Bob will support Smith, Smythe, Smithe, and Schmidt.
We thank Timothy for his great work leading us over this past release. Timothy will report to me as an individual contributor doing [something or other inconsequential enshrouded in mysterious phrasing]. (editor’s note: Timothy always stays on payroll for 6-12 months, then silently disappears from LDAP).
We’re in an amazing domain driving key innovations across industry. The work each and every one of you do is essential to our success. Thank you for your hard work during Daikatana’s release, and I look forward to how we tackle the exciting challenges ahead!
Literally nobody reads these emails. Re-orgs were so common during my time at Microsoft that everyone knew the algorithm: scroll through many paragraphs and extract the bolded names above. Ignore it all if it doesn’t affect your world. If it does, freak out and start taking long coffee breaks with coworkers to discuss the scuttlebutt.
The point to realize about Microsoft reorg emails is that they contain very little informational content encapsulated inside a gooey glob of requisite social veneer. Nobody wants a VP who sends an email that just says, “Tim’s out. Sally and Bob will split his direct reports. Back to work.”
I’m actually going to land the plane now. The relevance of the above will become apparent shortly.
Filter My Explosively Growing Information Bubble
Days coding at Facebook used to begin with reviewing others’ PRs in the morning, approving the work they accomplished the previous evening. On a busy day, you might spend a good hour reviewing others’ code.
A former coworker of mine, Michael Novati, the fastest coder at Facebook for years, had a day this February where he landed 417 PRs on GitHub. How long would it take you to review that?
But let’s not get anywhere near that extreme. Take the typical person on LinkedIn these days claiming code production boosts of 5-20x. How many hours a day would you spend reviewing that?
We need to stop reviewing code because it’s simply impossible to keep up with the volume being produced. Furthermore, when you spot an issue and submit a comment, the original author can just “@codex fix this.” It’s like asking a human to QC a confetti cannon. You not only become the rate-limiting step; you will tire of the deluge of code which will be impossible to keep on top of.
“Lights-out datacenters” are fully automated, where no human walks across the shop floor, thus enabling you to do without lighting altogether. We need “lights-out codebases” where no human ever sees the code. This isn’t an aspiration. It’s a requirement predetermined by dynamics already manifest.
What remains is embracing this future.
Scary? Nirvana? Scary Nirvana?
Does the idea of lights-out codebases scare you?
If I’m honest, I’m nervous about it even as I acknowledge its inevitability. My two most recent apps (Tanya’s Snowfield and OTD: On This Day) are both lights-out codebases where I’ve not seen or edited code at all, literally from creating their GitHub repros all the way through releasing in production. I wouldn’t even know if there was SETI@home code running in both.
But several things are slowly changing my mind.
In the past 6 months, I’ve had both Codex and Cursor reviewing PRs on GitHub. I’ve been shocked by the number of times one or both have identified issues in both human- and AI-generated code. If anything, this experience has shown me that:
My code contained far more bugs than I realized. I had been working on Superphonic’s codebase for several years, dogfooding the product daily without noticing issues. The number of bugs that AI code reviewers surface has humbled my appraisal of the quality of my code.
There is a lot more runway for AI to improve itself. The same LLM often finds issues with code that another instance of itself produced. There’s a lot more we can do to increase the quality of AI output even if all progress on foundation models stops.
It’s not unthinkable we’ll pass a threshold in the next few years where human intervention in codebases is seen as downright risky and irresponsible.
We’ve already passed that point with autonomous driving on highways, though few want to concede it. Highway fatality statistics suggest that if we really cared about saving lives, we would mandate that all vehicles on highways be driven autonomously. Waymos cause 5x fewer injuries than humans driving the same urban roads. The only reason we haven’t collectively mandated adoption of autonomous vehicles despite their superior safety record is that most of us still struggle to believe they’re safer. At a gut level, it just doesn’t feel believable, regardless of statistics. This is why nearly every claim about autonomous vehicles’ comparative safety is met with a litany of what-about-isms. This is why there continue to be more automobile fatalities than necessary.
Similarly, if I’m honest, it’s very hard for me to believe AI will one day write better code than me. Never mind that it already spots issues in PRs I’ve pre-reviewed and deemed OK. I just can’t believe it’ll do The Special Snowflakey Stuff Only Humans Can™ — you know, like beating a human at chess, beating a human at Go, beating Jeopardy since it’s full of nuance, writing better pop songs. I’ve got an endless list of other goalposts I’m happy to move towards once it does. “AGI” is essentially “whatever AI can’t do today,” indefinitely into the future. Humans FTW!
The rational side of me knows we’re headed towards a day when humans will be considered a liability in code production. Humans may remain great at software development as a discipline, but I can easily imagine a day when people would be aghast to learn their cloud platform “lets humans touch the code.”
Seek the Leading Edge, Not the Trailing
Hardware chip companies already use black-box development to collaborate with other firms. When receiving a chip component design from a vendor, there’s no way humans could possibly review the entire layout. Instead, you run acceptance tests proving the chip’s design to be good. This is standard industry practice.
Software needs to go there. To embrace lights-out codebases, I’ve been using:
TDD-like approaches to develop testable component boundaries and greatly increase automated testing
AI to check AI via pre- and post-action reviews (i.e. before a plan is executed, then after the coding is done)
Different LLMs to augment each other’s strengths
Dedicated agents to reduce categories of errors (e.g. a security review agent that only looks for security issues)
CI/CD protections and pre-commit hooks to prevent the introduction of issues
Agent skills that further increase high-quality autonomy
In a way, none of this is new. Many of these approaches have long been known to help reduce human-introduced defects. They’ve just become far more important as AI begins to greatly outproduce humans, given the ability to run massively parallel 24/7.
We’ll soon no longer jeer at PMs vibe-coding toy apps. I myself have two simple apps in production on iOS using lights-out codebases. It’s not hard to believe we’ll soon build and maintain increasingly complex codebases using lights-out principles.
And — you heard it here first — we’ll one day be scared, positively petrified, to use any mission-critical software known to have allowed human interference in its codebase.



The lights-out argument works when you can define what correct looks like.
In enterprise retail that's the hard part - pricing rules nobody documented, compliance requirements that changed, checkout logic that accumulated 10 years of business decisions.
Code review was never just about bugs. It was the last checkpoint where someone who understood the domain could ask "does this actually reflect how the business works?"
Remove that and you don't get a cleaner codebase. You get drift that's invisible until a customer notices.
Really interesting framing. One thing I keep wondering: with the hardware chips that you brought up, verification works because it is anchored to explicit design artifacts and formalized constraints (DSE etc). In enterprise software, many of the critical invariants (business rules, cross-system dependencies, historical edge cases) are never fully formalized, but they are tacit knowledge. When AI scales code generation and we reduce human code review, the question becomes: what is the authoritative intent artifact that code verification runs against? Curious how you see the “design intent layer” evolving in a lights-out model especially in large enterprise codebases.