The 2 AM Problem

In partnership with

The Meeker Series · Week 3 of 7 · Issue #5 · April 2026

I was working in a clinic in Mexico when a patient came in with knee and hip pain. In the United States the next step is obvious: order X-rays, get the images, read the radiologist report.

That clinic had none of that. No X-ray tech, no radiologist. What we had was a donated machine from a veterinary clinic and an instruction manual in a drawer. It took me a few hours. I read the documentation, figured out the technique, and got the studies I needed. Patient got diagnosed. Problem solved.

But something stayed with me from that afternoon. I had been practicing medicine for years on top of infrastructure I had never once needed to understand. The second it disappeared I was starting from scratch. In 2026 that feeling has a name. Researchers call it comprehension debt. And right now your team is accumulating it faster than you think.

The Number That Reframes Every Boardroom Conversation

In under two years, the cost of AI inference performance dropped 280 times over.

Achieving GPT-3.5-level performance cost roughly $20 per million tokens in late 2022. By October 2024, that same performance level cost $0.07. Stanford's AI Index confirmed this figure. Intelligence, functionally, is becoming a commodity utility.

Source: Stanford HAI 2025 AI Index Report; Epoch AI

This is not a marginal efficiency gain. A 280-fold collapse in unit cost in under two years has no clean historical precedent in enterprise software. The competitive advantage was never going to live in access to the tool. It was always going to live in the judgment of the person holding it. The leaders paying attention are not asking "should we use AI?" They are asking the 2 AM version: if this breaks at the worst possible moment, who on my team can still function without it?

Kaiser Permanente's ambient AI scribe program illustrates what commodity intelligence looks like at clinical scale. Over a 63-week period ending December 2024, 7,260 Permanente Medical Group physicians used the technology across more than 2.5 million patient encounters, saving a combined 15,791 hours of documentation time—the equivalent of nearly 1,800 eight-hour workdays. Frequent users saved roughly one hour per day at the keyboard. 84% reported improved patient interactions. 82% said their overall work satisfaction improved.

The clinical bandwidth freed up is real. But so is the question that follows it: when your scribe is always present, always accurate, and never tired, does the physician's own recall and documentation discipline weaken over time?

From Tool to Teammate: The Agentic Shift No One Is Training For

For the past three years, the dominant framing of AI in enterprise was the assistant model. You prompt. It responds. You decide. The interaction is bounded and linear.

That model is being retired. What replaces it is AI systems that don't wait to be asked. They monitor, infer, initiate, and complete multi-step work chains without human handholding at each node. Think less "smart search bar" and more "proactive junior analyst who never sleeps and can draft, send, and log before you have finished your coffee."

When AI moves from reactive tool to proactive teammate, three things shift simultaneously:

Decision velocity increases, and so does decision opacity. Dark code is the most visible symptom, and the same dynamic runs through every domain where AI generates output faster than humans can audit it.
Human override capacity atrophies. This is the 2 AM Problem in its purest form. The more reliably the agent performs, the less frequently the human exercises independent judgment. Capability is not lost all at once. It degrades quietly, like a muscle that stops being loaded.
Accountability surfaces become ambiguous. When an AI agent executes a flawed decision autonomously, the question of who owns that outcome does not resolve itself. In regulated industries like healthcare, finance, and defense, this is not a philosophical question. It has legal and ethical teeth.

The Governance Gap Nobody Has Solved Yet

The industry now has a name for what accumulates in the space between AI output and human understanding. They call it dark code: lines of software that no human has written, read, or reviewed. The term was coined by engineer Jouke Waleson in March 2026. The metaphor comes from manufacturing: Japan's FANUC has run lights-out factories since the 1980s where robots work in complete darkness. AI is bringing the same model to software.

The numbers are moving fast. Claude Code's codebase is now 100% written by Claude Code itself. Anthropic company-wide sits between 70% and 90% AI-generated. Google's Sundar Pichai disclosed in April 2026 that 75% of new code is AI-generated, up from 25% eighteen months prior. Microsoft's Satya Nadella put the figure at up to 30% across some repositories.

Here is the governance problem no one has solved: there is currently no reliable method to measure or trace AI-generated code in a repository after the fact. Unlike technical debt, which announces itself through failing tests, comprehension debt breeds false confidence. A January 2026 Anthropic randomized controlled trial found developers using AI for code generation averaged 50% on comprehension quizzes versus 67% for those coding manually—nearly two letter grades lower. The largest gap was in debugging skills specifically.

The tool making your team faster is simultaneously making them less capable of catching what the tool gets wrong.

What the Kaiser Data Actually Shows About Agentic Risk

The headline numbers from Kaiser's AI scribe rollout are compelling. Nearly 1,800 eight-hour workdays saved. 84% of physicians reporting improved patient interactions. One hour returned to every physician's day.

But buried in the same NEJM Catalyst analysis is the detail that did not make the press release.

The AI scribe produced a small but documented rate of clinical hallucinations—instances where the system generated summaries that did not match what actually happened in the room. Two examples from the published study:

A physician mentioned scheduling a patient's prostate exam. The AI scribe recorded that the exam had been performed.
A doctor discussed issues with a patient's hands, feet and mouth. The AI-generated summary recorded it as a diagnosis of hand, foot and mouth disease.

Neither error is catastrophic in isolation. Both are exactly the kind of quiet, confident, plausible-sounding mistake that slips past a physician whose documentation vigilance has softened because the tool is right 98% of the time. Both errors made it into the published study. Neither made it into the press release.

The people getting the most from these tools are not the ones following blindly down the AI's path. They are the ones still diligent enough to check and verify that the output is correct.

Triage Your AI Stack Before It Triages You

Here is the honest leadership question the efficiency collapse forces: if your team's AI tools went dark tomorrow morning, which decisions would stall, which workflows would collapse, and which people would not know how to proceed?

That audit is not a technology audit. It is a capability audit. And most organizations are not running it.

Three disciplines that close the gap:

Deliberate override drills. Build structured moments where your team makes consequential decisions without the AI layer. Not to reject the tool, but to preserve the muscle. Most organizations have no version of this. They should.
Reasoning transparency as a team norm. When an agent surfaces a recommendation, the default response should not be acceptance or rejection. It should be interrogation. What data drove this? What was weighted? What edge case did it likely miss? Teams that build this habit develop the critical cognition that commodity intelligence cannot replace.
Identify your irreplaceable judgment nodes. Every organization has decisions where contextual nuance, relationship knowledge, or ethical weight means the human must be the final decision-maker. Map them. Protect them. Do not let automation creep into them by default.

The leaders who navigate the agentic era well will not be the ones who deployed the most tools. They will be the ones who knew with precision which decisions belonged to the machine and which ones—at 2 AM when everything was on the line—still required a human in the room.

How are you triaging AI in your own workflow?

I was competent for years on top of infrastructure I had never needed to question. I suspect I am not the only one. The difference is the audit hasn't happened yet.

You are already making decisions about which tasks to hand to AI. Most of those decisions are happening without a framework, without an audit, and without a 2 AM stress test.

❝

"The question is no longer whether your team will work alongside AI agents. It is whether your team knows how to lead when the agent is wrong."

Try This Tomorrow: What is one decision in your current workflow that you have handed to an AI tool, and when was the last time you made that decision without it? Hit reply and share your answer. The most insightful responses will be featured in next week's issue.

Hit reply and share your answer. The most insightful responses will be featured in next week's issue on the three questions every leader must answer before their next AI investment.

— Carlo

Arnold Schwarzenegger has a newsletter.

Yeah. That Arnold Schwarzenegger.

So do Codie Sanchez, Scott Galloway, Colin & Samir, Shaan Puri, and Jay Shetty. And none of them are doing it for fun. They're doing it because a list you own compounds in ways that social media never will.

beehiiv is where they built it. You can start yours for 30% off your first 3 months with code PLATFORM30. Start building today.

The 2 AM Problem

The Number That Reframes Every Boardroom Conversation

From Tool to Teammate: The Agentic Shift No One Is Training For

The Governance Gap Nobody Has Solved Yet

What the Kaiser Data Actually Shows About Agentic Risk

Triage Your AI Stack Before It Triages You

How are you triaging AI in your own workflow?

Arnold Schwarzenegger has a newsletter.

Keep Reading

Quick Links

Subscription

Socials