The efficiency evidence is good, and it deserves to be said plainly. A large randomized trial in customer service measured productivity gains above 10%; a separate study of nearly 5,000 software developers measured gains above 25% (cited in Harvard Business Review, June 2026). These are rigorous field experiments, not vendor decks, controlled designs, real workers, measured output. Anyone dismissing AI productivity as hype is arguing against data.

The trouble starts when those true numbers get promoted into a strategy. A 25% gain in one function feels like it should compound into something enormous at the firm level. It doesn't, and the reason is not execution, it is arithmetic that an honest CFO can run in five minutes. Run it before the program is approved, not eighteen months after the budget has been spent.

Efficiency gains are real, measurable, and bounded. A floor, not a strategy.

The gains are real, and local

Notice what the field experiments actually measure: output per worker inside a single, well-bounded task. The 10%-plus customer-service result came from agents handling tickets; the 25%-plus developer result came from engineers writing code (cited in Harvard Business Review, June 2026). Neither study claims anything about revenue, market share, or firm value. They measure the speed of work that was already going to happen.

Local gains are worth taking, every time. A support function that resolves 10% more tickets per head is a genuinely better support function. The error is treating the local number as if it were the firm-level number, because between the task and the P&L sit two discount factors that the demo never mentions.

The ceiling, line by line

First factor: coverage. Only part of the cost base is amenable to AI at all, knowledge work, communication, analysis, coordination. Be generous and call it half; for a mid-market manufacturer or distributor with heavy physical costs, half flatters the case. Second factor: depth. Suppose the agentic tooling cuts those addressable costs by a full 10%, in line with the strongest experimental evidence. Half the cost base, cut by a tenth, is a total expense impact of about 5% (Harvard Business Review, June 2026).

Now flow that 5% through a standard earnings model and the lift in firm value lands around 10%. That is the ceiling, reached only if everything goes right, the gains are captured rather than refilled with new work, and the program costs nothing to run. Compare it with what executives say they expect from AI: a 135% value premium within three years (Harvard Business Review, June 2026). The best possible efficiency outcome delivers less than a tenth of the stated expectation.

Stress-test the assumptions and the conclusion barely moves. Declare the entire cost base addressable, every machine, every lease, and the value lift roughly doubles to 20%, still a seventh of the expectation. Push the cut to an implausible 20% across the addressable half and the answer is the same 20%. The exercise is worth doing in front of the leadership team precisely because no realistic input produces a number with three digits in it. When every defensible scenario lands between 5% and 20% of firm value, the debate about execution detail is a distraction. The strategy question, what else is the program for, is the only question left standing.

~5%

total expense impact if AI trims a full 10% from the half of the cost base it can touch, flowing through to a firm-value lift of only around 10%, against an expectation of 135%

Harvard Business Review, valuation analysis (June 2026)

"Where are the outcomes?" has no hours-saved answer

Sometime around month twelve, a board member asks the question every efficiency program eventually faces: where are the outcomes? The program manager answers with hours, thousands of them, logged and dashboarded. The board member is asking about a different unit. Hours saved are an input metric; the board prices revenue, margin, and multiple, and 95% of corporate AI pilots never convert the first into the second (MIT, 2025).

The conversion fails for a mundane reason: saved time is reabsorbed. Without a redesign of roles and workload, a 10% time saving becomes 10% more meetings, not 10% lower cost or 10% more selling capacity. The field experiments measured potential. The P&L records only what the operating model actually harvested, which is why the honest CFO's sheet needs a third discount factor for capture, and why that factor is usually the smallest of the three.

Take the floor. Aim above it.

None of this argues against efficiency; it argues against efficiency as the thesis. The right posture is to take the bounded gain as a floor, bank it, measure it against your own baseline, and let it fund the program. One Milton engagement converted exactly that kind of discipline into a 23% raw-materials inventory reduction measured against the customer's own baseline, a real number on a real balance sheet (documented engagement outcome).

But the program was never bet on that line. The bet sits on the growth side, where the same valuation work shows a sustained two-point lift in organic growth adding roughly 50% to firm value (Harvard Business Review, June 2026), five times the efficiency ceiling. Run the five-minute math, present both numbers to the board, and let the program be judged against the lever that can actually move the firm. Efficiency is the floor. The strategy is what you build on top of it.