"It is not necessary to change. Survival is not mandatory."
- Dr. W. Edwards Deming (allegedly)
A Developer recently showed me a new AI-powered productivity dashboard. Green bars everywhere. Commits up. Pull requests up. Velocity through the roof.
“And what percentage of that AI-generated code is anyone actually using and benefiting from?”
I didn't mean to ruin the party. The team had done great. But I genuinely wanted to know.
And the silence that followed was the most honest thing that happened in the meeting.
For years, enterprises paid fortunes to consultants promising them one thing: more code, faster. They bought into the fantasy that code generation was the thing holding their organizations back. If only developers typed faster. If only they were more predictable. If only the feature factory could run a wider, faster conveyor belt.
Then AI arrived, and quietly, without much ceremony, made a mockery of it all.
Writing code is now a commodity. No longer the bottleneck. Mass-produced. A single developer with a decent prompt can generate more lines in an afternoon than a team of senior programmers in a fortnight. And what has this revealed?
We have an uncomfortable truth to face.

Individuals and Interactions. Remember Those?
For twenty-five years and more, companies just kept adding more software processes and more tools to compensate for dysfunctional individuals and broken interactions.
Now the tools can do the typing.
The individuals. The interactions. The thinking. The sense-making. What’s left? The judgment about what should exist in the first place.
This is precisely why so many laggard organizations are panicking. They built their entire management model around measuring output. More of it. Faster. But is it used? Is it valued?
Agile fatigue? No. Agile exposure. The mirror, yet again, showing what the org didn’t want to see.
The On-Product Index Has Just Expired
On-product index — the ratio of people actually contributing to the development of the product itself, versus those orbiting around it. This index shines a spotlight on those five managers arguing about how to make the one Bob in the code mines more productive.
I’ve used this metric to open executive eyes. It’s devastating when applied honestly.
But here’s the problem with on-product-index in 2026.
With AI assistance, Bob in the code mines ships code at ten times his former rate. The on-product-index for the enterprise just improved beautifully. Bob is contributing more code when on-product with AI. The PowerPoints look fantastic. The quarterly report practically writes itself.
Except… the product isn’t actually any better. The customer didn’t ask for half of what Bob built but now has to sift through more slop to find the gold.
We have optimized a metric that is no longer measuring what we thought it was measuring.
Enter the On-Value Index
If code is a commodity, the question is no longer “who is producing it?” but “whose output is actually producing value?”
I’d like to invite you to consider a sharpening of empiricism. Call it the on-value index.
The on-value-index asks two honest questions:
- Of all the code in your product, which parts are demonstrably moving a key value metric?
- Of the people contributing to your product, what percentage of them contributed directly to moving a key value metric?
Suddenly, the picture looks very different.
- The “10x engineer” closing a hundred tickets a sprint, whose features sit unused? Not on-value.
- The quiet developer who ships one feature a quarter, but it’s the one driving retention? Deeply on-value.
- The architect whose guidance unblocked three teams with just a single git commit? On-value.
- The Product Owner who reorders the Product Backlog based on evidence rather than the loudest stakeholder in the room?
On-value increases, too.
The Uncomfortable Question
If you’re a Product Owner right now, ask yourself honestly: what percentage of the code shipped in the last two quarters was actually consumed by customers in a way that moved a Key Value Metric?
If you’re a developer, ask: how much of the AI-generated code actually did make it to production? And how much of that resulted in better uses and results?
This is empiricism at its rawest. We finally have the instruments to measure what Agile always told us to care about — working stuff that delivers real value — and yet most organizations kept obsessing over a proxy. What gets measured gets managed, and we have been measuring the wrong thing for a very long time. We got away with it because code was scarce. It no longer is.
What the Scrum Master Does on Monday
If you’re a Scrum Master reading this ( and I’m guessing a fair few of you are ) here’s your homework.
Find out what your team’s on-value-index actually is. Yes, that is damn hard. If your organization has no idea, congratulations — you have a finding. Bring it to the Retrospective. Bring it to the Product Owner. Bring it to AI.
Go study Evidence-Based Management. I mean, really take it seriously! And if you find that hard or time-consuming, use AI as your study buddy. For example, drop that guide into NotebookLM and render a short podcast, create a quick quiz, and have it explain it back to you using simple metaphors. It even has it explain it back to you as if it were a drunk co-worker who stuck around for a bit too long at a company party.
Scrum… So What?!
Here’s where it gets interesting for Scrum Teams.
When code is cheap, the team’s competitive advantage collapses back onto the things AI still cannot do:
- Listening to customers.
- Reading the room.
- Challenging assumptions.
- Shared (human to human) understanding.
- Recognizing when a feature should be killed.
- Creating psychological safety so a junior dares to say the emperor has no clothes.
These are precisely the activities that show up in Scrum events.
Flow efficiency is about how fast people can align on things. That’s really what determines the speed and value index.
Scrum Events are meant to be eventful. They are the rare spaces where shared human judgment gets made. If your team is yawning through them, that's a strategic failure.
But what about...
Let me take the steelman's critique of my own case seriously, because a metric worth defending should survive honest scrutiny.
“Not all valuable code is customer-visible. What about refactoring, security patches, tech debt, platform work?”
Fair. Absolutely fair. And nothing in the on-value-index says otherwise. Value is not synonymous with "a feature a customer can click on". Reducing incident frequency is value. Shortening lead time is value. Closing a vulnerability before it hits the news is enormous value. Scrum.org’s Evidence-Based Management framework gives us four Key Value Areas for exactly this reason: Current Value, Unrealized Value, Ability to Innovate, and Time-to-Market. Tech debt work lands squarely in Ability to Innovate. A team clearing it to unlock future delivery is deeply on-value (provided they can articulate why and measure the impact).
“Attribution is messy. Achieving a result is collaborative. One couldn't have scored without the assist from the other.”
Also true. And this is where I have to be careful with my own framing, because the on-value-index is not a performance review. The moment it becomes a stick to rank individual developers or features, it dies a well-deserved death. Goodhart’s Law is merciless. The index is a lens for teams and organizations to inspect their own flow of value. The honest question is "what proportion of our collective effort landed on something that mattered?", not "whose name is on the commit?".
“Some value takes years to materialize. You can’t measure on-value after two sprints for long-term bets.”
Granted. This is exactly where Unrealized Value earns its keep as a concept. A bet on a new market segment won’t move Current Value next sprint. But it should move something — qualitative customer signal, pilot adoption, a falsifiable leading indicator. If a team has been working on something for two quarters and still can’t articulate the leading indicator, that's, well, time to scratch some heads.
“This will create perverse incentives. Developers will chase shiny features and dodge the grunt work.”
Only if you implement it badly, which, let’s be honest, most will. It’s the laggard pattern: take a perfectly good empirical tool, weaponize it into an upward-reporting KPI, then blame the tool when it backfires. Velocity and burn-down charts and story points survived maybe a year before they were gamed into oblivion. The on-value-index needs to be owned by the Scrum Team and used in service of their Product Goal. The instant a VP of Engineering demands "on-value-index improvement" in individual performance reviews, the game is over. I’d rather you didn’t measure it at all than measure it that way.
“We work on regulatory, compliance, or internal tooling. There’s no value signal.”
Then your stakeholder is the regulator, or the internal user, or the auditor. The value is trustworthiness. And they absolutely have signals: fines avoided, audits passed, support provided, internal NPS from the ops team that uses your tool. The absence of a consumer-facing metric is not the absence of value (perhaps just the absence of imagination about where to look for it).
So, the on-value-index is neither elegant nor easy nor immune to misuse. I’m claiming it’s more honest than what most are measuring today. AI will certainly incentivize demand for 'more, faster'. And in an AI-accelerated world, it will be even more challenging for users to discover and extract true value, sifting through truckloads of slop.
So How To Actually Get Started With This?
I won't leave you hanging.
Where will you look for evidence?
If the answer is “I don’t know yet”, awaken your inner Sherlock or use your AI as your Watson. The moment they cannot point at where the proof will live, they’re doing hope-driven development. Wishful engineering. Let's instead be true empiricists.
So, here’s the secret.
During refinement of a PBI, imagine stepping through a time portal. Imagine that item is done. Shipped. Live. In the wild. You now have to do an audit — an inspection — to prove that this item is actually doing its job. To check if it is genuinely useful and beneficial. So how will you perform the audit? Where will you look? Database? Dashboard? Live review? Talk to whom exactly?
So you found the outcome! Whoo! Now walk the value stream backward.
Here’s the part that gives this real teeth. Once you’ve identified where the evidence will live, walk the causal chain backwards from there:
“This item was effective because [X]. X did its job because [Y]. Y was possible because of [Z]. To all the way back at A (the initiative or idea itself).
A concrete example. The PBI is “Add inline validation to the checkout form”.
Step through the time portal. Three weeks after release, you open the dashboard.
- Outcome (where the evidence lives): Cart-to-order conversion is up 4%.
- Because: Fewer users abandon after hitting submit.
- Because: They catch and correct their own errors mid-flow, without losing trust in the system.
- Because: Validation fires on blur, with specific human-readable messages.
- Back to A: So the PBI must specify blur-triggered validation, with specific error copy per field, rendered inline. Not “improve the form”. Not “add validation”. That version.
This way the team knows what they’re shooting at, why it is going to work (or not), and how they’ll know if it did. And the bit the organization often misses: they also know what they are not going to build, because the causal chain makes it visible that a red border alone isn’t going to cut it.
Does this sound like a lot of work? Well, do I have some great news for you! AI will happily draft those hypotheses, KVMs, and causal chains for you. But it is still your job to own them.
Is this hard? Yes. Frustrating? A little. Time-consuming? Sure. But hey, you got AI now, Sherlock.
And honestly, I think that is pretty ff-in awesome.
Oh, one last thing. You gotta do the slop count too. Talk about the results nobody wants to own, and yet make them visible for everyone to see. The running inventory of shipped code that just ain't doin' a damn. What early indicators could have avoided that slop?
Closing
AI has just handed us the bill. The organizations that thrive in the next few years will not be the ones with the fanciest AI stack. They will be the ones that finally — finally — take empiricism seriously. Transparency (shared understanding of what is slop and what's not). Inspection (expectation vs reality gaps). Adaptation (wooosh, slop to top).
Code is cheap now. Judgment isn’t. Relationships aren’t. Sense-making isn’t. Empathy isn’t.
That is where your on-value-index lives.
Go find it.
Let’s continue to uncover better ways together.