Your AI bill is climbing, and you cannot tell how much of it is waste.
Unhealthy code makes the model burn 50 to 120% more tokens for the same
task. That waste is invisible until you measure it. This page shows the
measurement, and shows our work.
The thesis in one line: AI amplifies whatever you already are.
Google says the same thing in its own words: "AI amplifies the
engineering culture it lands in. It multiplies both your strengths and
your weaknesses." It does not create discipline. It reveals whether
discipline was already there.
The Invisible Gap
The assessment turns twenty behavioural statements into a single
readiness number, compares it against how widely your developers already
use AI, and reports the distance between the two. Your readiness equals
your total score out of 40, expressed as a percentage. Your adoption is
the figure you set yourself. The adoption-readiness gap is your adoption
minus your readiness, never below zero.
A large positive gap means AI is being adopted faster than the
organization can absorb it safely. The gap is invisible because
perception diverges from reality: teams feel roughly 20% faster while
measuring about 19% slower, the perception trap METR documented. Green
dashboards hide it: volume rises, quality drops, rework multiplies, and
the dashboard still shows green.
The Readiness Quadrant
The model is not vibes. Four research-backed dimensions of
agentic-coding readiness converge on one core: the AI Readiness Gap.
Weakness in any one is amplified by AI, not mitigated.
Focus
Focus & Cognitive Capacity
When developers in your organization work with AI coding assistants, how protected is their ability to focus and critically supervise AI-generated work?
At full readiness: AI can be supervised and scaled safely.
Cognitive science, Gloria Mark (refocus cost after interruption)
Technical
Technical Validation & Engineering Discipline
When AI coding assistants generate or modify code, how reliably does your organization validate that the code is correct, secure, and behaviorally predictable before it reaches production?
At full readiness: AI safely amplifies engineering excellence.
CodeScene token-cost data + Stanford engineering-cleanliness research
Product
Product & Backlog Clarity
When work reaches developers (and AI coding assistants), how clear, intentional, and behaviorally precise is the product vision and backlog they work from?
At full readiness: AI accelerates delivery of the right things.
METR 2025: ambiguous intent compounds at AI speed
Feedback
Customer Feedback & Learning Speed
When AI accelerates delivery, how quickly and reliably does your organization learn whether it built the right thing for users?
At full readiness: AI accelerates learning and value creation.
DORA: delivery-stability erosion under unguarded AI adoption
At the centre: the AI Readiness Gap,
the distance between adoption velocity and preparedness. The four
dimensions feed it.
Each dimension carries five behavioural statements. You check a statement
only when it is consistently and completely true. Every checked statement
adds two points, so each dimension scores between 0 and 10, and the four
sum to a total out of 40. Within each dimension your score lands in one
of three zones. The boundaries below are derived directly from the
scoring engine; there is no third, hand-typed source.
Chaos: dimension score 0–2
AI supervision is statistically unsafe
Plateau: dimension score 4–6
Partial protection, high cognitive risk
Amplification: dimension score 8 and above
AI can be supervised and scaled safely
What Your Gap Means
The size of your adoption-readiness gap determines the severity band and
the risk narrative you receive. These thresholds come straight from the
scoring engine.
🟢 Your Gap Is Manageable
Manageable gap: 0–19 points
Your organization shows strong alignment between AI adoption and readiness. You are in a position to explore agentic AI coding with structured review.
🟡 Your Gap Carries Real Risk
Significant gap: 20–40 points
Your organization has adopted AI faster than it has built the capability to absorb it safely. Stanford research shows this pattern leads to mixed results: some teams gain, others stall.
🔴 Your Gap Demands Immediate Attention
Critical gap: 41 and above points
Your organization is operating in a zone where AI is statistically more likely to amplify dysfunction than deliver value. Stanford's research calls this the "rich get richer" effect, and it is working against you.
This Isn't Our Opinion
Four independent research efforts, three proprietary datasets, and
Google's own engineering org arrive at the same conclusion. A single
vendor stat invites "cherry-picked." Independent convergence does not.
Every figure below is tied to its named source; only verified figures
appear as fact.
Google: Model + Harness
Addy Osmani names the lever: "Agent = Model + Harness." Configuration, not the model, drives outcomes; on Terminal Bench an agent moved from outside the Top 30 to the Top 5 by changing only the harness.
Google has commercial interest (ADK, Jules, Gemini); the discipline argument is vendor-neutral and corroborated independently.
Adam Tornhill's controlled study: unhealthy code makes the model burn more tokens for the same task. This is the bridge that turns code quality into a financial argument.
FAROS: +861% code churn · 31.3% more PRs merged with no review
The "Acceleration Whiplash" across 22,000 developers over two years: throughput up, quality down at the same time, regardless of baseline engineering maturity.
GitKraken: 57% of co-changed clones involved in bugs · 211M LOC analysed
GitClear's 211M-LOC dataset is the third independent vendor to converge on the same conclusion: duplication rising, refactoring falling, clones carrying bugs.
A randomized trial with experienced open-source developers: they were measurably slower with AI, while feeling faster. That is the perception gap that hides the problem.
Stanford: engineering cleanliness predicts AI outcomes
The Software Engineering Productivity Research group finds engineering cleanliness (test coverage, modularity, documentation) explains a large share of the variance in AI outcomes.
The punchline: you cannot control model pricing, but you can control how
much waste your delivery posture generates. Readiness is the only
controllable lever on the climbing AI-coding bill; the score predicts your
token efficiency. In finance terms: undisciplined AI defers cost into a
compounding operating expense you cannot see; readiness is the capital
investment that collapses it.
So who actually solves this?
You just read why agentic coding is the present.
The throughput is real. So is the other half of the FAROS data above: 861%
more code churn, a third of pull requests merged with no review. Agentic
coding arrived; trust did not arrive with it.
The research already named the lever.
Google's own finding, three paragraphs up, is the whole story in one line:
Agent equals Model + Harness. The model is a commodity everyone rents. The
harness is what decides whether the output is trustworthy, and the harness
is the part you control.
nWave is that harness.
Predictable agents, running inside a harness you can verify, producing AI
coding you can actually trust. It is the engineered answer to the gap this
whole page measures: not faster code, code you do not have to second-guess.
You can see how nWave builds predictable agents and a trusted harness at
nwave.ai. What that looks like for your number, your codebase,
and your EU AI Act exposure is the conversation below.
Why Now, Not Next Quarter
Organizations that use AI are "deployers" under the EU AI Act; the
status is automatic, not chosen. The Act does not ask whether you chose
the AI intentionally; it asks whether you use it. Since February 2025,
Article 4 has required a sufficient level of AI literacy among staff, and
Article 26 requires human oversight by people with the necessary
competence, training, authority and support.
The behaviours that unlock AI productivity (tests, validation,
documentation, modularity, focused supervision) are the same behaviours
that constitute the evidence regulators will ask for. Readiness and
compliance are two views of one capability.
High-risk obligations carry their own enforcement deadline, which has
been the subject of a proposed postponement; confirm the current date
against a primary EU source before relying on it.
You Have the Model. Two Ways to Act.
The assessment shows you where you are. The conversation is where you
decide what to do about it.
Being the person who notices this first is a lonely seat. If your number
shows a real gap, the useful next step is usually not a tool; it is a
conversation with someone who has measured this across enough orgs to
tell you what is normal, what is urgent, and what can wait. No pitch, no
deck: a working session on your specific result and what it predicts for
your token bill, your quality trajectory, and your EU AI Act exposure.
You leave with a read on your situation whether or not we ever work
together.