ai-readiness.dev

The methodology · show your work

How We Measure Readiness for Agentic Coding

Your AI bill is climbing, and you cannot tell how much of it is waste. Unhealthy code makes the model burn 50 to 120% more tokens for the same task. That waste is invisible until you measure it. This page shows the measurement, and shows our work.

The thesis in one line: AI amplifies whatever you already are.

Google says the same thing in its own words: "AI amplifies the engineering culture it lands in. It multiplies both your strengths and your weaknesses." It does not create discipline. It reveals whether discipline was already there.

The Invisible Gap

The assessment turns twenty behavioural statements into a single readiness number, compares it against how widely your developers already use AI, and reports the distance between the two. Your readiness equals your total score out of 40, expressed as a percentage. Your adoption is the figure you set yourself. The adoption-readiness gap is your adoption minus your readiness, never below zero.

A large positive gap means AI is being adopted faster than the organization can absorb it safely. The gap is invisible because perception diverges from reality: teams feel roughly 20% faster while measuring about 19% slower, the perception trap METR documented. Green dashboards hide it: volume rises, quality drops, rework multiplies, and the dashboard still shows green.

The Readiness Quadrant

The model is not vibes. Four research-backed dimensions of agentic-coding readiness converge on one core: the AI Readiness Gap. Weakness in any one is amplified by AI, not mitigated.

The Readiness Quadrant Four dimensions (Focus, Technical, Product and Feedback) arranged around a central AI Readiness Gap core that they each feed.

Focus

Focus & Cognitive Capacity

When developers in your organization work with AI coding assistants, how protected is their ability to focus and critically supervise AI-generated work?

At full readiness: AI can be supervised and scaled safely.

Cognitive science, Gloria Mark (refocus cost after interruption)

Technical

Technical Validation & Engineering Discipline

When AI coding assistants generate or modify code, how reliably does your organization validate that the code is correct, secure, and behaviorally predictable before it reaches production?

At full readiness: AI safely amplifies engineering excellence.

CodeScene token-cost data + Stanford engineering-cleanliness research

Product

Product & Backlog Clarity

When work reaches developers (and AI coding assistants), how clear, intentional, and behaviorally precise is the product vision and backlog they work from?

At full readiness: AI accelerates delivery of the right things.

METR 2025: ambiguous intent compounds at AI speed

Feedback

Customer Feedback & Learning Speed

When AI accelerates delivery, how quickly and reliably does your organization learn whether it built the right thing for users?

At full readiness: AI accelerates learning and value creation.

DORA: delivery-stability erosion under unguarded AI adoption

At the centre: the AI Readiness Gap, the distance between adoption velocity and preparedness. The four dimensions feed it.

You've seen the model. Get your number

5 minutes · No login · Anonymous · Free

From Statements to a Score

Each dimension carries five behavioural statements. You check a statement only when it is consistently and completely true. Every checked statement adds two points, so each dimension scores between 0 and 10, and the four sum to a total out of 40. Within each dimension your score lands in one of three zones. The boundaries below are derived directly from the scoring engine; there is no third, hand-typed source.

Chaos: dimension score 0–2

AI supervision is statistically unsafe

Plateau: dimension score 4–6

Partial protection, high cognitive risk

Amplification: dimension score 8 and above

AI can be supervised and scaled safely

What Your Gap Means

The size of your adoption-readiness gap determines the severity band and the risk narrative you receive. These thresholds come straight from the scoring engine.

Your Gap Is Manageable

Manageable gap: 0–19 points

Your organization shows strong alignment between AI adoption and readiness. You are in a position to explore agentic AI coding with structured review.

Your Gap Carries Real Risk

Significant gap: 20–40 points

Your organization has adopted AI faster than it has built the capability to absorb it safely. Stanford research shows this pattern leads to mixed results: some teams gain, others stall.

Your Gap Demands Immediate Attention

Critical gap: 41 and above points

Your organization is operating in a zone where AI is statistically more likely to amplify dysfunction than deliver value. Stanford's research calls this the "rich get richer" effect, and it is working against you.

This Isn't Our Opinion

Four independent research efforts, three proprietary datasets, and Google's own engineering org arrive at the same conclusion. A single vendor stat invites "cherry-picked." Independent convergence does not. Every figure below is tied to its named source; only verified figures appear as fact.

  • Google: Model + Harness

    Addy Osmani names the lever: "Agent = Model + Harness." Configuration, not the model, drives outcomes; on Terminal Bench an agent moved from outside the Top 30 to the Top 5 by changing only the harness.

    Google has commercial interest (ADK, Jules, Gemini); the discipline argument is vendor-neutral and corroborated independently.

    Read the source
  • CodeScene: +50–120% more tokens

    Adam Tornhill's controlled study: unhealthy code makes the model burn more tokens for the same task. This is the bridge that turns code quality into a financial argument.

    Read the source
  • FAROS: +861% code churn · 31.3% more PRs merged with no review

    The "Acceleration Whiplash" across 22,000 developers over two years: throughput up, quality down at the same time, regardless of baseline engineering maturity.

    Read the source
  • GitKraken: 57% of co-changed clones involved in bugs · 211M LOC analysed

    GitClear's 211M-LOC dataset is the third independent vendor to converge on the same conclusion: duplication rising, refactoring falling, clones carrying bugs.

    Read the source
  • METR: 19% slower (felt 20% faster)

    A randomized trial with experienced open-source developers: they were measurably slower with AI, while feeling faster. That is the perception gap that hides the problem.

    Read the source
  • DORA: delivery-stability decrease under unguarded adoption

    The Accelerate State of DevOps report links AI adoption without readiness to small but real decreases in delivery stability and throughput.

    Read the source
  • Stanford: engineering cleanliness predicts AI outcomes

    The Software Engineering Productivity Research group finds engineering cleanliness (test coverage, modularity, documentation) explains a large share of the variance in AI outcomes.

    Read the source

The punchline: you cannot control model pricing, but you can control how much waste your delivery posture generates. Readiness is the only controllable lever on the climbing AI-coding bill; the score predicts your token efficiency. In finance terms: undisciplined AI defers cost into a compounding operating expense you cannot see; readiness is the capital investment that collapses it.

So who actually solves this?

You just read why agentic coding is the present. The throughput is real. So is the other half of the FAROS data above: 861% more code churn, a third of pull requests merged with no review. Agentic coding arrived; trust did not arrive with it.

The research already named the lever. Google's own finding, three paragraphs up, is the whole story in one line: Agent equals Model + Harness. The model is a commodity everyone rents. The harness is what decides whether the output is trustworthy, and the harness is the part you control.

nWave is that harness. Predictable agents, running inside a harness you can verify, producing AI coding you can actually trust. It is the engineered answer to the gap this whole page measures: not faster code, code you do not have to second-guess.

You can see how nWave builds predictable agents and a trusted harness at nwave.ai. What that looks like for your number, your codebase, and your EU AI Act exposure is the conversation below.

Why Now, Not Next Quarter

Organizations that use AI are "deployers" under the EU AI Act; the status is automatic, not chosen. The Act does not ask whether you chose the AI intentionally; it asks whether you use it. Since February 2025, Article 4 has required a sufficient level of AI literacy among staff, and Article 26 requires human oversight by people with the necessary competence, training, authority and support.

The behaviours that unlock AI productivity (tests, validation, documentation, modularity, focused supervision) are the same behaviours that constitute the evidence regulators will ask for. Readiness and compliance are two views of one capability.

High-risk obligations carry their own enforcement deadline, which has been the subject of a proposed postponement; confirm the current date against a primary EU source before relying on it.

You Have the Model. Two Ways to Act.

The assessment shows you where you are. The conversation is where you decide what to do about it.

Being the person who notices this first is a lonely seat. If your number shows a real gap, the useful next step is usually not a tool; it is a conversation with someone who has measured this across enough orgs to tell you what is normal, what is urgent, and what can wait. No pitch, no deck: a working session on your specific result and what it predicts for your token bill, your quality trajectory, and your EU AI Act exposure. You leave with a read on your situation whether or not we ever work together.

Get in touch for a full discussion

A peer conversation, not a pitch · we reply personally, fast

Or get your number first. take the assessment : 5 minutes, anonymous, free.