AgileMechanics.com | Agile Health Checks & Maturity Models

Origins

Agile health checks emerged as teams looked for ways to assess their own state across more than one dimension. The most widely known is the Spotify Squad Health Check, published by Henrik Kniberg in 2014¹. Spotify's coaches needed a tool that worked for many squads at once, was fast to run, and surfaced patterns across the organization without becoming a scorecard.

Other models followed similar logic. AgilityHealth (Sally Elatta) developed a more elaborate radar across team, leadership, and enterprise dimensions. Comparative Agility (Mike Cohn and Kenny Rubin) built an industry-benchmark survey. The shared insight across all of them: teams that only measure delivery output miss the structural health that determines whether the output is sustainable.

What a Health Check Is

An Agile health check is a structured self-assessment across multiple dimensions of team or organizational performance. Common dimensions include:

Psychological safety — can the team disagree, admit mistakes, raise hard issues?
Delivery — do we ship what we plan, in the time we expected?
Quality — are we proud of the work we produce?
Value — are we building things that matter to users?
Speed — do we get fast feedback from production?
Fun — do we enjoy working together?
Support — does the organization help or block us?
Mission — do we know why we exist and care about it?

The team rates each dimension on a simple scale — often a traffic light (green / yellow / red) or a 1–5 scale — and discusses the result. The conversation matters more than the number.

The Spotify Format

Kniberg's original Spotify Squad Health Check uses 11 cards, each with a "feeling good about X" statement and an opposite "feeling bad about X" statement. The squad reads each card aloud, each member votes red/yellow/green, and the squad discusses the result. A trend arrow (improving, stable, declining) is captured alongside the current state.

The format is deliberately low-tech and low-stakes. The point is to surface differences within the team and patterns across squads, not to produce a performance rating. Spotify's coaches roll up the results to spot organization-level patterns, but never to compare or rank individual squads.

AgilityHealth Radar

AgilityHealth uses a richer model — typically a four-quadrant radar covering Foundation (clarity, structure), Performance (delivery, value), Leadership, and Culture, each broken into sub-competencies. Teams complete a survey, results are visualized as a radar chart, and the team then runs a "Growth Portal" planning session to choose 1–3 areas to invest in next.

The richer model produces more granular insight but takes longer to run and risks information overload. AgilityHealth's biggest contribution is the explicit linking of diagnosis with growth planning: the assessment is not done until the team has named what it will work on.

Comparative Agility

Comparative Agility is a survey-based assessment that lets teams benchmark themselves against a large industry dataset. It is less a team diagnostic and more an organizational tool — useful when leadership wants to know "are we average, behind, or ahead in these capabilities."

The benchmarking can be motivating or distracting. Teams that already know they want to improve can find specific gaps useful. Teams that don't yet have the will to change can use benchmarks as a way to avoid the local work of improvement — "we're average, so we're fine."

Running a Health Check Well

1. Frame it as the team's tool, not management's

The single most important predictor of whether a health check works is who owns the results. If the team owns them — raw data goes to the team first, the team decides what to share — trust is high. If management gets the data first, the team treats the assessment as evaluation and gives the answers that look good.

2. Run it at a cadence

One-off health checks produce one-off insights. The trend over time is more valuable than any single snapshot. Quarterly is typical; some teams run monthly with a lighter version.

3. End with a commitment, not just a chart

A health check that produces a radar chart and nothing else is a wasted hour. End the session by picking one or two dimensions to invest in next, with concrete experiments.

4. Anonymize honestly

Anonymous rating produces more honest ratings, especially on dimensions like psychological safety where the rating itself is a referendum on the leader. Use anonymous tools or paper.

5. Discuss the spread, not just the average

If five team members rate "delivery" as green and one rates it as red, the interesting conversation is about the red. Average ratings hide the patterns that matter.

What Health Checks Are Not For

The fastest way to destroy a health check practice is to use it for purposes it was not designed for.

Not for performance evaluation: teams that suspect ratings will affect their reviews or their funding will rate themselves higher. The data becomes garbage.
Not for comparison between teams: every team has a different context. Stack-ranking squad health is meaningless and actively harmful.
Not as a one-time assessment: the goal is trend, not snapshot. A single rating in isolation says almost nothing.
Not as a substitute for retrospectives: health checks surface patterns across the team; retros work in detail on specific things. They complement each other; neither replaces the other.

Common Pitfalls

Scorecard creep: management starts tracking team scores, teams start gaming them, the practice dies. The fix is to keep the raw data with the team.
Survey fatigue: too many dimensions, too long, too often. Pick a model proportional to the team's tolerance for self-reflection.
No follow-through: the team identifies a problem, names it red, and does nothing about it for three quarters. The chart loses meaning.
Cosmetic dimensions: dimensions that sound impressive but aren't actionable. If the team can't act on a dimension, why measure it?

Coaching Tips

Negotiate Data Ownership First

Before running the first check, agree explicitly: the raw data stays with the team. Without that agreement, you'll get ratings, not honesty.

Start Light

Begin with a Spotify-style 11-card check. Move to richer models like AgilityHealth only after the team has built the habit and can absorb more depth.

Watch the Outliers

If one person rates a dimension differently from everyone else, that is the most interesting data point in the session. Make space for it.

Track the Trend

Hold the chart from the last check up against this one. "We were yellow on safety three months ago; we're green now — what changed?" is the real conversation.

Refuse to Rank

If leadership asks you to compare squad scores, push back. Comparison kills the practice. Aggregate themes are fine; rankings are not.

Convert Insight to Experiment

Don't end with "we're red on quality." End with "we're going to try X for the next two sprints and re-check."

Summary

Agile health checks are diagnostic tools, not management dashboards. Their value comes from giving teams a structured way to examine dimensions they would not otherwise discuss — psychological safety, value alignment, sustainability, support from the organization. The conversation the assessment produces is the actual product.

The practice fails when the data leaves the team, when ratings become benchmarks, or when the chart replaces the conversation. The practice succeeds when teams own their own data, run it at a cadence, and end every session with something concrete to try next.

Footnotes

Kniberg, H. (2014). Squad Health Check Model. Spotify Engineering.

Back to Team Dynamics, Health & Coaching