OKR scoring is theater

At the end of the quarter the ritual begins. Every team opens its OKRs and assigns each key result a score between zero and one. There is a brief, slightly performative debate about whether something landed at 0.6 or 0.7. Someone repeats the received wisdom that 1.0 means you sandbagged and 0.7 is the healthy stretch target. The numbers are recorded. The quarter is closed. Everyone feels a faint sense of completion, and almost nobody can tell you what changed in the business as a result.

OKR scoring has the texture of measurement without the function of it. It produces a number. The number is defensible. It is also, in most companies, a near-perfect example of activity that looks like rigor and delivers none.

What the score is actually measuring

A key result is supposed to be a measurable outcome. "Increase activation from 61 percent to 75 percent." If you hit 70 percent, the arithmetic gives you a clean score and the score means something real. But most key results are not written like that. Most are written as "launch the new onboarding flow" or "ship three partner integrations," which are not outcomes at all. They are deliverables wearing the costume of a metric. This is the output trap, and scoring makes it worse, because once you put a number on a deliverable, the number launders the deliverable into something that feels like a result.

So the score ends up measuring one of two things. Either it measures genuine movement on a metric, in which case you did not need the 0-to-1 ceremony because the metric already told you. Or it measures how much of a planned activity got done, in which case you have built an elaborate ritual for tracking task completion and calling it strategy. The grading ritual cannot tell you which one you are doing, which is precisely the problem.

Why 0.7 corrupts the number

The folklore that 0.7 is the ideal score is repeated everywhere and it quietly destroys the instrument. The intent was reasonable: set ambitious targets, do not punish people for aiming high. The effect is that the score stops being a measurement and becomes a negotiation. If 0.7 is good and 1.0 is suspicious, then the rational move is to set targets you are 70 percent confident of hitting, which means the target itself is now reverse-engineered from the desired score.

Once that happens, the whole apparatus is measuring its own calibration. A board member looking at a portfolio of 0.7s learns nothing, because the 0.7 was the goal of the goal-setting, not an observation about reality. This is a cousin of the number everyone rounds: a figure that has been socially agreed into meaninglessness while retaining the full visual authority of a metric.

The scoring ritual hides the question that matters

Here is what the grading session almost never asks: did this objective move the strategy, and if it did not, should we still be doing it? Scoring is backward-looking and self-referential. It evaluates the OKR against its own targets, inside its own frame, with no reference to whether the OKR was the right bet in the first place. A team can score a clean 0.8 on an objective that, in hindsight, was irrelevant to where the company actually needed to go. The score will look great. The quarter was still wasted.

This is why OKRs treated as a reporting exercise drift away from strategy over time. The reporting cadence rewards a tidy score, and a tidy score is easiest to produce when the objective is small, controllable, and disconnected from the messy bets that actually matter. The ritual selects for scorability, and scorability is not the same as importance.

What to measure instead

The honest alternative is not to abolish scoring. It is to score the thing that matters and stop dressing up the thing that does not. Two questions do more work than any 0-to-1 scale.

First: did the metric this objective targeted actually move, in the real instrumented system, regardless of what we planned? That is an observation, not a negotiation. It does not care about your confidence level at planning time. Second: of the work we did this quarter, how much of it is traceable to an objective that still matters, and how much was activity that scored well against a target nobody should have set? That question connects the grade back to the difference between a goal list and a strategy, which is where the value actually lives.

When OKRs sit on top of real work and real metrics rather than floating in a quarterly spreadsheet, the score becomes almost unnecessary. You do not grade the objective. You look at whether the metric moved and whether the work that moved it was the work you meant to do. The grade was always a proxy for those two facts. Measure the facts directly and the theater becomes optional.

What to do this quarter

Before the next grading session, pull last quarter's scores and sort them. Find every key result that scored above 0.7 and ask, for each one, whether a metric in a real system moved as a result, or whether the score reflects a deliverable getting shipped. The ones in the second category are your theater. They are not worthless, but they are not strategy, and counting them as strategy is how a company convinces itself it is executing while the actual numbers stand still.

Then try running one review with no scores at all. Just two columns: the metric we targeted, and where it actually landed. The conversation will be shorter, more uncomfortable, and far more useful than any debate about whether something was a 0.6 or a 0.7.

FAQ

Is OKR scoring useless? Not useless, but usually redundant or misleading. When a key result is a real metric, the metric already tells you what happened and the 0-to-1 score adds nothing. When the key result is a disguised deliverable, the score launders task completion into something that feels like strategic progress. Score the metric, not the activity.

Why is 0.7 considered the ideal OKR score? The convention was meant to encourage ambitious targets without punishing high aim. In practice it turns the score into a negotiation: teams reverse-engineer targets they are 70 percent confident of hitting, so a portfolio of 0.7s measures its own calibration rather than reality.

What should we do instead of grading OKRs? Run reviews on two facts: did the targeted metric move in the real system, and was the work that moved it traceable to an objective that still matters. Those observations do the job the grade was pretending to do, without the theater.

Does this mean OKRs don't work? No. It means scoring OKRs as a ritual disconnected from real metrics does not work. OKRs are useful as an encoding of intent when they sit on top of actual work and instrumented metrics. The failure is in the grading ceremony, not the framework.