Monday, March 15, 2021

Why WSJF is Nonsense

There's a common backlog prioritization technique, suggested as standard practice in SAFe, but also used elsewhere, "WSJF", "Weighted Shortest Job First." - also called "HCDF", "Highest Cost of Delay First" by Don Reinertsen.

Now, let me explain this one in (slightly oversimplified) terms:

The idea behind WSJF

It's better to gain $5000 in 2 days than to gain $10000 for a year's work. 
You can still go for those 10 Grand once have 5 Grand in your pocket, but if you do the 10 Grand job first, you'll have to see how you can survive a year penniless.

Always do the thing first that delivers the highest value and blocks your development pipeline for the shortest time. This allows you to deliver value as fast and high as possible. 


How to do WSJF?

WSJF is a simple four-step process:

To find out what the optimal backlog position for a given item is, you estimate the impact of doing the item ("value") and divide that by the investment into said item ("size") and then put the items in relation towards each other.


It's often suggested for estimated to use the "Agile Fibonacci" scale, so "1, 2, 3, 5, 8, 13, 20, 40, 80..."
The idea is that every subsequent number is "a little more, but not quite twice as much" as the previous one, so a "13" is "a little more than 8, but not quite 20". 
Since there are no in-between numbers, when you think you're not sure whether an item is 8 or 13, you can choose either, because these two numbers are adjacant and their difference is considered miniscule.

Step 1: Calculate "Value" Score for your backlog items.

Value (in SAFe) is actually three variables: User and/or Business Value, Time Criticality, Enablement and/or risk reduction. But let's not turn it into a science. It's presumed value.

Regardless of how you calculate "Value", either as one score or a sum or difference of multiple scores, you end up with a number. It becomes the numerator in your equation.

Step 2: Calculate "Size" Score for your backlog items.

"Size" is typically measured in the rubber-unit called Story Points, and regardless of what a Story Point means in your organization or how it's produced, you'll get another number - the denominator in your equation.

Step 3: Calculate "WSJF" Score for your backlog items.

"WSJF" score, in SAFe, is computed by dividing Value by Size.

For example, a Value of 20 divided by a size of 5 would give you a WSJF score of 4.

Step 4: Sort the backlog by "WSJF" Score.

As you add items, you just put them into the position where the WSJF sort order suggests, with the highest value on top, and the bottom value on the bottom of the backlog.
For example, if you get a WSJF of 3 and your topmost backlog item has a WSJF score of 2.5, the new item would go on top - it's assumed to be the most valuable item to deliver!

And now ... let me dismantle the entire concept of WSJF.

Disclaimer: After reading the subsequent portion, you may feel like a dunce if you've been using WSJF in the real world.


WSJF vs. Maths

WSJF assumes estimates to be accurate. They aren't. They're guesswork, based on incomplete and biased information: Neither do we know how much money we will make in the future (if you do, why are you working in Development, and not on the stock market?) nor do we actually know how much work something takes until we did it. Our estimates are inaccurate.

Two terms with error

Let's keep the math simple, and just state that every estimate has an error term associated. We can ignore an estimator's bias, assuming that it will affect all items equally, although that, too, is often untrue. Anyway.

The actual numbers for an item can be written as:
Value = A(V) + E(V)  [Actual Value + Error on the Value]
Sizes = A(S) + E(S)  [Actual Size + Error on the Size]

Why is this important?
Because we divide two numbers, which both contain an error term. The error term propagates.

For the following section, it's important to know that we're on a Fibonacci scale, where two adjacent items are always at least 60% apart.

Slight estimation Error

If we over-estimate value, an item will have at least 60% higher value than estimated, even if the difference between fact and assumption is miniscule. Likewise, if we under-estimate value, an item will have at least 30% lower value than estimated.

To take a specific example:
When an item is estimated at 8 (based from whatever benchmark), but turns out to actually be 5, we overestimated it by 60%. Likewise, if it turns out to actually be 13, we underestimated it by 38.5%.
If we're not 100% precise on our estimates, we could be off by a factor of 2.5!

The same holds true for Size. I don't want to repeat the calculation.

Larger estimation error

Remember - we're on a Fibonacci scale, and we only permitted a deviation by a single notch. If now, we permit our estimates to be off by two notches, we get significantly worse numbers: All of a sudden, we could be off by a factor of more than 6!

Now, the real problem happens when we divide those two.

Square error terms

Imagine that we divide a number 6 times larger than it should be, by a number 6 times smaller than it should be, we get a square error term.

Let's talk in a specific example again:
Item A was estimated as 5 value, but it was actually a 2 value. It was estimated as 5 size, but it was actually a 13 size. As such, it had an error of 3 in value, and an error of 13 in size.
Estimated WSJF = (2 + 3) / (13 - 8) = 1
However, the Actual WSJF = 2 / 13 = 0.15


Now, I hear you arguing, "The actual numbers don't matter... it's their relationship towards one another!"


Errors aren't equal

There's a problem with estimation errors: we don't know where we make errors, otherwise we wouldn't make them, and we also make different errors, otherwise, they wouldn't affect the scale at all. Errors are errors, and they are random.

So, let me draw a small table of estimates produced for your backlog:

Item Est. WSJF Est. Value Est. Size Act. Value Act. Size Act. WSJF
A 1.6 8 5 5 5 1
B 1 8 8 3 20 0.15
C 0.6 3 5 8 2 4
D 0.4 5 13 13 2 6.5

Feel free to sort by "Act. WSJF" to see how you should have ordered your backlog, had you had a better crystal ball.

And that's the problem with WSJF

We turn haphazard guesswork into a science, and think we're making sound business decisions because we "have done the numbers", when in reality, we are the victim of an error that is explicitly built into our process. We make entirely pointless prioritization decisions, thinking them to be economically sound.


WSJF is merely a process to start a conversation about what we think should be priority, when our main problem is indecision.
It is a terrible process for making reliable business decisions, because it doesn't rely on facts. It relies on error-prone assumptions, and it exacerbates any error we make in the process.

Don't rely on WSJF to make sound decisions for you. 
It's a red herring.

The discussion about where and what the value is provides much more benefit than anything you can read from a WSJF table. Do the discussion. Forget the numbers.

 

9 comments:

  1. Hi Michael - great post, I'd like to reference in a post on my blog if I may?
    https://davebrowettagile.wordpress.com/

    Thanks,
    Dave Browett

    ReplyDelete
    Replies
    1. Sorry for the late reply, yes of course you may reference me :)

      Delete
  2. Thanks for putting this together. It makes sense except for one thing you mentioned. "WSJF assumes estimates to be accurate." I've not seen this with WSJF and only seen the creators of SAFe mention that they not precise but educated estimates or more so guesstimates. It is only for putting things in some sort of priority order in order to achieve the best value add. So when you mention WSJF assumes estimates to be accurate, I was a little confused.

    ReplyDelete
    Replies
    1. Gil, thanks for your comment.
      People aren't aware of this hidden assumption.
      The problem exists because of how we normally think about estimates, and what WSJF does.
      In normal projects, we estimate the size of all work items, and then simply add up the numbers. We're usually underestimating some items, while overestimating others. Due to the randomness of error terms, the errors cancel out, and we end up with a fairly accurate estimate that we can use for planning.

      However, WSJF isn't like that, because we divide numbers rather than adding them up.
      So - what's the big deal? Errors in multiplication compound, rather than cancelling out.

      Practical example:
      What's the average value of a die roll? ... 3,5,4,2,6,1,2. Add up. 24. Divide by 7: 3.43. Eerily close to the correct value of 3.5, almost 98% accurate. We can live with that and plan with it.
      Now, take the same numbers, and multiply them. 1440. Take the 7th root: 2.8. Pretty bad prediction, even with a perfect distribution. 80% accurate. Wouldn't pass any statistical validity test.

      Note again - the same numbers, depending on whether you're using additive or multiplicative arithmetic, have an entirely different validity.

      And it's this multiplicative arithmetic which leads WSJF ad absurdum.
      Even miniscule mis-estimates in the denominator will lead to differences in orders of magnitude, while the numerator will almost be cancelled out.
      Just think of three backlog items: A has value 10, B has 11, C has 12, as a numerator. All of them have been sized at 3: Their WSJF is almost identical.
      Now imagine that someone argues that A might be possible with size 2: This small change immediately moves the item from the clear bottom (3.3 vs. 3.6 , 4) all the way to the uncontested top (5), although we've just changed the guess by a single point, and it might just be rounding error.

      Hence, if your guesses aren't perfectly accurate, you might as well throw dice instead of using WSJF prioritiziation.
      Proponents of WSFJ won't say that - because they've never done the math.
      The table up there in the article does the math for you. It shows what an estimation error does to your backlog order.

      Delete
  3. Hey Michael! Really enjoyed this post. I have a colleague who was touting WSJF, and the framework smelled fishy. This helped clarify my thoughts on how errors can impact prioritization by orders of magnitude. However, you mention at the end: "The discussion about where and what the value is provides much more benefit than anything you can read from a WSJF table." Do you think that organizations should not implement WSJF as a prioritization framework at all? Could the WSJF framework actually help incentivize deeper discussions on the elements of different ideas (risk reduction, time criticality, etc.)? At the end of the day, I'm wondering if the actual "sin" with using WSJF is not trying to use a framework to organize/prioritize ideas, but rather the "sin" is assuming that you have beat the odds. Assuming that the math is "correct," when it is obviously not.

    ReplyDelete
  4. Great article, reposting it on Linked-In...

    ReplyDelete
  5. Useful posting & will take some digesting for me - particularly because we don't use fibonacci for the estimating.

    That aside, I am very interested in what you may recommend as alternative to WSJF in terms of deciding what the priority should be. Thank you!

    ReplyDelete
  6. Another big problem with WSJF is in the real world sometimes backlog items have dependencies

    ReplyDelete