Tuesday, May 31, 2022

Why we need clearly defined goals

 A common issue I observe in many organizations - there's no visible goal beyond, "Get this work done," and people don't even see the point in "wasting time to set a goal." The problem: many organizations are just tactical engines generating work and handling exceptions in the generated work. Yet, most of this work has no goal. Goal-setting is not an esoterical exercise, and here's why.


The Hidden Factory

One concept of Lean Management is the "Hidden Factory" - and this concept deserves some kind of explanation. A factory is an institution that applies work to input in order to generate some outputs. So far, so good. A "hidden factory," on the other hand, is a factory within a factory, doing work without generating viable outputs: either nothing - or waste.

To understand the problem of hidden factories, think like a customer.

You get the option to buy your product from one of two providers.

Company A has a straightforward process, turning raw input into consumable output, and Company B has a process that turns the same input into something, then into something else, then into something else, then into the same consumable output.

This extra work makes Provider B's product is more expensive to produce, without any discernable difference to the product sold by Company A.

Company A and Company B thus sell products which are identical in all aspects - except price. Company B has to charge a premium to cover the extra work they do. Which product would you purchase?

As customers, we do not care how much work is required to do something. Given all other things equal, we choose to opt for the cheapest, fastest way of meeting our needs.

And companies are no different here. But how does that relate to goal-setting?

Induced and intermediate work

Many companies are great at inducing work - and once that work has been induced, it becomes necessary, and the people doing it must do it, otherwise the outcome can no longer be achieved.

Let's pick a practical example.
We have chosen to use a Design Process that requires any changes to be made as a sequential series of steps:
  1. Request the element to be changed
  2. Describe the change to be made
  3. Prototype the change
  4. Validate the change
  5. Implement the change
  6. Verify the change
While you may argue that all of these steps are common sense and necessary, our specific process choice has just locked us into a process that turns replacing an image into a full day's work - a change that could be done by a competent developer in a couple of minutes.

How is that relevant to goal-setting?

Confusing Task and Outcome

Referring to our fictional process, an individual may have the specific responsibility of describing changes. Their input is a change request, and their output is a change description. As a customer, I address this organization to say, "I want a new backdrop on my website." Our execution agent of step 2 will say, "I am overburdened. I have too many change requests on my desk. I need someone to help me describe the changes." If we would ask them "Why do you need to describe the changes?" - they might say, "So that they can be prototyped." If we'd press and ask, "And why do they need to be prototyped?" - the answer could be, "So that we can validate the change." - which, of course begs the question, "And then - I get what?" - "An implementable change."

You see where this is going: Everyone has reasons why they do the things they do, and from the way this organization is set up, their reasons are indeed valid. And still, nobody really understands why they do the things they do. 
We should assume that everyone whom we ask should answer, "So that you can get your new backdrop." In many companies, however, that is not the case.

And that's where goals come into play.

Goals

Every company has a few - usually very few - first-order goals, and a specific context that provides constraints within which these goals can be realized. Surviving and thriving on the market is a baseline almost all have, and most of the time, the primary goal is to achieve this by means of advancing the products which define the company. That would be an example of a first-order goal.

From that, we get into second-order goals, that is - into derived goals which help us achieve this primary goal. Build a better product. Build the product better. Sell more. Sell faster. Sell cheaper. You name it.

These, of course, would be realized via strategies - which in themselves have multiple subordinate goals. For example: Increase product quality, add features to the product, reduce discomfort with the current product, improve perception of the product in its current state - again, the possibilities are endless.

At some point in the rabbit hole, somewhere deep down in the loop of operational delivery, we may then see a business analyst stating, "I am overburdened. I have too many change requests on my desk. I need someone to help me describe the changes." - but why are they doing it? Are we describing change in order to increase product quality, in order to sell more, or to sell cheaper?
It's easy to realize that "adding people to do the work" may indeed make sense when our goal is to add features to improve our product. And yet, it seems entirely backwards when our goal is to "sell cheaper."

That's why we are setting goals. It allows everyone, on all layers of an enterprise, regardless of their specific responsibility, to quickly and easily determine, "Do the things which I am doing help us achieve our goals?"
If the answer to that simple and straightforward question is, "No" - then this begs two followup questions:
  • Why am I doing things which are not helping this company achieve its goals?
  • What do we need to change, so that I am contributing to our goals?

The impact of goals


Well-defined goals immediately expose the Hidden Factories and induced work, and they set the stage for reducing waste in our processes as well as leading employees to do more meaningful, more important work.

Poorly defined goals - such as "to do X amount of work" - encourage establishing and inflating Hidden Factories, and they set the stage for wasteful processes and unhappy employees who may be doing totally worthless work, without ever realizing.

Undefined goals - or the absence of goals - remove the yardstick by which we can measure whether we are contributing to anything of relevance or merely adding waste. Without a goal, work is meaningless and improvement impossible.


The importance of goals

Goal-setting is important for organizations both large and small in:
  1. Guiding decision-making
  2. Enabling Improvement
  3. Eliminating Waste
While a goal itself doesn't do any of these, a goal sets the stage for these. Once you know your goal, you can take it from there.

Friday, May 27, 2022

Why test coverage targets backfire

 Most organizations adopt a "Percent-Test-Coverage" metric as part of their Definition of Done. Very often, it's set by managers, and becomes a target which developers have to meet. This approach is often harmful. And here's why:

The good, bad and ugly of Coverage Targets

Setting coverage targets is often intended as both a means of improving the product, as well as improving the process - and getting developers to think about testing.

The good

It's good that developers begin to pay closer attention to the questions, "How do I test this?" as well as, "Where are tests missing?" Unfortunately, that's about it.

The bad

The first problem is that large legacy code bases starting with zero test coverage will put developers into a pickle: If I work on a component with 0% test coverage, then any amount of tests I write will still keep that number close to Zero. Hence, the necessary compromise becomes not setting a baseline number, just ask for the number to increase. The usually envisioned 80+% targets are visions for a distant future rather than something useful today.

Looking into the practice - as long as the organization is set up to reward minimizing the amount of time invested into unit testing, the outcome will be that developers try to meet the test coverage targets with minimum effort. 

Also, when developers have no experience in writing good tests, their tests may not do what they're supposed to do.

The ugly

There are many ways in which we can meet coverage targets that fulfill Goodhart's Law:

Any metric that becomes a target stops being useful.

Often, developers will feel unhappy producing tests that they create only to meet the target, considering the activity a wasteful addition to the process, which provides no benefit.

At worst, developers will spend a lot of time creating tests that provide a false sense of confidence, which is even worse than knowing that you have no tests to rely on.

But how can this happen? Let's see ...

The first antipattern is, of course, to write tests that check nothing.


Assertion-free testing

Let's take a look at this example:

Production code

int divide(int x, int y) { return x/y; }

Test

print divide(6,2);

When looking at our code coverage metrics, we will see a 100% coverage. But: the test will always pass - even when we break the production code. Only if we were to inspect the actual test output (which is manual effort that we probably won't do) - we will what the test does: There is no automated failure detection, and the tests aren't even written in a way that we would detect a failure: Who would detect the problem if suddenly, we got a "2" instead of a "3" - we don't even know which result would have been correct!


Testing in the wrong place

Look at this example:
Class Number {
int x;
void setX(int val) {x=val;}
void getX() {return x;}
void compute() { x=x?x/x^x:x-x*(x+x); }
}
n = new Number();
n.set(5);
assert (N.x == 5);
assert (N.get() == 5);

In this case, we have a code coverage of 75% - we're testing x, we're testing the setter, and we're testing the getter.

The "only" thing we're not testing is the compute function, which is actually the one place where we would expect problems, where other developers might have questions as to "What does that do, and why are we doing it like that?" - or where various inputs could lead to undesirable outcomes.

Testing wrongly

Take a peek at this simple piece of code:
int inverse (int x) { return 1/x; }
void printInverse(int x) { print "inverse(x);" }
assert (inverse(1) == 1);
assert (inverse(5) == 0);
spyOn(print);
printInverse(5);
assert (print.toHaveBeenCalled());

The issues

There are numerous major problems here, all of which in combination make that test more dangeous than helpful:


Data Quality

We may get a null pointer exception if the method is called without initializing x first, but we neither catch nor test this case.


Unpexpected results

If we feed 0 into the function, we get a Divide by Zero. We're not testing for this, and it will lead to undesired outcomes.


Missing the intent

The inverse function returns 0 for every number other than 1 and -1. It probably doesn't do what it's expected to do. How do we know? Is it poorly named, or poorly implemented?


Testing the wrong things

The print function's output is most likely not what we expect, but our tests still pass.


Conclusion

If we rely on coverage metrics, we might assume that we have 100% test coverage, but in practice, we may have very unreliable software that doesn't even work as intended.

In short: this way of testing tests that the code does what it does, not that it does what it should.


Then what?

The situation can be remedied, but not with numerical quotas. Instead, developers need education on what to test, how to test, and how to test well.

While this is a long topic, this article already shows some extremely common pitfalls that developers can - and need to - steer clear from. Coverage metrics leave management none the wiser: the real issue is hidden behind a smoke screen of existing tests.

Sunday, May 15, 2022

Dice Estimation

Although I am a proponent of "#noestimates," that is - the concept that estimation is often the consequence of unsound mental models, and in itself somewhat wasteful, because we often don't know - I still propose an estimation technique that is as useful as it is simple. BUT - it requires context. Let's epxlore this context:

Problems with estimation

Regardless of how we put it, one major challenge is that many organizations use estimates as if there was certainty that they can become "correct" - which is a common dysfunction, because it results in undesirable, wasteful behaviours. These could be, for example, as upfront design, padding, "challenging estimates" - and even post-hoc adjustment of the figures to match reality. None of these behaviours help us deliver a better product, and they may even impede quality or time-to-market.


All these aside: we don't know until we tried - that is, there's this thing called "the high probability of low-probability events," so regardless of how much effort we waste on trying to get those activities "right", we could still be wrong due to something we didn't factor.


These are the common reasons for the #noestimates movement which simplys reject the discipline of upfront estimation in favor of monitoring the flow of work via execution signals to gain both predictability and control. You may ask: "If estimation is so obviously flawed, then why would you propose yet another estimation technique?"

Category Errors

A common challenge teams face is that not all work is equal: Giving a trivial non-tech example, it doesn't take much expertise to realize that building a tool shack isn't the same category as building a house - and that's not the same category as building a Burj Khalifa.

By simply letting uncategorized work flow into teams, we risk that extremely large packages block the flow of urgent high-value items for a prolonged time. Hence, a first categorization into "Huge, Large, Small" does indeed make sense - The key realization is that these aren't equal.

Hence, dice estimation first categorizes work into the right bin: everything that's workable can be taken up by the team, while everything that isn't workable should be managed appropriately before being worked upon.

Six-sided dice

Story Points worked out terribly, and they led to many dysfunctions - so let's avoid that side track. Instead, let's say that we estimate items in dice values. For small items, our estimation unit would be team-days, for large items, it would be weeks, and for large items, it would be months.

That also leads us to the first logical connection:
One Small Dice at 5 or 6 could already classify for a "1" on a medium dice. We could think about splitting it in two if we want to take it on. On items bigger than 5, we should exercise caution: do we understand it properly, is the cost of implementation warranted - is it really the best thing to do next?

This also applies when an item exceeds 5 on the week count - at that point, it becomes a huge dice: we might want to check a lighweight business case and pivot before proceeding.


How Dice Estimation works

Dice estimation is very easy - you simply ask, "If each day of work was a dice roll, what would be the amount of sides on our dice so that we have enough time to hit a '1' - on average - by the time we are Done?"

That is: if the team expects to be done on 2 days, they would choose 2. If they would expect 3 days, they would choose 3 - if they expect 6, it would be 6 days.


Measuring Confidence

Since dice rolls aren't linear, this form of estimation focuses on levels of confidence in completion within a predefined period of time. Hence, despite the fact that we estimate in days, dice estimation values confidence - and thereby clarity and understanding - above quantity of work.


Estimating with uncertainty

Just like a D6 doesn't mean that "we will roll 6 times, then we will have rolled a 1" - what we estimate is the amount of faces on the dice and thereby the expected amount of days that the item will be done.
Think of it like this: If you have a coin, you have a 50% chance to get heads on the first try - and while statistically, you'd expect to get at least one heads in 2 attempts, it's theoretically possible that you need to flip the coin an infinite amount of times and never get heads.

The estimate in team-days helps us manage our work on the Dailies in an efficient manner - as long as the estimated amount of days hasn't elapsed, it's safe to assume we are on track. Any item reaching and/or exceeding the estimate should be scrutinized: are we stuck, did we misunderstand something - what is going on? The "estimate," thus, isn't even estimating effort - it's forecasting a pivot-or-persevere checkpoint so that we minimize sunk cost.

Lightweight estimation

We can estimate work items either as they arrive, during refinement, or when we get busy - that doesn't even matter. The key reasons for estimation are first, avoiding category errors and second: setting up an intuitive checkpoint so that we don't run out of bounds.
With Dice Estimation, we avoid category errors and institute checkpoints in execution

Of course, you could use dice estimation during Sprint Planning or in a PI-Planning as well, but you don't even need cadences to take advantage of Dice Estimation.


Predictability and Forecasting

As we know, a dice averages at 3.5, so even if we didn't do anything yet except categorizing by dice size, we know that our backlog of workable, small items would take about 2 weeks for 3 items. Once we've done a little bit of work and know how much the completed items took, we can either run a simple Monte Carlo Simulation or apply the Law of Large Numbers to figure out how much known work we have ahead of us. The same applies to the Large and Huge dices, which gives us a fairly decent understanding of when an item might be delivered, based on backlog position.


Not a Performance Metric

The dice metaphor should make it easy to explain how neither can one measure the perfomance of a dice-roller based on how often they roll a certain number, nor does it make sense to "squeeze" estimates - the probability of guessing the right number won't increase by restricting the "allowed" numbers that can be picked. If anything, squeezing estimates would lead the team to spend more time on examining exceptions - hence, reducing performance.

Why you should switch from Story Points to Dices

Dice Estimation is a minimum effort method of right-sizing work that helps with:
  1. Sufficient predictability
  2. Easy plannability
  3. Execution checkpoints
  4. Avoiding large batches that block flow
  5. "Split, Pivot or Defer" decisions