Fail Fast, Move On: April 2022

Thursday, April 14, 2022

Ten things eerily close to Scrum that you may misunderstand

There are some ideas around Scrum that sound a whole lot like they're based on the Scrum Guide - when, indeed, they aren't. Even worse: you might be doing them in a way that could cause problems. Let's dig in.

1 - Scrum Roles

If I ask you what the Scrum roles based on the Scrum Guide are, you're probably very quick to answer: "Scrum Master, Product Owner, Developer."

Wrong.

What? How can that be wrong?

Well, because the Scrum Guide doesn't mention any roles. Indeed, it doesn't even use the term, "role" any more. Scrum Master, Product Owner and Developer are merely accountabilities. That means, someone has to be accountable.

2 - Dedicated Scrum Master

The concept of a "dedicated Scrum Master" isn't mandatory based on the Scrum Guide. Neither is the concept that "a Scrum Master shouldn't be technical, so they don't dump their own bias on the team."

These ideas are often marketed in an attempt to provide job safety for the myriads of people who can do nothing else. You can do Scrum very effectively even if Scrum Mastery is an accountability that rotates among developers, for example, on a per-Sprint basis.

Caveat - they need to know what they need to do, and have enough time for doing it.

3 - Product Owners aren't developers

Neither is the concept of a "Product Owner who isn't a developer, so they don't interfere in the work" prescribed by Scrum. It's entirely possible that, for example, a senior developer assumes the PO accountability. As long as the Product Goal is clear, the Product Backlog well-maintained, Sprints deliver high value and stakeholders know what to expect, there's not going to be much of an issue.

4 - Team autonomy

Are Scrum teams really autonomous? Doesn't Scrum rely on team autonomy?

Try searching for the term "autonom" in the current Scrum Guide - you'll be surprised! Team autonomy isn't mandatory for Scrum. In fact, in larger organizations, it can't be - because if you have larger products, multiple teams need to collaborate.

Before proceeding, let that sink in:
Scrum teams are not free radicals.

5 - Each team has their own Product Owner and Product Backlog

Certain "scaling" approaches suggest that each team has their own Product Owner and Backlog. Well - for a Sprint backlog, that's true. But having a separate Product Backlog for each team adds problems rather than solving them. The Scrum Guide states that multiple teams working on the same product, "should share the same Product Goal, Product Backlog, and Product Owner."

Note that this isn't a Scrum rule, only a suggestion. I will leave it as an exercise to the reader to figure out why that is a good suggestion though. And if the answer is too difficult, try the book "Scaling Lean and Agile Development" by Larman and Vodde.

6 - Teams have their own Definition of Done

One of the first exercises Scrum masters typically do with a new team is to draft up their own Definition of Done. That's not necessary in larger organizations, because the organization could already provide a DoD. "But that's not Agile..." - no! On the contrary, it ensures that the term "Done" has a consistent meaning anywhere in the organization. It reduces complexity and misunderstanding. It's quite irritating when a stakeholder has to ask in every Sprint Review, "What ... exactly ... does 'Done' mean, when you say that?"

Teams are encouraged to add details to the organizational Definition of Done. If they deviate from the organization-wide DoD, though, they invite trouble. Caveat - an organizational DoD should be absolutely minimal, lest it slows down teams with pointless busywork.

7 - Refinement meetings

"In order to prepare for the upcoming Sprint, the Product Owner invites the team for at least one Refinement meeting during the ongoing Sprint." That's a really common practice - but it's not what Refinement is!

Refinement is no Scrum event, for a good reason. Taking a peek at the top couple backlog items, maybe doing a small spike, or creating a wireframe, are all activities that can't really be done very effectively in a centralized meeting. They're better done by brooding over the items from your desk. And sometimes, reaching out to a user has a delay, and we really need the answer before moving forward. The asynchronity of refinement also ensures we don't disrupt the flow of work by having yet another meeting on our calendar.

8 - PO acceptance

During the Review, the team demos their work to the Product Owner, who then accepts it.

Search for anything even remotely resembling this concept in the Scrum Guide. It just isn't there. This sentence contains so many flawed concepts that if you practice this, I wholeheartedly advise a Scrum refresher training.

The Review is about stakeholder alignment, and it's a working session, not an approval meeting.

The Product Owner also doesn't supervise any work done by developers - together with all the stakeholders, they inspect the outcomes of the Sprint, regardless of how much work was done or not done.

Nobody "accepts" individual work items of the team. Developers meet their DoD, and that's it. If the DoD isn't adequate or the backlog items not sufficiently valuable, that's not a problem we fix by adding an approval step to the Review. We fix it by improving our DoD, refinement and planning approaches.

9 - One Increment per Sprint

This is probably one of the oldest misunderstandings of Scrum - that at the end of each Sprint, the team delivers one Product Increment which contains all the changes made to the product for the duration of the Sprint.

An increment is produced whenever a modification has been made to the product, and it's in a usable state. Scrum and MinimumCD are not at odds. Scrum teams can - and actually should - produce as many increments as possible during a Sprint, and also deploy and/or release them as early as possible.

During the Sprint Review, the sum of all increments produced since the last Sprint Review is inspected, as these are the outcomes of the Sprint. This sum could be anywhere from zero to infinity. A team which only works towards a single, integrated Increment at the end of a Sprint, will be much more likely to find themselves empty-handed, so that's a bad strategy to begin with.

10 - Backwards-facing Retrospectives

"The Scrum Team discusses what went well during the Sprint, what problems it encountered, and how those problems were (or were not) solved."

That's a literal quote from the Scrum Guide, so isn't isn't the Retrospective pattern of "What went well, what didn't go well, what we could improve?" - the best way to conduct a Retrospective? No.

"The Scrum Team identifies the most helpful changes to improve its effectiveness." - that's also a quote. Of course, we need to have consider what happened in the last Sprint. The purpose of our Retrospective, however, is looking forward, identifying the most helpful changes. If all you do in your Retrospectives is dwell on the past and make miniscule adjustments so that the same old problems don't constantly haunt you, you're not future-proofing your team.

The most helpful change you can make is that which will make you most successful in the future - which may not necessarily be fixing a problem you had in the past.

Bonus - Ceremonies

It sounds like an innocent mistake, but there's a huge issue hidden behind this label. Scrum events are called "event" instead of "ceremony" for a reason.

An event is, literally, "when something notable happens."
A ceremony is, literally, "doing what we always do in the way we always do it."

Organizations implementing Scrum "ceremonies" usually find themselves getting very low value from these, not understanding why they're important - and not thinking about better ways to achieve better outcomes.

As a consequence, we see purely mechanical Scrum which helps nobody, gets on developers' nerves, and the one thing that makes agility tick - Double Loop Learning - is overboarded before it ever started.

Friday, April 8, 2022

Why test coverage tagets backfire

Most organizations adopt a "Percent-Test-Coverage" metric as part of their Definition of Done. Very often, it's set by managers, and becomes a target which developers have to meet. This approach is often harmful. And here's why:

The good, bad and ugly of Coverage Targets

Setting coverage targets is often intended as both a means of improving the product, as well as improving the process - and getting developers to think about testing.

The good

It's good that developers begin to pay closer attention to the questions, "How do I test this?" as well as, "Where are tests missing?" Unfortunately, that's about it.

The bad

The first problem is that large legacy code bases starting with zero test coverage will put developers into a pickle: If I work on a component with 0% test coverage, then any amount of tests I write will still keep that number close to Zero. Hence, the necessary compromise becomes not setting a baseline number, just ask for the number to increase. The usually envisioned 80+% targets are visions for a distant future rather than something useful today.

Looking into the practice - as long as the organization is set up to reward minimizing the amount of time invested into unit testing, the outcome will be that developers try to meet the test coverage targets with minimum effort.

Also, when developers have no experience in writing good tests, their tests may not do what they're supposed to do.

The ugly

There are many ways in which we can meet coverage targets that fulfill Goodhart's Law:

Any metric that becomes a target stops being useful.

Often, developers will feel unhappy producing tests that they create only to meet the target, considering the activity a wasteful addition to the process, which provides no benefit.

At worst, developers will spend a lot of time creating tests that provide a false sense of confidence, which is even worse than knowing that you have no tests to rely on.

But how can this happen? Let's see ...

The first antipattern is, of course, to write tests that check nothing.

Assertion-free testing

Let's take a look at this example:

Production code

int divide(int x, int y) { return x/y; }

Test

print divide(6,2);

When looking at our code coverage metrics, we will see a 100% coverage. But: the test will always pass - even when we break the production code. Only if we were to inspect the actual test output (which is manual effort that we probably won't do) - we will what the test does: There is no automated failure detection, and the tests aren't even written in a way that we would detect a failure: Who would detect the problem if suddenly, we got a "2" instead of a "3" - we don't even know which result would have been correct!

Testing in the wrong place

Look at this example:

Class Number {
int x;
void setX(int val) {x=val;}
void getX() {return x;}
void compute() { x=x?x/x^x:x-x*(x+x); }
}

n = new Number();
n.set(5);
assert (N.x == 5);
assert (N.get() == 5);

In this case, we have a code coverage of 75% - we're testing x, we're testing the setter, and we're testing the getter.

The "only" thing we're not testing is the compute function, which is actually the one place where we would expect problems, where other developers might have questions as to "What does that do, and why are we doing it like that?" - or where various inputs could lead to undesirable outcomes.

Testing wrongly

Take a peek at this simple piece of code:

int inverse (int x) { return 1/x; }
void printInverse(int x) { print "inverse(x);" }

assert (inverse(1) == 1);
assert (inverse(5) == 0);
spyOn(print);
printInverse(5);
assert (print.toHaveBeenCalled());

The issues

There are numerous major problems here, all of which in combination make that test more dangeous than helpful:

Data Quality

We may get a null pointer exception if the method is called without initializing x first, but we neither catch nor test this case.

Unpexpected results

If we feed 0 into the function, we get a Divide by Zero. We're not testing for this, and it will lead to undesired outcomes.

Missing the intent

The inverse function returns 0 for every number other than 1 and -1. It probably doesn't do what it's expected to do. How do we know? Is it poorly named, or poorly implemented?

Testing the wrong things

The print function's output is most likely not what we expect, but our tests still pass.

Conclusion

If we rely on coverage metrics, we might assume that we have 100% test coverage, but in practice, we may have very unreliable software that doesn't even work as intended.

In short: this way of testing tests that the code does what it does, not that it does what it should.

Then what?

The situation can be remedied, but not with numerical quotas. Instead, developers need education on what to test, how to test, and how to test well.

While this is a long topic, this article already shows some extremely common pitfalls that developers can - and need to - steer clear from. Coverage metrics leave management none the wiser: the real issue is hidden behind a smoke screen of existing tests.