Sunday, October 3, 2021

Five reasons for having a Definition of Done

"Do we really need to waste our time to come up with a Definition of Done?"  - well: of course, you're free to do that or not. Just give me your attention for a few minutes to consider the multiple benefits you gain from having one.

Before we start
Many people use the term "Done" in reference to their share of the work, as in, "I am done." This leads to multiple, oftentimes inconsistent, definitions of "development done," - "testing done," - "deployment done." While this may be a choice the team makes, customers don't care who specifically is done with their own work. They care when they receive a product - hence:
The Definition of Done is not set for specific phases of the working process
- it refers to product increments that have passed all stages of the team's process.
As such, it encompasses all the usual activities related to an organization's development process and applies to every single product backlog item.

Now - what does a Definition of Done offer?

#1 - A Quality Standard

"But I thought you had performance tested it?" - "No, a performance test was not part of the acceptance criteria!" - "It's obvious that with that kind of performance, the feature is entirely unusable!" - "Then you should have requested us to performance test it!"

Well - in such a conversation, nobody is a winner: the customer is dissatisfied because they don't get what they need, developers are dissatisfied because they just got an extra load of work, and the Product Owner is dissatisfied because they ended up getting no deliverable value.

The Definition of Done aligns stakeholder expectations on product quality.

Should the customer be bothered to re-confirm their quality expectations on every single user story, and should they be required to re-state that this expectation applies for every release? No.

Once everyone is clear that something is a universal quality requirement, adding a simple statement like, "all pages load within less than a second," would make it clear that the team's work isn't done until the customer's demand is satisfied.

#2 - Common understanding

"I'm done." - "When can I have it?" - "Well, I still need to commit it, then we'll produce a build package, then it goes to testing, let's see where it goes from there ..." - "But didn't you say you were done?" - "I'm done with development."

When different people have different mental models of what "done" means, everyone uses the term in the way that is most convenient for them.

The Definition of Done defines how the term, "Done" is used in the organization.

So much organizational waste - from extended meetings, over people trying to do things that can't possibly succeed, all the way to massive management escalations, is attributed toward misaligned use of this short, simple word: "Done."

#3 - Simpler communication

"I'm done." - "Did you do any necessary refactorings?" - "Will do later." - "If it's not refactored, you can't commit it! And did you get your code reviewed?" - "Do I really need to?" - "That's part of our policy. Where are your unit tests?" - "I don't think that module needs tests." - "Dude, without tests, it's un-maintainable! And did you check with the testers yet?" - "Why should I?" - "What if they found any problems?" --- "Sorry to disturb: I saw that the feature is marked as Done. Can I use it yet?" - "Almost." - "No!" - "Okay, I'm confused now, I'll set up a meeting later so you can explain the status to me."

When everyone understands what needs to be done to be "Done" (pun intended) - communication is much simpler - long, deep probing to discover the actual condition of an individual work item become un-necessary.

The Definition of Done simplifies conversations.

Everyone should be clear what it means when someone uses the term "Done" - what must be understood, and what can be understood.

Everyone must understand, "When it's not covered by the DoD - you can't expect that someone did it, especially when it wasn't explicitly agreed beforehand."

Likewise, everyone can understand, "When it is covered by the DoD - you don't need to ask whether someone did it when people say it was done."

#4 - Providing clarity

"I need an online shop." - "No problem." - "Can you store the orders in the database?" - "We know how to do our job!" - "Can you make sure that baskets don't get lost when users close the browser?" - "That makes it much more complicated. How about we do a basic version now, and we'll add basket session persistence later on once you have a solid user base?" - "Can you make sure the shop has good performance?" - "What do you mean, 'good performance?'"

Stakeholders are often unaware what the team normally does or doesn't do and what they can definitely expect from the product. Hence, they may communicate incomplete or overspecific requirements to the team, both of which are problematic.

Incomplete requirements lead to low-value products that lack essential features, oftentimes leading both parties to conclude that the other party is an idiot.

Overly specific requirements, on the other hand, usually lead to sub-optimal implementations and waste effort when there are easier, better ways to meet user expectations than specified by the customer.

The Definition of Done avoids over-specification on items covered in the DoD.
It likewise avoids under-specification for points not covered in the DoD.

Within the confines of the Definition of Done, the team gains freedom what to do and how to do things, as long as all aspects of the DoD are met. It allows customers to keep out of details that the team handles within the standards of their professional ethics.

#5 - Preventing spillover work

"We're done on this feature." - "Splendid. Did you do functional testing?" - "Yup." - "And?" - "3 defects." - "Are they fixed yet?" - "If we'd do that, we'd not meet the timeline, so we deferred them until after the Release." - "But ... doesn't that mean you still have work to do?" - "Not on the feature. Only on the defects." - "But don't defects mean the Acceptance Criteria aren't met?" - "The defects are so minor that they can be fixed later ..."

We see this happening in many organizations.  Unfortunately, there are two insidious problems here:

1. Based on the Pareto principle, the costs of the undone work could massively outweigh the cost of the done work, potentially toppling the product's entire business case. And nobody knows.

2. Forecasting future work is a challenge when capacity is drained in an ill-defined manner. The resulting loss of transparency decreases customer trust and generates stress within the team.

The Definition of Done ensures that there is no future work induced by past work.

The Definition of Done is a protection for the team, in that they will not accumulate a constantly rising pile of undone work which will eventually incapacitate them.

Likewise, a solid DoD protects the business, because there is a much lower risk that one day, developers will have to state that, "We can't deliver any further value until we invest a massive amount of time and money to clear our debt."


The reasons for having a Definition of Done may vary from team to team, and each person might find a different reason compelling. While it's definitely within the realm of possibility that none of the benefits outlined in this article are meaningful in your context, at least ponder whether the hour-or-two it takes to align on the key points of a Definition of Done are worthwhile the amount of stress you might have to indulge by not having one.

A Definition of Done is not cast in stone - it's up to negotiation, and points can be added and removed during Inspection and Adaptation events, such as team Retrospectives. As long as everyone can agree to a change, that change is legitimate.

If you don't have a DoD yet, try with a very simple one and take it from there.

As a conclusion of this article, I'll even throw in my favorite minified DoD:

No work remaining.

Sunday, August 8, 2021

The Product Owner Role

What concerns me in regards to the Product Owner role: it's so horribly diluted by many organizations that sometimes, practitioners ask me "What's the meaning of my work, and what should I do in order to my job well?"

There's so much garbage out there on the Internet regarding the Product Owner Role that it's very difficult for someone without significant experience to discern what's a proper definition that would help both companies define proper job descriptions, and provide guidance to PO practitioners how to improve. So - let me give you my perspective.

Great Product Ownership

I would classify great Product Ownership into four key domains:

While one of the core responsibilities of the Product Owner is the Product Backlog, it should be nothing more than the expression of the Product Owner's intent. And this intent should be defined by acting on these four domains.

Product Leadership

The Product Owner should be providing vision and clarity to the team, the product's stakeholders, customers and users alike.

Product Vision

Who is better than the Product Owner at understanding what the product is, which problem it solves, why it's needed and where it's going? Great Product Owners build this vision, own it and inspire others to follow them in their pursuit of making it happen.

This vision then needs to be made specific by elaborating long-term, mid-term and short-term objectives - a guiding visionary goal, an actionable product goal and the immediate sprint goal.

Clarity of Purpose

While the Vision is often a bit lofty, developers need substantial clarity on "what happens now, what happens next?" - and customers will want to know "what's in it - for me?" The Product Owner must be crystal clear on where the product currently is, and where it's going next. They must be able to clearly articulate what the next steps are - and what the next steps are not. They need to be able to state at any given point in time what the highest priority is, and why that is so.

The Product Backlog is then the place where the PO maintains and communicates the order of upcoming objectives and content.


The Product Owner must communicate their product with both internal and external stakeholders. Life is never easy, so they must rally supporters, build rapport with sponsors, and resolve the inevitable conflicts amongst the various groups of interest.


The product can only be as successful as the support it receives. As such, the Product Owner must build a broad network of supporters, and continuously maintain and grow their influence in their organization - and for the product's market. Keeping a close eye on stakeholder satisfaction and interest, continuously re-kindling the fire of attention drawn to the product is essential in sustaining and fostering the product.


As soon as multiple people are involved, there tend to be conflicts of interest. Even if there is only one single stakeholder, that stakeholder has choices to make, and may need to resolve between conflicting priorities themselves.

In peace times, the Product Owner builds common ground with stakeholders, so that they are more likely to speak positively of the product.
In times of crisis, the Product Owner understands the sources of conflict, ebbs the waves, reconciles differences, brings people together to work out positive solutions, and mends wounds.


The Product Owner is the go-to source for both the team and the product's stakeholders when they want to know something about the product's purpose or intent. The Product Owner has both factual knowledge and inspiring stories to share.

Product Knowledge

Caution - the Product Owner isn't a personified Product Instruction Manual, and they don't need to be. Much rather, they should be the people to be able to explain why the product currently is the way it is, and why it's going to be the way it's going to be. They must be able to fully understand the product's capabilities and purpose - and they must be able to convey why these are good choices. 
From a more negative take, the Product Owner must understand the weaknesses of the current product and have ideas how to leverage or compensate these.
And for all of this, the Product Owner should have the domain expertise, market information and hard data to back up their statements.


"Facts tell, stories sell." - the Product Owner's role is to sell the product, both to the team, and to the customers. They should be able to tell a relatable, realistic story of what users want to and/or are doing with the product, what their current pains are, and what their future benefits will be.
"Speaking to pain and pleasure" is the game - touch hearts and minds alike. The Product Owner should be NEAR their users, and bring developers NEAR as well.

Business Acument

The Product Owner's primary responsibility is to maximize the value of the product, by prioritizing the highest value first, and by making economically sensible choices both in terms of obtaining funding and spending.

Value Decisions

There are three key value decisions a Product Owner faces every day:
  1. What is our value proposal - and what isn't?
  2. What value will we deliver now, and what later?
  3. What is not part of our value proposal, and will therefore not be delivered at all?
The question oftentimes isn't whether the customer needs something, but whether they need it so urgently that other things have to be deferred or be ditched.

When anyone, customer or developer alike, asks the Product Owner what is on the agenda today, this week, or this month - the Product Owner must be able to answer in a way that the underlying value statements are clear to all.


With infinite money and infinite time, you could build everything - but since we don't have that luxury, the Product Owner must make investment decisions - what is a positive business case, what is a negative business case, what can we afford to do - and what can we afford to not do?

The Product Owner should be able to understand the economical impact of any choices they make: More people can do more work, but burn the budget faster. Every feature has an opportunity cost - all other features that get deferred because of it. Fines could be cheaper than implementations, so not everything "mandatory" must be done. These are just a few.
There is often no straightforward answer to "What should we spend our money on this month?" - and considering all of the trade-offs from every potential economic angle before bringing product related decisions to the team or towards stakeholders is quite a complex endeavour.

Economic decisions need to then be transported transparently towards the relevant organizational stakeholders - to team members, who may not understand where priorities come from, to customers who may not understand why they don't get their request served - to managers, who may not understand why yesterday's plan is already invalid.

Given all of these Product Owner responsibilities above, it should be quite clear that the Product Owner must focus and has little time to take care of things that are ...

Not the Product Owner's concern

Three domains are often seen as expectations on the Product Owner, which are actually a distraction from their responsibilities, and putting them onto the PO's shoulders actually steals the time they need in order to do the things that make them a good Product Owner:

Project Management

The Product Owner is not responsible for creating a project plan, tracking its progress or reporting status.

Let's briefly describe how this is supposed to happen:

Planning is a collaborative whole-team exercise, and while the Product Owner participates and provides context, a goal and a sorted backlog as input, they are merely contributing as the team creates their plan.

Developers are autonomous in their work, and the Product Owner should rely on being requested for feedback whenever there's visible progress or any impediments hinder the planned outcomes. If the team can't bear the responsibility of their autonomy properly, that would be a problem for the Scrum Master to tackle. The PO should entirely keep out of the work.

Since Sprint Reviews are the perfect opportunity to inspect and adapt both outcomes and progress, no status reporting should be required. A "gemba mindset" would indicate that if stakeholders are concerned about progress, they need to attend the Reviews, and should not rely on hearsay, that is, report documents. 

Team Organization

The Product Owner is not reponsible for how the team works, when they have meetings or who does what.

When Scrum is desired as a way of working, the team should have a Scrum Master. The worst thing a Product Owner can do with their time is bother with introducing, maintaining or optimizing Scrum - they should be able to rely on having proper Scrum in place.

Team events, such as Plannings or Reviews, help the team do their work, and as such, should be organized by the developers themselves, because only they know when and how they need these. The Scrum Master can support, and the Product Owner should attend - but the PO shouldn't be bothered with setting these up, and most definitely shouldn't run the entire show.

If anyone assigns tasks on a Scrum team, it's the team members self-organizing to do this. Having the Product Owner (or Scrum Master) do this job is an antipattern that will hurt the team's performance. The Product Owner should not even need to know who does what, or when.

Development Work

The Product Owner develops the product's position, not a technical solution. They have a team of experts to do this, and these experts (should) know better than the PO how to do this. That means the PO should be able to keep entirely out of design, implementation and testing.

Product Owners actively designing solutions often fall into the "premature optimization" trap, resulting in poor solutions. The best approach is to have the Product Owner collaborate with developers as needed to get sufficient clarity on how developers would proceed, but to focus fully on the "What" and keep entirely out of the "How."

When Product Owners have time for implementation, the product is most likely going to fail: while they're paying attention to the development, they aren't focusing on what's happening to the product and its customers out in the market.

Product Owners have a team of professionals around them who are supposed to deliver a high quality "Done" Increment. If their team has no quality assurance, the solution is to bring testers in, not to delegate testing to the Product Owner.

Thursday, August 5, 2021

Continuous Integration Benchmark Metrics

 While Continuous Integration should be a professional software development standard by now, many organizations struggle to set it up in a way that actually works properly.

I've created a small infographic based on data taken from the CircleCI blog - to provide an overview of the key metrics you may want to control and some figures on how the numbers should look like when benchmarked against industry performance:

The underlying data is from 2019, as I could not find data from 2021.

Key Metrics

First things first - if you're successfully validating your build on every single changed line of code and it just takes a few seconds to get feedback, tracking the individual steps would be overkill. The metrics described in this article are intended to help you locate improvement potential when you're not there yet.

Build Frequency

Build frequency is concerned with how often you integrate code from your local environment. That's important because the assumption that your local version of the code is actually correct and consistent with the work of the remaining team is just that - an assumption, which becomes less and less feasible as time passes.

By committing and creating a verified, valid build, you reset the timer on that assumption, thereby reducing the risk of future failure and rework.

A good rule of thumb is to build at least daily per team member - the elite would validate their changes every couple of minutes! If you're not doing all of the following, you may have serious issues:

  • Commit isolated changes
  • Commit small changes
  • Validate the build on every single change instead of bulking up

Build Time

The amount of time it takes for a committed change until the pipeline has successfully completed - indicating that the build is valid and ready for deployment into production.

Some organizations go insanely fast, with the top projects averaging at 2 seconds from commit all the way into production - and it seems to work for them. I have no insights whether there's much testing in the process - but hey, if their Mean Time to Restore (MTTR) on productive failures is also just a couple minutes, they have little to lose.

Well, let's talk about normal organizations - if you can go from Commit to Pass in about 3 and a half minutes, you're in the median range: half the organizations will still outperform you, half won't.

If you take longer than 28 minutes, you definitely have to improve - 95% of organizations can do better!

Build Failure Rate 

The percentage of committed changes causing a failure.

The specific root cause of the failure could be anything - from build verification, compilation errors or test automation - no matter what, I'm amazed to learn that 30% of projects seem to have their engineering practice and IDE tooling so well under control that they don't even have that problem at all, and that's great to hear. 

Well, if that's a problem for you like 1/5th of the time, you'd still pass as average, but if a third or more of your changes are causing problems, you should look to improve quickly and drastically!

Pipeline Restoration Time

How long it takes to fix a problem in the pipeline.

Okay, failure happens. Not to everyone (see above), but to most. And when it does, you have failure demand - work only required because something failed. The top 10% organizations can recover from such a failure within 10 minutes or less, so they don't sweat much when something goes awry. If you can recover within the hour, you're still on average.

From there, we quickly get into a hugely spread distribution - the median moves between 3 hours and 18 hours, and the bottom 5% take multiple days. The massive variation between 3 and 18 hours is explained easily - if you can't fix it before EOB, there's an entire night between issue and resolution.

Nightly builds, which were a pretty decent practice just a decade ago, would immediately throw you at or below median - not working between 6pm and 8am would automatically botch you above 12 hours, which puts you at the bottom already.

First-time Fix Rate

Assuming you do have problems in the pipeline - which many don't even have, you occasionally need to provide a fix to return your pipeline to Green condition.
If you do CI well, your only potential problem should be your latest commit, and if you follow the rules on build frequency properly, the worst case scenario is reverting your change, and if you're not certain that your fix will work, that's the best thing you can do in order to return to a valid build state.

Half the organizations seem to have this under control, while the bottom quartile still seems to enjoy a little bit of tinkering - with fixes being ineffective or leading to additional failures. 
If that's you, you have homework to do.

Deployment Frequency

The proof of the pudding: How often you successfully put an update into production.

Although Deployment Frequency is clearly located outside the core CI process, if you can't reliably and frequently deploy, you might have issues you maybe shouldn't have.

If you want to be great, aim for moving from valid build to installed build many times a day. If you're content with average, once a day is probably still fine. When you can't get at least one deployment a week, your deployment process is definitely ranking on the bottom of the barrel and you have definite room for improvement.

There are many root causes for lower deployment frequency, though: technical issues, organizational issues or just plain process issues. Depending on what they are, you're looking at an entirely different solution space: for example, improving technically won't help as long as your problem is an approval orgy with 17 different comittees.


Continuous Integration is much more than having a pipeline.

Doing it well means:

  1.  Integrating multiple times a day, preferably multiple times an hour
  2. Having such high quality that you can be pretty confident that there are no failures in the process, 
  3. And even when a failure happens, you don't break a sweat when having to fix it
And finally, your builds should always be in a deployable condition - and the deployment itself should be so safe and effortless that you can do it multiple times a day.

Thousands of companies world-wide can do that already. What's stopping you?

Wednesday, July 21, 2021

Definition of Done vs. Acceptance Criteria

Let's clear up a very common confusion: What's the difference between "Acceptance Criteria" and the Definition of Done, and which is which?

Sometimes, people ask "what is the Definition of Done for this User Story?"
And then, they proceed to document a feature's Acceptance Criteria like this:
  1. Feature works
  2. All tests passed
  3. Documentation written
To make it short: in this example, AC's and DoD are backwards. Now, let's clarify.

Acceptance Criteria

A backlog item's Acceptance Criteria are stated from the perspective of the item's consumer: what defines, for them, that the item is successful?
I hesitate to say, "user acceptance", because sometimes, backlog items are not intended for the end user. If we ignore that case, though, the Acceptance Criteria could be written like simplified User Acceptance Tests: verifiable statements that can be answered with "yes" or "no", and tested by examining the product itself.

Good Acceptance Criteria would be, for example:
  • I get informed about my current balance
  • I get a warning when my balance is negative
  • I get a notification when my balance has changed
As you see, Acceptance Criteria are things that the product does when the item is completed. They are not things that the development team does to the product.

Definition of Done

The Definition of Done is, as the term says, a definition, of what everyone in the development organization, the Product Owner, as well as stakeholders, should understand, when someone uses the term, "Done." 
The Definition of Done is a set of mandatory activities that apply to every single backlog item. It's a checklist-style itemization of things that the team does to the product in order to call a single backlog item "Done."

An example of a Definition of Done is,
  • Code written
  • All tests Passed
  • Successful deployment to Test environment
  • Mandatory documentation completed
When you have such a definition, it does not make sense to add these points again to every single backlog item.

But ... the DoD doesn't apply to this item!

Well, it's entirely feasible that the amount of effort to complete a specific DoD activity on an item is Zero. That means it's by default "Done", because all the work required is completed with zero work.
Another thing is when you specifically want to make an exception from your DoD for a specific item. Well, as long as it's an exception, you understand and accept the consequences, and everyone including the customers agrees to take this route, that's okay, because transparency is there.
Although I would caution that if it happens too often, you may want to check whether your DoD makes sense.

But ... it's a specific task for this item!

That's okay. Then add it that task to the item. It's still not an Acceptance Criterion, and it doesn't affect the applicability of the overall Definition of Done.

Discerning: AC or DoD?

  Acceptance Criteria   Definition of Done
Describes ...   What the product does   What the team does
Quality of ...   Product from user perspective   Work from process perspective
Applies to ...   Specifically this backlog item   Generally all backlog items
Met when ...   Verified by a test   Agreed inside team

Thursday, July 8, 2021

The wrong professionals

Sometimes, I struggle with teams failing to understand the engineers' responsibility in quality: "I have asked the Product Owner whether we should apply Clean Code practices, and she said, she doesn't need it."

This is already not a conversation that should happen.

Here's a little metaphor I like to use:

When my electrician asks me whether they should insulate the wiring, I have the wrong electrician. 

And that has a number of consequences:

  • Professionals do not negotiate the standards of professionalism with their customer. 
  • Customers expect that the professional brings and adheres to professional standards, which is why they get hired. 
  • Customers are not in a position to judge whether a professional standard is applicable in their context, and asking them to do so shifts responsibility to someone who can't bear it. That itself is un-professional.

So, what does that mean for your Agile team?
  • Be as professional as you can, and continuously improve.
  • Do not ask the customer to tell you what "professional" is.
  • Instead, ask them whether your standards of professionalism satisfy their needs.
  • You can't delegate the responsibility for the quality of your work to anyone else.
    The attempt is already un-professional.

Sunday, June 27, 2021

The Code Review Department

 Let met tell you the story of a company that had problems with their code quality, and solved it the Corporate way ... well, ehum ... "solved."

Poor Code Quality

When I met that organization, their development process basically looked like this:

Developers took their specifications, completed their coding and passed the code for testing. Simple as that. 

After a major initiative failed, the Lessons Learned revealed out that the root cause for failure was the poor code quality, which basically made code un-maintainable, hard to read and difficult to fix defects without adding more defects elsewhere. 

A problem that would require rules and regulation to fix. So, it was decided that Code Reviews should be done. And - of course, people complained that code reviews took time, and if developers would do the Reviews, they would be even slower in getting their work done.

Stage 1: Introducing: the Code Review Department

What would be the Corporate way, except to introduce a new department of specialists dedicated to the specific task? So, they opened the Code Review Department. Let's call them CRD.

To keep Code Reviews as neutral and objective as possible, the Review Department was strictly separated from the others: physically, functionally and logically. They were not involved in the actual development process, and only reviewed the code that was actually delivered.

With the backing of senior management, the CRD adopted a very strict policy to enforce better code quality and defined their departmental KPI's, most notably:

  • Remarks made per reviewer: How many review remarks each reviewer had found, per commit and per month. This allowed the CRD management to see which reviewers were most keen and could best spot problems.
Now, how could this possibly go wrong?

Stage 2: Code Quality improves

Since it's easy to objectively measure this data, the reviewers got busy and lots of review comments were made. Proudly, the CRD management presents statistics to IT leadership, including their own KPI's as well as the obvious metric they can present to measure the software delivered by the Software Development Department (SDD):

  • Code Quality per developer: How many review remarks were made, per commit and month, per developer. This allowed the SDD to put a KPI on the quality of code provided by developers. And, conveniently, it would also justify the existence of the CRD.

With the blessings of IT leadership, SDD Management thus adopted the KPI.

So, things were going well ...

Stage 3: Code Review has a problem

Now, developers aren't dumb people. They aopted Linting and basically got their Review remarks down to Zero within in pretty short time. Now, the CRD should be happy, shouldn't they?

Turns out, they weren't. And here was the problem: The Reviews per reviewer metric tanked. Not only didn't reviewers suddenly fail their quota, CRD management figured out they probably weren't looking in the right places.

So what did the CRD reviewers, not being stupid, either, do? When they looked at code, they screened for patterns they considered problematic and introduced a new rule.

The numbers for Review remarks per reviewer rose again, and management was doubly happy: not only were the numbers of the CRD fine again, reviewers were continuously improving their own work.

Great! Things are even getting better!

Stage 4: Developers have a problem

Developers started to get frustrated. Their Lint rules were no longer working in getting them 100% passed reviews. What was worse, that they found whenever their Linting got updated, code was rejected again, and they needed to figure out the new rule to add to their Lint configuration. Not only did this process consume a lot of time, it distracted from actual development work.

Well, developers caught on that meeting the rules wasn't going to get them 100% reviews any more, so they began introducing honeypot infringements: They made obvious mistakes in their code so that reviewers would remark them, and they'd already have the fix in place right when the review came in.

Everyone happy: CRD met their KPI's and developers were no longer forced to constantly adopt new rules. Except ...

Stage 5: Management catches on

CRD reviewers were content, because they had plenty review comments again until the CRD management started to measure policy vialotations by type, and figured out that developers had stopped improving and were making beginner mistakes again. Of course, they reported their findings to higher management. And thus, a new KPI was born:

  • Obvious mistakes per developer: How many obvious review infringements were made by team, with a target of Zero and published transparently throughout the SDD.

Well, again, developers aren't stupid people. So, obviously, they would meet their KPI. How?

You might have guessed it: they would hide their, ehum, "mistakes" in the code so they were no longer obvious, and then placed bets who could get most of them past Review without being caught.

Guess who won the game?

Stage 6: Code quality deteriorates

The CRD reviewers got stuck in a game of whack-a-mole with developers, who constantly improved their tricks of hiding more and more insidious coding errors, while updating their Linting rules right away when reviewers added a new rule.

Until that day when a developer hit the jackpot by splipping a Zero-Day exploit past Review. 

The CRD management no longer trusted their own Reviewers, so they added peer review reviews and another KPI:

  • Issues slipped past Inspection: Reviews were now a staged process where after review by a Junior Reviewer, a Senior Reviewer would review again to figure out what the first reviewer had missed. Every Reviewer would get a Second-Review Score, and that score would need to be Zero. So, they started looking deeper.

You can guess where this is leading, right?

Stage 6: Code quality goes to hell

Now, with four-eye reviews and a whole set of KPI's, nothing could go wrong any more?

Code Reviewers were doing splendidly. They always had remarks and the department's numbers truly validated that a separate, neutral Code Review Department was absolutely essential. 

So the problem was fixed now.

Well, except one small fly in the ointment. Let's summarize the process from a development perspective:

  1. When developers make mistakes, they are reprimanded for doing a poor job.
  2. When developers make no mistakes, new rules are introduced, returning to (1).

Developers now felt like they were on a sinking ship. It was easier to simply continue making mistakes on existing rules than to adopt new rules. They came to accept that they couldn't meet their KPI anyways. 

Since they could no longer win, they stopped caring. Eventually, the review department was relabeled to complaints department, and nobody took their remarks seriously any more. Developers would now simply add buffer time to their estimates, and called it the "Review buffer".

By now, the CRD was firmly established, and they were also fighting a losing battle: try whatever they might, they got more and more overloaded, because truly necessary review remarks and more and more horrible code got more and more common. They needed to add staffing, and eventually outnumbered the developers.

The Code Review Department became the last bastion against poor code quality. A bulwark defying the storming seas of bad code. 

So, what's your take ... 

is a Code Review Department a good idea?

How would you do it?

Wednesday, June 16, 2021

A day in the life of an Enterprise Coach

 "Michael, what does an Enterprise Coach do?" - it's a good question that people sometimes ask me, and frankly, I can't say I have "the" answer. So, I will give you a small peek into my journal. 

ECDE* (* = European Company Developing Everything) is a  ficitional client.
Like the company, the day is fictional. The described events are real. 

Disclaimer: This day is not representative of what all enterprise coaches do, nor of all the things an enterprise coach does. There is no routine. Every day is different. Every client is different. Every situation is different. Connecting the dots between all the events is much more important than all of the activities combined.

Before we start

"Enterprise Coaching" isn't simple or straightforward. There's often more than one coaching objective to pursue simultaneously, and progress requires diplomacy, patience, tons of compromises and long-term strategic thinking. Some topics can be solved in a single sessions, while others may take a long time to change. It may take years until people understand the things they were told on day 1 of their Agile training.

Whereas I typically coach for effectiveness, sustainability and quality, there's a potentially infinite amount of potential enterprise coaching objectives, including - without limitation - the introduction of frameworks, methods, practices, cultures, mindset and so on. I see the latter as means to an end, not as viable objectives to pursue.

My definition of coaching success is therefore not "X amounts of teams doing Scrum", "Y amount of Scrum Masters certified" or "Z Agile Release Trains Launched." I define success as, "The client has the means (attitude, knowledge, learning, innovation) for achieving what they want."

On the average day, I jump a lot between all levels and functions of the organization, from team member all the way to senior management - from IT over business towards administrative areas - and simultaneous work on short-term as well as long-term topics. 

While I'm trying to "work myself out of a job", it's usually the lack of knowledge and the experience regarding certain concepts or practices that may require me to involve longer and deeper than initially bargained for.

A coach's day

8:00 am - Getting started

I take some time to reflect. Yesterday was an eventful day at ECDE. A critical member in one of the teams just announced they would be leaving, we had a major production incident - and management announced they want to launch a new Agile Release Train. Business expressed dissatisfaction with one of the Release Trains and there are quarrels about funding. Okay, that's too much: I have to pick my battles.

So I choose to completely ignore the head-monopoly issue, the incidents and the business dissatisfaction. I trust the teams that they can handle this: I am aware, I wasn't asked for support.

There are no priorities for coaching in ECDE. My trains self-manage their Improvement Backlogs. I haven't gotten senior management to adopt a company-wide "ECDE improvements" backlog yet, which would create much more transparency about what's actually important.

The Tyranny of the Urgent is ever-present. I have to make time for strategy, otherwise I'd just run after the latest fires. Most of the stuff I came for are long-term topics anyways, but there are always some quick wins. 

So, what are the big roadblocks my client needs to overcome?

Ineffective organization, low adaptivity, lack of experience, and last but not least levels of technical debt that might exceed ECDE's net present value.

I check my calendar: Personal coaching, a strategy session and a Community workshop. Fair enough.

9:00 am - Personal Coaching / RTE

In a SAFe organization, Release Train Engineers (RTE) are multipliers of agile ways of working, practice and mindset within the organization, which is why I spend time with them as much as I can. They often serve as culture converters constantly struggling to protect their Agile Release Train from the continuously encroaching, pervasive Tayloristic, Command+Control mindset in the areas of management not yet participating in the transformation efforts.

With this come hundreds of small challenges, such as rethinking phase gates and reporting structures, decentralization, meeting information needs of developers and management alike, and driving changes to the organizational system to foster self-organization, growth and learning.

Some topics go straight into my backlog because they're over-arching and I need to address these with higher management. For others, we determine whether the RTE can handle these, needs methodology support (tools, methods, frameworks, canvases etc.) or self-learning resources (web articles, books etc.) I clarify the follow-ups and send some links with information.

The RTE role is challenging to do well, and oftentimes is pretty ungrateful. It's essential that the RTE has those precious moments where the seeds of change turn to fruition.

10:00 am - Sourcing Strategy

ECDE has outsourced to many different vendors scattered across the globe. And of course, every piece of work goes to the lowest bidder, so senior managers are looking at a development value stream as fragmented as it could possibly be. The results are underwhelming... "but hey, we're saving costs!"

I'm not a fan of cost accounting, but here I am, discussing cost of delay, opportunity costs, hidden costs, sunk costs and all of the costs associated with the Seven Wastes of Lean, re-writing the business case for the current vendor selection strategy and make the Obvious visible. We can't change long-term contracts on a whim, so we need a strategy. We schedule a follow-up with Legal and Procurement to explore intermediate options.

When you know what you need to do, and can't do it.

12:00 pm - Lunch time

The business dissatisfaction explodes into a spontaneous escalation. The line manager insists the teams  must do overtime to meet the deadline for the expected fixed scope. I politely invite him to an Agile Leadership training. He declines. He demands that we must, quote, "get a proper Project Manager by the end of the month" and ends the call.

One step forward, two steps back. Happens all the time.

1:00 pm - Finally. Lunch.

A Product Owner pings me, because she's unclear about team priorities. Turns out the team wants to apply Clean Code principles, but the PO is concerned about Velocity. While I have my meal, we're having a conversation about the impact of quality upon speed and quantity. She decides to give the team room for improving their engineering practices. We agree to follow up in a month.

I shake my head. ECDE developers still need permission to do a proper job.

2:00 pm - Product Workshop

I'm joining a Product People Community workshop to introduce the concept of a Demand Kanban. I gather some materials, prepare a Mural board and grab a cup of tea. During the workshop, I explain some basic concepts, and we spend most of our time design-thinking some changes to the current process. We create a small backlog of experiments they would like to try.

The "knowledge" these POs got from their certification training is a laughing stock. I do what I can, although this will take a lot more than a handful of workshops.


5:00 pm - Let's call it a day.

A Scrum Master spontaneously calls. I'm really happy to have 1:1 conversations with people close to the teams, so I pick up despite my working day being over. Her question is inconspicious. Instead of giving a quick answer, I'm curious what her team tried and learned. I smell a systemic issue of which she only barely scraped the surface.

I suggest that she could run a Topic Retro with her team. She's stumped. For her, a Retro was always a 30-minute, "Good/Bad/Improve" session focused on the last Sprint, so she asks: "How do I do a Topic Retro?" This turns into a two-hour call.

ECDE provides abysmal support for new Scrum Masters. I decide to let it go, because there's a dedicated team in charge of Scrum Mastery. I feel bad for a moment, but my energy is limited.

7:00 pm - Finally, done.

Almost. Now, all I need to do is organize my coaching Kanban, then do the administrative stuff.

I take a look at the board and scratch my head: "Solved two problem today, found five additional problems." I take a few stickies from the "Open" column and move them straight into the trashbin. 

It's almost 9pm when I turn off the computer. I reflect and once again realize that while emphasizing "Sustainable Pace" all the time to my clients, I can't continue those long days forever. I should spend more time exercising.

Tomorrow, I'll do better.

Wednesday, May 5, 2021

Guess why nothing gets done?

There are great lessons to be learned from monitoring systems that directly translate to people. 

Take a look at this example graph:

When a CPU is running multiple tasks, it will optimize its performance to give - based on priority - adequate shares of time to each of its tasks. Some performance is required to operate (System Load) and some performance is required to buffer data in and out to operate the different parallel tasks (Context switching). 

When the "System" Load is high, we typically state that either the hardware is not meant for the operating system or that the Operating System is ill-configured, usually the latter. Every operation invested into the System, and every operation invested into Context is an operation not available to complete any task.

People and machines

People do exactly the same. But how would such a diagram relate to people in organizations? Let's translate.

Every person must divide their working hours into their different assignments, plus everything that accompanies it:

Tasks and Projects

whether something is a machine's task, or a person's task - it's a task. From a different level of abstraction, being assigned to a project is a major task. Working on 2 or 3 parallel projects is considered the norm in many organization, so there's often multiple high level tasks going on.


As people switch between projects, they have the same kind of context switching going on: close the thoughts on project 1, and pick up the thoughts for project 2. 

Additionally, when they pick up project 2, other people will have made progress, so they need to remember where they left off and then learn what happened in the meantime. Until they are ready to do work on project 2, this will have shifted. 

For example, if I work on project 1 on exclusively Mondays and on project 2 exclusively on Tuesdays, each time I pick up these projects, I need to catch up the events and changes of four days. Let's just say that I can do this in a single hour - it still means that my effective progression time has been reduced by 12.5%!


Just like a machine running Windows or Linux, our organization has an Operating Model. At this level, it doesn't even matter whether that's Scrum, Waterfall or anything else. The Operating Model requires employee capacity in some form or fashion: routine meetings, the creation of reports, documentation etc., all of these take time away from actually conducting the task at hand.
Let's just take Scrum. A well-run Scrum initiative will take roughly 15% of a full-time, dedicated team member's time for Planning, Review, Retrospectives, Dailies and artifact generation. Other operating models are significantly less effective, with the bad ones easily taking 30-40% of a person's time.
Let's stick to the most effective one: Scrum, at 15%.

While the operating system's load should be mostly detached from the amount of a machine's activities, a Scrum team member assigned to multiple projects can not function properly without attending the different teams' events. As such, 3 parallel projects would already burn 45% of a full-time employee's capacity. And remember: it doesn't get better if the organization's operations are less effective!

Adding it all up

Let's say I work on 3 parallel initiatives, and 45% of my capacity is usurped by the Operating Model.

Another 12.5% are taken by Context Switching.

That leaves roughly 1/3 of my capacity to do actual work on the project.

Given 3 initiatives of equal priority, only 10% of my capacity are dedicated to each single project!

Stopping the multitasking

In a parallel universe, Alternate Me has decided to only pick up a new initiative when an old initiative is completed. Alternate Me simply does not multi-task.

Alternate Me has 0% Context Switching.
Alternate Me spends 15% on the Operating System Scrum.

Alternate Me is free to spend 85% capacity to complete project 1.
When finished, Alternate Me then proceeds to spend 85% capacity to complete project 2, and so on...

Alternate Me who doesn't multitask is 850% faster to complete each project!

If project 1 takes Parallel Working Me 2 months, it will take Alternate Me 1 week.

If all three projects take Parallel Working Me 2 months, I have no results until 2 months have passed, then I have 3 projects to show.

Sequential Working me will have 1 project to show after 1 week - and 3 projects to show after 3 weeks!

Sequential Me can take an entire month of vacation, and still has capacity to complete a 4th project by time Parallel Working Me has completed 3 projects, working full-time.

Why do you run parallel projects?

Monday, April 12, 2021

Stop measuring Sprint Predictability

In strongly plan-driven organizations, we often see a fascination with Sprint Predictability. So - what is it, and why would I advise against it?

Let's first take a look at how we can measure Sprint Predictability:

We have four key points of interest in this measurement system: 

  1.  What did the team plan based on their known/presumed velocity?
  2.  What did the team actually deliver based on their committed plan?
  3.  What did the team miss, based on their committed plan?
  4.  What did the team overachieve?

Charted, it could look like this:

We can thus tell whether a team can estimate their velocity realistically, and whether they are setting sufficiently SMART (Specific, Measurable, Ambitious, Realistic, Terminated) goals for themselves.

In a multi-team setting, we could even compare these metrics across teams, to learn which teams have a good control over their velocity and which don't.

If by now, you're convinced that Sprint predictability is a good idea - no! It's not! It's a horrible idea!

Here's why:

The Crystal Ball

Every prediction is based on what we know today. The biggest challenge is predicting things we don't know today.

Here are a couple reasons why our forecast of predictability may be entirely wrong and why we may need to adapt. We may have ...

  • Vague objectives
  • Mis-estimated the work
  • Made some assumptions that turned out false
  • Encountered some unforseen challenges
  • Discovered something else that has higher value

Of course, management in a plan-driven organization can and will argue, "That's exactly the point of planning: to remove such uncertainties and provide clarity." With which we are back to square 1: Trying to create the perfect plan, which requires us to have a perfect crystal ball. 

Predictability implicitly assumes that adaptation (ability to respond to change) is a luxury rather than a necessity. When we operate in an environment where adaptation really isn't required, we should not use an agile approach to begin with.

Let's now take a tabular look at the five reasons for getting forecasts wrong:

Cause Challenge Alternative
Vague objective The communicated objective and the real goal may be miles apart.
It's better to pursue the actual goal than to meet the plan.
Take small steps and constantly check whether these are steps in the right direction, changing course as new information arises.
Mis-estimation Work was perceived simpler than originally thought, mandating tasks nobody expected, consuming extra time. Avoid aligning on the content of the work, instead align around the outcomes and break these into bite-sized portions that have little risk attached.
Wrong assumptions Some things about our Product turned out differently than we had anticipated. We can do more pre-work, which does nothing other than trade "delivery time" for "preparation time", we still make un-validated assumptions. Validating assumptions is always a regular part of the work. Set up experiments that determine the next step rather than trying to draw a straight line to the goal from the beginning. Accept "open points" as you set out.
Unforseen challenges An ambitious plan has risk, while an un-ambitious plan has risk buffers. Pondering all of the eventualities to "right-size" the risk buffer is a complete distraction from the actual Sprint Goal. Equally avoid planning overly optimistic (e.g., assuming absolutely smooth sailing) as well as overly pessimistic (e.g. assuming WW3 breaks out) and just accept that unusual events take us out of our comfort zone of being predictable. Learn over time which level of randomness is "normal."
Value changed Something happened that made new work more valuable than the one originally planned. While this shouldn't frequently happen within a Sprint, it could be part of discovery work. Ensure there is clarity within the team and organization that the primary goal is maximizing value and customer satisfaction, not meeting plans. 

As we can see from the above table, "Sprint Predictability" is a local optimization that gives people a cozy feeling of being fully in control, when in reality, they're distracted from creating value for the organization. 


As much as managers, and even some Scrum Masters, like to use metrics and numbers to see whether teams have high predictability on their Sprints, we need to re-focus our discussion towards:
  1. How well do we understand which goal we're trying to achieve? (Level of Transparency)
  2. Do we understand, and have the ability to generate, value? (Ability to Inspect)
  3. Since "The biggest risk is taking no risks" - let's agree on how much risk can our organization bear with (Ability to Adapt)
When we focus on these pillars of Scrum, we will go an entirely different direction from "becoming more predictable" - we need to improve our ability to respond swiftly and effectively as new information arises!

And once we have high responsiveness, we can argue formidably whether a "Sprint Predictability Index" has any value at all.

Sunday, April 4, 2021

A few things you have to understand about systems

The difference between a system and a compound is that while a compound is defined by the sum of its component, a system is defined by the product of the interactions of its components.

This very simple statement has profound consequences, regardless of whether we are talking about chemical, physical, social or entire economic systems.

Decomposition and Reassembly

Classic science has it that if you de-compose a complex problem into smaller units, the complexity can be handled in individual bites. While this works great when interactions are not as prevalent, it entirely fails when the behaviour of a system is predominantly defined by component interactions.

A de-composed system missing even one of its interactions will not display the same properties as the complete system.

Modifying a de-composed system may create an entirely different system when re-assembled.


Interaction generates friction. The mechanism of minimizing friction is synchronization.

As friction reduces the motion energy of the affected components, the amount of friction gradually reduces until the interacting components will have minimal friction.  As such, every interacting component of a system will enter into a synchronized state over time.

The momentum of a system in a synchronized state will be the cumulative momentum of all components. The same holds true for inertia.

Synchronization does not equate stability. Indeed, the process of synchronization could destabilize, and potentially destroy, the entire system.


On a higher level of abstraction, a subsystem behaves like a component, assuming its internal and external interactions are separate and distinct.

Interacting subsystems will generate friction until they are synchronized.

Subsystem synchronization could oscillate between different states and have different driving forces until an equilibrium is achieved.

Independent subsystems behave like components: they may be in sync within themselves, yet out of sync with each other.

Component Effectiveness

Since the components of a system are as effective as their interactions, the effectiveness of any individual component is both enabled and constrained by its interaction. 

Effectiveness is enabled by synchronized interactions.
Effectiveness is constrained by frictional interactions.

When a component's interactions are predominantly frictional, the component is rendered ineffective unless it's intended to be an abrasive component.

Why is any of that important?

Think about what the above means for piloting changes in parts of your system.
You may not achieve what you intend.

Thursday, April 1, 2021

Improving Code Reviews

A "code review" activity is part of many organizations' development process - and oftentimes, it sucks. It frustrates people, wastes their time and the value in improving quality is questionable at best. If you feel that's the case, here are some things you can try.

What's a Code Review?

"Code review" is a feedback process, fostering growth and learning. It should not be confused or conflated with a QA sign-off process. While finding problems in the code may be part of doing the review, that's not the key point.

So-called one-way "gate reviews" without feedback on defect-free code are a waste. A major portion of their value is missed. The best reviews won't merely help people learn where they messed up - they help people find new, better ways of doing things!

Now, let us explore five common antipatterns and what we could do about them.

Five Code Review Antipatterns and how to deal with them

Review Hierarchy

In many organizations, the Code Review process "puts people in their place": A more senior person reviews the code of more junior persons, and annotates everything that these did wrong. Yes - this sounds exactly like a teacher grading a student's term paper, and the psychological implications are very similar.

While this does indeed foster some form of learning, it creates an anhedonian mindset: the key objective of the coding developer is to avoid the pain of criticism and rework. There is little joy in a job well done. Deming's 12th point comes to mind.

Suggestion 1: Reverse the review process. Let the junior review the senior's code, and see what happens.

Suggestion 2: Do review round robins. Everyone gets to review everyone else's code.

Suggestion 3: Have an open conversation, "How do Code Reviews affect our view of each other's professionalism?"

Huge Chunk Reviews

I'll admit that I've been both on the giving and receiving end here: Committing huge chunks of code at once and sending thousands of lines of code for review in one big bulk, without any comments. And the review outcome was, "This is garbage, don't work like that!" Rightly so. Nobody in their right mind has time to ponder such a huge amount of changes in detail. The review feedback will take a long time and probably not consider all the important points - simply because there are too many.

Code Reviews shouldn't create a huge burden, and they should have a clear focus.

Suggestion 1: State the review objective: What would you like feedback on?

Suggestion 2: Send increments into Code Review that can be thoroghly reviewed in no more than 15 minutes.

Suggestion 3: Reduce feedback intervals. For example: no more than 2 days should pass between writing a line of code and getting it reviewed.

"LGTM" or whimsical feedback

Poor reviews start with the premise that "the only purpose of a review is to find problems." On the positive side of the spectrum, this leads to a lot of a standard "lgtm" (Looks Good To Me) annotations as code is simply waved forward. On the opposite side of the spectrum, some individuals feel an almost sadistic need to let others know that there are always problems, today stating "this is good", and tomorrow stating "this is bad."

Behind this antipattern is the "controller mindset" that someone in the organization believes that a review is intended to tell others, "you did this wrong, you did that wrong.

You can improve this by moving away from checking the code towards positive reinforcement, creating virtuous learning cycles

Suggestion 1: Change the guiding question from, "What is wrong with this code?" towards, "What could I learn from this code?

Suggestion 2: Create Working Agreements how you want to deal with extreme ends of the review spectrum.

Suggestion 3: Collect the learnings from Code Reviews and look at them in the Retrospective.

Ping-pong or Ghosting

Small story: One of my teams had just fixed a production problem that was causing a revenue loss of roughly €15k per day. Someone from a different team did the code review, demanded some semantic fixes, these were made - next round: lather, rinse, repeat. After 2 weeks, the reviewer went on vacation without notice. The fix got stuck in the pipeline for 5 days without response. This funny little event cost the company over €250k - roughly three years' worth of developer salary!

Things like that happen because the expectations and priorities in the team aren't aligned with business objectives and also because of a phenomenon I call "ticket talk."

Suggestion 1: Use TameFlow Kanban to make the Wait Time and Flowback caused by Code Reviews visible.

Suggestion 2: Create a Working Agreement to talk face-to-face as soon as there's flowback.

Suggestion 3: Replace Code Reviews with Pair Programming.

Preferences, emotions and opinions

Let's return to the "whimsical feedback" antipattern briefly. Many times, I see feedback over "use CamelCase instead of Snake Case", "use Tabs indentation instead of spaces" or whether a "brace should open behind the method name rather than in a new line". 

None of these make the product any better or worse. The debate over such matters can get quite heated, and potentially even escalate into a full-blown religious war. These are entirely up to personal preference, and as such, not worth a review annotation: They are red herrings. 

Suggestion 1: Formalize coding conventions and put them into your Lint / SCA config.

Suggestion 2: If you're really bothered, use a pre-commit hook to prevent checking in code that violates static code rules.

Suggestion 3: If you think a rule is missing or unproductive, bring it up in the Retrospective.

Alternative perspective

Code Reviews are just one way to improve coding within a team and/or organization. Mandatory code reviews - by default - create interrupts in the flow, reducing overall performance by a significant amount. Better options include:

  • Code Review upon request
    (e.g., "I want to talk with you about this code")
  • Code Dojos, where the entire team assembles to learn from one another.
    (SAFe's IP Iterations are great for dojos.)
  • Pair programming - making the discussion part of the process.
    (Reviews should be obsolete if you do Pairing right)

Still, if your organization mandates code reviews, try to make the best from them.

Summary (tl;dr)

Code Review is more about fast feedback learning than about "catching errors".
A positive "what can I learn" attitude makes reviews much more enjoyable and beneficial than a negative "what did they do wrong" attitude.

When reviews expose pressing problems, don't just annotate them. Engage the discussion about "how can we work differently?"

Saturday, March 27, 2021

Tests without code knowledge are waste

I start with an outrageous claim: "Software tests made without knowledge of the code are waste."

Now, let me back that claim up with an example - a calculator applet which you can actually use:


Have fun, play around with this little applet ... it works if your JS is enabled!

Now let us create a test strategy for this applet:

Black Box test strategy

Let's discuss how you would test this, by going into the list of features this applet offers:
  • Basic maths: Addition, Subtraction, Multiplication, Division
  • Parentheses and nested parentheses
  • Comparisons: Less, Equal, Greater ...
  • Advanced maths: Exponents, Square Root, Logarithm, Maximum, Minimum, Rounding (prefixed by "Math.")
  • Trigonometry: Sinus, Cosinus, Tangens, Arc-Functions (also prefixed)
  • Variables: defining, referencing, modifying
  • Nested functions: combining any set of aforementioned functionality
  • And a few more ...
Wow - I'm sure you can already see where our test case catalogue is going with minimal coverage, and we haven't even considered edge cases or negative tests yet!

How many things could go wrong in this thing? Twenty, thirty? Fifty? A thousand?
How many tests would provide an appropriate test coverage? Five, ten, a hundred?
How much effort is required for all this testing? 

Would it shock you if I said ...

All those tests are a waste of time!

Frankly speaking, I would test this entire thing with a single test case, because everything beyond that is just waste. 
I am confident that I can do that because I know the code:

  <div class="row">
	<div class="card"
                        margin-right:2rem; width: 18rem;
                        box-shadow: 4px 4px 4px 1px rgba(0, 0, 0, 0.2);">
	  <b class="card-header text-center bg-primary text-light">
	  <div class="card-body">
	  	<textarea class="form-control"
                          rows="1" cols="20"
                          style="text-align: right;"
		          oninput="output.innerHTML = eval(this.value)"
	  <div class="card-footer text-right">
		  <b id="output">0</b>
Yes, you see this right. The entire code is just look-and-feel. There is only a single executable statement: "eval(this.value)", so we don't have a truckload of branch, path, line, statement coverage and whatnotever that we need to cover:
All relevant failure opportunities are already covered by JavaScript's own tests for its own eval() function, so why would we want to test it again? 

The actual failure scenarios

Seriously, this code has only the following opportunities for failure:
  • Javascript not working (in this case, no test case will run anyways)
  • Accidentally renaming the "output" field (in this case, all tests will fail anyways)
  • User Input error (not processed)
  • Users not knowing how to use certain functions
    (which is a UX issue ... but how relevant?)
Without knowing the code, I need to test an entire catalogue of test cases.
Knowing and understanding the code, I would  reduce the entire test to a single Gherkin spec:

The only relevant test

Given: I see the calculator
When: I type "1+0".
Then: I see "1" as computation result.

So why wouldn't we want to test anything else?
Because: the way the code is written, if this test passes, then all other tests we could think of would also pass.

But why not write further tests?

A classical BDD/TDD approach would mandate us to incrementally define all application behaviours, create a failing test for each, then add passing code, then refactor the application. 
If we would do this poorly, we would really end up creating hundreds of tests and writing explicit code to pass each of them - and that's actually a problem with incremental design unaware of the big picture!

The point is that we wrote the minimum required code to meet the specified functionality right from the start: code that has only a single failure opportunity (not being executed) - and after having this code in place, there's no way we can write another failing test that meets the feature specifications!

And that's why a close discussion between testers and developers is essential in figuring out which tests are worthwhile and which aren't.

Wednesday, March 24, 2021

The only constant is change

A classic response towards change failures in traditional organizations, most notably, Software Release failures, is "Then make changes less often." This gives a good feeling that for a prolonged period of time, things will be predictable and stable. Unfortunately, it's a local optimization issue that actually makes things worse in the long term. Why do we think like that, and why is that thought a problem?

The gut reaction

Let's say that a change just failed. This is an uncomfortable event. Naturally, we don't like discomfort. The easiest way to deal with discomfort is to postpone the re-occurrence of the uncomfortable event into the future. In software development, we do this by postponing the next release as far as possible.

This provides an immediate short-term effect on our psyche: We have just achieved certainty that until the next release happens, there will be no further discomfort, and thus we have reduced immediate stress by postponing it.

Unfortunately, this has done nothing to reduce the probability of that next change failing.

Now, let's see what happens as a consequence of this decision:

The immediate after-effect

Because there is now a prolonged period of time until our next release happens, and the amount of development work is not decreasing, the batch size for the next release increases by exactly the amount of delay in the release. So, if, for example, we used to release once per month and reduce this to once per three months, the batch size just went up 200% with a single decision.

Of course, this means that the scope of the next change is also going to increase, it will become more complex to deliver the next release.

The knock-on effect

A bigger, more complex change is more difficult to conduct and has a higher potential for failure. In consequence, we're going to be failing stronger and harder. If we have 3 small changes, and 1 of them fails, that's a success rate of 33%. If we now combine all 3 changes into 1 big change, that means we end up with a 100% failure rate!

You see - reducing change frequency without reducing change scope automatically increases likelihood and impact of failure.

If now we decide to follow our gut instinct again, postponing the next failure event, we end up in a vicious circle where change becomes a rare, unwanted, highly painful event: We have set the foundation for a static organization that is no longer able to adapt and meet customer demands.

The outcome

The long-term consequence of reducing change frequency is that we can poorly correlate effort and outcome, it becomes indistinguishable what works and what doesn't - and thereby, the quality of our work, our product, our processes and our metrics deteriorate. We lose our reason to exist on the market: providing high quality and value to our customers on demand.

"If it hurts, do it more often."

Let's just follow the previous computation:
If 1 big change fails 100% of the time, maybe you can slice it into 3 smaller changes, of which 2 will succeed, reducing your failure rate by 66%?

So, instead of deciding to reduce change frequency, you decide to increase it?

The immediate after-effect

Because there is now a shorter period of time until the next release, there will be a reduced time between when something is developed and until you see the consequences. We close the feedback loop faster, we learn quicker what works and what doesn't. And since we tend to be wired to not do things that become painful, we do more of the things that work, and less of the things that don't.

The knock-on effect

Managing small changes is easier than managing complex change. Thereby, it becomes less risky, less work and less painful to make changes such small changes. Likewise, since we get faster (and more frequent) feedback on what worked, we can optimize faster for doing more things that provide actual value.

The outcome

By making rapid, small changes, we can quickly correlate whether we improved or worsened something, and we can respond much more flexibly towards changing circumstances. This allows us to deliver better quality and feel more confident about what we do.


The same vicious circle created by the attitude, "If we change less often, we will have fewer (but significantly more) painful events" can become a virtuous cycle if we change our attitude towards, "If change do it more often, it'll become easier and less painful each time."

Your call.

Monday, March 15, 2021

Why WSJF is Nonsense

There's a common backlog prioritization technique, suggested as standard practice in SAFe, but also used elsewhere, "WSJF", "Weighted Shortest Job First." - also called "HCDF", "Highest Cost of Delay First" by Don Reinertsen.

Now, let me explain this one in (slightly oversimplified) terms:

The idea behind WSJF

It's better to gain $5000 in 2 days than to gain $10000 for a year's work. 
You can still go for those 10 Grand once have 5 Grand in your pocket, but if you do the 10 Grand job first, you'll have to see how you can survive a year penniless.

Always do the thing first that delivers the highest value and blocks your development pipeline for the shortest time. This allows you to deliver value as fast and high as possible. 

How to do WSJF?

WSJF is a simple four-step process:

To find out what the optimal backlog position for a given item is, you estimate the impact of doing the item ("value") and divide that by the investment into said item ("size") and then put the items in relation towards each other.

It's often suggested for estimated to use the "Agile Fibonacci" scale, so "1, 2, 3, 5, 8, 13, 20, 40, 80..."
The idea is that every subsequent number is "a little more, but not quite twice as much" as the previous one, so a "13" is "a little more than 8, but not quite 20". 
Since there are no in-between numbers, when you think you're not sure whether an item is 8 or 13, you can choose either, because these two numbers are adjacant and their difference is considered miniscule.

Step 1: Calculate "Value" Score for your backlog items.

Value (in SAFe) is actually three variables: User and/or Business Value, Time Criticality, Enablement and/or risk reduction. But let's not turn it into a science. It's presumed value.

Regardless of how you calculate "Value", either as one score or a sum or difference of multiple scores, you end up with a number. It becomes the numerator in your equation.

Step 2: Calculate "Size" Score for your backlog items.

"Size" is typically measured in the rubber-unit called Story Points, and regardless of what a Story Point means in your organization or how it's produced, you'll get another number - the denominator in your equation.

Step 3: Calculate "WSJF" Score for your backlog items.

"WSJF" score, in SAFe, is computed by dividing Value by Size.

For example, a Value of 20 divided by a size of 5 would give you a WSJF score of 4.

Step 4: Sort the backlog by "WSJF" Score.

As you add items, you just put them into the position where the WSJF sort order suggests, with the highest value on top, and the bottom value on the bottom of the backlog.
For example, if you get a WSJF of 3 and your topmost backlog item has a WSJF score of 2.5, the new item would go on top - it's assumed to be the most valuable item to deliver!

And now ... let me dismantle the entire concept of WSJF.

Disclaimer: After reading the subsequent portion, you may feel like a dunce if you've been using WSJF in the real world.

WSJF vs. Maths

WSJF assumes estimates to be accurate. They aren't. They're guesswork, based on incomplete and biased information: Neither do we know how much money we will make in the future (if you do, why are you working in Development, and not on the stock market?) nor do we actually know how much work something takes until we did it. Our estimates are inaccurate.

Two terms with error

Let's keep the math simple, and just state that every estimate has an error term associated. We can ignore an estimator's bias, assuming that it will affect all items equally, although that, too, is often untrue. Anyway.

The actual numbers for an item can be written as:
Value = A(V) + E(V)  [Actual Value + Error on the Value]
Sizes = A(S) + E(S)  [Actual Size + Error on the Size]

Why is this important?
Because we divide two numbers, which both contain an error term. The error term propagates.

For the following section, it's important to know that we're on a Fibonacci scale, where two adjacent items are always at least 60% apart.

Slight estimation Error

If we over-estimate value, an item will have at least 60% higher value than estimated, even if the difference between fact and assumption is miniscule. Likewise, if we under-estimate value, an item will have at least 30% lower value than estimated.

To take a specific example:
When an item is estimated at 8 (based from whatever benchmark), but turns out to actually be 5, we overestimated it by 60%. Likewise, if it turns out to actually be 13, we underestimated it by 38.5%.
If we're not 100% precise on our estimates, we could be off by a factor of 2.5!

The same holds true for Size. I don't want to repeat the calculation.

Larger estimation error

Remember - we're on a Fibonacci scale, and we only permitted a deviation by a single notch. If now, we permit our estimates to be off by two notches, we get significantly worse numbers: All of a sudden, we could be off by a factor of more than 6!

Now, the real problem happens when we divide those two.

Square error terms

Imagine that we divide a number 6 times larger than it should be, by a number 6 times smaller than it should be, we get a square error term.

Let's talk in a specific example again:
Item A was estimated as 5 value, but it was actually a 2 value. It was estimated as 5 size, but it was actually a 13 size. As such, it had an error of 3 in value, and an error of 13 in size.
Estimated WSJF = (2 + 3) / (13 - 8) = 1
However, the Actual WSJF = 2 / 13 = 0.15

Now, I hear you arguing, "The actual numbers don't matter... it's their relationship towards one another!"

Errors aren't equal

There's a problem with estimation errors: we don't know where we make errors, otherwise we wouldn't make them, and we also make different errors, otherwise, they wouldn't affect the scale at all. Errors are errors, and they are random.

So, let me draw a small table of estimates produced for your backlog:

Item Est. WSJF Est. Value Est. Size Act. Value Act. Size Act. WSJF
A 1.6 8 5 5 5 1
B 1 8 8 3 20 0.15
C 0.6 3 5 8 2 4
D 0.4 5 13 13 2 6.5

Feel free to sort by "Act. WSJF" to see how you should have ordered your backlog, had you had a better crystal ball.

And that's the problem with WSJF

We turn haphazard guesswork into a science, and think we're making sound business decisions because we "have done the numbers", when in reality, we are the victim of an error that is explicitly built into our process. We make entirely pointless prioritization decisions, thinking them to be economically sound.

WSJF is merely a process to start a conversation about what we think should be priority, when our main problem is indecision.
It is a terrible process for making reliable business decisions, because it doesn't rely on facts. It relies on error-prone assumptions, and it exacerbates any error we make in the process.

Don't rely on WSJF to make sound decisions for you. 
It's a red herring.

The discussion about where and what the value is provides much more benefit than anything you can read from a WSJF table. Do the discussion. Forget the numbers.