Fail Fast, Move On: April 2020

Saturday, April 25, 2020

The defect funnel - systematically working towards high quality

Take a look at this diagram: Which of these images best describes your quality strategy?

The four stages - from left to right - are:

Automated testing - issues detected by automated test execution.
Manual testing - issues detected by manual testing efforts.
System Monitoring - issues detected by monitoring capability.
User Reports - issues encountered by users on the running system.

The bars indicate:

Red: Too many issues to deal with
Yellow: A bearable, greater amount that needs to be prioritized rather than dealt with.
Big green: A greater amount of issues that gets completely handled.
Small green: A negligible amount of issues that are being dealt with as soon as they pop up.
No bar: "we don't do this or it doesn't do much."

The defect funnel

Although the images don't really resemble much of a funnel, this "defect funnel" is similar to a sales funnel. In the ideal world, you'd find the highest amount and the most critical defects early, and as a delivery progresses through the process, both amount and criticality decrease. Let's take a look at the ideal world (which never happens in reality) -

Automated testing should cover all the issues that we know can happen - and when we have a good understanding and high control of our system, that should be the bulk of all issues. If we rigorously apply Test Driven Design, we should always have automated tests run red when we create new features, so having red tests is the ideal scenario.

Manual testing - in theory - should not find "known" problems. Instead, it should focus on gray and unexplored areas: manual testing should only find problems where we don't know ahead of time what happens. That's normal in complex systems. Still, this should be significantly less than what we already know.

Monitoring is typically built to maintain technical stability - and in more progressive organizations also to generate business insights. If we find unexpected things in monitoring, it basically means that we don't know how our product works. And the amount of known problems we have should be low, because everything else is just a sign of shoddy craftsmanship.

User reports are quirks we learn from our users. Since we're the designers, creators and maintainers of our product, no user should know more about it than we do. Still, it can occasionally happen that either we choose to expose our user to a trial, or that a scenario is too far out of the norm to predict before it happened. The better our control of our system is, the lower the amount of stuff we don't see before our users.

In the real world, the funnel usually doesn't even remotely resemble a funnel at all. This should be a clear-cut sign that your process may neither be working as intended nor as designed.

No systematic quality approach

If you don't have a coherent approach to quality at all, this is most likely how things look like: If you encounter a problem, it's either by chance during testing, or because users complain.
You can't really discriminate whether the issue was caused by the latest deployment, or has been around for a while and simply never shown up before.
If there's any test automation, it's most likely just regression tests, focusing on the critical scenarios that existed for a long time. Since these tend to be stable, test automation hardly finds any issues.
System monitoring will only detect the most glaring issues - like "server down" or "tablespace full".

In such a situation, developers are fighting a losing battle: Not only do they not really know what caused the problem, or how many problems there actually are - every deployment invites problems. You never know how much effort anything takes, because of the constant interrupts to solve production issues. Reliability is low, quality is low, predictability is low - the only things that tend to be high are effort and frustration.

Hence, most larger organizations adopt systematic quality gated processes:

Waterfall: Quality Gates

Adding systematic quality control processes, with formal test cases, systematic test execution and rigorous bug tracking allows IT to discover most of the critical issues before a deployment hits the user. If this is your only quality measure, though, you're not reducing defect rates at all.
Delays in deliveries to the test environment cut down test time, so test cases get prioritized to meet time constraints.

New components are tested manually ("no time for automation") and everyone sighs with relief when the package leaves development - there's neither time, money nor mental capacity to mind Operations.
The time available to fix found issues is never enough, so defects merely get prioritized - the most critical ones fixed, and the rest are simply released along with the new features: the long-term quality of the system degrades.

In such an environment, testers continually stumble upon legacy problems and simply learn to no longer report known issues. Quality is a mess, and every new user stumbles upon the same things.
The fortunate thing for developers is that they're no longer the only ones who get blamed and interrupted - they have the QA team to shift blame to as well.

Introduction of Agile Testing

The most notable thing about agile testing is that developers and testers are now in the same boat. By having a Definition of Done that declares no feature "Done" before tests were executed, developers no longer benefit from pushing efforts onto the next desk, and test automation - especially of new components - becomes mandatory to keep cycle times low.

What's scary is that the increased focus on quality and the introduction of agile testing techniques seem to reduce quality - the amount of issues suddenly discovered becomes immense! The truth is that the discovered issues were always there and are inherent both to the product and the process. They were just invisible.

Many teams stop at this point, because they don't get enough time to fix all know problems and stakeholders lose patience with the seeming drop in performance. Everyone knows testing is the bottleneck, and instead of pushing forward and resolving the issue once for all, they become content with "just enough" testing.
Hence, they never reach the wonderful point where the amount of issues discovered by users start to decline to a bearable amount. But that's where the true victory of using higher degrees of test automation, user centric testing and closer collaboration with development manifest.

Shift-Left Testing

It's not enough to "do Agile Testing", we have to change the quality approach. By having every team member - and users - agree on quality and acceptance criteria prior to deployment, by moving to test driven design, by formulating quality in terms of true/false verifiable scenarios prior to implementation - and finally, by automating these scenarios prior to development, we break the problem of finding issues after the fact, that is, when the code is already wrong.

When we first move to Shift-Left Test, we will typically encounter a lot of situations where we discover that the system never did what it was supposed to do, and the newly designed test scenarios fail due to legacy issues. At this point, effort may have another explosion, because a lot of discussions will be required to make the system consistent. The reduction in speed and the increase in problems is a sign that you're moving in the right direction.

In the context of shift-left testing, teams often add extra capabilities to the system which mainly serve for testing purposes, but which are also great hookpoints to enlarge system monitoring to catch certain business scenarios, such as processing or procedural failures.
All of the problems thus caught earlier will not hit the user any more, and this becomes the first point where users start to notice what's going on - and begin to increase confidence in the team's efforts.

Moving to DevOps

Once you've got the quality of the creation of new features under control, it's time to enhance your sphere of control and ensure users also have a good experience of your system. You can't do that without Ops on board, and you need to start solving the issues Ops encounter with a higher priority.

Investing into monitoring for new components becomes an integral part of your quality strategy, for two reasons: First, you will need ways to test your value hypotheses against real world data, and second, since you're designing for quality, you need to ensure this design doesn't break.

You'll still be hitting legacy issues left and right - because you still never had the time to clean them up. But you start to become more aware of them as they arise, and by systematically adding monitoring hookpoints to know issues, you learn to quantify them, so that you can systematically work them off.

The "Accelerate" Stage

In their book, "Accelerate", Gene Kim, Nicole Forgsen and Jez Humble, describe four key metrics of high performing organizations:

Lead time
Deployment frequency
Mean time to recover
Change Fail Percentage

Being world-class on these metrics is only possible with stringent quality control in every aspect of your process, and it's only possible if your system has high quality to begin with.

What may come as a surprise: we're not even aiming to eliminate all known issues in design: That would be too expensive, and too slow. Instead, we're making informed optimization decisions: Does it cost more to automate a test, or to establish a monitoring ruleset that will ensure we're not running into problems? Do we try to get it right the first time, or are we willing to let our users determine whether our choice was good?

An Accelerated organization, oddly enough, will typically feature a lower degree of test automation and less manual testing than a Shift-lefted organization, because they do not gain value from these activities as much any more. For example, shooting a record of data through the system landscape and validating the critical monitoring hookpoints tends to be significantly lower effort than to design, automate, execute and maintain a complex test scenario. Plus, it speeds up the process.

Friday, April 24, 2020

CONWIP Kanban - implementing Covid regulations with ease

The Covid social distancing regulation forces stores to adapt new strategies of ensuring distance and hygiene are maintained while people go shopping.

Today, I discovered an application of Conwip boards in daily life - and people may not even recognize that they're doing it: because: there's no board.

Let's look at a supermarket, and visualize it as a board:

Stores have instituted a fairly robust process that ensures - given a normal, self-balancing distribution, social distance can be maintained, without much supervision.
They have merely reduced the amount of Shopping carts to become the Constraint on store capacity, and have set up a few extremely simple rules:

No shopping without shopping car
Don't get too close to other people in the shop
Keep within the distance markers at the cashier

There are a few implicit rules that go without saying:

If there's no shopping car, you have to wait until one becomes available or you leave.
Bring back your shopping car after packing up.

The system self-balances and exercises full WIP control:

If there are too many people in the store, there will be no carts left, hence no more people coming in.
If a queue is forming anywhere, no carts will be released, hence no more people coming in.
Once a queue is dissolved, carts will be released, allowing new people to enter the store.

I could immediately spot what's going on here: the store has adopted a type of CONWIP Kanban:

the shoppers are our Kanbans (WIP),
the carts our Replenishment tokens,
the amount of Replenishment tokens is our CONWIP limit
the Constraint is defined by the store's size, and modeled by demand controlling through the CONWIP limit
the Replenishment buffer is the cart pickup.
The space between carts at the cashiers functions like a "Constraint buffer."
That even ensures we're warned ahead when cashier is operating at or near capacity limit, and we can open another cashier.

You gain high control over the system and free real-time risk management on top - and you need neither a significant amount of time nor money to implement these type of changes!

Tuesday, April 21, 2020

The dictatorship of Relativism

Unfortunately, the relativism which permeates modern society is also invading the "Agile" sphere - and thus, organizations. This is especially detrimental because these organizations build software systems which have a significant impact on potentially millions of other people.

It's all about Perception

Perception is extremely important in how we interact with the world around us. It's the combination of our sensory organs, our experiences and our neural processing which detemines how we perceive a situation. Therefore, people with different backgrounds will have widely different perceptions on the same subject.
Yet, to build up anything sustainable, we need to be as accurate and precise in our perception as possible.

There are still facts

Without trying to harp too much on the Covid pandemy - a virus doesn't care what we would like the situation to be, or whether we believe that it's a significant threat. There's nothing we can discuss or negotiate with the virus and we can't tell it anything, either.
We can't bargain with it, we have to face reality and work our way from there.
The same goes for business figures. And IT. You can't argue with the bank account that it would be nice if it were just a bit more positive. You can't tell a crashing stock market that developers feel bad about it. Your server doesn't care which of its 0's you would prefer to be 1's. What's there - is there. You have to submit and deal with it.

How sustainable is the willful ignorance and denial of the facts that reality confronts us with?

Thoughts and opinions

Is it okay to have a clear opinion on a matter? Yes.
I would even go so far as to state that many people who claim to "have no opinion" are either deceiving themselves or (trying to) deceive others. In some cases, I would go as far as attributing malice. This becomes most obvious in cases where people who profess to have no opinion become militant against someone who voices theirs. If you're really un-opinionated either way, why is that specific opinion so much of a problem?
The scientific approach would be to examine a claim based on the evidence, and if it holds, to support it - and if it doesn't hold, to dismiss it. There is no, and I repeat, absolutely zero reason to attack the person who proclaims an opinion simply for having it. And still, that is what we see. Logic dictates that we must discredit the idea, not the speaker!

Is an open, transparent workplace consistent with censorship and thought crime?

Predictive capability

A general rule of science is that "models with predictive capabilities are better than those without." The quest to increase the predictive capabilities of our models has brought us running water, heat and electricity for our homes, it has given us cars, computers, the Internet - and sufficient food onto our plates.
While arguably, no reputable scientist would say that any scientific model is perfect or beyond scrutiny, we first need to find a case where scientifically validated models and methods do not yield the predicted outcome before we should discard them - especially where they have proven time and again to produce significant benefits.

What are these "better ways of working", compared to the effectiveness of verifiable methods which have been proven to achieve significant improvements?

Patterns

Evolution has ingrained us deeply to recognize patterns. Our brains are wired to seek patterns everywhere, and match to the most probable ones. When we look at the sky, we see flowers, sheep - and many other things. That is our mind playing tricks on us. But it doesn't discredit patterns as a whole.

For example: Five people ran in front of a train. They all died. See a pattern there? Do you really need to run in front of a train to figure out what will happen?

Should we dismiss the idea of patterns, and in the same breath apply patterns that have nothing more than anecdotal evidence as support?

Leadership

Especially in times of change, we need orientation. And in almost every case, even a sub-optimal fixture is more beneficial than a complete loss of support. Few leaders think of themselves as beyond scrutiny, and oddly, it's those who do so tend to attract the largest following in times of turmoil.
Is it better to lead or to not lead? When people need direction and are unable to find theirs, it's usually the most ethical choice to set a direction first, and then offer the opportunity for change.

Would we prefer everyone struggle by themselves, denying them the benefits of rallying around an idea they can all agree to?

End the Relativism

Not everything is relative.

There are facts. We can misinterpret, misunderstand or misrepresent them - but they are still there. Instead of soft-cooking the facts, we need to get better at interpreting, understanding and representing them.

Everyone has an opinion. Neither are facts equal to opinions, nor are people who have a clear opinion automatically wrong. We have logic and science to tell us which is which. By celebrating the freedom to have even a wrong opinion, we learn to be respectful towards one another. Reasoning teaches us to sort out the wheat from the chaff.

We need predictability. We can't predict everything, but we can predict more than nothing. The more predictability we have, the more likely we will still be alive tomorrow. Instead of mushing up everything with the terms "complex" and "unknown", we need to simplify as far as possible (but no further) and learn as much as we can.

We rely on patterns. We're really biased in the patterns we observe and how we interpret them. At the same time, there are repeatable and consistent (scientific) patterns and (esoteric) phantasms. The distinction between the two is what brought us to where we are today, for a good reason.

Society is based on leadership. Strong leaders can be a great boon to others. Beneficial leadership can propel hundreds of thousands of people to a better future. If we want to truly help people, we help those who have the potential to lead to do it for the better.

Stop the trash coaching

If you are an "Agile Coach" who:

institutes a culture of relative interpretations until it becomes impossible to discern what's right or wrong - you're destroying people's ability to make the critical, timely decisions.
hushes up people who boldly go forward with their opinion - you're instituting a totalitarian system where creativity and courage are impossible.
constantly harps on everything being unknown - you're removing the very basis of what makes a company successful: understanding.
rejects well-established patterns and methods because allegedly, those things don't exist "in the Complex" - you're not reducing complexity, you're pulling in chaos!
denies the value of proper leadership - you're opening the door towards anarchy and decay, not towards teamwork or growth!

None of these things are helping your client grow. These destroy people's ability to do the right thing.

Do the right thing

Forget the labels. It doesn't matter whether we're called coach, consultant, advisor or whatever.
The client has a problem to solve, and they need help. Guidance. Support. Whatever. You're there to make a positive difference.

When the client needs to:

Figure out what's going on - establish what we know and what we don't know. Don't pull that which is known into chaos.
Get the facts straight, help them get the facts straight. Institute metrics, create transparency. Collect data. Gather evidence. Minimize bias instead of dwelling on it.
Have reliable methods or techniques - start with the ones that have proven to be most reliable, then inspect and adapt from there. We don't need to re-invent the Wheel, and we most certainly don't need placebos or magical thinking.
Get out of a mess quickly - lead and teach others how to do that. Don't let people stranded or disoriented when every minute counts. There's time for talk, and time for action.
Move forward - show the way. It doesn't matter whether you "help them find theirs" or you just bluntly tell them what your opinion is. Break the "analysis-paralysis". Companies have business to do. It's better to revise a wrong decision than to remain indecisive or lost.

By doing these, you will be a tremendous help to the people around you, and a good investment for your clients.

Conclusion

While it's good to check the adequacy of our mental models: People to whom everything relative, or who promote the idea that everything needs to be discussed and strong decision-making is off-limits do not belong into business. Especially not into IT.

When you identify problematic stances and behaviours in your "Agile Coaches", get rid of them. Quickly. They will do more harm than good.

And if now you conclude that I don't have a "proper Agile Coaching mindset", that's totally up to you. I don't see "Agile" as an esoterical space where everything goes and nothing is true - I see it as a company's ability to do the best possible thing, swiftly and reliably. And that requires knowing what "the best possible thing" is. Where that conflicts with the label "Proper Agile" - so be it.