Tuesday, March 15, 2022

Bugfixing team - good or bad idea?

In many organizations struggling with quality, a gut reaction is setting up a "bug fixing team," which has only one purpose: Cleaning up the myriads of known bugs in the software so that other teams can focus on new features.

There's a concept that "developers can go faster if they don't have to worry about the bugs in their code." This concept is a delusion, puts unrealistic expectations on the bugfix team, and will ultimately lead to disappointment.

Why bugfix teams are bad

Bugfix teams are a counterproductive workaround stopping the organization from solving their actual problems:

Developers don't own their work. They can fire-and-forget, because it's no longer their problem after delivery.
Developers don't fully understand how their code actually works, hence there's nobody who has deep knowledge of the inner workings of the product.
Developers don't learn how to write better code, because they don't have to work through bad code.
Technical debt continues to grow, because the Bugfix team will only work on known problems while others introduce more problems.
Building a new feature on a buggy baseline doesn't add to low quality - it multiplies.
The perceived progress of the delivery teams is an illusion, because it doesn't mean that the delivery doesn't contain a time bomb.
The Continuous Improvement process is subjected to quantity of output, which is the very reason why the problem occurred to begin with.

I have been part of Bugfix teams. It's a frustrating, ungrateful job and an uphill battle unless all other developers also take responsibility for the quality of their own work.

How bugfix teams can succeed

There is a way to set up a successful Bugfix team, however:

The "Bugfix" team is a team like any other, but they can fully commit to - and focus on - improving quality and have no commitment on new features. And they get a spotlight in stakeholder reviews to demonstrate their improvements.
All other teams practice a rigorous Stop-the-Line process and do everything in their power to not let low quality pass downstream. The policy is, "Bugfix team improves fundamentals, others stop deterioration and improve within the context of delivery."
The Bugfix team responsibility rotates.

Technically, that will make the Bugfix team a "legacy rewrite" team rather than a "locate, isolate and fix the bug" team. Their mission shaping the future of the product, not merely dealing with its past.

Conclusion

My stance on the topic is, "no Bugfix team unless you can clearly explain how you're actually going to solve the problem that leads you to needing it."

You can't solve the problem of low quality simply by throwing money at it.

Friday, March 11, 2022

Test Automation Specialists - what a waste!

"Test Automation Engineers require someone who will write the test cases for them, and they can start automating right after the software is deployed onto the test environment."

It's 2022, and I just had this conversation yesterday. Seriously?

Overspecialization leads to Waste

If we work like that, we needlessly create bottlenecks, we test too late, our tests will be of low quality, and we will spend a lot of time on identifying, tracking and reworking defective deliveries. We produce a lot of work, but not quality.

In a well-functioning software organization, there can be no such role as a "test automation engineer" who is fed work by "test designers."

Although it sounds "resource efficient," that approach generates massive amounts of waste in all categories of Lean Muda:

Defects

When automated tests aren't written by developers, and only available once a software package is released, we end up with three kinds of defects:

Product defects, i.e. things the product should (or shouldn't) do, but that weren't correctly implemented, which require another software update.
Automation defects, i.e. things the test automation should (or shouldn't) do, but that weren't correctly implemented. Until clarified, these lead to significant communication overhead.
False negatives, i.e. inadequate automation not catching existing defects. This is significantly more likely when automation isn't written close to the code, and might omit dangerous cliffs in the product implementation.

Handover

Whenever actors in a process change, there's a handover. At the point of handover, we induce delay, lose information and risk the introduction of defects. This makes our process slow and error-prone. When test itself becomes a source of error, we should stop and think.

Motion

Handovers also mean that we're moving artifacts: test descriptions, build versions and defects - without need. We also need to coordinate this motion, which means we induce additional meetings into our process, which also slows us down and adds non-value added overhead. Motion also leads to inventory waste.

Inventory

As previously mentioned, un-automated tests will pile up as inventory, as will un-tested deliveries. When tests fail, the defect notifications are made available at a point in time when developers are working on other things. Hence, we are adding three queues to our process. Each queue contains work waiting to be done, and the combined length of all these queues is value delay for our customers.

Waiting

As mentioned in my article on CI/CD industry benchmarks, the time required between when a developer makes a change, and when a developer receives feedback on whether everything is correct shouldn't exceed a few minutes, preferably just a few seconds. If automation begins upon delivery, this period of time is impossible to warrant. The delivered product has to wait a significant amount of time until it can produce customer value, and developers will turn their attention elsewhere.

Overprocessing

Having different people automate tests and write productive code leads to "overprocessing," often called "gold-plating:" As mentioned in my article about the test pyramid, we should try to automate as close to the code as possible. When we're doing high-level tests which could instead be low-level tests, we make our test suite more complex, slower and less reliable than possible.Unit tests are extremely quick both to write and execute. Test automation engineers who automate black box/gray box tests will spend a significantly higher amount of work to write tests with poorer outcomes.

Overproduction

Since the amount of labor to create a product increment in server code and test automation are extremely hard to balance, the strategy of automating after the fact usually leads to developed product features piling up, waiting to be automated. As a consequence, the decision is often to release these without proper test automation coverage to flush the waiting queue. This builds up "technical debt" which deteriorates product sustainability.

Human potential

When developing software with multiple people, we act as a team. We can learn from each other, and we can align: Someone who can automate tests, can also write features. Someone who can write features, can also automate. As a team, we can align during planning "who does what," and fill our information gaps so that those doing the work have full information and don't need anyone else to fill in for them. Techniques like pair programming or code reviews ensure that even if one person makes a mistake, someone else will catch it before a defect even enters the system. It's a limiting belief to think that somehow people can't learn - we all can, if the conditions are favorable. Maybe not everyone will be the greatest developer, but we don't need to: we work within our ability, and grow our capacity a little every day.

To confine a test automation engineer to automating defined test specifications, and to confine coders to write code without understanding how to do that without defects - is an insult to our human intellect, and a very expensive one, at that: All the above seven wastes cost a lot of money that we could save if we would just improve our collaboration, communication and skillset.

Lean Software Test Automation

Please, let me clarify. I don't believe that test automation expertise is a waste - the waste is actually hidden in the process, and how we use our talent.

A much leaner approach which avoids all of the above eight wastes is to educate and empower - use QA to help developers learn how to define the right tests, how to apply processes like Test-Driven Development and Behaviour-Driven Development, and guide them on their journey to deliver Zero Defect Quality. And in reverse, having developers upskill our test automation experts to help them deliver integrated pieces of value including working software.

The purpose of test automation in such an organization isn't to find defects - it's to maintain the high frequency of delivery of high quality products, and to serve as a living documentation for all behaviours of the product which is maintained as part of the development process without overhead.

Decision-making in an agile organization

A challenging question in Agile organizations is: "Which decisions should be made where?" - we quickly end up in a heated debate of "team autonomy versus command and control." A red herring - and a false dichotomy that misses the big picture. There is a better way.

Making decisions

Let us first discuss briefly about the three ways that decisions can be made in companies:

Top-Down

Traditional organizations may be most familiar with this type of decision. When something needs to be decided, a manager needs to be informed about what is to be decided, preferably also with some information on the consequences of various decision types. At a convenient moment in time, the manager will decide, and the affected parties will act based on the decision.

Depending on how information is presented to the manager, the manager may decide based on an unknown bias. Managers are often unable to fully understand the consequences of their choices. In turn, teams have to deal with consequences that may not even make sense to them, but they lack the power to change things. The system encourages currying favor over full transparency, and a "making things look good" is worth more than "doing what is right."

A lot of time is lost in presenting information to management in order to prepare the decision, even when the decision is seemingly obivous. Revising the decision requires significant evidence as well, even after the negative repercissions are already visible. Hence, top-down decision on the work are rarely efficient, effective or even helpful. And that's why they are avoided in leaner organizations wherever possible.

Autonomy

Agile organizations, especially those centered around Scrum, emphasize team autonomy - and thereby, autonomous decision making. Many developers don't like managers interfering with their work, and prefer this way of making decisions. Some get very religious on the "autonomy" part, and will insist on teams getting their way.

Autonomous decisions are both feasible and efficient when the team operates in isolation: As long as it works for the team, we're good. When the team isn't happy with a choice, they can quickly revise it and improve.

Team autonomy sounds great, but isn't a feasible approach when multiple teams interact.

Let's say that team A would like to define a customer as firstName and lastName referenced by customerID, and team B would like to define the customer as fullName and title referenced by customerRefID: Who is right?

If A and B decide autonomously, the system won't integrate, so everyone is "wrong."

Should management decide?

That brings us to the third option, which requires willingness to compromise:

Federation

Let us redefine team boundaries to objective boundaries. People contributing to the same goal, working in the same context, or on the same products, need to collaborate, lest they get stuck dealing with friction rather than making progress.

In larger organizations, we have multiple teams. They might be planning, working and delivering independently - yet, at some level of abstraction, they share the same goal.

As an example, if we're working on an Online Shop, we benefit little if our checkout process is optimized at the expense of user registration: our total revenue might go down if we force people to provide payment data upon registration already. Neither the "Checkout," the "Payment," nor the "Registration" teams can autonomously decide which approach is the best - they need to collaborate!

Thus, a "federation" is born: People whose work overlaps for one reason or another can federate. A federation compromizes on local autonomy to optimize globally. We have some centralization, and some decentralization, and we apply a higher-ordered decision framework to determine both how, and what, we need to centralize.

A federation offers us fast and efficient team-level decisions, as well as sustainable and effective overarching decisions.

Guardrails of federated decision-making

A number of common pitfalls can render a federation ineffective:

Not involving people who are affected by outcomes.
Involving people in the discussion who aren't involved in the outcome.
Forming decision queues.

This happens when we violate the general rule of thumb that any federation should decide as little as possible, and as much as necessary.

The following decision-making flow can assist us in determining which decisions we would like to keep autonomous, and which are better to federate.

In sequence, we can ask three questions to determine whether we require a central federated decision session:

Will the decision affect others?

We only need to involve those who are affected.
If we affect nobody outside the team, we don't need others to agree.

Will the decision have long-term impact?

Easily reversible choices can be subjected to experiment, and inspecting outcomes.
Decisions that have a high cost of change require deeper discussions.

Is the decision highly time-critical?

If there is a high cost of delay, "it's easier to ask for forgiveness than get permission."
If cost of delay is lower than cost of change, the decision should be made centrally.

As we know from governments, federation has a cost attached and easily leads to bureaucracy. Hence, with every centralized decision, we should inspect and adapt on, "What are the costs and benefits of centralizing this?" - and, "Are there things we could change so that we could do this autonomously?"

Fail Fast, Move On

Pages