Sunday, January 24, 2021

The importance of testablility

In the article on Black Box Testing, we took a look at the testing nightmare caused by a product that was not designed for testing. This led to a followup point: "Testability", which is also a quality attribute of the ISO:25010. Let us examine a little closer what the value of testability for product development actually is.






What is even a "test"?

In science, every hypothesis needs to be falsifiable, i.e. it must be logically and practically possible to find counter-examples. 

Why are counter-examples so important? Let's use a real-world scenario, and start with a joke.

A sociologist, a physisicst and a mathematician ride on a train across Europe. As they cross the border into Switzerland, they see a black sheep. The sociologist exclaims, "Interesting. Swiss sheep are black!". The physicist corrects, "Almost. In Switzerland, there are black sheep." The mathematician shakes her head, "We only know that in Switzerland, there exists one sheep that is black on at least one side."

  • The first statement is very easy to disprove: they just need to encounter a sheep that isn't black.
  • The second statement is very hard to disprove: because even if the mathematician were right and another angle would reveal the sheep to not be fully black, the statement weren't automatically untrue, because there could be other sheep somewhere in Switzerland that are black.
  • Finally, the third statement holds true, because the reverse claim ("There is no sheep that is black on one side in Switzerland") has already been disproven by evidence.

This leads to a few follow up questions we need to ask about test design.


Test Precision

Imagine that our test setup to verify the above statements looked like this:

  1. Go to the travel agency.
  2. Book a flight to Zurich.
  3. Fly to Zurich.
  4. Take a taxi to the countryside.
  5. Get out at a meadow.
  6. Walk to the nearest sheep.
  7. Inspect the sheep's fur.
  8. If the sheep's fur color is black, then Swiss sheep are black.

Aside from the fact that after running this test, you might be stranded in the Winter Alps at midnight wearing nothing but your pyjamas and you're a couple hundred Euros poorer, this test could go wrong in so many ways.

For example, your travel agency might be closed. Or, you could have insufficient funds to book a flight. You could have forgotten your passport and aren't allowed to exit the airport, you could go to a meadow that has cows and not sheep, and the sheep's fur inspection might yield fleas. Which of these has anything to do with whether Swiss sheep are black?

We see this very often in "black box tests" as well:

  • it's very unclear what we're actually trying to test
  • just because the test failed, that doesn't mean that the thing we wanted to know is untrue
  • there's a huge overhead cost associated with validating our hypothesis
  • we don't return to a "clean slate" after the test
  • Success doesn't provide sufficient evidence to verify the test hypothesis.

Occam's Razor

In the 13th century, a monk by the name of Occam came up with what's known today as Occam's Razor, i.e., "entities should not be multiplied without necessity". Taking a look at our above test, that would mean that every test step that has nothing to do with sheep or fur color should be removed from the design.

The probability to run this test successfully increases by isolating the test object (sheep's fur) as far as possible, and eliminating all variability from the test that isn't directly associated to the hypothesis itself.


Verifyability

We verify a hypothesis by finding a counter-example to what's called the "alternate hypothesis" and assuming that if this one is untrue, then its logical opposite, called the "null hypothesis" is true. Unfortunately, this means we have to think in reverse.

In our example: To prove that all sheep are black, we have to sample all sheep. That's difficult. It's much easier to sample one non-black sheep, and thereby falsify that all sheep are black. If we fail to produce even one, then all sheep must be black.


Repeatability and Reproducibility

A proper test is a systematic, preferrably repeatable and reproducible, way of verifying our hypothesis. That means, it should be highly predictable in its outcome, and we should be able to test as often as we want.

Again, going back to our example, if we design our test like this:

  1. Take a look at a Swiss sheep.
  2. If it is white, then sheep are not black.
  3. If sheep are not - not black, then sheep are black.

This is terrible test design, because of some obvious flaws in each step: 

  1. The setup is an uncontrolled random sample. Since sheep are white or black, running this test on an unknown setup, means we haven't ruled out anything if we picked a black sheep.
  2. The alternate hypothesis is incomplete: Should the sheep be brown, then it is also not white.
  3. Assuming that step 2 didn't trigger, we would conclude that brown = black.

Since the "take a look at a Swiss sheep" is already part of the test design, each time we repeat this test, we get a different outcome, and we can't reproduce anything either, because if I run this test, my outcome will be different from yours.


Reproducibility

A repeatability problem occurs when the same setup can generate different results. In our example, "take a look at" could be fixed, assuming we take the mathematician's advice, by re-phrasing step 1 to: "Look at a Swiss sheep from all angles." This would lead everyone to reach the same conclusion.

We might also have to define what we call "white" and "black", or whether we would classify "brown" as a kind of white.

We increase reproducibility by being precise on how we examine the object under test, and which statements we want to make about our object under test.


Repeatability

Depending on what the purpose of our test is, we are doing well by removing the variation in our object under test. So, if our test objective is to prove or falsify that all sheep are black, we can set up a highly repeatable, highly reproducible test like this:

  1. Get a white Swiss sheep.
  2. Identify the color of the sheep.
  3. If it is not black, then the statement that Swiss sheep are black is false.

This experiment setup is going to produce the same outcome for everyone conducting the test anywhere across the globe, at any point in time.

While there is a risk that we fail in step 1 (if we can't get hold of a Swiss sheep), we could substitute the test object with a picture of a Swiss sheep without affecting the validity of the test itself.


What is testability?

A good test setup has:
  • a verifiable test hypothesis
  • a well-defined test object
  • a precise set of test instructions
  • absolutely minimized test complexity
  • high repeatability and reproducibility
When all these are given, we can say that we have good testability. The more compromises we need to make in any direction, the worse our testability gets. 

A product that has high testability allows us to formulate and verify any relevant test hypothesis with minimal effort.

A product with poor testability has a high difficulty associated with formulating or verifying a test hypothesis. This difficulty might translate into an increase of any or all of the following:
  • complexity
  • effort
  • cost
  • duration
  • validity
  • uncertainty


In conclusion

The more often you want to test a hypothesis, the more valuable high testability becomes.
With increasing change frequency, the need to re-verify a formerly true hypothesis also increases. 

Design your product from day 1 to be highly testable.
By the time you discover that a product's testability is unsustainably low, it's often extremely expensive to notch it up to the level where you need it.

Tuesday, January 19, 2021

The problem of Black Box Tests

One of the most fundamental enablers of agile ways of working is the ability to swiftly and reliably detect problems in your product - that is, to test it efficiently and effectively. Unfortunately, the "traditional" approach of black-box testing a running application is hardly useful for this purpose. 

I have created an executable use case to illustrate the problem. 


Let's take a look at this little service, and assume that it would be your responsibility to test it in a way that you can reliably tell whether it works correctly, or which problems it has:

You can call this service in the mini applet included on this page if you have JavaScript enabled..

Alea iacta est

You need Javascript to run this demo.

Yes, just try it out - roll the dice!
That's about as simple as an application can get.
Can you imagine how much effort it takes to properly test this simple application?

It's not enough to roll the dice once and get a number between 1 and 6 - how do you know that there isn't a possibility that the application might generate results outside that range?

And how would you know that you have fair dice? Call the service a thousand times and assume that you would get an approximately even distribution of values? What would be your thresholds for assuming that the dice are "fair"? What if 5% or fewer, or 25% or more results go to one number, which is statistically still possible with a decent probability?
You see the difficulty already.

But let's make this more difficult:

Hello

You need Javascript to run this demo.

What if I told you that this is a call to the same service?
Yes, exactly - you didn't know everything the service does when you created your test concept before.
There's a different feature hidden in the service: if you pass a user name to the request, it will greet you!

This adds a whole new dimension to the test complexity: you have to test with - and without - a user name. And would you want to try different user names?

  But that's not everything:

You lose!

You need Javascript to run this demo.

Did you even catch that this one behaves different?
What if I told you that this is another call to the same service?
Yes, exactly - you still didn't know everything the service does when you created your test concept.
There's another different feature hidden in the service: you can load the dice and cheat!

If you tell the service to cheat you, you will get unfair dice.

So, now you need to run your entire test set from above twice again - with and without cheating.

And we haven't even looked into whether there are multiple ways of cheating, or whether the cheating function always triggers correctly when the variable is set (hint: it doesn't). Good luck without knowing the application where the malfunction is.

But we're not done yet:

I win  

You need Javascript to run this demo.

Did you catch the difference here?
What if I told you that this is yet another call to yet again same service?

There's yet another different feature hidden in the service: if I use my name to cheat, I will get loaded dice in my advantage!

By now, you're probably doubting whether you understood the application at all when you started testing it.

The code

Now - let me blow your mind and tell you how little source code was required to totally blow your test complexity and effort out of proportion:


That's it. This little snippet of code is entirely sufficient to keep a Black Box tester busy for hours, potentially days, and still remain unable to make a reliable statement on whether they missed anything, and which problems the product may or may not have.

Depending on how your application is designed, a few minutes of development effort can generate a humongous mountain of effort in testing.
 
And that's why you can't possibly hope to ever achieve a decent test coverage on an application without knowing the code.

Testability

There's another problem: this code wasn't written with testing in mind (or, much rather: purposely written with poor testability in mind -- hee hee) so you have no way of ever coming up with an easier way to test this service, until it's rewritten.

And that's why you can't maintain sustainable high quality unless developers and testers actively collaborate to build highly testable software that is easy to work with, both for software changes and testing. 

Think of minimizing sustainable lead time - consider the total effort from request to release, and consider it both for initial creation and future modification. There's no point in optimizing for development speed if you slow down testing more than that, and likewise, there's no point in delivering minimal code if the consequence is totally bloated test scope.

Otherwise, you'll not be very agile.

Friday, January 1, 2021

Low-Code, No-Code, Full-Code - The Testing Challenge

In the Enterprise world, there's a huge market for so-called "Low Code" and "No Code" Solutions, and they do have a certain appeal - you need to do less coding, and as such, need less developers, to achieve your business objectives, because they bring a lot of "out-of-the-box" functionality. 

So why is it even something to talk about - and how does that relate to "Agile" ways of working?


Let's explore this one from a quality perspective.


The Paradigms

No-Code: Configure and Go

"No Code" solutions are especially appealing to organizations that have no IT department and are looking for something that someone without IT knowledge can configure in a way that's immediately useful.

An early implementation of no-code platforms will typically not even include a staging environment where people try out things. Many times, changes are immediately live, on a productive system. That's great for small organizations who know exactly what they're doing because it absolutely minimizes effort and maximizes speed.
It turns into a nightmare when someone, somehow, by pure accident, managed to delete the "Order" object and now you're happy-hunting for a couple thousand unprocessed orders that your angry customers are complaining about - with no way to remedy the system.

And it turns into an even worse nightmare when the system doesn't do what it's supposed to do, and you've got a chance smaller than hell freezing over of figuring out why the black box does what it actually does instead of what it's supposed to do.

When introducing Quality Assurance on a No-Code platform, organizations are often stuck using third-party testing software that uses slow, flaky, difficult-to-maintain, expensive UI-based tests which will eventually get in the way of high speed adaptability. Clean Code practices applied to testing are usually a rare find in such an environment.


Low-Code: Configure and Customize

"Low Code" solutions are especially appealing to managers who are out to deliver standardized software to their organization fast. Many of these systems bring huge chunks of standard capability out-of-the box and "only need customization where your organization doesn't do what everyone else does."

That sounds appealing and is a common route in many organization, who often find out only years after the initial introduction that "you can't sustain your market position by doing what everyone else does" - your business does require a lot of customization to stand out in the market, and the platform often doesn't accommodate for that. 

Most vendor solutions don't provide a suite of functional tests for your organization to validate the standard behaviour, which means you often end up creating duplicate or highly similar code in your customization efforts - or use standard functions that don't do what you think they would. Worse yet, many use proprietary languages that make it very difficult to test close to the code. In combination, that makes it extremely hard to test the customization you're building, and even harder to sustainably keep the platform flexible.



Full-Code: Design, Build, Optimize

"Full Code" solutions sound like the most effort and the slowest way of achieving things. But looks can be deceptive, especially to a non-expert, because a modern stack of standard frameworks like Spring, Vue and Bootstrap, can literally make it a matter of minutes for a developer to produce the same kind of results that a low-code or no-code platform configuration would provide, without any of the quality drawbacks of Low-Code or No-Code.

Your organization has full control over the quality and sustainability of a full-code solution. It depends entirely upon what kind of engineering practices you apply, which technologies you use and which standards for quality you set for yourself.


Quality Control

To sustainably high quality at a rapid pace, you need full quality control:
  • You must be able to quickly validate that a component does what it's supposed to do.
  • You must be able to quickly figure out when something breaks, what it was, why and how.
  • When something breaks, you must be able to control blast radius.
  • You need a systematic way of isolating causes, effects and impact.
The most common approach to maintain these is a CI/CD pipeline that runs a robust test automation in the delivery process. To make it feasible that this control is exercised upon every single change that anyone makes at any point in time, it should not take longer than a few minutes, lest people are tempted to skip it when in a hurry.

The problem with both No-Code and Low-Code solutions is: In many cases, such platforms aren't even built for testability, and that becomes a nightmare for agile development. Instead of running a test where and how it is most efficient to run, you invest a lot of brainpower and time into figuring out how to run the test in a way that fits your technology: You have subjected quality to the technology, instead of the other way around!

In a low-code environment, this can become even more problematic, when custom components start to interfere with standard components in a way that is unknown and uncontrollable in a huge black box.


Non-functional Quality

Although I would not outright suggest to opt for a full-code solution (which potentially is not in the best interests of your organization, and it's entirely implausible without skilled developers), I would like to share a list of non-functional quality attributes that may not be considered when selecting a new system, platform or service.

In order to remain agile - that is, to be able to quickly, effectively and easily implement changes in a sustainable manner - your platform should also accommodate for the following non-functional quality requirements:

Factor Decisions
Testability
How much effort is it to test your business logic?
This must go far beyond having a human check briefly whether something works as intended. It needs to include ongoing execution, maintenance and control of any important tests whenever any change is made. And remember: any function you can't test may cause problems - even when you're not intentionally using it!
Traceability
How closely are cause and effect related?
You don't want a change to X also to affect Y and Z if that wasn't your intent! Are you able to isolate changes you're making - and are you able to isolate the impact of these changes?
This should go for the initial setup as well as for the entire lifecylce of the product.
Extensibility
How much effort does it take to add, change or remove business logic?
Adding a form field to a user interface is a start, not the end. Most of your data has a business purpose, and it may need to be sent to business partners, reported in finance, analyzed in marketing etc. How much effort does it take to verify everything turns out as intended?
Flexibility
How often will you be making changes?
If you're expecting a change a year, you can permit higher test efforts per change, but when you're looking at new stuff in a weekly manner, you could be overwhelmed by high test or change efforts, and cutting corners will become almost inevitable.
Security
Can you really trust your system?
Although every system could have vulnerabilities, and standard softwares tend to have fewer, but how can you test for Zero-Days unless you can fully test the intricate inner workings?
Also, some legislation like GDPR forces you to expose certain data processings, and you may need to provide evidence what your system does in order to do that. This is extremely difficult when some behavioural description of certain aspects are a black box.
Mutability
How much effort would it take to migrate to a new platform and decommission the current platform?
When you introduce a system without having an understanding of how much time, effort and risk is involved in a migration or decommissioning initiative, it might be easier to kill your current company and start another business than to get rid of the current technology. That means you could find yourself in a hostage situation when the day comes that your platform is no longer the best choice for your business, and you have no choice except continuously throwing good money after the bad.

As a general rule of thumb, low-code and no-code platforms tend not to emphasize these, so the value your organization places on these non-functional requirements correlates with the plausibility of selecting this approach.

Conclusion

With a lot of these to be said, if you're in the comfortable situation of introducing a new technology, ensure that you check the non-functional requirements and don't get blinded by the cute bucket of functionality a low-code or no-code solution may offer. If your platform does poorly especially on traceability, testability or mutability, you're going to trade off your agility for some extremely painful workaround that could increase the Cost of Ownership of your solution beyond feasible limits.

It wouldn't be the first time that I'd advise a client to "trash everything and start with a blank folder. Within a few years, you'll be faster, have saved money and made better business."

Culture Conversion

Many times, I hear that "SAFe doesn't work" both from Agile Coaches and companies who've tried it, and the reasons behind the complaint tend to boil down to a single pattern that is missing in the SAFe implementation - culture conversion. Let's explore why this pattern is so important, what it is, and how to establish it.



The Culture Clash

Many enterprises are often built upon classical management principles: Workers are seen as lazy, selfish and disposable "resources". Decisions are made at the top, execution is delegated. We have a constant tug-of-war between "The Business" and "Development". All problems are caused by "Them" (irrespective of whom you ask) - and the key objective is always to pass the next milestone lest heads roll.  There is little space for face-level exchange of ideas, mutual problem solving, growth and learning.

If you try to use an agile approach, which is built upon an entirely different set of principles, practices and beliefs, you'll get a clash. Either workers care, or they don't. Either people are valuable, or they aren't. Either they can think, or they can't. You get the idea. Behind that is a thing called "Theory X/Y." 

Self-fulfilling prophesy

When you treat people like trash, they'll stop caring about their work. When you don't listen to your developers, they fall silent. When you punish mistakes, workers become passive. And so on. This lose-lose proposition turns into a death spiral and becomes a self-fulfilling prophesy.

Likewise, when you create an environment built upon mutuality, trust and respect, people will behave differently. Except - you can't just declare it to be so, and continue sending signals that the new values are "just theoretical buzzwords that don't match our reality." Because, if you do that, this will again be a self-fulfilling prophesy.


Breaking the vicious circle

You can't change everything overnight, especially not an entire organization. Some people "get it" immediately, others take longer. Some may never get it. Even when you desire and announce a new culture, it can't be taken for granted. You have to work towards it, which can be a lot of effort when dealing with people who have built their entire careers on the ideas of the old culture.  

Resilience over robustness

A lot of this doesn't happen in the realm of processes, org charts and facts - what's truly going on happens mostly in the realm of beliefs, hopes, fears. As such, problems are often difficult to identify or pinpoint until a dangerous symptom becomes manifest. Hence, you can't simply re-design an organization to "implement" this new culture. The best you can do is institute checks and balances, early warning mechanisms, buffer zones and intentional breaking points.

Buffer Zone

Often, you may need time to collect striking evidence that would convince others to let go of certain un-helpful practices. These might include, for example, HR policies, project management or accounting practices. When you can't quite yet eliminate these things, it's quite important for the culture conversion to also include a conversion of such activities, so that they don't affect the teams. At the same time, you need a strategy laid out with clear targets for abolishing these things, lest they become "the new normal" and culture converters start believing them to be right or even essential.


The Culture Conversion Pattern

When you operate in an environment where cultural elements that conflict with the intended future culture exist and will likely interfere with the sustainability of the change, you need mechanisms that let you:

  • Establish the desirable culture
  • Minimize undesirable culture infringement
  • Mitigate damage from culture infringement
  • Breaking points when undesirable culture gets too strong
  • Identify culture clash

Specific people must take on this responsibility, it's not sufficient to say "We should do this." Someone must be in control of these activities and the entire organization must rigorously apply the above mechanisms, inspecting and adapting relentlessly upon failure.

Failure on any of these will provide a backdoor for the existing, undesirable culture to quickly usurp the new culture, and the culture change will fail.

The SAFe Zone

A healthy SAFe organization would institute the "Program Level" to provide exactly this resilience for culture conversion. The Product Management function would protect the agile organization against low value work and overburden, the RTE function would safeguard against Command and Control, and the architect would be the bulwark against unsustainable engineering. Product Owners and Scrum Masters would provide an additional safety cushion to protect the teams.

These roles must unite to drive the need for transparent, un-political value optimization, mutual collaboration and quality-focused development practice both towards the teams and the non-agile surrounding organization.


Failing Culture Conversion

Let's say your Program Level is being pressured to introduce cultural dysfunctions from the previously existing surrounding organization into the Agile Release Train, and they can't push back. In their function as a culture converter, they are now converting the new culture back into the old culture, and as such, working against the Agile Transformation. If you do not identify and deal with this issue swiftly and strongly, you're setting the fox to keep the geese: The fledgling new culture will be steamrolled by the existing culture in no time.




Summary

When you are using SAFe, ensure that the ART Roles are both willing and able to act as culture converters, and give them the support they need to function properly as such, mostly by relieving them of any and all responsibilities that relate to the "old" culture you want to abolish.

By overriding, short-ciruiting or ignoring the culture conversion function, you're dooming the culture transformation, and since the new ways of working rely on the new culture, you're going to train wreck. 

SAFe sucks when you mess up the culture conversion.