Sunday, January 24, 2021

The importance of testablility

In the article on Black Box Testing, we took a look at the testing nightmare caused by a product that was not designed for testing. This led to a followup point: "Testability", which is also a quality attribute of the ISO:25010. Let us examine a little closer what the value of testability for product development actually is.






What is even a "test"?

In science, every hypothesis needs to be falsifiable, i.e. it must be logically and practically possible to find counter-examples. 

Why are counter-examples so important? Let's use a real-world scenario, and start with a joke.

A sociologist, a physisicst and a mathematician ride on a train across Europe. As they cross the border into Switzerland, they see a black sheep. The sociologist exclaims, "Interesting. Swiss sheep are black!". The physicist corrects, "Almost. In Switzerland, there are black sheep." The mathematician shakes her head, "We only know that in Switzerland, there exists one sheep that is black on at least one side."

  • The first statement is very easy to disprove: they just need to encounter a sheep that isn't black.
  • The second statement is very hard to disprove: because even if the mathematician were right and another angle would reveal the sheep to not be fully black, the statement weren't automatically untrue, because there could be other sheep somewhere in Switzerland that are black.
  • Finally, the third statement holds true, because the reverse claim ("There is no sheep that is black on one side in Switzerland") has already been disproven by evidence.

This leads to a few follow up questions we need to ask about test design.


Test Precision

Imagine that our test setup to verify the above statements looked like this:

  1. Go to the travel agency.
  2. Book a flight to Zurich.
  3. Fly to Zurich.
  4. Take a taxi to the countryside.
  5. Get out at a meadow.
  6. Walk to the nearest sheep.
  7. Inspect the sheep's fur.
  8. If the sheep's fur color is black, then Swiss sheep are black.

Aside from the fact that after running this test, you might be stranded in the Winter Alps at midnight wearing nothing but your pyjamas and you're a couple hundred Euros poorer, this test could go wrong in so many ways.

For example, your travel agency might be closed. Or, you could have insufficient funds to book a flight. You could have forgotten your passport and aren't allowed to exit the airport, you could go to a meadow that has cows and not sheep, and the sheep's fur inspection might yield fleas. Which of these has anything to do with whether Swiss sheep are black?

We see this very often in "black box tests" as well:

  • it's very unclear what we're actually trying to test
  • just because the test failed, that doesn't mean that the thing we wanted to know is untrue
  • there's a huge overhead cost associated with validating our hypothesis
  • we don't return to a "clean slate" after the test
  • Success doesn't provide sufficient evidence to verify the test hypothesis.

Occam's Razor

In the 13th century, a monk by the name of Occam came up with what's known today as Occam's Razor, i.e., "entities should not be multiplied without necessity". Taking a look at our above test, that would mean that every test step that has nothing to do with sheep or fur color should be removed from the design.

The probability to run this test successfully increases by isolating the test object (sheep's fur) as far as possible, and eliminating all variability from the test that isn't directly associated to the hypothesis itself.


Verifyability

We verify a hypothesis by finding a counter-example to what's called the "alternate hypothesis" and assuming that if this one is untrue, then its logical opposite, called the "null hypothesis" is true. Unfortunately, this means we have to think in reverse.

In our example: To prove that all sheep are black, we have to sample all sheep. That's difficult. It's much easier to sample one non-black sheep, and thereby falsify that all sheep are black. If we fail to produce even one, then all sheep must be black.


Repeatability and Reproducibility

A proper test is a systematic, preferrably repeatable and reproducible, way of verifying our hypothesis. That means, it should be highly predictable in its outcome, and we should be able to test as often as we want.

Again, going back to our example, if we design our test like this:

  1. Take a look at a Swiss sheep.
  2. If it is white, then sheep are not black.
  3. If sheep are not - not black, then sheep are black.

This is terrible test design, because of some obvious flaws in each step: 

  1. The setup is an uncontrolled random sample. Since sheep are white or black, running this test on an unknown setup, means we haven't ruled out anything if we picked a black sheep.
  2. The alternate hypothesis is incomplete: Should the sheep be brown, then it is also not white.
  3. Assuming that step 2 didn't trigger, we would conclude that brown = black.

Since the "take a look at a Swiss sheep" is already part of the test design, each time we repeat this test, we get a different outcome, and we can't reproduce anything either, because if I run this test, my outcome will be different from yours.


Reproducibility

A repeatability problem occurs when the same setup can generate different results. In our example, "take a look at" could be fixed, assuming we take the mathematician's advice, by re-phrasing step 1 to: "Look at a Swiss sheep from all angles." This would lead everyone to reach the same conclusion.

We might also have to define what we call "white" and "black", or whether we would classify "brown" as a kind of white.

We increase reproducibility by being precise on how we examine the object under test, and which statements we want to make about our object under test.


Repeatability

Depending on what the purpose of our test is, we are doing well by removing the variation in our object under test. So, if our test objective is to prove or falsify that all sheep are black, we can set up a highly repeatable, highly reproducible test like this:

  1. Get a white Swiss sheep.
  2. Identify the color of the sheep.
  3. If it is not black, then the statement that Swiss sheep are black is false.

This experiment setup is going to produce the same outcome for everyone conducting the test anywhere across the globe, at any point in time.

While there is a risk that we fail in step 1 (if we can't get hold of a Swiss sheep), we could substitute the test object with a picture of a Swiss sheep without affecting the validity of the test itself.


What is testability?

A good test setup has:
  • a verifiable test hypothesis
  • a well-defined test object
  • a precise set of test instructions
  • absolutely minimized test complexity
  • high repeatability and reproducibility
When all these are given, we can say that we have good testability. The more compromises we need to make in any direction, the worse our testability gets. 

A product that has high testability allows us to formulate and verify any relevant test hypothesis with minimal effort.

A product with poor testability has a high difficulty associated with formulating or verifying a test hypothesis. This difficulty might translate into an increase of any or all of the following:
  • complexity
  • effort
  • cost
  • duration
  • validity
  • uncertainty


In conclusion

The more often you want to test a hypothesis, the more valuable high testability becomes.
With increasing change frequency, the need to re-verify a formerly true hypothesis also increases. 

Design your product from day 1 to be highly testable.
By the time you discover that a product's testability is unsustainably low, it's often extremely expensive to notch it up to the level where you need it.

5 comments:

  1. If I am not completely misunderstanding your reasoning, in paragraph Repeatability 1. should read "Get a Swiss sheep" and not "Get a white Swiss sheep", correct?

    ReplyDelete
    Replies
    1. That's a good question.
      We choose a white sheep on purpose, such as to have a 100% falsified alternate hypothesis.
      If we selected a random Swiss sheep, we could by accident select a black sheep, and we still wouldn't know whether Swiss sheep are always black.

      There's the old math joke, "All odd numbers are prime. Proof: 3,5,7,11,13,17 q.e.d."
      If we set up the test like this:
      1. Pick the number 9.
      2. If it isn't prime, then the claim all odd numbers are prime is false.

      9 is "a white Swiss sheep".

      Delete
    2. What if you cannot get white sheep? It still doesn't mean there is no white sheep in Switzerland.

      Delete
  2. What if you cannot get white sheep in Switzerland? It still doesn't mean that there is no white sheep in Switzerland.

    ReplyDelete
  3. Technically yes, although context matters a lot.
    If we would make this analogous to a software test:
    If the statement "SELECT sheep FROM Switzerland WHERE fur = White LIMIT 1;" returns nothing, then that might have three obvious reasons:
    1. The database connection failed.
    2. The database is faulty.
    3. There are no white sheep.

    Of course, we still need to rule out it isn't 1 and 2, which is a straightforward process.
    But it's different if we have a frontend application with server-side rendering, in which case there are additional reasons why we don't see white sheep:

    1. The backend didn't process the response.
    2. The backend processed the response wrongly.
    3. The backend didn't display the result correctly.
    4. The backend didn't convey the result correctly to the frontend.
    5. The frontend doesn't display correctly.
    6. There might be a filter setting somewhere that hides the result
    ... and these are just the obvious things on the top of my head.

    ReplyDelete