Sunday, June 17, 2018

Test Pyramid Explained - part 1

Let's take a deeper look at what the Test Pyramid is, and how it can help us achieve sustainable, high quality. In this section, we will take a look at the left part of the picture only, as understanding this portion is essential to making sense of the right side.


Yet another model of the "Test Pyramid" - there's more to it than meets the eye!

The five levels

Before we get into the "How", we will examine the "What" - the five different levels, starting from top to bottom. Why top-down? Because this is how the business looks at software.

Process Chain Tests

A process chain is a description of a user-centric feature (oftentimes, a user story), irrespective of where it is implemented. From a high level, a customer might be something like "I want my order shipped home."
Such a process chain may consist of a larger number of technical features realized across a bigger number of subsystems, some of them potentially not even software. In our example, the process chain might look like this:

  1.  User presses "Purchase" (online shop)
  2.  User makes payment (payment provider)
  3.  Order gets sent to warehouse for picking (warehouse system)
  4.  Order is picked (picker's device + warehouse system)
  5.  Package is sent for shipment (logistics + logistics payment system)
  6.  Package is shipped (logistics + logistics tracking system)
  7.  Package arrives (logistics tracking system)
  8.  Order is closed (online shop)


As we can see from this example, it's incredibly complex to test a process chain, as each system and activity has a chance to fail. The potential amount of failure scenarios are nearly infinite - regardless of how many we cover, there might still be another.

The good news is that if a process chain works, it's a guarantee that all subsystems and steps worked.
At the same time the bad news is that - if the process chain doesn't work, we may need to do a lot of trackbacking to discover where the failure was introduced into the system.

Regardless of how much we test elsewhere - it might just be a good idea to do at least one supervised process chain test before "going live" with a complex system. That is, if we can afford it. Many organizations might simply resort to monitoring a live system's process chain in a "friendly user pilot phase".

Duration
A process chain test might take anywhere from a few minutes to many weeks to complete. As a rule of thumb, lacking any further information, an hour to a day might be a solid guess for the execution time of such a test. This explains why we don't want hundreds of them.

System Tests

Slightly simper than process chain tests are the oftenplace common system tests: The system is considered an inseperable unit - oftentimes, a "black box".

A system test would be concerned with the activities and data transfers from the time data enters into one system until the sub-process within the system is closed. Resorting to our above example, a system test of the Online Shop might look like this:

  1.  User presses "Purchase" (Webshop)
  2.  User's order data is persisted as "Payment Pending" (Database)
  3.  User is redirected to payment section (External Payment service)
  4.  Payment is authorized (External Payment service)
  5.  Payment authorization ID is persisted (Database)
  6.  Order Status is set to "Payment Complete" (Database)
  7.  User is redirected to "Thank you" page (Webshop)
  8.  Order is forwareded to Warehouse system
  9.  Warehouse System sends Order Acknowledged message
  10.  Order Status is set to "In Process" (Database)
Here we see that system tests, despite having a much smaller scope than a process chain, are still nearly as difficult to test and stabilize. 

Oddly enough, many so-called "test factories" test on this level, creating complex automation scripts - oftentimes based on tools such as SeleniumIDE - which is seen as a feasible way to automate tests with little effort.
The downside of automating system tests is that a minor change in the test constellation will invalidate the test - in our example, if the "Thank You" is replaced with a modal stating "Your order has been completed.", we might have to scrap the entire test (depending on how poorly it has been written).

I have seen entire teams spending major portions of their time both figuring out why system tests failed - as well as keeping up with all those feature changes invalidating the tests.

Duration
System tests shouldn't take all too long, but 5-15 minutes for a single automated test case isn't unheard of. Fast system tests might finish in as little as ten seconds.

Integration Tests

Integration tests are inteded to check the I/O of a system's components, usually ignoring both the larger scope process chain and the lower level technical details. 

An integration test assumes that the preceding steps in the source system worked - the focus is on the system's entry and exit points, considering the internal logic as a black box.

In our webshop payment example, we might consider the following autonomous integration tests:

  1. When a user presses "Purchase", all items from the basket are stored in the database (UI -> Backend)
  2. When a user is forwarded to the Payment Website, the total purchase price is correctly transferred to the payment service (Backend -> payment system)
  3. When a payment is successfully completed, the payment data is correctly stored (payment system -> Backend)
  4. When an order is correctly paid, it is forwarded to the warehouse system (Backend -> warehouse system)
  5. The Warehouse system's order acknowledge is correctly processed (warehouse system -> Backend)

Integration tests are much smaller than system tests, and the root cause of failure is much easier to isolate.
The biggest downside of integration tests is that they rely on the availability and response of the partner system. If a partner system happens to be unavailable for any reason, integration tests can not be run.
I've seen this break the back of one webshop's test suite who relied on a global payment provider's sandbox that failed to answer during business hours, because it was constantly bombarded by thousands of clients.

Duration
Integration tests don't do all that much, their downside is the response time of the two systems. Good integration tests shouldn't take more than maybe 50ms, while poor integration tests might take a few seconds.

A good way to speed up integration tests is by mocking slow or unreliable partner systems, which can also speed them up massively, but adds complexity to the component's test suite.

Feature & Contract Tests

This group simultaneously contains two types of testing, as these go hand in hand: Feature tests are the internal logic how a system processes data. Contract tests validate how the data is being passed into / exits from the system.

Here's an example of a feature test:

class BasketValidationResponseSpec extends Specification {
   def "information given to customer" (BasketPojo Basket, String message, Boolean status ) {

expect:
   Basket.statusMessage() === message
   Basket.checkState() === status

where:
   Basket | message | status
   [ Bread[1], Butter[1], Book[1] ] | "Valid" | true
   [] | "Empty basket" | false
   [Bread[199] ] | "Too much Bread" | false
   [Bread[1], Butter[199] ] | "Too much Butter" | false
  }
}

(please forgive my indentation, HTML indentation is a pain)

Feature tests don't rely on any external interfaces being available, making them both reliable and fast to execute. Unlike unit tests (below), they don't test the details of individual functional units, but focus on the interaction of multiple functional units.

Contract tests are the flip side of the coin here, as a feature test assumes that the data is both provided in the right way and is returned in a way that the interfaced component can correctly process. In an ever-changing software world, these assumption are often untrue - contracts help create some reliability here. I don't want to go into that topic too deeply, as contracts are an entire field on their own.

Duration
The good news is that good feature and contract tests execute in as little as 20ms, making them both incredibly fast and reliable.


Unit tests

The bread and butter of software development are unit tests. They test single individual functional units in isolation, and there should be no relevant system functionality that's not covered with a test that wouldn't explain how that functionality works.
The purpose of unit tests isn't as much to create user comprehensible test feedback, it's to ensure that the code is comprehensible and workable - even when refactored.

Unit tests will ensure your code is loosely coupled, that each method doesn't do too many things (ideal amount: one purpose per method), that involuntary design errors are quickly caught and many other things which help developers.

While well-designed Feature tests answer the question "Why" a piece of code exists, the unit test defines "How"it is implemented. Separating these two things often neither makes sense - the boundary may be fluid. The main difference is that a unit test never relies on anything external to the object under test, whereas a feature test might rely on full object instatiation.
Their main "downside" is that their lifetime is coupled to the functionality they test - whenever the functionality gets adjusted, the unit test either has to stay valid or needs to be modified.

Duration
Unit tests are extremely fast. There are even tools executing the unit tests of modified code in the background while the developer is still typing. The limiting factors here are pretty much CPU speed and RAM: executing an entire project's unit test suite shouldn't take more than a minute (excluding ramp-up time of the IDE), otherwise you're probably doing something wrong.



Given these definitions, let's do a brief...


Summary


Test TypeDurationAccuracyDurability
Process Chain1h +Very LowVery Low
System1-15minVery LowVery Low
Integration50ms+LowLow
Feature&Contract10-20msHighHigh
Unit< 10msHighN/A

If you ask me if there's any sane reason to test on the higher levels of the pyramid - I'd answer: "It's too slow, too expensive and too unreliable." At the same time, there are reasons to test high in the pyramid, including: Coarse granularity business feasibility testing, lack of lower level automation and/or lack of developer skills.

In the -->next article of the series, I will explain the right side of the image - the testing metrics in more detail.



No comments:

Post a Comment