Thursday, December 10, 2020

Test Cocooning

 "How do you deal with Legacy code that lacks test coverage?" - even miniscule small changes are hazardous, and often, a necessary rewrite is postponed forever because it's such a nightmare to work with. Even if you have invested time into your test coverage after taking over the system, chances are there are still parts of the system you need to deal with that aren't covered at all. So this is what I propose in this situation:


Test Cocooning is a reversed TDD cycle, and it should be common sense.


The Cocooning process

Test Cocooning is a pretty straightforward exercise: 
  1. Based on what you think the code does, you create a cocooning test.
    1. If the test fails, you didn't understand the code correctly and you have to improve your test.
    2. If the test passes, you have covered a section of the code with a test that ensures you don't accidentally break the tested aspect of the code.
  2. Based on what you think the code does, you make a breaking change.
    1. If the test fails in the way you thought it would, you have a correct understanding of that piece of code.
    2. If the test passes, you didn't understand the code correctly and you have to improve your test (back to step 1)
    3. Intermediate activity: Of course, you revert the change to restore the behaviour that you have covered with test.
  3. Within the scope of your passing test, you begin to improve:
    1. Create lower-levelled tests that deal with more specifics of the tested code (e.g. unit tests.)
    2. Refactor based on the continuous and recurrent execution of all the relevant tests.
    3. Refactor your tests as well.
  4. Re-run the original cocooning test to ensure you didn't mess up anywhere!

Once a cocooning cycle is completed, you should have reworked a small section of your Legacy code to be Clean(er) Code that is more workable for change.


Iterating

You may need to complete multiple cocooning cycles until you have a sufficient amount of certainty that you can work on the code reliably.


Backtracking

The important secret of successful Test Cocooning is that you need to backtrack both on the code and your tests - after completing all relevant cocooning cycles, you'll need to re-run:

  • your cocooning tests against the original legacy code. 
  • your unrefactored original cocooning tests against the new code.
  • your unrefactored original cocooning tests against the original legacy code.
Yes, that's painful and a lot of overhead, but it's your best bet in the face of dangerous, unworkable code, and believe me - it's a lot less painful than what you'll experience when some nasty bugs slip through because you skipped any of these.


Working Cocooned code

Once you have your test cocoon, you can work the cocooned code - only within the scope of your cocoon - to fix bugs and to build new features.

Bugfixes

Fixing bugs relies on making a controlled breach to your cocoon.
Metaphorically speaking, you need to be like a spider that caught a bug and sucks it dry before discarding the woven husk.
  1. Create a test cocoon for the current behaviour which passes under the current faulty(!) conditions of the code segment that exactly reproduces the bug as though it were desired behaviour.
  2. Create a test which fails due to the bug, i.e. add a second test that exactly reverses the cocooned behaviour.
  3. Write the code that meets the requirement of the failing test.
    1. As a consequence, the cocooned passing test for the bug should now fail.
    2. Ensure that no other tests have failed.
    3. If another test has failed, ensure that this is intentional.
  4. Eliminate the broken cocoon test that reproduces the bug's behaviour.
    1. If there were other tests that failed, now is the time to modify these tests one by one.
  5. Backtrack like described above to ensure that nothing slipped. 

Modifying features

Modifying existing behaviour should be treated exactly like a bugfix.

New functionality

If you plan to add new functionality to Legacy Code, your best bet is to develop this code in isolation  from the cocooned legacy and only communicate via interfaces, ensuring that the cocoon doesn't break. 
When you really need to invoke new code from the Legacy, treat the modification like a bugfix.

Rewrite

A rewrite should keep the cocoon intact. Don't cling to any of the Legacy code and consider your cocooning efforts "sunken cost" - otherwise, you risk reproducing the same mess with new statements. 



Closing remarks

  1. I believe that test cocooning requires both strong test and development expertise, so if you have different specialists on your team, I would highly recommend to build the cocoon in pairing.
  2. Cocoon tests are often inefficient and have poor performance. You do not need to add these tests to your CI/CD pipeline. What you must add to your pipeline is the lower-level tests that replicate the unit behaviour of the cocoon. It's totally sufficient to rerun the cocoon tests when you work on the cocooned Legacy segment.
  3. Cocooning is a workaround for low-quality code. When time permits, rewrite it with Clean Code and you can discard the cocoon along with the deleted code.
  4. Do not work on Legacy Code without a solid Cocoon. The risk outweighs the effort.

Friday, December 4, 2020

Test Coverage Matrix

Whether you're transitioning towards agile ways of working on a Legacy platform or intend to step up your testing game for a larger system developed in an agile fashion, at some point, it pays to set up a Coverage Matrix to see where it pays to invest effort - and where it doesn't.



Before we start

First things first: the purpose of an agile Coverage Matrix isn't the same as a traditional project-style coverage matrix that's mostly concerned with getting the next release shipped. I don't intend to introduce a mechanism that adds significant overhead with little value, but to give you a means of starting the right discussions at the right time and to help you think in a specific direction. Caveat emptor: It's up to you to figure out how far you want to go down each of the rabbit holes. "Start really simple and incrementally improve" is good advice here!

What I'm proposing in this article will sound familiar to the apt Six Sigma practitioner as a simplified modification of the method, "Quality Function Deployment." And that's no coincidence.


Coverage Characteristics

Based on the ISO/IEC 9126, quality characteristics can be grouped into Functionality, Reliability, Usability, Efficiency, Maintainability and Portability. These are definitely good guidance. 

To simplify matters, I like to start the initial discussion by labelling the columns of the matrix:

  • Functionality ("Happy Cases")
  • Reliability ("Unhappy Cases")
  • Integration
  • Performance
  • Compliance
  • UX
Of course, we can clarify a lot more on what each of these areas means, but let's provide some leeway for the first round of discussion here. The most important thing is that everyone in the room has an aligned understanding on what these are supposed mean. 
If you are in the mood for some over-engineering, add subcategories for each coverage characteristic, such as splitting Performance into "efficiency", "speed", "scalability", "stress resilience" etc. That will bloat up the matrix and may make it more appropriate to flip rows and columns on the matrix.

Test Areas

Defining test areas falls into multiple categories, which correlate to the "Automation Test Pyramid". 

  • User journeys
  • Data flow
  • Architectural structure
  • Code behaviour
There are other kinds of test areas, such as validation of learning hypotheses around value and human behaviour, but let's ignore these here. Let's make a strong assumption that we know what "the right thing" is, and we just want to test that "we have things right." Otherwise, we'd open a can of worms here. You're free to also cover these, adding the respective complexity.


Functional areas

In each test area, you will find different functional areas, which strongly depend on what your product looks like.

User journeys

There are different user journeys with different touchpoints how your user interacts with your product. 

For example, a simple video player app might have one user flow for free-to-play, another for registration, another for premium top-up, and another for GDPR compliant deregistration as well as various flows such as "continue to watch my last video" or "download for offline viewing". These flows don't care what's happen technically.


Data flow

Take a look at how the data flows through your system as certain processes get executed. Every technical flow should be consistent end-to-end.

For example, when you buy a product online, the user just presses "Purchase", and a few milliseconds later, they get a message like "Thank you for your order." The magic that happens inbetween is make or break for your product, but entirely irrelevant for the user. In our example, that might mean that the system needs to make a purchase reservation, validate the user's identity and their payment information, must conduct a payment transaction, turn the reservation into an order, ensure the order gets fulfilled etc. If a single step in this flow breaks, the outcome could be an economic disaster. Such tests can become a nightmare in microservice environments where they were never mapped out.


Architectural structure

Similar to technical flow, there are multiple ways in which a transaction can occur: it can happen inside one component (e.g. frontend rendering), it can span a group of components (e.g. frontend / backend / database) or even a cluster (e.g. billing service, payment service, fulfilment service) and in the worst case, multiple ecosystems consisting of multiple services spanning multiple enterprises (e.g. Google Account, Amazon Fulfilment, Salesforce CRM, Tableau Analytics).

In architectural flow, you could list the components and their key partner interfaces. For example:

  • User Management
    • CRM API
    • DWH API
  • Payment 
    • Order API
    • Billing API

Architectural flow is important in the sense that you need to ensure that all relevant product components and their interactions are covered.

You can simplify this by first listing the relevant architectural components, and only drilling down further if you have identified a relevant hotspot.


Code behaviour

At the lowest level is always the unit test, and different components tend to have different levels of coverage - are you testing class coverage, line coverage, statement coverage, branch coverage - and what else? Clean Code? Suit yourself.

Since you can't list every single behaviour of the code that you'd want to test for without turning a Coverage Matrix into a copy of your source code, you'll want to focus on stuff that really matters: do we think there's a need to do something?


Bringing the areas together

There are dependencies between the areas - you can't have a user flow without technical flow, you won't have technical flow without architectural flow, and you won't have architectural flow without code behaviour. Preferably, you don't need to test for certain user flows at all, because the technical and architectural flows already cover everything. 

If you can relate the different areas with each other, you may learn that you're duplicating or missing on key factors.


Section Weight

For each row, for each column, assign a value on how important this topic is. 

For example, you have the user journey "Register new account." How important do you think it's to have the happy path automated? Do you think the negative case is also important? Does this have impact on other components, i.e. would the user get SSO capability across multiple products? Can you deal with 1000 simultaneous registrations? Is the process secure and GDPR compliant? Are users happy with their experience?

You will quickly discover that certain rows and columns are "mission critical", so mark them in red. Others will turn out to be "entirely out-of-scope", such as testing UX on a backend service, so mark them gray. Others will be "basically relevant" (green) or "Important" (yellow).

As a result, you end up with a color-coded matrix.

The key discussion that should happen here is whether the colors are appropriate. An entirely red matrix is as unfeasible as an entirely gray matrix.


A sample row: Mission critical, important, relevant and irrelevant



Reliability Level

As the fourth activity, focus on the red and yellow cells and take a look at a sliding scale on how well  you're doing in each area and assign a number from 0 to 10 with this rough guidance:

  • 0 - We're doing nothing, but know we must.
  • 3 - We know that we should to more here.
  • 5 - We've got this covered, but with gaps.
  • 7 - We're doing okay here.
  • 10 - We're using an optimized, aligned, standardized, sustainable approach here.

As a consequence, the red and yellow cells should look like this:

A sample matrix with four weighted fields.

As you would probably guess by now, the next step for discussion would be to look at the big picture and ask, "What do we do with that now?"


The Matrix

Row Aggregate

For each row, figure out what the majority of colors in that row is, and use that as the color of the row. Next, add up all the numbers. This will give you a total number for the row. 

This will give you an indicator which row is most important to address - the ones in red, with the lowest number.

The Row Aggregate


Column Aggregate

You can use the same approach for the columns, and you will discover which test type is covered best.  I would be amazed if Unhappy Path or Compliance turn out to have poor coverage when you first do this exercise, but the real question is again: Which of the red columns has the lowest number?


The Column aggregate



After conducting all the above activities, you should end up with a matrix that looks similar to this one:


A coverage matrix

Working with the Matrix

There is no "The right approach" to whether to work on improving coverage for test objects or test types - the intent is to start a discussion about "the next sensible thing to do," which totally depends on your specific context.  

As per our example, the question of "Should we discuss the badly covered topic Performance which isn't the most important thing, or should we cover the topic of architectural flow?" has no correct answer - you could end up with different groups of people working hand in hand to improve both of these, or you could focus on either one.



How-To Use

You can facilitate discussions with this matrix by inviting different groups of interest - business people, product people, architects, developers, testers - and start a discussion on "Are we testing the right things right, and where or how could we improve most effectively?"

Modifications 

You can modify this matrix in whatever way you think: Different categories for rows or columns, drill-in, drill-across - all are valid.

For example, you could have a look at only functional tests on user journeys and start listing the different journeys, or you could explicitly look at different types of approaching happy path tests (e.g., focusing on covering various suppliers, inputs, processing, outputs or consumers)

KISS 

This method looks super complicated if you list out all potential scenarios and all potential test types - you'd take months to set up the board, without even having a coversation. Don't. First, identify the 3-4 most critical rows and columns, and take the conversation from there. Drill in only when necessary and only where it makes sense.




Tuesday, December 1, 2020

Refactor, Rewrite or Redesign?

Have you ever heard the term "Refactoring Sprint?" There is a lot of confusion around what Refactoring, Rewrite und Redesign are, what they entail, as well as how and when to use them - so here's a small infographic:



Refactoring

Refactoring is all about small changes to the way code has been written to reduce technical debt as the code and our understanding thereof grows. It's a continuous exercise and should always be an integral part of the work. Safe refactoring is a low-risk exercise that happens within the scope of existing unit tests, which should verify that indeed we haven't made any unwanted changes.

Examples of Refactoring are: Extracting a variable, renaming a function, consolidating statements or moving code from one place to another.

It should go without saying that in professional development, Refactoring should never be a separate task and should never be done in big batches. Your Definition of Done should include that no essential refactoring is left to do.


Rewrite

We rewrite code when we discover that a current code segment is no longer the best way to achieve a specific purpose. Depending on your test coverage, rewrites are a high-risk exercise that are the basis for future risk reduction.

Examples of Rewrites are new and better means (e.g. a new technology) to do the same thing, or stumbling upon a legacy that is difficult to work with (e.g. complex methods without tests).

Smaller rewrites (e.g. method rewrites) should be done autonomously as regular part of the work when the need is discovered and when developers have sufficient confidence that they can do this safely.
Larger rewrites should be planned with the team, as this exercise could consume significant portions of time and may need additional attention to failure potential.


Redesign

A redesign must happen when the purpose of an existing component has changed in ways that the current solution is no longer the best possible thing. Redesign is a high-risk and highly time consuming exercise that should be done to get out of a corner. If you have highly malleable code that was well refactored on a continuous basis, redesign should hardly ever be a case unless you have major changes to the business context in which a system is operated.

Examples of Redesign might include moving from batch processing to data streams, changing the database paradigm, or acquiring a new line of business that requires processing different data.

Redesign should always be a planned whole-team exercise that might even break down into a hierarchy of mid- and short term goals. Redesign is a mix of technical and business decision, so it should be aligned with Product.