Thursday, December 10, 2020

Test Cocooning

 "How do you deal with Legacy code that lacks test coverage?" - even miniscule small changes are hazardous, and often, a necessary rewrite is postponed forever because it's such a nightmare to work with. Even if you have invested time into your test coverage after taking over the system, chances are there are still parts of the system you need to deal with that aren't covered at all. So this is what I propose in this situation:


Test Cocooning is a reversed TDD cycle, and it should be common sense.


The Cocooning process

Test Cocooning is a pretty straightforward exercise: 
  1. Based on what you think the code does, you create a cocooning test.
    1. If the test fails, you didn't understand the code correctly and you have to improve your test.
    2. If the test passes, you have covered a section of the code with a test that ensures you don't accidentally break the tested aspect of the code.
  2. Based on what you think the code does, you make a breaking change.
    1. If the test fails in the way you thought it would, you have a correct understanding of that piece of code.
    2. If the test passes, you didn't understand the code correctly and you have to improve your test (back to step 1)
    3. Intermediate activity: Of course, you revert the change to restore the behaviour that you have covered with test.
  3. Within the scope of your passing test, you begin to improve:
    1. Create lower-levelled tests that deal with more specifics of the tested code (e.g. unit tests.)
    2. Refactor based on the continuous and recurrent execution of all the relevant tests.
    3. Refactor your tests as well.
  4. Re-run the original cocooning test to ensure you didn't mess up anywhere!

Once a cocooning cycle is completed, you should have reworked a small section of your Legacy code to be Clean(er) Code that is more workable for change.


Iterating

You may need to complete multiple cocooning cycles until you have a sufficient amount of certainty that you have workable code.


Backtracking

The important secret of successful Test Cocooning is that you need to backtrack both on the code and your tests - after completing all relevant cocooning cycles, you'll need to re-run:

  • your cocooning tests against the original legacy code. 
  • your unrefactored original cocooning tests against the new code.
  • your unrefactored original cocooning tests against the original legacy code.
Yes, that's painful and a lot of overhead, but it's your best bet in the face of dangerous, unworkable code, and believe me - it's a lot less painful than what you'll experience when some nasty bugs slip through because you skipped any of these.


Working Cocooned code

Once you have your test cocoon, you can work the cocooned code - only within the scope of your cocoon - to fix bugs and to build new features.

Bugfixes

Fixing bugs relies on making a controlled breach to your cocoon.
Metaphorically speaking, you need to be like a spider that caught a bug and sucks it dry before discarding the woven husk.
  1. Create a test cocoon for the current behaviour which passes under the current faulty(!) conditions of the code segment that exactly reproduces the bug as though it were desired behaviour.
  2. Create a test which fails due to the bug, i.e. add a second test that exactly reverses the cocooned behaviour.
  3. Write the code that meets the requirement of the failing test.
    1. As a consequence, the cocooned passing test for the bug should now fail.
    2. Ensure that no other tests have failed.
    3. If another test has failed, ensure that this is intentional.
  4. Eliminate the broken cocoon test that reproduces the bug's behaviour.
    1. If there were other tests that failed, now is the time to modify these tests one by one.
  5. Backtrack like described above to ensure that nothing slipped. 

Modifying features

Modifying existing behaviour should be treated exactly like a bugfix.

New functionality

If you plan to add new functionality to Legacy Code, your best bet is to develop this code in isolation  from the cocooned legacy and only communicate via interfaces, ensuring that the cocoon doesn't break. 
When you really need to invoke new code from the Legacy, treat the modification like a bugfix.

Rewrite

A rewrite should keep the cocoon intact. Don't cling to any of the Legacy code and consider your cocooning efforts "sunken cost" - otherwise, you risk reproducing the same mess with new statements. 



Closing remarks

  1. I believe that test cocooning requires both strong test and development expertise, so if you have different specialists on your team, I would highly recommend to build the cocoon in pairing.
  2. Cocoon tests are often inefficient and have poor performance. You do not need to add these tests to your CI/CD pipeline. What you must add to your pipeline is the lower-level tests that replicate the unit behaviour of the cocoon. It's totally sufficient to rerun the cocoon tests when you work on the cocooned Legacy segment.
  3. Cocooning is a workaround for low-quality code. When time permits, rewrite it with Clean Code and you can discard the cocoon along with the deleted code.
  4. Do not work on Legacy Code without a solid Cocoon. The risk outweighs the effort.

Friday, December 4, 2020

Test Coverage Matrix

Whether you're transitioning towards agile ways of working on a Legacy platform or intend to step up your testing game for a larger system developed in an agile fashion, at some point, it pays to set up a Coverage Matrix to see where it pays to invest effort - and where it doesn't.



Before we start

First things first: the purpose of an agile Coverage Matrix isn't the same as a traditional project-style coverage matrix that's mostly concerned with getting the next release shipped. I don't intend to introduce a mechanism that adds significant overhead with little value, but to give you a means of starting the right discussions at the right time and to help you think in a specific direction. Caveat emptor: It's up to you to figure out how far you want to go down each of the rabbit holes. "Start really simple and incrementally improve" is good advice here!

What I'm proposing in this article will sound familiar to the apt Six Sigma practitioner as a simplified modification of the method, "Quality Function Deployment." And that's no coincidence.


Coverage Characteristics

Based on the ISO/IEC 9126, quality characteristics can be grouped into Functionality, Reliability, Usability, Efficiency, Maintainability and Portability. These are definitely good guidance. 

To simplify matters, I like to start the initial discussion by labelling the columns of the matrix:

  • Functionality ("Happy Cases")
  • Reliability ("Unhappy Cases")
  • Integration
  • Performance
  • Compliance
  • UX
Of course, we can clarify a lot more on what each of these areas means, but let's provide some leeway for the first round of discussion here. The most important thing is that everyone in the room has an aligned understanding on what these are supposed mean. 
If you are in the mood for some over-engineering, add subcategories for each coverage characteristic, such as splitting Performance into "efficiency", "speed", "scalability", "stress resilience" etc. That will bloat up the matrix and may make it more appropriate to flip rows and columns on the matrix.

Test Areas

Defining test areas falls into multiple categories, which correlate to the "Automation Test Pyramid". 

  • User journeys
  • Data flow
  • Architectural structure
  • Code behaviour
There are other kinds of test areas, such as validation of learning hypotheses around value and human behaviour, but let's ignore these here. Let's make a strong assumption that we know what "the right thing" is, and we just want to test that "we have things right." Otherwise, we'd open a can of worms here. You're free to also cover these, adding the respective complexity.


Functional areas

In each test area, you will find different functional areas, which strongly depend on what your product looks like.

User journeys

There are different user journeys with different touchpoints how your user interacts with your product. 

For example, a simple video player app might have one user flow for free-to-play, another for registration, another for premium top-up, and another for GDPR compliant deregistration as well as various flows such as "continue to watch my last video" or "download for offline viewing". These flows don't care what's happen technically.


Data flow

Take a look at how the data flows through your system as certain processes get executed. Every technical flow should be consistent end-to-end.

For example, when you buy a product online, the user just presses "Purchase", and a few milliseconds later, they get a message like "Thank you for your order." The magic that happens inbetween is make or break for your product, but entirely irrelevant for the user. In our example, that might mean that the system needs to make a purchase reservation, validate the user's identity and their payment information, must conduct a payment transaction, turn the reservation into an order, ensure the order gets fulfilled etc. If a single step in this flow breaks, the outcome could be an economic disaster. Such tests can become a nightmare in microservice environments where they were never mapped out.


Architectural structure

Similar to technical flow, there are multiple ways in which a transaction can occur: it can happen inside one component (e.g. frontend rendering), it can span a group of components (e.g. frontend / backend / database) or even a cluster (e.g. billing service, payment service, fulfilment service) and in the worst case, multiple ecosystems consisting of multiple services spanning multiple enterprises (e.g. Google Account, Amazon Fulfilment, Salesforce CRM, Tableau Analytics).

In architectural flow, you could list the components and their key partner interfaces. For example:

  • User Management
    • CRM API
    • DWH API
  • Payment 
    • Order API
    • Billing API

Architectural flow is important in the sense that you need to ensure that all relevant product components and their interactions are covered.

You can simplify this by first listing the relevant architectural components, and only drilling down further if you have identified a relevant hotspot.


Code behaviour

At the lowest level is always the unit test, and different components tend to have different levels of coverage - are you testing class coverage, line coverage, statement coverage, branch coverage - and what else? Clean Code? Suit yourself.

Since you can't list every single behaviour of the code that you'd want to test for without turning a Coverage Matrix into a copy of your source code, you'll want to focus on stuff that really matters: do we think there's a need to do something?


Bringing the areas together

There are dependencies between the areas - you can't have a user flow without technical flow, you won't have technical flow without architectural flow, and you won't have architectural flow without code behaviour. Preferably, you don't need to test for certain user flows at all, because the technical and architectural flows already cover everything. 

If you can relate the different areas with each other, you may learn that you're duplicating or missing on key factors.


Section Weight

For each row, for each column, assign a value on how important this topic is. 

For example, you have the user journey "Register new account." How important do you think it's to have the happy path automated? Do you think the negative case is also important? Does this have impact on other components, i.e. would the user get SSO capability across multiple products? Can you deal with 1000 simultaneous registrations? Is the process secure and GDPR compliant? Are users happy with their experience?

You will quickly discover that certain rows and columns are "mission critical", so mark them in red. Others will turn out to be "entirely out-of-scope", such as testing UX on a backend service, so mark them gray. Others will be "basically relevant" (green) or "Important" (yellow).

As a result, you end up with a color-coded matrix.

The key discussion that should happen here is whether the colors are appropriate. An entirely red matrix is as unfeasible as an entirely gray matrix.


A sample row: Mission critical, important, relevant and irrelevant



Reliability Level

As the fourth activity, focus on the red and yellow cells and take a look at a sliding scale on how well  you're doing in each area and assign a number from 0 to 10 with this rough guidance:

  • 0 - We're doing nothing, but know we must.
  • 3 - We know that we should to more here.
  • 5 - We've got this covered, but with gaps.
  • 7 - We're doing okay here.
  • 10 - We're using an optimized, aligned, standardized, sustainable approach here.

As a consequence, the red and yellow cells should look like this:

A sample matrix with four weighted fields.

As you would probably guess by now, the next step for discussion would be to look at the big picture and ask, "What do we do with that now?"


The Matrix

Row Aggregate

For each row, figure out what the majority of colors in that row is, and use that as the color of the row. Next, add up all the numbers. This will give you a total number for the row. 

This will give you an indicator which row is most important to address - the ones in red, with the lowest number.

The Row Aggregate


Column Aggregate

You can use the same approach for the columns, and you will discover which test type is covered best.  I would be amazed if Unhappy Path or Compliance turn out to have poor coverage when you first do this exercise, but the real question is again: Which of the red columns has the lowest number?


The Column aggregate



After conducting all the above activities, you should end up with a matrix that looks similar to this one:


A coverage matrix

Working with the Matrix

There is no "The right approach" to whether to work on improving coverage for test objects or test types - the intent is to start a discussion about "the next sensible thing to do," which totally depends on your specific context.  

As per our example, the question of "Should we discuss the badly covered topic Performance which isn't the most important thing, or should we cover the topic of architectural flow?" has no correct answer - you could end up with different groups of people working hand in hand to improve both of these, or you could focus on either one.



How-To Use

You can facilitate discussions with this matrix by inviting different groups of interest - business people, product people, architects, developers, testers - and start a discussion on "Are we testing the right things right, and where or how could we improve most effectively?"

Modifications 

You can modify this matrix in whatever way you think: Different categories for rows or columns, drill-in, drill-across - all are valid.

For example, you could have a look at only functional tests on user journeys and start listing the different journeys, or you could explicitly look at different types of approaching happy path tests (e.g., focusing on covering various suppliers, inputs, processing, outputs or consumers)

KISS 

This method looks super complicated if you list out all potential scenarios and all potential test types - you'd take months to set up the board, without even having a coversation. Don't. First, identify the 3-4 most critical rows and columns, and take the conversation from there. Drill in only when necessary and only where it makes sense.




Tuesday, December 1, 2020

Refactor, Rewrite or Redesign?

Have you ever heard the term "Refactoring Sprint?" There is a lot of confusion around what Refactoring, Rewrite und Redesign are, what they entail, as well as how and when to use them - so here's a small infographic:



Refactoring

Refactoring is all about small changes to the way code has been written to reduce technical debt as the code and our understanding thereof grows. It's a continuous exercise and should always be an integral part of the work. Safe refactoring is a low-risk exercise that happens within the scope of existing unit tests, which should verify that indeed we haven't made any unwanted changes.

Examples of Refactoring are: Extracting a variable, renaming a function, consolidating statements or moving code from one place to another.

It should go without saying that in professional development, Refactoring should never be a separate task and should never be done in big batches. Your Definition of Done should include that no essential refactoring is left to do.


Rewrite

We rewrite code when we discover that a current code segment is no longer the best way to achieve a specific purpose. Depending on your test coverage, rewrites are a high-risk exercise that are the basis for future risk reduction.

Examples of Rewrites are new and better means (e.g. a new technology) to do the same thing, or stumbling upon a legacy that is difficult to work with (e.g. complex methods without tests).

Smaller rewrites (e.g. method rewrites) should be done autonomously as regular part of the work when the need is discovered and when developers have sufficient confidence that they can do this safely.
Larger rewrites should be planned with the team, as this exercise could consume significant portions of time and may need additional attention to failure potential.


Redesign

A redesign must happen when the purpose of an existing component has changed in ways that the current solution is no longer the best possible thing. Redesign is a high-risk and highly time consuming exercise that should be done to get out of a corner. If you have highly malleable code that was well refactored on a continuous basis, redesign should hardly ever be a case unless you have major changes to the business context in which a system is operated.

Examples of Redesign might include moving from batch processing to data streams, changing the database paradigm, or acquiring a new line of business that requires processing different data.

Redesign should always be a planned whole-team exercise that might even break down into a hierarchy of mid- and short term goals. Redesign is a mix of technical and business decision, so it should be aligned with Product.




Wednesday, November 18, 2020

16 misconceptions about Waterfall

Ok, Agilists. It's 2021, and people are still using Waterfall in corporate environments. With this article, I would like to dismantle the baloney strawman "Waterfall" that's always proclaimed as the archenemy of all that is good and would encourage you to think about how exactly your suggested "Agile" is going to do better than the examples I have taken from real-world, professional Waterfall projects.

Here are some things that many agilists may have never experienced in Waterfall projects. I did.


What you think Waterfall is, but isn't

There are numerous standard claims about what's wrong with Waterfall, which I would generously call "statement made from ignorance," although there could be more nefarious reasons why people make these claims. Point is: many of the common claims are not generally true.


Big Bang vs. Incremental

Waterfall doesn't mean that until the determined end date of the project, there will be nothing to show. I remember when I stated that I worked in a 5-year Waterfall project, people from the Agile community called that insane. It's not. We had a release every 3 months. That means that the project had a total of 20(!) Increments, each with its own scope and objectives: Yes - Waterfall can be used to build products incrementally! In corporations, that's actually normal.


Upfront Design vs. Iterative Design

With each delivery, project managers, analysts and business people sit together and discuss the roadmap: which requirements to add or remove, and which priorities to shift. I have once worked in a product that was created in pure Waterfall for almost 20 years, and nobody could have anticipated the use cases delivered in 2010 when the product's first version hit the market back in 1992. Even Waterfall projects can iterate. Especially for enterprise systems.


Death March vs. Adaptivity

When you think that someone sits in a closet and produces the Master Plan, which must be slavishly adhered to by the delivery teams, you're not thinking of a properly managed Waterfall project. While yes, of course, there is a general plan, but a Waterfall plan gets adapted on the fly as new information arises. Timelines, staffing, scope, requirements, objectives - are all subject to change, potentially even on a weekly basis if your project manager is worth their salt.


Fixed Scope vs. Backlog

If you've ever done Project Management, you know pretty well that scope is very malleable in a project. When an organization determines that meeting a fixed timeline is paramount, Waterfall fixed time projects can be pretty similar to Sprints in managing scope. While of course, you get problems if you don't manage the Critical Path properly, that's not a Waterfall problem - it's carelessness. 


Fixed Time vs. Quality

Probably one of the main complaints about Waterfall is that a team delivering on a fixed schedule will push garbage downstream to meet the timeline. Again, that's not a Waterfall issue - it's a "fixed time" issue. If you flex the time, and fix the work package, there's nothing inherent to Waterfall that implies a willful sacrifice of quality.

(And, as a witty side note - if you believe that fixed time is the root cause for low quality: how exactly would Scrum's Sprint timebox solve that problem?)


Assumptions vs. Feedback Learning

Complex systems serving a multitude of stakeholders are incredibly hard to optimize, especially when these stakeholders have conflicting interests. The complexity in Waterfall requirement analysis is usually less in trying to get a requirement right, as it is in identifying and resolving conflicting or wrong demands. The time spent upfront to clarify the non-developmental interferences pays off in "doing the right thing." Good analysts won't be making wild assumptions about things that could potentially happen years down the line. When a release is launched, good Waterfall projects use real user feedback to validate and update the current assumptions


Handovers vs. Collaboration

Yes. There's something like stage-gates in most Waterfall projects. I myself have helped Waterfall organizations implement Quality Gates long before Scrum was a thing. But it's not inherent to Waterfall - otherwise it wouldn't have been a thing in the early 2000's. Also: don't misunderstand gates. They don't mean that an Unknown Stranger hands you a Work Package which you will hand over to another Unknown Stranger at the next Gate. What typically happens: As soon as analysts have a workable design document, they'll share it with developers and testers, who take a look, make comments and then meet together to discuss intent and changes. Good Waterfall organizations have collaboration between the different specialists whenever they need to.


Documentation vs. Value Creation

A huge misconception is that "Waterfall relies on heavy documentation" - it doesn't, depending on how you operate. Heavy documents are oftentimes the result of misfired governance rather than caused by the Waterfall approach itself. It's entirely feasible to operate Waterfall with lightweight documentation that clarifies purpose and intent rather than implementation details, if that's what your organization is comfortable with. Problems start when development is done by people who are separated from those who use, need, specify or test the product - especially when there's money and reputation at stake. 


Process vs. Relationships

As organizations grow large, you may no longer have the right people to talk with, so you rely on proxies who do a kind of Telephone Game. This has nothing to do with Waterfall. A good Waterfall Business Analyst would always try to reach out to actual users, preferably power users, who really know what's going on and build personal relationships. As mutual understanding grows, process and formality becomes less and less important, both towards requesters and within the development organization - even in a Waterfall environment.


Resource Efficiency vs. Stable Teams

There's a wild claim that allegedly, Waterfall doesn't operate with stable teams. Many Waterfall organizations have teams that are stable for many years, in some cases, even decades. Some of the better ones will even "bring work to the team" rather than assigning work to individuals or re-allocating people when something else is urgent. The "Resource efficiency mindset" is a separate issue, unrelated to Waterfall.


Big Batch vs. Flow

Kanban and Waterfall can quite well coexist. Indeed, I have used Kanban in a Waterfall setting long before I first heard of Scrum where requirements flowed through three specialist functions, and we had an average cycle time of less than one week from demand intake to delivery. Waterfall with Small Batches is possible, and can perform exceptionally well.


Top-Down vs. Self-Organized

I've worked with corporations and medium-sized companies using Waterfall, and have met a lot of Project Managers and Team Leads who have worked in a fashion very similar to a Product Owner: taking a request, discussing it with the team, letting the team figure out what to do how and when, only then feeding back the outcome of this discussion into the Project Plan. Waterfall can have properly self-organized teams.


Push vs. Pull

Whereas in theory, Waterfall is a pure "Push"-based process, the field reality is different. If you have a decent Waterfall team lead, it will basically go like this: We see what work is coming in, we take what we can, and we escalate the rest as "not realistic in time", and get it (de-)prioritized or the timeline adjusted. De facto, many Waterfalls teams are working pull-based.


Overburden vs. Sustainable Pace

Yes, we've had busy weekends and All-Nighters in Waterfall, but they were never a surprise. We could anticipate them weeks in advance. And after these always came a relaxation phase. Many people working in a well built, long-term Waterfall project call the approach quite sustainable. They feel significantly more comfortable than they would be under the pressure to produce measurable outcomes on a fortnightly basis! Well-managed Waterfall is significantly more sustainable for a developer than ill-managed Scrum, so: Caveat emptor!


Resources vs. Respect

Treating developers as interchangeable and disposable "resources" is an endemic disease in many large organisations, but it has nothing to do with Waterfall. It's a management mindset, very often combined with the cost accounting paradigm. The "human workplace" doesn't coincide well with such a mindset. And still, the more human Waterfall organizations treat people as people. It entirely depends on leadership.


Last Minute Boom vs. Transparency

Imagine, for a second, that you would do proper Behaviour Driven Development and Test Driven Development in a Waterfall setting. I did this in one major program, delivering Working Software that would have been ready for deployment every single week. If you do this, and properly respond to feedback, Waterfall doesn't need to produce any nasty surprise effects. The Last Minute Boom happens when your development methodology is inapproprate and your work packages are too big, not because of Waterfall.


All said - what then is, "Waterfall?"

"Waterfall" is nothing more and nothing less than an organized, sequential product development workflow where each activity depends on the output of the previous activity.

There are really good uses for Waterfall development, and cases where it brilliantly succeeds. It's incorrect to paint a black-white image where "Waterfall is bad and Agile is good", especially not when equivocating "Agile" to a certain framework.

Proper Waterfall

A proper Waterfall would operate under the following conditions:
  1. A clear, compelling and relateable purpose.
  2. A human workplace.
  3. A united team of teams.
  4. People who know their ropes.
  5. A "facts are friendly" attitude.
  6. Focus on Outcomes.
  7. Continuous learning and adaptation.
  8. Reasonable boundaries for work packages.
  9. Managing the system instead of the people.

All these given, a Waterfall project can have a pretty decent chance to generate useful, valuable results.

And when all the above points are given, I would like to see how or why your certain flavor of "Agile" is doing better.


My claim


I challenge you to disprove my claim: "Fixing the deeper mindset and organizational issues while keeping the Waterfall is significantly more likely to yield a positive outcome than adopting an Agile Framework which inherits the underlying issues."





Tuesday, November 17, 2020

Is all development work innovation? No.

In the Enteprise world, a huge portion of development work isn't all that innovative. A lot of it is merely putting existing knowledge into code. So what does that mean for our approach?

In my Six Sigma days, we used a method called "ICRA" to design high quality solutions.


Technically, this process was a funnel, reducing degrees of freedom as time progressed. We can formidably argue about whether such a funnel is (always) appropriate in software development or whether it's a better mental model to consider that all of them run in parallel at varying degrees, (but that's a red herring.) I would like to salvage the acronym to discriminate between four different types of development activity:

Activity Content Example
Innovate Fundamental changes or the creation of new knowledge to determine which problem to solve in what way, potentially generating a range of feasible possibilities. Creating a new capability, such as "automated user profiling" to learn about target audiences.
Configure Choosing solutions to a well-defined problems from a range of known options.
Could include cherry-picking and combining known solutions.
Using a cookie cutter template to design the new company website.
Realize Both problem and solution are known, the rest is "just work", potentially lots of it. Including a 3rd party payment API into an online shop.
Attenuate Minor tweaks and adjustments to optimize a known solution or process.
Key paradigm is "Reduce and simplify".
Adding a validation rule or removing redundant information.

Why this is important

Think about how you're developing: depending on each of the four activities, the probability of failure, hence, the predictable amount of scrap and rework, decreases. And as such, the way that you manage the activity is different. A predictable, strict, regulated, failsafe procedure would be problematic during innovation, and highly useful on attenuation - you don't want everything to explode when you add a single line of code into an otherwise stable system, which might actually be a desirable outcome of innovation: destabilizing status quo to create a new, better future.

I am not writing this to tell you "This is how you must work in this or that activity." Instead, I would invite you to ponder which patterns are helpful and efficient - and which are misleading or wasteful in context. 

By reflecting on which of the four activities and the most appropriate patterns for each of them, you may find significant change potential both for your team and for your organization, to "discover better ways of working by doing it and helping others do it."


Thursday, November 12, 2020

PI-Planning: Factors of the Confidence Vote

 The "Confidence Vote" is a SAFe mechanism that is intended to ensure both that the created PI Plan is feasible, and also to ensure that people understand the intent behind creating the common plan - what it means, and what it doesn't. Implied in SAFe are two different kinds of confidence vote with slightly different focus.







Train Confidence Vote

The "Train Confidence Vote" is taken on the Program Board - i.e. the aligned, integrated PI plan across all teams. All participants of the PI-Planning are asked to simultaneously vote on the entire plan. Here are the key considerations, all of which should be taken into account:

Objectives: Feasibility and Viability

First, we should look at the ART's PI objectives realistic, and does it make sense to pursue them? Do we have our priorities straight, and are we focused on delivering maximum value to our customer?

High Confidence on PI objectives would imply that these objectives are SMART (Specific, Measurable, Ambitious, Realistic, Timebound) within the duration of the PI.

Features: Content and Scope

Do we have the right features, do all of them provide significant progress towards our objectives, did we pick a feasible amount, and did we arrange them in a plausible order and are the right people working on them? Is the critical path clearly laid out, and is the load on the bottleneck manageable?

High Confidence on Features would imply that everyone is behind the planned feature arrangement.

Dependencies: Amount and Complexity

If we have too many dependencies, the amount of alignment effort throughout the PI will be staggering, and productivity is going to be abysmal. You also need to manage external dependencies, where the Train needs something from people who aren't part of the Train, and you need to pay extra attention when these people didn't even attend the PI-Planning.

High Confidence of Dependencies would imply that active efforts were made to eliminate as many dependencies as possible, and teams have aligned already how they deal with the inevitable ones. When people either mark a high amount of dependencies without talking about them, or you feel that some weren't mentioned, that should reduce your confidence drastically.


Risks: Quantity, Probability and Impact

Risks are normal part of life, but knowingly running into disaster isn't smart. Were all the relevant risks brought up? Have they been ROAM'ed properly? How likely will you be thrown off-track, and how far?

When you consider risks well under control, that can give you high confidence in this area - when you feel like you're facing an army of gremlins, vote low.


Big Picture: Outcomes and approach

After looking at all the detailed aspects, take one step back: Are we doing lots of piecemeal work, or do we produce an integrated, valuable product increment? Do we have many solitary teams running in individual directions, or do we move in the same direction? Do you have the impression that others know what they're doing?

When you see everyone pulling on the same string and in the same direction, in a feasible way, that could give you high confidence. When you see even one team moving in a different direction, that should raise concerns.


Team Confidence Vote



During your breakout sessions, the Scrum Master should frequently check pulse on team confidence. The key guiding question should be: "What do you need so that you can vote at least a 4, preferrably a 5, on our team's plan?"

Your team plan is only successful when every single member of the team votes at least a 3 on it, so do what it takes to get there. It's entirely inacceptable for a team member to lean back comfortably and wait for the team confidence vote and then vote 2, they should speak up immediately when they have concerns. Likewise, it's essential that teams have clarified all the issues that would lead them to vote low on their team's plan before going into the PI confidence vote.

When your team can not reach confidence, do not hesitate - involve Product Management and the RTE immediately to find a solution!

Here are the factors you should consider in your team confidence vote:

Objectives

Does your team have meaningful objectives, are you generating significant value?

Understanding

Do you really understand what's expected from you, how you're contributing to the whole, what makes or breaks success for you - what the features mean, what your stories are, what they mean, and what's required to achieve them?

Capacity and Load

Do you understand, including predictable and probable absences, how much capacity your team has?  How likely can you manage the workload? Have you accommodated for Scrum and SAFe events? Would unplanned work break your plan?

Dependency Schedule

Can you manage all inbound dependencies appropriately, do you trust the outbound dependencies to be managed in a robust way? What's your contingency plan on fragile dependencies?

Risks

Are you comfortable with the known risks? Do you know your Bus Count, and have you planned accordingly? Do you trust that larger-scaled risks will be resolved or mitigated in time?

Readiness

Right after the PI-Planning, you will jump into execution. Do you have everything to get on the road?



Closing remarks

This list isn't intended to check each factor individually, and it isn't intended to be comprehensive, either. It is merely intended to give you some guidance on what to ponder. If you have considered all these, you probably haven't overlooked anything significant. If you still feel, for any reason, that you can't be confident in your plan, by all means, cast the vote you feel appropriate, and start the conversation that you feel is required.
It's better to spend a few minutes extra and clarify the concerns than to find out too late that the entire PI plan is garbage.

Monday, November 2, 2020

Delivered, Deployed, Done?

While an agile organization should avoid over-engineering a formal status model, it's necessary to provide standards of what "Done" means so that people communicate at an even level. The highest level of confusion arises in large organizations where teams provide piecemeal components into a larger architecture, because teams might define a "Done" that implies both future work and business risk until the delivery is actually in use.

In such a case, discrimating between "Deployed" and "Done" may be useful.


What's Done?

At the risk of sounding like a broken record, "Done means Done," "It's not Done when it's not done" and "You're not Done when you're not done."

That is, when you're sitting on a pile of future work, regardless of whether that pile is big or small, you're not done. This is actually quite important: While softening your DoD gives you a cozy feeling of accomplishment, it reduces your transparency and will eventually result in negative feelings when the undone work comes back to bite you.

As such, your enterprise DoD should encompass all the work that's waiting. Unfortunately, in an Enterprise setting, especially when integrating with Waterfall projects or external Vendors, you may have work waiting for you a year or more down the line. The compromise is that teams put work items into an intermediate, "Deployed" status when the feature is live in Production, and set it to "Done" at the appropriate time in the future.

What's Deployed?

In situations where a team has to move on before taking care of all the necessary work to move to "Done", because there are factors outside their own control, it may be appropriate to introduce an intermediate status, "Deployed." This allows teams to move on rather than idly waiting to do nothing or wasting their energy getting nowhere.

In large enterprise situations, teams often deliver their increments and the following haven't been taken care of yet:
  • Some related open tickets
  • User training
  • Incidents
  • Feature Toggles
  • Business Configuration
  • E2E Integration
  • Tracking of Business Metrics
  • Evidence of Business Value
This status is not an excuse - it creates transparency on where the throughput in the value stream actually is blocked, so that appropriate management action can be taken.


Interpreting "Done" vs. "Deployed."

Let's take a look at this simple illustration:


Team 1

If they would soften up their DoD to equate "Deployed" and "Done", then the business risk is hidden, and it becomes impossible to identify why the team isn't generating value, even though they're delivering. They lose transparency with such an equivocation.
A strict discrimination between "Deployed" and "Done" surfaces the organizational impediment in this team and makes the problem easy to pinpoint.

Team 2

It wouldn't make sense to discriminate between "Done" and "Deployed", because the process is under control and there is no growing business risk. This team could just leave items "in Progress" until "Done" is reached and doesn't benefit from "micromanaging" the status.

Wednesday, October 21, 2020

Where to put the Business Analysts?

A common question that needs to be answered during agile transitions is, "Where do we put the Business Analysts?"

In a traditional project organization, it's quite common that they receive orders from Product / Project Management, create solution designs and hand these over to development teams, this is a poor approach for agile organizations.



Avoid: BA working for PM

We often see that the BA is a go-between for users and Agile Teams, or even for Product Management and Agile Team, both of which are done in the name of efficiency at the expense of quality.

There are numerous highly dysfunctional antipatterns associated with this approach, i.e. things that cause more problems than they solve, including, without limitation:

Antipattern Problem
Works as Requested

When users ask for something suboptimal, that's what they'll get, because developers are unaware of the user's real need, and the Product Owner also lacks the necessary information to acknowledge alternate solution proposals.

Works as DesignedWhen Business Analysts make invalid assumptions about the technical solution, developers will strugge to implement based on their design, since developers are not in a position to challenge the underlying assumptions.
Dysfunctional PO

When a PO gets prioritized, "must-do", fully analyzed designs that need to be implemented, their role becomes dysfunctional. All they can do is "push tickets" and fill in templates of work. The PO's main function is invalidated. 
Product Owners struggle to find purpose and meaning in their work, and in such a setup, it's no loss to eliminate them entirely.

Telephone GameThe amount of information lost when users talk to analysts who talk to product owners who talk to developers is staggering. The amount of communication overhead and productivity loss caused by this setup potentially outweighs the benefits of doing business analysis outside the team.
BottleneckSeparating the BA out as a special function typically makes them a bottleneck. When push comes to shove, incomplete designs are handed to development in a hurry, which often causes more trouble later than the amount of work the BA wasn't able to complete.

Try: BA is part of the Agile Team

An alternative, significantly more agile, approach is to make the BA part of the agile team they're doing analysis for. 
In this setup, the BA is a dedicated member of the Agile Team they're working with - figuring out both the customer needs in the solution, and the developer needs in the design. Their accountability is being a member of the Development Team, contributing towards the Team Goals

In this setup, their job isn't "Done" when a design document is written, but when the user's need is successfully met.

From this position, the Business Analyst supports the refinement and elaboration of business value, interacting with users, not as a go-between, but as a facilitator for developers and users.

Business Analysts also support the decisions of the Product Owner, ensuring that backlog items are "Ready" both quantitatively and qualitatively when there is development capacity to pull these items.

This approach to Business Analysis in an agile setup makes BA expertise equally, and potentially even more, important to the success of development as in a traditional setup, without creating any of the above-mentioned antipatterns.


The HR issue

The main challenge that has to be addressed when talking about the new role of the BA is an HR issue:
From a practical perspective, the BA gets "degraded" from being pretty high in the hierarchy of the organization all the way "down to" team member. This often causes resistance, making it look like the way of least resistance is to opt for the prior choice, which creates irreconcilable conflict within the development organization.

As such, there are multiple items to clarify before we can make the BA as valuable as possible in an agile setting:

Focus area Problem
HR

Address potential HR impediments that make it within a BA's own best interests to not be part of an agile team, but rather outside. Such impediments include salary, career progession and other incentives set by the organization.

Line Organization

In organizations where BA itself is a separate silo, work with the BA's manager to ascertain them that making the BA's part of the Agile Team does not diminish their importance or influence. The main thing that needs to change is that BA's now receive their work from the team.

BA Individuals

Work with the BA's themselves to ascertain them that being part of an Agile Team is, in fact, not a degradation and to discover and resolve the personal issues they have with the new, different ways of working.


Wednesday, October 14, 2020

How to resolve the Planning Conflict

There's a seeming conflict that might become apparent: On the one hand, "delivering early and often" is an Agile principle - and on the other hand, "deferred commitment"  is a Lean principle. This might create a planning conflict. How do you resolve it?



Planning purpose

First, we must realize that there are different reasons for planning. 

Within the team / development organization, the purpose of planning is to make sure that we have a realistic goal and we all understand what we need to do.

Towards our customers, the purpose of planning is different. They don't care who does what, and when. They care when they'll get what.

Towards other stakeholders in our organization, the purpose of planning is again different. They need to know when they're expected to contribute, and when they can get a contribution from us.


Defer commitment?

First thing to realize here is: "Who are we committing towards?" Are we committing inside the teams to maximize value - or are we committing a certain date or scope to our customers or stakeholders?

Customers and stakeholders plan their actions based on our commitment, so in this regard, we shouldn't commit anything that we can't keep, because otherwise, we may be creating significant, non-value-adding re-planning and organizational overhead. Broken customer commitments will damage our trust, If you can deliver without having to give a commitment, that's better, and even when you need to commit so that others can plan, try to commit as late as possible.

The trust issue

"Deferred commitment" requires trust. 
  • Trust in the team, that they do the best they possibly can. 
  • Trust in the organization, that they enable the team to succeed.
  • Trust in the customers and stakeholders, that they want the team to succeed.
Asking for early commitment hints at a lack of trust. The solution is not to enforce strict commitment, but to build trust. In a trustful relationship, deferred commitment shouldn't be an issue for anyone.


Deliver early?

Inside our team, we plan to deliver as much value as early as possible, because "you got what you got". To minimize risk and to avoid falling for Parkinson's Law, we should avoid keeping activity buffers that allow us to "do extra work", and we should remember that early delivery is our feedback and learning trigger.


Resolving the conflict

There is no conflict.
We work towards two separate events: 
The team's first point of feedback, and the point of business completion.
  • The first date is the earliest point in time when we can get feedback. It allows us to validate our assumptions and to verify our product. There is no guarantee of completion or finality. For internal planning, we look for earliest possible dates, so that we can reduce risk and deliver value quickly.
  • The second date is the latest point in time when we can complete a topic. We communicate this date as late as possible and try to avoid having to lock it in if we can. This minimizes the danger of expectation mismatch. For external communication, we look for latest feasible dates, so that other people's decisions don't rely on our unvalidated assumptions.

Addendum

Based on the feedback that "deferred commitment" in a Lean context is referring to decisions:
The statement "Scope X will be completed at date Y" consists of two decisions made today: a decision about what, as well as a decision about when. If there is no need to decide this today, we should not.
We try to avoid locking in a decision that has a significant risk of being wrong.
That is not the same as "we won't deliver any value until some undefined date in the future." It means, "we can't guarantee you the value until we know more."

Thursday, October 8, 2020

Why you shouldn't set Predictability targets

While predictability is pretty important for management, and having an agile team/organization deliver in a predictable manner is certainly aspirable, setting targets for predictability is a terrible idea. And here's why:


Blue lines are reinforcing, red lines negative reinforcement, orange items are under the teams' control.


As soon as we set Predictability as a target, we create a reinforcement loop that rewards teams for spending more time planning and less time actually developing. The same reinforcement loop also destroys the very thing called "agility", i.e. the flexibility of "responding to change over following a plan."

As a consequence of both reinforcement loops initiated by setting a predictability target, we reduce the ability to actually deliver business value. As such:

Developers who work towards a Predictability objective do so at the expense of Business Objectives.

If that's not yet clear enough, let me put it bluntly:

Predictability targets hurt your company's bottom line.

 

Hence, I will strongly advise to resist the urge of using predictability as a KPI and setting predictability targets on software development.


Tuesday, October 6, 2020

Dependencies aren't created equal

We often hear the proposal that dependencies are bad, and we should work to remove them.
Is that true? Well - it ... depends. Pun intended.

First and foremost, "dependency" can mean many things, and some of these are good, others bad, and some conditionally good or bad - and yet others, simultaneously good and bad.

Before I get into that, let me define what I mean with "good" and bad:

Economic perspective

(You can skip this section if you can simply live with the terms "Net Benefit", "ROI" and "TCO".)

Looking from an economic perspective, anything we do or decide results in an outcome.
This outcome has a benefit - the Return on Invest (ROI).
The way to reach this outcome, its maintenance and failures on the way all have a cost - the Total Cost of Ownership (TCO).

From that, we can determin the Net Benefit: ROI - TCO.

Whether we measure that in Currency (Dollars/Euros) or in whatever unit we see fit (e.g. time, opportunities, satisfaction etc.) is irrelevant. Every outcome we get has such a net benefit.

As such, we can always compare two alternatives based on their net benefit. The outcome with the "better" net benefit is the preferable option.

For example:

Either we can:

  1. do the yard work this Saturday, and have a clean yard.
    Net benefit = clean yard.

  2. hire a gardener at $500 to do the yard work, and go to see the Playoffs.
    Net benefit = clean yard + seen playoffs - $500. 

Now, whether you prefer option 1 or 2 depends on whether you value attending the playoffs more than the $500, so often, the Net Benefit may have some subjective component that's hard to quantify in money. Regardless, a rational mind would choose whatever option has the highest subjective net benefit.

Why do I bring up this example?
Because option 1 has no dependencies, and option 2 has a hard dependency on the gardener and on the money. If you prefer option 2, you deliberately choose to have a dependency in order to increase your benefit.


Good and bad dependencies

With the concept of net benefit on our hand, let us opt for two generic alternatives:
Option A, which has dependencies and Net Benefit A as "ROI A" - "TCO A".
Option B, which no dependencies and Net Benefit B as "ROI B" - "TCO B".
To at least keep it somewhat simple, we'll assume "risk" (of whatever type) is part of the TCO.
This gives us a proper basis for discussion.

Type What Example
Bad dependency Net Benefit A < Net Benefit B,
TCO A > TCO B and
ROI A < ROI B.
Being unable to meet market demands
because a component vendor can't keep up.
Good dependency Net Benefit A > Net Benefit B and
TCO A < TCO B.
Software company buying servers
instead of building chipsets from raw silicone.
Potentially good dependency Net Benefit A > Net Benefit B but
ROI A  < ROI B.
Letting a business partner serve a market segment
you're not an expert in.
Potentially bad dependency TCO A < TCO B but
Net Benefit A < Net Benefit.
Preferred Supplier process that simplifies procurement
but means you can't get everything you need.
Mixed dependency Net Benefit A > Net Benefit B and
TCO A > TCO B.
Outsourcing sales to an agency
that takes a commission.

That clarified, we definitely want to avoid or eliminate "bad dependencies", but we may actively look for "good dependencies".

What happens in practice, though, is bias: we inflate ROI and discount TCO to make dependencies look more rosy than they are. We do this by making overly optimistic assumptions about potential payoffs (ROI) and dismissing negative factors that don't meet our concept. Of course, that is a trap we easily fall victim of and we should make sure we draw a realistic image on both ROI and TCO, preferrably erring on the side of caution.

Now, let's take a look and interpret those dependencies:

Bad Dependencies

A bad dependency is definitely something you want to eliminate. You win by removing it, hands down.

Good Dependencies

Don't be fooled, good dependencies are both everywhere and at the same time rarer than you think!

Our specialized service society is full of them. They help you make the best value of your own scarce resources, first and foremost, time. We could hardly live if we wanted to have no such good dependencies. You depend on a farmer to produce food, you depend on cars or public transportation which you haven't built, and so on. The modern world wouldn't be able to exist without them.

To eliminate such good dependencies would throw us back into the Stone Age.

After this, let's take off the rosy glasses and face reality: willfully induced good dependencies can turn sour any time. To use an illustrative example, let's say you're a non-tech company who decided to outsource their IT to an IT Service provider. Then, the market turns and your most profitable segment becomes online sales - all of a sudden, your ability to meet market demands depends on an external agent who now dictates the rate at which you're earning money!

Potentially Good Dependencies

The world isn't simply black and white, and TANSTAAFL. Partnerships are a great example of a potentially, though not universally, good dependency. In our above example, the dependency is good if you partner with someone who relieves you of some burden to allow you to achieve more. The dependency is bad if you partner with someone who allows you to achieve more, but at a price you can't really afford.

(An extreme example here might be a model who becomes rich and famous through a casting show, but is forced into a contract that ultimately makes them sacrifice family relationships and health.)

Potentially Bad Dependencies

When you can have something simple cheaper than something better, that's good if you are content with the outcome. It's bad if you aren't. Since most of the time, people want the best possible outcome, these types of dependency are usually bad.

Mixed Dependencies

These increase your business risk, and are a gambit. The bet is that by taking the dependency, you will get a better outcome. If the bet wins, this dependency is good. On the other hand, if the bet loses, this dependency is bad. Sometimes, it's both good and bad at the same time.

Taking our example of a sales outsourcing, you earn less money from your core business that is now running via agency, and you earn more money from business you otherwise couldn't have acquired. So, it's a good dependency as long as you have more extra business than the commissions, and a bad dependency otherwise.


How does all of that map to "Agile"?

Great question. All of these are business decisions. Oftentimes, it's business people who bring dependencies into a process in an attempt to optimize something. Take, for example, customer service introducing ZenDesk, or Marketing deciding to run Salesforce. Or a manager who decides to offshore the development for some of the systems integrated into a complex IT landscape.

In any of these scenarios, all of a sudden, you end up with dependencies outside your sphere of control. The question, "how do we best create the business outcome" becomes "how do we deal with the technical dependencies?"

If we leave the local optimization goggles of pure Software Development, there may be tangible business benefits which make the induction of these dependencies not just plausible, but a genuine, positive business case. 
For argument's sake, let's ignore the fact that most business cases look better than reality and deal with the fact that all of a sudden, there's a dependency.

While certain Agile framework proponents religiously advocate the removal of dependencies, the case isn't as clear-cut as it may seem.

Simply exposing a dependency doesn't mean we can, or even should, remove it.

We have to make a clear case that a discovered dependency is bad.
When we can provide transparent evidence of a bad dependency, removal should be the only logical conclusion.

If, in our investigations, we discover that from a systemic perspective, that a dependency is actually good, we would be optimizing locally in an attempt to remove it. Managing it becomes inevitable.

And that's where tools like SAFe's dependency map are more helpful than the Agile dogma of "dependencies are bad."



Tuesday, September 29, 2020

Tool Tip - The Change Compass

 If you're looking for a simple, yet efficient tool to help you create transparency on how your team is developing and growing, you may want to try the "Change Compass" in one of your Retrospectives:


The x-dimension is labelled "Calm" vs "Stormy", and the y-dimension "Improving" vs. "Worsening". Like this, it becomes a 2D mood-meter on where your team thinks they're heading.


How to use

Give people one minute to put a marker on the compass in whatever place they feel is most appropriate. Any place on the compass is fine, because we're mostly interested in a direction.

  • "Calm" would mean "Nothing is really happening, there's no speed."
  • "Stormy" would mean "We're in for a rough ride, faster than we can control."
  • "Improving" would mean "Things are getting better"
  • "Worsening" would mean "Things are getting worse"

Directional Patterns

When looking at the results, you don't need to interpret every single note. Instead, try to look for some key patterns.
The list of patterns here are the most common ones you can spot, although there may be others.

Making Progress

Our "True North" is positive change at a sustainable pace.

A healthy team undergoing healthy change would place their markers on the top third of the compass, preferably close to the center.

Too far on the left means we're not moving as fast as we could, while too far on the right means we may be overburdening the team, potentially making change an impediment in and of itself.

Comfort zone

When everyone feel that we're neither getting worse nor better, and there's no motion, people operate in their comfort zone.

This is good after a long, bumpy ride, but shouldn't be a permanent condition: There's no change without change, so this might trigger a discussion about where we can improve.

Change Fatigue

Sometimes, people have the impression that we're moving fast, but nowhere.

In this case, it's pretty important to have a conversation about what we're trying to accomplish.

We're losing 'em

When you see this, the team is torn. Some people feel the change is moving too slow, others feel things are moving too fast and/or in the wrong direction.

This would be a great time to have a conversation about what causes the different perceptions. Maybe the people who are uncomfortable see something the others don't - or vice versa?

No clear direction

With this pattern, it's almost irrelevant whether the points are on the left, center or right of the compass.

The key realization here is that people are disoriented - they don't know where they're headed, or what the outcome will be.

Maybe we lack transparency of what's going on?

Business as usual

People are in motion, at a comfortable pace, but nothing is really getting better.

Are we not even trying to change, or are we trying and see nothing coming out?
What do we need to change, so that things actually improve?

Change Theater

We're moving, but not where we want to go.

The further people move their markers to the bottom, the more likely you should stop and entirely reconsider whatever you're doing.

Dealing with the results

As long as you're "making progress", there's not much to discuss, and the exercise could be over within minutes.
In any other case, you need to start an conversation about what the results mean for your team.

The outcome of a Change Compass session should be that you:
  • pick up speed when you're in a calm
  • slow down when you're in stormy conditions
  • Correct course when you're worsening
  • Reconsider when you're not improving
In any case, it creates transparency on how your team thinks about your improvement efforts.

Alternative uses

The Change Compass can also be used in stakeholder environments, i.e.:
  • to conclude a Sprint Review (to see how customers think about the team's progress)
  • in management sessions, to see how the "outside-in" perspective on the team is
  • to provide management feedback when the team feels things are blocked or going wrong

Tuesday, September 8, 2020

Agile Risk Management

 "How do we manage risks in an agile setting?" - Agile risk management differs widely from classic project risk management, because we have a different sphere of concerns. Whereas classical projects are mostly concerned with risks related to delivering within TQB (Time, Quality, Budget), an agile environment forces us to consider a much broader sphere of risks:

Agile Risk Management

Agile risk management

There are some general notes on agile risk management that may be unfamiliar or in contrast to the expectations of classic project organizations:

Risk overview

Teams (and in SAFe, ARTs) should be able to have an insight into their most relevant unresolved risks at any time. The assumption is that "if there is no risk overview, there are no relevant risks." Scrum Masters ensure that both of these statements are true.

At a minimum, risks are identified as such to be visible. Some organizations prefer to add additional information, such as severity, occurrence, detection (the FMEA process) and countermeasures, which is only relevant if you have no means of addressing them swiftly.

Risks are treated like regular work items, and move into the backlog as "potential work to do". The teams decide whether new risks are added to the Sprint Backlog or to the Product Backlog - or to the Program Backlog, in a SAFe context.

Live updates

Agile risk management is a constant exercise of evaluating available information and anticipating probable events that should be avoided, then inspecting and adapting the determined course of action. Risk management in an agile setting happens during every event (and during daily work, as well) -

  • Lean-Agile Budgets identify financial risks
  • Refinement identifies product risks
  • Planning and Dailies identify process, delivery and organizational risks
  • Reviews and Demos identify delivery and product risks
  • Retrospectives and I+A workshops identify all kinds of risks
  • PMPO Sync identifies product and delivery risks
  • Scrum-of-Scrums (SOS) identifies organizational and process risks

As such, the risk overview is a more volatile and shifting artifact than even the teams' plan, and potentially more ephemeral than the product backlog itself.

Avoid Single Points of Failure

Organizations are most resilient if there is no single points of failure, hence risk management becomes a collaborative exercise. It's better to work with focal areas than rely on a clearly delineated role-responsibility mapping. It is expected that everyone contributes to naming and resolving relevant risks, from the most junior developer to the most senior manager.

Scrum Masters facilitate team risk resolution and create transparency in the surrounding organization on those outside the team's control. Ideally, the team would be able to deal with their own risks even without requiring the Scrum Master to take action.

Risk resolution

Just as every day is an opportunity to identify risks, we should deal with them before they materialize, ideally right when it is exposed. It's the team's decision on how they will prioritize risks against other work.

Risks outside the teams' sphere of control should be adressed via proper channels. In a SAFe setting, the first channel for a team is usually from PO to PM or from SM to RTE who would involve management if required.



Focus Areas

Teams succeed by collaborating and helping each other out, so let's not go into the "sorry not my desk" antipattern. Still, people do different things, and so pay more attention to different aspects. In this context, let us examine the different focal areas of the common agile roles.

Product People risk focus

First and foremost, product people must take care that we build the right thing, and have the resources (both time and money) to do so. Hence, they must be aware of and deal with:

Financial risks

Financial risks are cash flow related. We must secure an initial investment that allows us to develop something, and in order to continue, we need ongoing funding. Within an enterprise, that's typically budget. On the free market, that's revenue, which is usually generated through sales or subscriptions. So the Product people need both the means to understand the current financial situation and to forecast the future, and thereby extrapolate wherein the risks lie.

Common financial risks include exploding license fees, stakeholders withdrawing their support, customers leaving or price wars on the market, but also pretty mundane stuff like equipment breaking down or the need for a bigger office as the team grows.

To manage financial risks, the product owner must understand their cash flow.

Since financial risks are entirely out of scope for classic Kanban, XP and Scrum, there tend to be no standard team-level mechanisms for dealing with them.

Lean-Agile Budgets are one of many SAFe mechanisms to keep the predictable financial risks away from the team.


Product risks

Product risks are related to success of the entire endeavour. We need to build the right thing right at the right time, and ensure we adopt to changing circumstances as rapidly as possible. Hence, "release fast, release often" is essential to minimize product risk. 

Common product risks range from building the wrong product over building it in a way people don't like all the way to the product becoming obsolete or unmaintainable. Hence, product risks can be located in the past, with consequences ranging far into the future. This requires constant attention both to the inner dealings of the team and the outside environment.

To manage product risks, it's essential to look beyond the backlog, into the product itself and the product's market. Metrics can serve both as lagging and leading indicators to discover and track their manifestation.

Refinement workshops, Reviews (System Demos) and Planning events should reveal product risks, both within the team and at scale.


Team risk focus

Autonomous teams have control over both their process and their delivery. Hence, the risks associated to these must be born by them:

Delivery risks

Delivery risks range from not delivering anything all the way to delivering the wrong thing or something that doesn't work, hence they include the huge topic of quality-related risks. Since delivery risks have a price tag attached, called the "cost of failure", these risks consist of more than pure impact - they also have a huge element of choice: we take calculated delivery risks when the benefit outweighs the cost.

Common delivery risks include defects, incidents and problems (in ITIL terms), not being in control over the product's technical quality, not testing right or enough as well as releasing something immature, but also failure to gather fast and reliable feedback that could expose and thereby prevent other risks.

Delivery risks must be managed, but often become visible in real time. They are hard to pre-plan.
If we see any delivery risk in the future, we should devise a strategy to start minimizing it today. Retrospectives address how we dealt with past delivery risks. Team dailies should reveal current delivery risks, and teams should actively collaborate to deal with them.
If they can't be dealt with immediately, they should be made transparent on the Team Board.


Process risks

Usually, a process risk manifests as an impediment towards doing the right thing swiftly. In larger organizations with strict regulations and massive dependencies, process risks are often outside the teams' sphere of control, which, in the worst case, may lead the idea of self-organization and team-level agility ad absurdum. 

Common process risks include handovers, bottlenecks, delays, but also technical aptitude.

Teams are expected to manage process risks within their own sphere of control. Where they lack this control, the Scrum Master must often intervene to drive risk resolution. 

Team Dailies often reveal immediate process risks.
Retrospectives are often the best point in time to deal with long-term risks.

In SAFe, we use the Scrum-of-Scrums and the I+A workshop to address cross-cutting process risks. Additionally, we can resort to Communities of Practice to deal with practice-related risks.


Scrum Master risk focus

One of the Scrum Master's core responsibilities is revealing the things nobody else sees - and that includes risks of all form and types. Sometimes, the Scrum Master actively has to examine risks from the other roles' focus to identify need for change. Additionally, there's a group of risks that will oftentimes require action on behalf of the Scrum Master:

Organizational risks

Organizational risks, in this context, are risks induced by the way the team and its environment is organized. Such risks occur within the team, at the interaction points between the teams and the surrounding enterprise, as well as imposed from outside the teams' immediate horizon. Most of them occur at friction points, that is, where incompatible parts of an organization collide.

Typical organizational risks include asynchronity, miscommunication, bottlenecks, communication gaps or inavailability of individuals as well as mismatching goals or priority conflicts. There is usually a positive correlation between organization size and organizational risks.

Two core activities where organizational risks are identified are Planning and Retrospectives. In SAFe, that would include PI-Planning and I+A workshops where the SM should feed in both input and track relevant action items.