Sunday, December 19, 2021

The Agile Tragedy of Commons

There's a socioeconomical dilemma called, "Tragedy of the Commons" which effectively means, "In unregulated systems, local optimization may become so extreme that the system becomes unsustainable."
And here's why you need to understand it in order to understand "agility at scale."

Before we get started, though, let me explain the dilemma:


The Tragedy of Commons

Imagine there's a huge, pristine green pasture. The lord of the land has decreed that everyone is free to use this pasture for shepherding their flock.

The first shepherd who arrives finds a vast green pasture, and the flock is happy to graze on the fields.

Soon afterwards, another shepherd arrives, and their sheep graze happily on the lush pasture as well. Both shepherds are happy, as their sheep have ample food to eat.

In the excellent conditions, the flocks multiply.

As the flocks grow, there is no longer an overabundance - the sheep of the two shepherds begin competing for food. The first shepherd's sheep had more time to multiply, and the second shepherd's sheep lack the required conditions to multiply freely.


Both flocks compete over increasingly scarce food: the sheep lack nutrition and are threatened from starvation. The first shepherd feels that the second shepherd's presence has caused this threat to their flock. The second shepherd considers the first to be using their unfair advantage of having a bigger flock to drive them off the pasture. Quarrels arise.

The feudal lord settles the dispute by dividing the once lush green pasture into an allotted segment for each shepherd, based on the size of their flock. Both shepherds now have access to less land than they could access before - but each now has full control over their flock.


Should the lord have settled the dispute in this way? Should the lord have found another solution? Take some time to think. What would have happened - had the lord not intervened?


Tragedy of Agile Commons

There are many, and massive, applications to the Tragedy of Commons in the realm of software systems - moreso in scaled environments. In the land of Agile, they're much more visible than in the land of Waterfalls, where the lots are already divided before they exist: Agile teams incrementally and empirically discover their next best move based on where they are today - perfect preconditions for the Tragedy of Commons.

Common Code

Teams who haven't learnt discipline in code will find it highly irritating to see one developer interfering with another developer's code: "my" coding style is always better than "yours."

Quickly, quality problems begin arising on the Common Codebase: quarrels over style, functionality placement, inconsistencies and many others.

As more and more developers enter the scene, the tendency for code silos built around personal preference, coding style, technologies, or even domains increases.

Whereas one team can often be left to freely roam the green pastures of service implementation, the field quickly turns into a brown quagmire when multiple teams all play their preferences, until the code base becomes entirely unworkable.

Once teams build fences around their codebase, Collective Code Ownership becomes a thing of the past, and Component Teams find themselves entertaining a dreadful nightmare of coordination meetings.

Better approaches could be:

  • code conventions
  • linting rules
  • cross-team Pairing sessions
  • Code dojos
  • Continuous Integration / Continous Delivery

- but these things are all topics large organizations struggle with after the code has been divided into silos.

Common Innovation

A green field project is often a great way to introduce new, more powerful technologies that allow teams to do less with more.

As the development organization matures, and standards begin to shape the landscape, new ideas becomes exotic and marginal, struggling to overcome inertia.

Imagine - just for example - introducing agent-based logic into an architecture driven by RPC's and SOAP requests: There will be few takers for such an innovation.

The Tragedy of Common Innovation is that new ideas have to find a really small niche when the field is already taken by old ideas. Many good ideas go extinct before they can catch a hold.

With a constant decline of innovative ideas, organizations eventually find themselves investing massive efforts into servicing outdates ways of working and technologies, incapacitating their ability to deliver high value to the customer in the same ways others do.

Better approaches might be:

  • innovation allotments
  • hackathons
  • innovation sessions
  • innovation champions
  • cross-team collaboration on innovation
  • intrapreneurship

Common Meetings

Have you ever been in a 2-hour meeting with 40 people? Did you ever pay attention to how many people actually speak? Hint: it's most likely not an even distribution.

Small organizations find their meetings very effective, but as more and more people appear on the scene, meeting effectiveness quickly declines. And there's a reason.

In an effective 3-people, 1-hour meeting, every person gets to speak roughly 20 minutes. That's a lot of time to voice ideas, offer feedback and draw conclusions. That's a 33% activity ratio. And everyone has pretty much the same understanding afterwards.

When we constrast this with a 30-people, 2-hour meeting: Simply by dividing clock time, we see that every person gets to speak an average of 4 minutes, while being forced to listen for an average of 116 minutes: The ratio of ideas contributed versus passivity is staggering for each individual - the activity ratio has dropped to a mere 3%! In such a scenario, the tragedy of common meetings becomes that some of the more experienced people take the stage, and everyone else becomes decoration.

Solution approaches might be:

  • focus sessions
  • using a need-to-know principle
  • Law of Two Feet
  • breakout sessions
  • topic ownership

Specialisation also removes the need for everyone to participate in all discussions.

The tradeoff is mainly between not everyone getting firsthand information and people suffering through hours of only marginally relevant meetings. To any solution, there's a downside.

Common Work

A single developer can work on any code at any time, and there will be no unpredicted side effects like merge conflicts caused by others' work. Small teams will usually learn quickly how to coordinate so that they minimize mutual interference.

Without good engineering practice, delivering a larger, integrated piece of software means lots of simultaneous changes in many places. Teams will either get into a Riverdance of constantly stepping on each other's toes, or they will require so much coordination that things get really messy. Of course, the "solution," is - once again - code silos and dependency hell: productivity tanks as risks and delays rise skywards.

Every developer joining an organization that hasn't managed to deal with the Tragedy of Common Work adequately, will make every developer's productivity decline - up to a point where the net productivity gain of hiring additional developers may be negative, i.e. with each new hire, the organization gets less productive overall!

Potential solutions could be:

  • visual dependency management
  • domain separation
  • decoupling
  • joint roadmap planning
  • cyclical synchronisation points
  • communication by code

Now what?

These are just four examples of how the Tragedy of Commons matters a lot in a Scaled Agile setting, and there are a vast number of potential commons.

Regardless of whether an Enterprise is new to agile ways of working or have been doing so for a while: you need to establish overarching rules that mitigate the conflicts, lest you run afoul of the Tragedy of Commons.

The "Tragedy of Commons" is entirely evitable in a holistic system where every participant sees themselves as an integral part of the whole. The solution is coexistence and collaborative conflict resolution rather than competition.

Ground rules address unregulated, harmful growth, a lack of discipline, and myopic actions, but each rule comes with a drawback: it reduces flexibility. While Team Autonomy needs boundaries where these are for the Greater Good, it's important to set these boundaries wisely, and revisiting these where they aren't useful. That can't be done by only one group - in a common system, it has to involve all those whom it concerns.

Which boundaries will you set to prevent your organization from suffering the Tragedy of the Commons, and what is their cost?

Monday, November 29, 2021

The "Planning Tetris" Antipattern

 "We need to utilize all of our Story Points" - that's a common dysfunction in many Scrum teams, and especially in SAFe's Agile Release Trains where teams operate on a Planning horizon of 3-5 Sprints. It results in an antipattern often called "Planning Tetris." It's extremely harmful, and here's why.


Although the above feature plan appears to be perfectly optimized, reality often looks different: all items generate value later than they potentially could - at a higher cost, in longer time and with lower efficiency!


Accumulating Work in Process

Planning Tetris often leads to people starting work on multiple topics in one Sprint, and then finishing it in a later Sprint. It is resource-efficient (i.e. maximizing the utilization of available time), not throughput-efficient (i.e., maximizing the rate at which value is generated.)

That leads to increased Work in Process, which is a problem for multiple reasons:

Value Denial

Just like in the sample diagram above, "Feature 1" and "Feature 2" could each be finished in a single Sprint. And still, Feature 1 doesn't provide any value in Sprint 1, and Feature 2 has no value in Sprint 2. So, we lose 1 Sprint of marketability on Feature 1 (our highest priority) - and on Feature 2 as well:
A perfect example how utilizing the team makes the value come later!

Loss of money

Imagine now that every feature costs less than it's worth (which it should, otherwise it wouldn't be worth developing) - and you see that the "saved" efficiency of having worked on features 3 and 4 before finishing feature 1 costs the company more money than the added benefit .

Efficiency loss

You may argue, "different people are working on the features, so there's no multitasking."
Yes - and no. What is happening?
Sprint Planning for Sprint 1 has to discuss 3 features: 1,3 and 4. This means that the whole team is discussing three different topics, (none of which will be delivered in that Sprint.) The same happens in Dailies and Review. And, potentially at a source code level as well. The feature interference may also bloat up the complexity of technical configuration, deployment processes and the like.
The team becomes slower, hence less efficient.

Adding needless risk

In statistics, there's a phenomenon called "the high probability of low probability events." Let me explain briefly:  There's an infinite amount of almost infinitely-unlikely events, but unfortunately, high infinity divided by low infinitiy is still a number close to one: Something will happen. You just don't know what, and when, so you can't prepare or mitigate. Since you don't know which aspect of your plan will be affected when a risk hits, you'll always be caught by surprise.
How is that a bigger problem in Planning Tetris than in sequentialized delivery?

Massive ripple effect

When you're working on one topic, and an event hits that affects your entire team, you have one problem to communicate. When the same happens as you're working on multiple topics, all of them are impacted, and you're generating a much stronger ripple effect.

Complex mitigation

As multiple topics are in process, you suddenly find yourself mitigating multiple topics. And that means multiplicative mitigation effort - less time to work, and at the same time a higher risk that not all mitigations are successful. You end up with a higher probability of not being able to get back on track!

Chaotic consequences

Both the ripple effect into the organization and the mitigating actions could lead to unpredicted consequences which are even harder to predict than the triggering event. In many cases, the only feasible solution is to surrender and mark all started topics as delayed, and try to clean up the shards from there.



Prepare to Fail

There's Parkinson's Law - "work always extends to fill the amount of time available." That's often used as an argument to start another topic, because it stops gold-plating and keeps people focused.
But there's also the (F)Law of Averages: "Plans based on averages fail half the time."
The latter makes planning tetris a suicidal approach from a business perspective: it starts a vicious circle.

Predictable failure

Because there's no slack built into planned tetris, the mid-term plan will automatically fail as soon as a single feature turns out more complex than planned. The more features are part of our tetris stack, the more likely at least one of them will fail. And the team will usually get blamed for it. Because of that, we end up with

Conservative estimates

Teams must allocate the slack buffers into their feature estimates to reduce the probability of failure. When a Tetris plan spans multiple Sprints, some feature content may not be "Ready" for implementation during the Sprint when slack would be available - so we end up with Parkinson's Law, the buffered estimates don't reduce failure probabilities. 

Declining throughput

At this point, Parkinson's Law tag-teams with the Flaw of Averages to KO the team: Regardless of how conservative the estimates, the team will still end up failing half the time. The consequence is that business throughput continues to decline (there's an interesting bottom: when a Sprint only contains one feature!) 


Strangulating the team

Let's take a look at the psychological impact of Planning Tetris now as well:

No space for Creativity

I have never seen an organization where Product Management was happy that developers would add "creative spaces" into a Tetris Plan. It's all about churning out feature, after feature, after feature, without a pause, without a break. When one feature is done, another is already in progress. There is no room to be creative.

No space for Growth

The only relevant business outcome in Tetris Plans is usually business value delivered. It ignores that developers are the human capital of the organization, and growing them is growing the organization's ability to deliver value. Especially in the rapidly changing tech industry, not growing equals falling back until eventually, the team is no longer competitive.

No space for Improvement

I often advise that developers should take some time to look at "Done" work to reflect how it could have been done better, and turning that better way into action. With Planning Tetris, that opportunity doesn't exist - another feature is waiting, and improving something that exists is always less important than delivering the next big thing. That often ends in terrible products which are no joy to deal with - for developers and customers alike!



Now ... what then?

The point that Planning Tetris is a terrible idea should be blatantly obvious.
"Now what's the better way then?" - you may ask.

It sounds incredibly simplistic, because it is actually that simple.
  1.  Reduce the amount of features the team is working on in parallel to an absolute minimum. This minimizes blast radius.
  2.  Instead of having people parallelize multiple topics, let "inefficient", "not-skilled" people take easier parts of the work to step up their game. That reduces the impact of low-probability events and gives everyone air to breathe.
  3.  Put slack into the Sprints. The gained resilience can absorb impact. It also reduces the need for buffered estimates, countering Parkinson's Law and the Flaw of Averages. It also gives people air to breathe.
  4.  Agree on Pull-Forward. When the team feels idle, they can always pull future topics into unused idle time. Nobody complains when a topic is finished ahead of time, everyone complains when something turns late. Pull Forward has no ripple effects or chaotic consequences.

Ok, too many words, so TL;DR:
  1. Sequentialize.
  2. Slack.
  3. Pull.
All problems mentioned in this article = solved.

Monday, November 15, 2021

From Standards to Baselines

Many - especially large - organizations are looking for standards that they want everyone to adhere to. The idea is that "standards reduce complexity." Yes and no. Unfortunately, there's a risk that standards create more complexity than they are intended to reduce. 

Let's take a look at the issue by using Scrum as a showcase. Whatever I say about Scrum will also apply to Kanban, DevOps, CI/CD - and many other topics.



The Standard

There's no argument that Scrum is a de-facto standard in the industry for many teams. Many organizations mandate that development teams must use Scrum, and rigorously enforce adherence to a company standard of Scrum. While it's entirely possible to use Scrum in this manner, this entirely misses the point of Scrum: as the mechanics of Scrum are honed to perfection, the core ideas of flexibility and continuous improvement are lost. Teams lose ownership as their way of working is externally imposed.

Teams using Scrum as a standard lose the ability to evolve beyond Scrum. Scrum becomes their mental shutter - they become unable to think in a different way.


Broken Scrum

Teams confined by Standard Scrum often feel that it is far too restrictive. Especially inexperienced teams often suffer from poorly implemented practices, which seem to have no value and just generate overhead for the team. Not being aware of the actual intent, and being unable to discern intent and practice, they proverbially throw out the baby with the bathtub: "Scrum is broken," Scrum is discarded.

Such teams fall below the baseline of Scrum, and they think that Scrum is the problem.


The Baseline

Instead of considering Scrum as the confines within which development must be organized, a team can also perceive Scrum as their baseline. Understanding Scrum as a Baseline means that there's no prescription what you must do or how to do it: it doesn't even tell you that you need to use Scrum. What it tells you is what you must have to be at least as good as a Scrum team could be.

For example - everyone should be able to tell at any point in time what the team's highest priority is.
And there should be closed feedback loops both for decisions and execution.
And the team shoud apply double-loop learning at least in monthly cycles.
And so on.

Now: what's the difference?


From Standard to Baseline

What may sound like a game of semantics makes a massive difference in practice:

Standards create restrictions. Baselines foster growth.

Standard Scrum teams often find themselves rendered ineffective, not because of Scrum, but because the standard stops Continuous Improvement as soon as it touches the rules of Scrum. Baseline Scrum teams aren't concerned with the rules of Scrum - they're concerend with being "at least as good" as Scrum would have them be. A team on a Baseline of Scrum can do whatever they want. There is no rule that the team must use Scrum. Instead, Scrum becomes a list of things that help the team inspect and adapt.

For example, there is no rule such as "we must have Retrospectives." But there is a benchmark - the team should frequently re-examine their ways of working, and actively work on improving their effectiveness. There are other means than a Sprint-end Retrospective to achive this: for example,  extended coffee breaks with deep discussion.

Standards can be measured and assessed based on adherence to a fixed set of rules and practices: it's all about checking the boxes.

Baselines can't be measured in this way. They must be measured based on outcomes: What is the value we're trying to get out of a standard?  And: are we getting at least that?

Measuring adherence to standard leads to improvements primarily focued on compliance. At full compliance, the journey of change ends. Measuring baseline performance is much more complicated. There is no "true/false" answer such as for compliance, and there's an open-ended scale that knows no "perfect" - there's always room for improvement.

 

Now what?

I would advise everyone to look at everything in the Agile domain - Values, Principles, Practices, Frameworks and even tools - as Baselines: "How far do we go beyond that?

If the answer is "Not even there," then have a discussion of how you can up your game. Maybe adopting that thing is a simple, easy way to improve?

However, if the answer is, "Already far beyond," then compliance is off your list of worries. Even if you don't have that thing, you most likely won't need it.

Monday, November 1, 2021

Four key metrics for transformation effectiveness

What matters for a company in an "Agile Transformation?" Well, let me give you my perspective. Here are four key metrics which I would advise to track and improve:


Customer satisfaction

Customer Satisfaction can be both a leading and a lagging indicator. It's leading, because it informs us how likely our next move will grow our business - and it's lagging, because it tells us how well we did in the past.
We can measure it asynchronously with tools like the Net Promoter Score or observing Google ratings. Softer metrics include customer interviews. Modern, technological means include A/B tests, conversion and resubscription rates.
Regardless of which of these you measure: if you see this indicator going down, you're most likely doing something wrong.

A proper change initiative relies on being able to track user satisfaction in some way and use that to inspect and adapt accordingly.

Employee happiness

Employee happiness is a pretty strong leading indicator for potential success: happy staff tend to do everything in their power to help the company succeed, because they like being part of this success. In reverse, unhappy staff are a leading indicator for many other problems that only become visible once they hit.

I'm not a huge fan of employee morale surveys, as depending on the organizational culture, it's weaponized against management, so people always give scores of 100% and bolt for the door at the next opportunity. 
The minimum you can do is measure staff attrition - if your attrition rates are above industry, you're doing something wrong. If you're significantly below, you're probably doing something right.
At a more detailed level, it does indeed help to look at factors like psychological flow, change fatigue, diversity and compensation fairness, although we need to be careful that these are used as indicators for inspection and adaptation, not as the next management KPI to be gamed.
 

Throughput rate

Throughput rate is a leading indicator for capability and capacity: when you know your throughput rate and the amount of work ahead, you can well predict what happens when.

The effectiveness of an organization can be tracked by looking at end-to-end throughput rates of the core value streams. Some examples are the duration from lead to conversion, from demand to delivery, or from order to cash.
Take a look at queues, lack of priority, overburdened employees and unavailable resources. By tweaking these levers, throughput rate can often be doubled or more, without any additional effort or expense.
Although it may seem counter-intuitive: the key is not to get people to "work harder," it is to eliminate wait time by having some people do less work. For that, we must understand where wait time accumulates and why it does so.


Financial Throughput

The proof in the pudding is financial throughput, a lagging indicator. Be wary - it can't be measured by department, and it can't be measured by unit. It also can't be measured exclusively by looking at the cost sheet and the amount of work done.
Financial throughput is the one key metric that determines business success: it's the rate at which we're earning money!
We have two significant levers for financial throughput: speeding up the rate at which we turn investment into returns, and reducing the size of the investments such as to bind less capital. Ultimately, combining both of these is the sweet spot.


How to improve

Customer and Employee satisfaction

These metrics depend mainly on the managerial system of the company: how goals are set, how people are treated. Usually, there's a strong correlation: happy employees make customers happy, and happy customers give positive feedback to employees.
Deming noted in his 14 points that management must work to "remove barriers that rob people of their right to pride of workmanship.

Transparency is one factor, getting out of the way is another. Removing policies that reduce people's ability to do the right thing is yet another. A management committed to quality, that is, fixing things that lead to poor outcomes, is vital here.
And, of course, the ultimate key here is letting people own their process.

Throughput rate

This third metric depends on flow efficiency. 
Note that "flow efficiency" is not "resource efficiency:" A process where people and/or resources operate without slack is usually dysfunctional and will falter at the slightest hiccup. Process flow requires resilience. 
Queues are a universal killer of throughput rate, so avoid queues wherever and whenever possible.

In software engineering, process efficiency is mainly determined by engineering practice: Software Craftsmanship and Continuous Delivery (CD). The prior requires people to know how to develop software using the most appropriate techniques, such as Clean Code practice. The latter requires some tooling, a product architected for CD, a high commitment to quality as well as policies and practices consistent with the intent of CD.


Financial Throughput

The final metric depends on how aligned decision-makers are with their customer base and their organization.

While financial throughput relies on the organization's operative throughput rate, we have to look at which things affect our financial throughput and enable our organization to do more of these - and quicker. And in many cases, that means doing less of the things that have sub-optimal financial throughput. For example, eliminating "failure demand." (work that's only required because something else went wrong.) Or "null objectives." (targets which do not affect these four metrics.)



And how about Agile Frameworks?

Nowhere does this article mention a specific "Agile Framework." This is not an oversight - frameworks are irrelevant, or potentially even harmful, in the discussion of business relevant metrics. They could be a tool in the solution space. That depends on where we come from and which challenges we face.

For example, if we're challenged on engineering practice - we can't solve that with Scrum or SAFe.  Likewise, if we have customer satisfaction issues: Kanban doesn't even consider the topic.

Not even "Agile" is either a relevant means, nor a relevant outcome. Working "Agile" is merely one possible approach that tends to be consistent with these metrics. Where agile values, principles, practices and mindset help us improve on these metrics, they are valuable. But when that is not the case, they aren't worth pursuing.


Sunday, October 3, 2021

Five reasons for having a Definition of Done

"Do we really need to waste our time to come up with a Definition of Done?"  - well: of course, you're free to do that or not. Just give me your attention for a few minutes to consider the multiple benefits you gain from having one.



Before we start
Many people use the term "Done" in reference to their share of the work, as in, "I am done." This leads to multiple, oftentimes inconsistent, definitions of "development done," - "testing done," - "deployment done." While this may be a choice the team makes, customers don't care who specifically is done with their own work. They care when they receive a product - hence:
The Definition of Done is not set for specific phases of the working process
- it refers to product increments that have passed all stages of the team's process.
As such, it encompasses all the usual activities related to an organization's development process and applies to every single product backlog item.

Now - what does a Definition of Done offer?

#1 - A Quality Standard

"But I thought you had performance tested it?" - "No, a performance test was not part of the acceptance criteria!" - "It's obvious that with that kind of performance, the feature is entirely unusable!" - "Then you should have requested us to performance test it!"

Well - in such a conversation, nobody is a winner: the customer is dissatisfied because they don't get what they need, developers are dissatisfied because they just got an extra load of work, and the Product Owner is dissatisfied because they ended up getting no deliverable value.

The Definition of Done aligns stakeholder expectations on product quality.

Should the customer be bothered to re-confirm their quality expectations on every single user story, and should they be required to re-state that this expectation applies for every release? No.

Once everyone is clear that something is a universal quality requirement, adding a simple statement like, "all pages load within less than a second," would make it clear that the team's work isn't done until the customer's demand is satisfied.



#2 - Common understanding

"I'm done." - "When can I have it?" - "Well, I still need to commit it, then we'll produce a build package, then it goes to testing, let's see where it goes from there ..." - "But didn't you say you were done?" - "I'm done with development."

When different people have different mental models of what "done" means, everyone uses the term in the way that is most convenient for them.

The Definition of Done defines how the term, "Done" is used in the organization.

So much organizational waste - from extended meetings, over people trying to do things that can't possibly succeed, all the way to massive management escalations, is attributed toward misaligned use of this short, simple word: "Done."



#3 - Simpler communication

"I'm done." - "Did you do any necessary refactorings?" - "Will do later." - "If it's not refactored, you can't commit it! And did you get your code reviewed?" - "Do I really need to?" - "That's part of our policy. Where are your unit tests?" - "I don't think that module needs tests." - "Dude, without tests, it's un-maintainable! And did you check with the testers yet?" - "Why should I?" - "What if they found any problems?" --- "Sorry to disturb: I saw that the feature is marked as Done. Can I use it yet?" - "Almost." - "No!" - "Okay, I'm confused now, I'll set up a meeting later so you can explain the status to me."

When everyone understands what needs to be done to be "Done" (pun intended) - communication is much simpler - long, deep probing to discover the actual condition of an individual work item become un-necessary.

The Definition of Done simplifies conversations.

Everyone should be clear what it means when someone uses the term "Done" - what must be understood, and what can be understood.

Everyone must understand, "When it's not covered by the DoD - you can't expect that someone did it, especially when it wasn't explicitly agreed beforehand."

Likewise, everyone can understand, "When it is covered by the DoD - you don't need to ask whether someone did it when people say it was done."



#4 - Providing clarity

"I need an online shop." - "No problem." - "Can you store the orders in the database?" - "We know how to do our job!" - "Can you make sure that baskets don't get lost when users close the browser?" - "That makes it much more complicated. How about we do a basic version now, and we'll add basket session persistence later on once you have a solid user base?" - "Can you make sure the shop has good performance?" - "What do you mean, 'good performance?'"

Stakeholders are often unaware what the team normally does or doesn't do and what they can definitely expect from the product. Hence, they may communicate incomplete or overspecific requirements to the team, both of which are problematic.

Incomplete requirements lead to low-value products that lack essential features, oftentimes leading both parties to conclude that the other party is an idiot.

Overly specific requirements, on the other hand, usually lead to sub-optimal implementations and waste effort when there are easier, better ways to meet user expectations than specified by the customer.

The Definition of Done avoids over-specification on items covered in the DoD.
It likewise avoids under-specification for points not covered in the DoD.

Within the confines of the Definition of Done, the team gains freedom what to do and how to do things, as long as all aspects of the DoD are met. It allows customers to keep out of details that the team handles within the standards of their professional ethics.



#5 - Preventing spillover work

"We're done on this feature." - "Splendid. Did you do functional testing?" - "Yup." - "And?" - "3 defects." - "Are they fixed yet?" - "If we'd do that, we'd not meet the timeline, so we deferred them until after the Release." - "But ... doesn't that mean you still have work to do?" - "Not on the feature. Only on the defects." - "But don't defects mean the Acceptance Criteria aren't met?" - "The defects are so minor that they can be fixed later ..."

We see this happening in many organizations.  Unfortunately, there are two insidious problems here:

1. Based on the Pareto principle, the costs of the undone work could massively outweigh the cost of the done work, potentially toppling the product's entire business case. And nobody knows.

2. Forecasting future work is a challenge when capacity is drained in an ill-defined manner. The resulting loss of transparency decreases customer trust and generates stress within the team.

The Definition of Done ensures that there is no future work induced by past work.

The Definition of Done is a protection for the team, in that they will not accumulate a constantly rising pile of undone work which will eventually incapacitate them.

Likewise, a solid DoD protects the business, because there is a much lower risk that one day, developers will have to state that, "We can't deliver any further value until we invest a massive amount of time and money to clear our debt."



Summary

The reasons for having a Definition of Done may vary from team to team, and each person might find a different reason compelling. While it's definitely within the realm of possibility that none of the benefits outlined in this article are meaningful in your context, at least ponder whether the hour-or-two it takes to align on the key points of a Definition of Done are worthwhile the amount of stress you might have to indulge by not having one.

A Definition of Done is not cast in stone - it's up to negotiation, and points can be added and removed during Inspection and Adaptation events, such as team Retrospectives. As long as everyone can agree to a change, that change is legitimate.

If you don't have a DoD yet, try with a very simple one and take it from there.

As a conclusion of this article, I'll even throw in my favorite minified DoD:

No work remaining.

Sunday, August 8, 2021

The Product Owner Role

What concerns me in regards to the Product Owner role: it's so horribly diluted by many organizations that sometimes, practitioners ask me "What's the meaning of my work, and what should I do in order to my job well?"

There's so much garbage out there on the Internet regarding the Product Owner Role that it's very difficult for someone without significant experience to discern what's a proper definition that would help both companies define proper job descriptions, and provide guidance to PO practitioners how to improve. So - let me give you my perspective.


Great Product Ownership

I would classify great Product Ownership into four key domains:


While one of the core responsibilities of the Product Owner is the Product Backlog, it should be nothing more than the expression of the Product Owner's intent. And this intent should be defined by acting on these four domains.

Product Leadership

The Product Owner should be providing vision and clarity to the team, the product's stakeholders, customers and users alike.

Product Vision

Who is better than the Product Owner at understanding what the product is, which problem it solves, why it's needed and where it's going? Great Product Owners build this vision, own it and inspire others to follow them in their pursuit of making it happen.

This vision then needs to be made specific by elaborating long-term, mid-term and short-term objectives - a guiding visionary goal, an actionable product goal and the immediate sprint goal.

Clarity of Purpose

While the Vision is often a bit lofty, developers need substantial clarity on "what happens now, what happens next?" - and customers will want to know "what's in it - for me?" The Product Owner must be crystal clear on where the product currently is, and where it's going next. They must be able to clearly articulate what the next steps are - and what the next steps are not. They need to be able to state at any given point in time what the highest priority is, and why that is so.

The Product Backlog is then the place where the PO maintains and communicates the order of upcoming objectives and content.

Communication

The Product Owner must communicate their product with both internal and external stakeholders. Life is never easy, so they must rally supporters, build rapport with sponsors, and resolve the inevitable conflicts amongst the various groups of interest.

Networking

The product can only be as successful as the support it receives. As such, the Product Owner must build a broad network of supporters, and continuously maintain and grow their influence in their organization - and for the product's market. Keeping a close eye on stakeholder satisfaction and interest, continuously re-kindling the fire of attention drawn to the product is essential in sustaining and fostering the product.

Diplomacy

As soon as multiple people are involved, there tend to be conflicts of interest. Even if there is only one single stakeholder, that stakeholder has choices to make, and may need to resolve between conflicting priorities themselves.

In peace times, the Product Owner builds common ground with stakeholders, so that they are more likely to speak positively of the product.
In times of crisis, the Product Owner understands the sources of conflict, ebbs the waves, reconciles differences, brings people together to work out positive solutions, and mends wounds.

Insight

The Product Owner is the go-to source for both the team and the product's stakeholders when they want to know something about the product's purpose or intent. The Product Owner has both factual knowledge and inspiring stories to share.

Product Knowledge

Caution - the Product Owner isn't a personified Product Instruction Manual, and they don't need to be. Much rather, they should be the people to be able to explain why the product currently is the way it is, and why it's going to be the way it's going to be. They must be able to fully understand the product's capabilities and purpose - and they must be able to convey why these are good choices. 
From a more negative take, the Product Owner must understand the weaknesses of the current product and have ideas how to leverage or compensate these.
And for all of this, the Product Owner should have the domain expertise, market information and hard data to back up their statements.

Storytelling

"Facts tell, stories sell." - the Product Owner's role is to sell the product, both to the team, and to the customers. They should be able to tell a relatable, realistic story of what users want to and/or are doing with the product, what their current pains are, and what their future benefits will be.
"Speaking to pain and pleasure" is the game - touch hearts and minds alike. The Product Owner should be NEAR their users, and bring developers NEAR as well.


Business Acument

The Product Owner's primary responsibility is to maximize the value of the product, by prioritizing the highest value first, and by making economically sensible choices both in terms of obtaining funding and spending.

Value Decisions

There are three key value decisions a Product Owner faces every day:
  1. What is our value proposal - and what isn't?
  2. What value will we deliver now, and what later?
  3. What is not part of our value proposal, and will therefore not be delivered at all?
The question oftentimes isn't whether the customer needs something, but whether they need it so urgently that other things have to be deferred or be ditched.

When anyone, customer or developer alike, asks the Product Owner what is on the agenda today, this week, or this month - the Product Owner must be able to answer in a way that the underlying value statements are clear to all.

Economics

With infinite money and infinite time, you could build everything - but since we don't have that luxury, the Product Owner must make investment decisions - what is a positive business case, what is a negative business case, what can we afford to do - and what can we afford to not do?

The Product Owner should be able to understand the economical impact of any choices they make: More people can do more work, but burn the budget faster. Every feature has an opportunity cost - all other features that get deferred because of it. Fines could be cheaper than implementations, so not everything "mandatory" must be done. These are just a few.
There is often no straightforward answer to "What should we spend our money on this month?" - and considering all of the trade-offs from every potential economic angle before bringing product related decisions to the team or towards stakeholders is quite a complex endeavour.

Economic decisions need to then be transported transparently towards the relevant organizational stakeholders - to team members, who may not understand where priorities come from, to customers who may not understand why they don't get their request served - to managers, who may not understand why yesterday's plan is already invalid.


Given all of these Product Owner responsibilities above, it should be quite clear that the Product Owner must focus and has little time to take care of things that are ...

Not the Product Owner's concern

Three domains are often seen as expectations on the Product Owner, which are actually a distraction from their responsibilities, and putting them onto the PO's shoulders actually steals the time they need in order to do the things that make them a good Product Owner:


Project Management

The Product Owner is not responsible for creating a project plan, tracking its progress or reporting status.

Let's briefly describe how this is supposed to happen:

Planning is a collaborative whole-team exercise, and while the Product Owner participates and provides context, a goal and a sorted backlog as input, they are merely contributing as the team creates their plan.

Developers are autonomous in their work, and the Product Owner should rely on being requested for feedback whenever there's visible progress or any impediments hinder the planned outcomes. If the team can't bear the responsibility of their autonomy properly, that would be a problem for the Scrum Master to tackle. The PO should entirely keep out of the work.

Since Sprint Reviews are the perfect opportunity to inspect and adapt both outcomes and progress, no status reporting should be required. A "gemba mindset" would indicate that if stakeholders are concerned about progress, they need to attend the Reviews, and should not rely on hearsay, that is, report documents. 


Team Organization

The Product Owner is not reponsible for how the team works, when they have meetings or who does what.

When Scrum is desired as a way of working, the team should have a Scrum Master. The worst thing a Product Owner can do with their time is bother with introducing, maintaining or optimizing Scrum - they should be able to rely on having proper Scrum in place.

Team events, such as Plannings or Reviews, help the team do their work, and as such, should be organized by the developers themselves, because only they know when and how they need these. The Scrum Master can support, and the Product Owner should attend - but the PO shouldn't be bothered with setting these up, and most definitely shouldn't run the entire show.

If anyone assigns tasks on a Scrum team, it's the team members self-organizing to do this. Having the Product Owner (or Scrum Master) do this job is an antipattern that will hurt the team's performance. The Product Owner should not even need to know who does what, or when.


Development Work

The Product Owner develops the product's position, not a technical solution. They have a team of experts to do this, and these experts (should) know better than the PO how to do this. That means the PO should be able to keep entirely out of design, implementation and testing.

Product Owners actively designing solutions often fall into the "premature optimization" trap, resulting in poor solutions. The best approach is to have the Product Owner collaborate with developers as needed to get sufficient clarity on how developers would proceed, but to focus fully on the "What" and keep entirely out of the "How."

When Product Owners have time for implementation, the product is most likely going to fail: while they're paying attention to the development, they aren't focusing on what's happening to the product and its customers out in the market.

Product Owners have a team of professionals around them who are supposed to deliver a high quality "Done" Increment. If their team has no quality assurance, the solution is to bring testers in, not to delegate testing to the Product Owner.

Thursday, August 5, 2021

Continuous Integration Benchmark Metrics

 While Continuous Integration should be a professional software development standard by now, many organizations struggle to set it up in a way that actually works properly.

I've created a small infographic based on data taken from the CircleCI blog - to provide an overview of the key metrics you may want to control and some figures on how the numbers should look like when benchmarked against industry performance:



The underlying data is from 2019, as I could not find data from 2021.

Key Metrics

First things first - if you're successfully validating your build on every single changed line of code and it just takes a few seconds to get feedback, tracking the individual steps would be overkill. The metrics described in this article are intended to help you locate improvement potential when you're not there yet.


Build Frequency

Build frequency is concerned with how often you integrate code from your local environment. That's important because the assumption that your local version of the code is actually correct and consistent with the work of the remaining team is just that - an assumption, which becomes less and less feasible as time passes.

By committing and creating a verified, valid build, you reset the timer on that assumption, thereby reducing the risk of future failure and rework.

A good rule of thumb is to build at least daily per team member - the elite would validate their changes every couple of minutes! If you're not doing all of the following, you may have serious issues:

  • Commit isolated changes
  • Commit small changes
  • Validate the build on every single change instead of bulking up

Build Time

The amount of time it takes for a committed change until the pipeline has successfully completed - indicating that the build is valid and ready for deployment into production.

Some organizations go insanely fast, with the top projects averaging at 2 seconds from commit all the way into production - and it seems to work for them. I have no insights whether there's much testing in the process - but hey, if their Mean Time to Restore (MTTR) on productive failures is also just a couple minutes, they have little to lose.

Well, let's talk about normal organizations - if you can go from Commit to Pass in about 3 and a half minutes, you're in the median range: half the organizations will still outperform you, half won't.

If you take longer than 28 minutes, you definitely have to improve - 95% of organizations can do better!


Build Failure Rate 

The percentage of committed changes causing a failure.

The specific root cause of the failure could be anything - from build verification, compilation errors or test automation - no matter what, I'm amazed to learn that 30% of projects seem to have their engineering practice and IDE tooling so well under control that they don't even have that problem at all, and that's great to hear. 

Well, if that's a problem for you like 1/5th of the time, you'd still pass as average, but if a third or more of your changes are causing problems, you should look to improve quickly and drastically!


Pipeline Restoration Time

How long it takes to fix a problem in the pipeline.

Okay, failure happens. Not to everyone (see above), but to most. And when it does, you have failure demand - work only required because something failed. The top 10% organizations can recover from such a failure within 10 minutes or less, so they don't sweat much when something goes awry. If you can recover within the hour, you're still on average.

From there, we quickly get into a hugely spread distribution - the median moves between 3 hours and 18 hours, and the bottom 5% take multiple days. The massive variation between 3 and 18 hours is explained easily - if you can't fix it before EOB, there's an entire night between issue and resolution.

Nightly builds, which were a pretty decent practice just a decade ago, would immediately throw you at or below median - not working between 6pm and 8am would automatically botch you above 12 hours, which puts you at the bottom already.


First-time Fix Rate

Assuming you do have problems in the pipeline - which many don't even have, you occasionally need to provide a fix to return your pipeline to Green condition.
If you do CI well, your only potential problem should be your latest commit, and if you follow the rules on build frequency properly, the worst case scenario is reverting your change, and if you're not certain that your fix will work, that's the best thing you can do in order to return to a valid build state.

Half the organizations seem to have this under control, while the bottom quartile still seems to enjoy a little bit of tinkering - with fixes being ineffective or leading to additional failures. 
If that's you, you have homework to do.


Deployment Frequency

The proof of the pudding: How often you successfully put an update into production.

Although Deployment Frequency is clearly located outside the core CI process, if you can't reliably and frequently deploy, you might have issues you maybe shouldn't have.

If you want to be great, aim for moving from valid build to installed build many times a day. If you're content with average, once a day is probably still fine. When you can't get at least one deployment a week, your deployment process is definitely ranking on the bottom of the barrel and you have definite room for improvement.

There are many root causes for lower deployment frequency, though: technical issues, organizational issues or just plain process issues. Depending on what they are, you're looking at an entirely different solution space: for example, improving technically won't help as long as your problem is an approval orgy with 17 different comittees.


Conclusion

Continuous Integration is much more than having a pipeline.

Doing it well means:

  1.  Integrating multiple times a day, preferably multiple times an hour
  2. Having such high quality that you can be pretty confident that there are no failures in the process, 
  3. And even when a failure happens, you don't break a sweat when having to fix it
And finally, your builds should always be in a deployable condition - and the deployment itself should be so safe and effortless that you can do it multiple times a day.

Thousands of companies world-wide can do that already. What's stopping you?




Wednesday, July 21, 2021

Definition of Done vs. Acceptance Criteria

Let's clear up a very common confusion: What's the difference between "Acceptance Criteria" and the Definition of Done, and which is which?


Sometimes, people ask "what is the Definition of Done for this User Story?"
And then, they proceed to document a feature's Acceptance Criteria like this:
  1. Feature works
  2. All tests passed
  3. Documentation written
To make it short: in this example, AC's and DoD are backwards. Now, let's clarify.



Acceptance Criteria

A backlog item's Acceptance Criteria are stated from the perspective of the item's consumer: what defines, for them, that the item is successful?
I hesitate to say, "user acceptance", because sometimes, backlog items are not intended for the end user. If we ignore that case, though, the Acceptance Criteria could be written like simplified User Acceptance Tests: verifiable statements that can be answered with "yes" or "no", and tested by examining the product itself.

Good Acceptance Criteria would be, for example:
  • I get informed about my current balance
  • I get a warning when my balance is negative
  • I get a notification when my balance has changed
As you see, Acceptance Criteria are things that the product does when the item is completed. They are not things that the development team does to the product.


Definition of Done

The Definition of Done is, as the term says, a definition, of what everyone in the development organization, the Product Owner, as well as stakeholders, should understand, when someone uses the term, "Done." 
The Definition of Done is a set of mandatory activities that apply to every single backlog item. It's a checklist-style itemization of things that the team does to the product in order to call a single backlog item "Done."

An example of a Definition of Done is,
  • Code written
  • All tests Passed
  • Successful deployment to Test environment
  • Mandatory documentation completed
When you have such a definition, it does not make sense to add these points again to every single backlog item.


But ... the DoD doesn't apply to this item!

Well, it's entirely feasible that the amount of effort to complete a specific DoD activity on an item is Zero. That means it's by default "Done", because all the work required is completed with zero work.
Another thing is when you specifically want to make an exception from your DoD for a specific item. Well, as long as it's an exception, you understand and accept the consequences, and everyone including the customers agrees to take this route, that's okay, because transparency is there.
Although I would caution that if it happens too often, you may want to check whether your DoD makes sense.

But ... it's a specific task for this item!

That's okay. Then add it that task to the item. It's still not an Acceptance Criterion, and it doesn't affect the applicability of the overall Definition of Done.



Discerning: AC or DoD?

  Acceptance Criteria   Definition of Done
Describes ...   What the product does   What the team does
Quality of ...   Product from user perspective   Work from process perspective
Applies to ...   Specifically this backlog item   Generally all backlog items
Met when ...   Verified by a test   Agreed inside team

Thursday, July 8, 2021

The wrong professionals

Sometimes, I struggle with teams failing to understand the engineers' responsibility in quality: "I have asked the Product Owner whether we should apply Clean Code practices, and she said, she doesn't need it."

This is already not a conversation that should happen.

Here's a little metaphor I like to use:

When my electrician asks me whether they should insulate the wiring, I have the wrong electrician. 

And that has a number of consequences:

  • Professionals do not negotiate the standards of professionalism with their customer. 
  • Customers expect that the professional brings and adheres to professional standards, which is why they get hired. 
  • Customers are not in a position to judge whether a professional standard is applicable in their context, and asking them to do so shifts responsibility to someone who can't bear it. That itself is un-professional.

So, what does that mean for your Agile team?
  • Be as professional as you can, and continuously improve.
  • Do not ask the customer to tell you what "professional" is.
  • Instead, ask them whether your standards of professionalism satisfy their needs.
  • You can't delegate the responsibility for the quality of your work to anyone else.
    The attempt is already un-professional.


Sunday, June 27, 2021

The Code Review Department

 Let met tell you the story of a company that had problems with their code quality, and solved it the Corporate way ... well, ehum ... "solved."


Poor Code Quality

When I met that organization, their development process basically looked like this:

Developers took their specifications, completed their coding and passed the code for testing. Simple as that. 

After a major initiative failed, the Lessons Learned revealed out that the root cause for failure was the poor code quality, which basically made code un-maintainable, hard to read and difficult to fix defects without adding more defects elsewhere. 

A problem that would require rules and regulation to fix. So, it was decided that Code Reviews should be done. And - of course, people complained that code reviews took time, and if developers would do the Reviews, they would be even slower in getting their work done.


Stage 1: Introducing: the Code Review Department

What would be the Corporate way, except to introduce a new department of specialists dedicated to the specific task? So, they opened the Code Review Department. Let's call them CRD.

To keep Code Reviews as neutral and objective as possible, the Review Department was strictly separated from the others: physically, functionally and logically. They were not involved in the actual development process, and only reviewed the code that was actually delivered.

With the backing of senior management, the CRD adopted a very strict policy to enforce better code quality and defined their departmental KPI's, most notably:

  • Remarks made per reviewer: How many review remarks each reviewer had found, per commit and per month. This allowed the CRD management to see which reviewers were most keen and could best spot problems.
Now, how could this possibly go wrong?

Stage 2: Code Quality improves

Since it's easy to objectively measure this data, the reviewers got busy and lots of review comments were made. Proudly, the CRD management presents statistics to IT leadership, including their own KPI's as well as the obvious metric they can present to measure the software delivered by the Software Development Department (SDD):

  • Code Quality per developer: How many review remarks were made, per commit and month, per developer. This allowed the SDD to put a KPI on the quality of code provided by developers. And, conveniently, it would also justify the existence of the CRD.

With the blessings of IT leadership, SDD Management thus adopted the KPI.

So, things were going well ...

Stage 3: Code Review has a problem

Now, developers aren't dumb people. They aopted Linting and basically got their Review remarks down to Zero within in pretty short time. Now, the CRD should be happy, shouldn't they?

Turns out, they weren't. And here was the problem: The Reviews per reviewer metric tanked. Not only didn't reviewers suddenly fail their quota, CRD management figured out they probably weren't looking in the right places.

So what did the CRD reviewers, not being stupid, either, do? When they looked at code, they screened for patterns they considered problematic and introduced a new rule.

The numbers for Review remarks per reviewer rose again, and management was doubly happy: not only were the numbers of the CRD fine again, reviewers were continuously improving their own work.

Great! Things are even getting better!

Stage 4: Developers have a problem

Developers started to get frustrated. Their Lint rules were no longer working in getting them 100% passed reviews. What was worse, that they found whenever their Linting got updated, code was rejected again, and they needed to figure out the new rule to add to their Lint configuration. Not only did this process consume a lot of time, it distracted from actual development work.

Well, developers caught on that meeting the rules wasn't going to get them 100% reviews any more, so they began introducing honeypot infringements: They made obvious mistakes in their code so that reviewers would remark them, and they'd already have the fix in place right when the review came in.

Everyone happy: CRD met their KPI's and developers were no longer forced to constantly adopt new rules. Except ...

Stage 5: Management catches on

CRD reviewers were content, because they had plenty review comments again until the CRD management started to measure policy vialotations by type, and figured out that developers had stopped improving and were making beginner mistakes again. Of course, they reported their findings to higher management. And thus, a new KPI was born:

  • Obvious mistakes per developer: How many obvious review infringements were made by team, with a target of Zero and published transparently throughout the SDD.

Well, again, developers aren't stupid people. So, obviously, they would meet their KPI. How?

You might have guessed it: they would hide their, ehum, "mistakes" in the code so they were no longer obvious, and then placed bets who could get most of them past Review without being caught.

Guess who won the game?

Stage 6: Code quality deteriorates

The CRD reviewers got stuck in a game of whack-a-mole with developers, who constantly improved their tricks of hiding more and more insidious coding errors, while updating their Linting rules right away when reviewers added a new rule.

Until that day when a developer hit the jackpot by splipping a Zero-Day exploit past Review. 

The CRD management no longer trusted their own Reviewers, so they added peer review reviews and another KPI:

  • Issues slipped past Inspection: Reviews were now a staged process where after review by a Junior Reviewer, a Senior Reviewer would review again to figure out what the first reviewer had missed. Every Reviewer would get a Second-Review Score, and that score would need to be Zero. So, they started looking deeper.

You can guess where this is leading, right?

Stage 6: Code quality goes to hell

Now, with four-eye reviews and a whole set of KPI's, nothing could go wrong any more?

Code Reviewers were doing splendidly. They always had remarks and the department's numbers truly validated that a separate, neutral Code Review Department was absolutely essential. 

So the problem was fixed now.

Well, except one small fly in the ointment. Let's summarize the process from a development perspective:

  1. When developers make mistakes, they are reprimanded for doing a poor job.
  2. When developers make no mistakes, new rules are introduced, returning to (1).

Developers now felt like they were on a sinking ship. It was easier to simply continue making mistakes on existing rules than to adopt new rules. They came to accept that they couldn't meet their KPI anyways. 

Since they could no longer win, they stopped caring. Eventually, the review department was relabeled to complaints department, and nobody took their remarks seriously any more. Developers would now simply add buffer time to their estimates, and called it the "Review buffer".

By now, the CRD was firmly established, and they were also fighting a losing battle: try whatever they might, they got more and more overloaded, because truly necessary review remarks and more and more horrible code got more and more common. They needed to add staffing, and eventually outnumbered the developers.

The Code Review Department became the last bastion against poor code quality. A bulwark defying the storming seas of bad code. 


So, what's your take ... 

is a Code Review Department a good idea?

How would you do it?

Wednesday, June 16, 2021

A day in the life of an Enterprise Coach

 "Michael, what does an Enterprise Coach do?" - it's a good question that people sometimes ask me, and frankly, I can't say I have "the" answer. So, I will give you a small peek into my journal. 

ECDE* (* = European Company Developing Everything) is a  ficitional client.
Like the company, the day is fictional. The described events are real. 

Disclaimer: This day is not representative of what all enterprise coaches do, nor of all the things an enterprise coach does. There is no routine. Every day is different. Every client is different. Every situation is different. Connecting the dots between all the events is much more important than all of the activities combined.


Before we start

"Enterprise Coaching" isn't simple or straightforward. There's often more than one coaching objective to pursue simultaneously, and progress requires diplomacy, patience, tons of compromises and long-term strategic thinking. Some topics can be solved in a single sessions, while others may take a long time to change. It may take years until people understand the things they were told on day 1 of their Agile training.

Whereas I typically coach for effectiveness, sustainability and quality, there's a potentially infinite amount of potential enterprise coaching objectives, including - without limitation - the introduction of frameworks, methods, practices, cultures, mindset and so on. I see the latter as means to an end, not as viable objectives to pursue.

My definition of coaching success is therefore not "X amounts of teams doing Scrum", "Y amount of Scrum Masters certified" or "Z Agile Release Trains Launched." I define success as, "The client has the means (attitude, knowledge, learning, innovation) for achieving what they want."

On the average day, I jump a lot between all levels and functions of the organization, from team member all the way to senior management - from IT over business towards administrative areas - and simultaneous work on short-term as well as long-term topics. 

While I'm trying to "work myself out of a job", it's usually the lack of knowledge and the experience regarding certain concepts or practices that may require me to involve longer and deeper than initially bargained for.


A coach's day

8:00 am - Getting started

I take some time to reflect. Yesterday was an eventful day at ECDE. A critical member in one of the teams just announced they would be leaving, we had a major production incident - and management announced they want to launch a new Agile Release Train. Business expressed dissatisfaction with one of the Release Trains and there are quarrels about funding. Okay, that's too much: I have to pick my battles.

So I choose to completely ignore the head-monopoly issue, the incidents and the business dissatisfaction. I trust the teams that they can handle this: I am aware, I wasn't asked for support.

There are no priorities for coaching in ECDE. My trains self-manage their Improvement Backlogs. I haven't gotten senior management to adopt a company-wide "ECDE improvements" backlog yet, which would create much more transparency about what's actually important.

The Tyranny of the Urgent is ever-present. I have to make time for strategy, otherwise I'd just run after the latest fires. Most of the stuff I came for are long-term topics anyways, but there are always some quick wins. 

So, what are the big roadblocks my client needs to overcome?

Ineffective organization, low adaptivity, lack of experience, and last but not least levels of technical debt that might exceed ECDE's net present value.

I check my calendar: Personal coaching, a strategy session and a Community workshop. Fair enough.


9:00 am - Personal Coaching / RTE

In a SAFe organization, Release Train Engineers (RTE) are multipliers of agile ways of working, practice and mindset within the organization, which is why I spend time with them as much as I can. They often serve as culture converters constantly struggling to protect their Agile Release Train from the continuously encroaching, pervasive Tayloristic, Command+Control mindset in the areas of management not yet participating in the transformation efforts.

With this come hundreds of small challenges, such as rethinking phase gates and reporting structures, decentralization, meeting information needs of developers and management alike, and driving changes to the organizational system to foster self-organization, growth and learning.

Some topics go straight into my backlog because they're over-arching and I need to address these with higher management. For others, we determine whether the RTE can handle these, needs methodology support (tools, methods, frameworks, canvases etc.) or self-learning resources (web articles, books etc.) I clarify the follow-ups and send some links with information.

The RTE role is challenging to do well, and oftentimes is pretty ungrateful. It's essential that the RTE has those precious moments where the seeds of change turn to fruition.


10:00 am - Sourcing Strategy

ECDE has outsourced to many different vendors scattered across the globe. And of course, every piece of work goes to the lowest bidder, so senior managers are looking at a development value stream as fragmented as it could possibly be. The results are underwhelming... "but hey, we're saving costs!"

I'm not a fan of cost accounting, but here I am, discussing cost of delay, opportunity costs, hidden costs, sunk costs and all of the costs associated with the Seven Wastes of Lean, re-writing the business case for the current vendor selection strategy and make the Obvious visible. We can't change long-term contracts on a whim, so we need a strategy. We schedule a follow-up with Legal and Procurement to explore intermediate options.

When you know what you need to do, and can't do it.


12:00 pm - Lunch time

The business dissatisfaction explodes into a spontaneous escalation. The line manager insists the teams  must do overtime to meet the deadline for the expected fixed scope. I politely invite him to an Agile Leadership training. He declines. He demands that we must, quote, "get a proper Project Manager by the end of the month" and ends the call.

One step forward, two steps back. Happens all the time.


1:00 pm - Finally. Lunch.

A Product Owner pings me, because she's unclear about team priorities. Turns out the team wants to apply Clean Code principles, but the PO is concerned about Velocity. While I have my meal, we're having a conversation about the impact of quality upon speed and quantity. She decides to give the team room for improving their engineering practices. We agree to follow up in a month.

I shake my head. ECDE developers still need permission to do a proper job.


2:00 pm - Product Workshop

I'm joining a Product People Community workshop to introduce the concept of a Demand Kanban. I gather some materials, prepare a Mural board and grab a cup of tea. During the workshop, I explain some basic concepts, and we spend most of our time design-thinking some changes to the current process. We create a small backlog of experiments they would like to try.

The "knowledge" these POs got from their certification training is a laughing stock. I do what I can, although this will take a lot more than a handful of workshops.

 

5:00 pm - Let's call it a day.

A Scrum Master spontaneously calls. I'm really happy to have 1:1 conversations with people close to the teams, so I pick up despite my working day being over. Her question is inconspicious. Instead of giving a quick answer, I'm curious what her team tried and learned. I smell a systemic issue of which she only barely scraped the surface.

I suggest that she could run a Topic Retro with her team. She's stumped. For her, a Retro was always a 30-minute, "Good/Bad/Improve" session focused on the last Sprint, so she asks: "How do I do a Topic Retro?" This turns into a two-hour call.

ECDE provides abysmal support for new Scrum Masters. I decide to let it go, because there's a dedicated team in charge of Scrum Mastery. I feel bad for a moment, but my energy is limited.


7:00 pm - Finally, done.

Almost. Now, all I need to do is organize my coaching Kanban, then do the administrative stuff.

I take a look at the board and scratch my head: "Solved two problem today, found five additional problems." I take a few stickies from the "Open" column and move them straight into the trashbin. 

It's almost 9pm when I turn off the computer. I reflect and once again realize that while emphasizing "Sustainable Pace" all the time to my clients, I can't continue those long days forever. I should spend more time exercising.

Tomorrow, I'll do better.




Wednesday, May 5, 2021

Guess why nothing gets done?

There are great lessons to be learned from monitoring systems that directly translate to people. 

Take a look at this example graph:


When a CPU is running multiple tasks, it will optimize its performance to give - based on priority - adequate shares of time to each of its tasks. Some performance is required to operate (System Load) and some performance is required to buffer data in and out to operate the different parallel tasks (Context switching). 

When the "System" Load is high, we typically state that either the hardware is not meant for the operating system or that the Operating System is ill-configured, usually the latter. Every operation invested into the System, and every operation invested into Context is an operation not available to complete any task.


People and machines

People do exactly the same. But how would such a diagram relate to people in organizations? Let's translate.

Every person must divide their working hours into their different assignments, plus everything that accompanies it:


Tasks and Projects

whether something is a machine's task, or a person's task - it's a task. From a different level of abstraction, being assigned to a project is a major task. Working on 2 or 3 parallel projects is considered the norm in many organization, so there's often multiple high level tasks going on.


Context

As people switch between projects, they have the same kind of context switching going on: close the thoughts on project 1, and pick up the thoughts for project 2. 

Additionally, when they pick up project 2, other people will have made progress, so they need to remember where they left off and then learn what happened in the meantime. Until they are ready to do work on project 2, this will have shifted. 

For example, if I work on project 1 on exclusively Mondays and on project 2 exclusively on Tuesdays, each time I pick up these projects, I need to catch up the events and changes of four days. Let's just say that I can do this in a single hour - it still means that my effective progression time has been reduced by 12.5%!


System

Just like a machine running Windows or Linux, our organization has an Operating Model. At this level, it doesn't even matter whether that's Scrum, Waterfall or anything else. The Operating Model requires employee capacity in some form or fashion: routine meetings, the creation of reports, documentation etc., all of these take time away from actually conducting the task at hand.
Let's just take Scrum. A well-run Scrum initiative will take roughly 15% of a full-time, dedicated team member's time for Planning, Review, Retrospectives, Dailies and artifact generation. Other operating models are significantly less effective, with the bad ones easily taking 30-40% of a person's time.
Let's stick to the most effective one: Scrum, at 15%.

While the operating system's load should be mostly detached from the amount of a machine's activities, a Scrum team member assigned to multiple projects can not function properly without attending the different teams' events. As such, 3 parallel projects would already burn 45% of a full-time employee's capacity. And remember: it doesn't get better if the organization's operations are less effective!



Adding it all up

Let's say I work on 3 parallel initiatives, and 45% of my capacity is usurped by the Operating Model.

Another 12.5% are taken by Context Switching.

That leaves roughly 1/3 of my capacity to do actual work on the project.

Given 3 initiatives of equal priority, only 10% of my capacity are dedicated to each single project!


Stopping the multitasking

In a parallel universe, Alternate Me has decided to only pick up a new initiative when an old initiative is completed. Alternate Me simply does not multi-task.

Alternate Me has 0% Context Switching.
Alternate Me spends 15% on the Operating System Scrum.

Alternate Me is free to spend 85% capacity to complete project 1.
When finished, Alternate Me then proceeds to spend 85% capacity to complete project 2, and so on...

Alternate Me who doesn't multitask is 850% faster to complete each project!


If project 1 takes Parallel Working Me 2 months, it will take Alternate Me 1 week.

If all three projects take Parallel Working Me 2 months, I have no results until 2 months have passed, then I have 3 projects to show.

Sequential Working me will have 1 project to show after 1 week - and 3 projects to show after 3 weeks!

Sequential Me can take an entire month of vacation, and still has capacity to complete a 4th project by time Parallel Working Me has completed 3 projects, working full-time.


Why do you run parallel projects?