Fail Fast, Move On: U-Curve Optimization doesn't apply to deployments!

Maybe you have seen this model as a suggestion how we should determine optimum batch size for deployments in software development? It's being propagated, among other places, on the official SAFe website - unfortunately, it sets people off on the wrong foot and suggests them to do the wrong thing. Hence, I'd like to correct this model -

In essence, it states is that "if you have high transaction costs for your deployments, you shouldn't deploy too often - wait for the point where the cost of delay is higher than the cost of a deployment." That makes sense, doesn't it?

The cause of Big Batches

Well - what's wrong with the model is the curve. Let's take a look at what it really looks like:

The difference

It's true that holding costs increase over time, but so do transaction costs. And they increase non-linearly. Anyone who has ever worked in IT will confirm that making a huge, massive change isn't faster, easier or cheaper than making a small change.

The amount of effort in making a deployment is usually unrelated to the amount of new features part of the deployment - the effort is determined by the amount of quality control, governance and operational activity required to put a package into production. Again, experience tells us that bigger batches don't cause less effort for QC, documentation or operations. If anything, this effort is required less often, but bigger batches typically require more tests, more documentation and more operational activity each time - and the probability of Incidents rises astronomically, which we can't exclude from the cost of change if we're halfway honest.

Metaphorically, the U-Curve graph could be interpreted as, "If exercise is tiresome, exercise less often - then you won't get tired so often. The optimum amount of exercise is going to door to receive the pizza order, but rather order half a dozen pizzas at once if the trip to the door is too exhausting, and then just eat cold pizza for a few days."

Turning back from metaphors to the world of software deployment: It's true that for some organizations, the cost of transaction exceeds the cost of holding. This means that the value produced but unavailable to users is lower than the cost of making that value available. And that means that the company is losing money while IT sits on undeployed, "finished" software. The solution, of course, can't be to wait even longer with not deploying, and losing even more money - even if that's what many IT departments do.

As shown in the model, the optimum batch size isn't achieved when the company is stuck between a rock and a hard place - finding the point where the amount of money lost by not deploying is so big that it's worth to spend a ton of money on making a deployment.

The mess

Let's look at some real world numbers from clients I have worked with.

As I hinted, some companies have complex, cumbersome deployment processes that require dozens of people weeks of work, easily costing $50000+ for a single new version. It's obvious that due to the sheer amount of time and money involved, this process happens as rarely as possible. Usually, these companies celebrate it as a success when they're able to go from quarterly releases to semiannual releases. But what happens to the value of the software in the meantime?

Just assuming that the software produced is worth the cost of production (because if it wasn't, why build it to begin with) - if the monthly cost of development is $100k, then a quarterly frequency means that the holding cost is already at $300k, and it goes up to over half a million for semiannual releases.

Given that calculation, we should assume that the optimal deployment frequency is when the holding cost reaches $50k, which would be two deployments per month. That doesn't make sense, however: when 2 deployments costs $50k each per month, then 100% of the budget would flow into deployment - of nothing.

Thus, the downward spiral begins: fewer deployments, more value lost, declining business case, pressure to deliver more, more defects, higher cost of failure, more governance, higher cost of deployments, fewer deployments ... race to the bottom!

The solution

So, how do we break free from this death spiral?

Simple: when you're playing a losing game, change the rules.

The mental model that deployments are costly and we should optimize our batch size to only deploy when the cost of deployment outweighs the holding cost is flawed. We are in that situation because we have the wrong processes to begin with. We can't keep these processes. We need to find processes that significantly reduce our deployment costs:

The cost of Continuous Deployment

Again, using real world data from a different client of mine:

This development organization had a KPI on deployment costs, and they were constantly working on making deployments more reliable, easier and faster.

Can you guess what their figures were? Given that I have anchored you at $50k before, you might think that they have optimized the process maybe to $5000 or $3000.
No! If you think so, you're off by so many orders of magnitudes that it's already funny.

I attended one of their feedback events, where they reported that they had brought down the average deployment cost from $0.09 to $0.073. Yes - less than a nickel!

This company made over 1000 deployments per day, so they were spending $73 a day, or $1460 a month, on deployments. If we calculated the accumulated cost of deployments for the whole period, they were still spending over $5000 for three months' worth of software development. But the transaction cost for each single deployment is ridiculously low.

Tell me of anything in software where the holding cost is lower than 7 Cents - and then tell me why we are building that thing? Literally: 7 Cents is mere seconds of developer time!

With a Continuous Deployment process like this, anything that's worth enough for a developer to reach for their keyboard is worth deploying without delay!

And that's the key message why the U-Curve optimization model is flawed:

Anything worth developing is worth deploying immediately.

When the cost of a single deployment is so high that anything developed isn't worth deploying immediately, you need to improve your CI/CD processes, not figure out how big you should make that batch.

If your processes, architecture, infrastructure or practices don't permit for Continuous Deployment, the correct solution is to figure out which changes you need to make so that you can continuously deploy.

1 comment:

Tim WendtApril 11, 2024 at 8:00 AM
While I appreciate all of the analysis here, you completely missed the point SAFe is making... or should I say, you've came to the same conclusion as the researchers and methodologist at SAFe did. SAFe is arguing AGAINST big batches, they are showing the math involved in finding the SMALLEST batch size. It's right there in the name U-Curve Optimization! You are finding the minimum batch size that optimizes the hold costs and the transaction costs. The math is the math, single piece flow can be the most optimal but only if the transaction costs are zero. Even releasing a single piece of software has some transaction costs so single piece is unlikely to be the most optimal. It is completely unrealistic for most companies to lower transaction costs to zero... remember there is a huge cost associated with lowering the transaction cost to that level that is not needed for most companies. Thanks for the article.

Fail Fast, Move On

Pages

Friday, July 22, 2022

U-Curve Optimization doesn't apply to deployments!