Fail Fast, Move On: May 2014

Friday, May 30, 2014

Wrong assumptions - Take nothing for granted!

As a Product Owner, my primary responsibility is for the "What" of the team, not for the "How".

This week, we had a completely new product and so we formed a new team and did what we always did in projects ...

Communicate the product vision, set up the backlog, define Working Agreements - and get the action on!

We're doing 1-week sprints, because they work very well for our company and those weekly Retrospectives are really valuable.

Anyway. So, we had our Review and the team had done a good job, completed 14 Story Points and done a lot of work that will help them reach a higher velocity later on.

Or, so I thought.

After the Review was over, I casually asked "So, how many unit tests did you write this week?"

... deafening silence ...

Guess what?

The team hadn't had an explicit Working Agreement to write unit tests - and so they didn't!
It's a good thing we have weekly sprints: The Retro will take care of this.

Lesson Learned

Never, ever take anything for granted - you never know who interprets things how!
Better be explicit about the Engineering Practices that will be employed in the project than accumulating technical debt.

Wednesday, May 28, 2014

Continuous Integration that isn't

I have seen people who run Jenkins and claim that they have realized CI.

Actually, one of those people was me, a couple years down the line.
Before you ask, no, it wasn't in an agile team.

Configuring a Jenkins is easy, and getting the Jenkins to pull a repo, create a build and deploy it on a server also is. But that is not Continuous Integration!

So, here is my story:
We were in a customer project and pretty much nobody had heard of CI, only one guy had an idea "Why should we manually deliver software to the testers? There are tools out there, let's do CI"

And so we did.
The Jenkins was up and running. Whenever the team manager pressed the button, the software got deployed to the Test Environment and the downtime for a new deployment was reduced from a couple hours to less than 5 minutes. As the testers knew when the "Deploy" button was being pressed, usually Friday EOB, test was absolutely not affected by any downtimes.
A big benefit!

However, usually the first thing happening after the deployment - something didn't work.
Like, for instance, the web server. Or the Messaging Queue. Or the database. Or the business processes. Or anything else that used to work in the past.

Lesson learned

Continuous Integration is so much more than automating the build/deployment chain and reducing outages to a couple minutes.

CI shouldn't result in outages in the first place. You can use techniques like Parallel Deployment to attain Zero Downtime for patches.

Also, you haven't understood CI until you have tons of other techniques in place.

CI that works on a button press is missing the point: CI should be continuous, not scheduled and manual.

If you are delivering dozens of new features with each build, your CI has a very slim chance of locating the error source. Make sure your CI is integrated in a way that each feature has at least 1 build on it's own.
If you don't have unit test coverage, CI isn't even worth being called such. Move towards high unit test coverage before bothering with CI.
If you don't have automated regression and smoke tests, CI is more likely to cause harm than help. Invest into test coverage and link the automated tests to the CI server.
If you don't have a rapid feedback cycle into your development process, CI has no benefit. Make sure the developer who committed the failing build gets informed and acts within minutes.
If you aren't acting immediately on failed builds or errors in the deployment, that's not CI, it's a mess!
STOP and fix if the tests fail. Don't proceed to code further based on harmful code!
If you are spending a full week on manual integration tests, you may have a CI tool, but you don't have CI!
Create automated integration tests that can be run as part of the CI process. If you can't eliminate manual components, rethink your approach!

CI isn't about having a tool - it's about having the processes and engineering practices that allow you to deliver rapidly and often!
Real CI comes with a mindset "Let's rather have one too many than one too few deployments".

Monday, May 26, 2014

Done undone

As a Product Owner, it is my key responsibility is ensuring that the customer is satisfied with the product.

As the SCRUM Team, it is our key responsibility that we get the story "Done" in a way that the customer will also accept.

Recently, I had a bad surprise when running a new team.

We all work for the same company, but we usually don't work together in the same constellation.

So, we dug head in, at the beginning of the sprint, we defined the backlog.

As PO, I defined the stories and priorities. Then, my team did the Work Breakdown and defined the tasks required for each story.

During the Sprint Review, I couldn't accept a single story as "Done", despite the fact that the team assumed the story was done.

What had happened?

The tasks all got done, but nobody paid attention to the story itself! After all tasks were executed, the story was so complicated that even the developers had to ask each other how to use it - UX was terrible!

A customer was present in the Review and he simply asked "How do you expect me to do this?"

Sorry for the team, I couldn't accept it as "Done", because I personally understood "Done" as "We are not going touch this again. We can tear up the story card because everything is finished".

The failure?

I assumed that the team's Definition of Done was the same as mine, but the team had a DoD for themselves which considered a story "Done" if all tasks were completed - not when the results are usable by the customer!

Lesson Learned

Make sure that the Definition of Done is not subjective.

Take your time in the first sprint. Remove all subjectivity and unspoken expectation from the DoD.

Everybody must be on the same boat. The team, the PO and the Customer should all have the same understanding of the team's DoD.

Make certain that before the first story gets implemented, everyone knows and understands the team's DoD in the same way.

Wednesday, May 21, 2014

The worst possible Performance metric

Developer performance is not easy to measure.
Why is this? Because a developer's primary objective should be to find the overall simplest feasible solution towards unsolved problems (or at least, "un-implemented").
However, time and again, there are non-technical project managers who try to do it.

There is an infamous telltale project where allegedly one million lines of code had to be written within one month, but the developers overperformed - producing even one and a half million lines!
Wow, what a great result!*

Refactoring is a technique primarily focused on eliminating code complexity, therefore increasing readability, maintainability and improving overall design.

Story time:

One of these days, my team was challenged with automating a business process.
Occasionally, the customer would ask how we were doing. So, one glorious day, they asked very specifically, "How many lines of code did you write today?"
It was probably the worst possible day to ask this question.
The entire team had written maybe 10 additional lines of code - but deleted roughly 200!
So, at the end of the day, the "lines of code" metric was 190 in the negative!

It actually took a while to explain to our customer why they should still be paying for this ...

The refactored code eliminated a performance problem.
It also implemented 2 different user stories from our backlog.
And in all that, we increased the flexibility of the current code base way beyond the customer's need - with no extra effort!

Lesson Learned

Never, ever let anyone measure developer performance in "lines of code". It is not a success metric.
Don't even go for "tasks done" or "amount of user stories completed", these are all deceptive!

The only metric that should be applied to software development is "outcome".
And that one is incredibly tough to quantify.
In the end, all it means "How much better is the software fit for it's intended purpose now?"

Tuesday, May 20, 2014

Mocked loophole: Failure to test unit integration!

We recently had a project where we had to experiment with Unit Testing in a Procedural Environment.
Being familiar only with tests in an Object Oriented environment, it was quite tough to figure out how to properly conduct unit tests.

For testing a function, we did what we usually do: mock every external function call.

So, our code effectively looked like this:

function X
{
if Y($1) is "true" then echo "Yes"
else echo "No"
}

X.test
mock function Y { return "true" }
assertEquals "X in good case" "Yes" X(1)
mock function Y { return "false" }
assertEquals "X in bad case" "No" X(2)

Y.test
assertEquals "Y with good result" "true" Y(1)
assertEquals "Y with bad result" "false" Y(2)

Extra credit to those who already sit back laughing, "You fools, this obviously had to go wrong!" ...

Guessed what happened?

We had done some refactoring to Y in the meantime, and in the end, the unit tests for Y looked like this:

Y.test
assertEquals "Y with good result" "Yes" Y(1)
assertEquals "Y with bad result" "No" Y(2)

Yes, we had changed "Y" from returning "true"/"false" to returning "yes" / "no"!
Of course, the refactoring and TDD made sure that Y was doing what it should be, and we simply assumed that regression tests would catch the error on X - guess what: they didn't!
Because we had always mocked the behaviour of Y in X, there was no such test "Does X do what it's supposed to do in real circumstances?"

Lesson Learned:
If the function works in context, it does what it's supposed to do - but if the function works in isolation, there is no guarantee that it works in context!

We changed the way of writing unit tests as follows: "Rather than use the most isolated scope to test a function, prefer to use the most global scope possible without relying on external resources".

Saturday, May 10, 2014

Work Done doesn't matter

It was a small company which just decided to transition towards SCRUM.

The team I coached was highly competent, they actually did a good job. I was serving them as a SCRUM Master and I actively engaged in daily business, conducting story-related admin activity as well.

SCRUM was really good for the team: Impediments surfaced left and right, we started resolving year-old stuff and really tuned up the velocity quickly.

In the first Review, I took the liberty of inviting all the relevant stakeholders.

Here is how the Review went:
Everyone just gathered in front of the SCRUM board and reported which tasks were "Done".
Nobody sat at a computer, and withing 5 minutes, the first attendants were already fiddling with their watches and phones.

The team was not capable of producing "visible results", and even if the results were visible, they were only talking about them rather than demonstrating them.

My lesson:
A team is still focused around Tasks and Work may be applying SCRUM, but is focused on the wrong deliverable.
In traditional management, reporting the "Work Done" is very important. We neither report about how hard and/or much we worked, nor do we deliver "work".

Our result is working stuff. For developers, that's the new software product. For server admins, it may be a piece of hardware where the developers can now install the product. For a marketing team, it may be the new product's homepage.
But for nobody, it's a bunch of completed task cards.

Friday, May 9, 2014

Versioning Failure

It was many years ago, when I first was introduced to the marvels of a Version Control System when working as a developer for an Enterprise Support Platform.
My customer was using the PVCS for release management - I had never heard of automated versioning before.

I love automation, and I loved the things the PVCS could do for me.
However, I quickly grew weary that after pretty much every couple lines of code, I had to do the following:

Add modified files to the repo
Do a diff to verify the changes
Commit the changes
Publish to baseline

Whenever I run a manual activity multiple times, my first thought is "Automate this". So this is what I did.
It was very easy to automate. Always the same commands, always in the same sequence - so I just scripted it!

Then came this glorious Friday. It was my last day of work before vacation.
Everyone else had already left the office.
I wanted to complete this one last task. It was trivial, one single line of code.
So I implemented the change, did my tests, ran my script and took off.

On Monday morning, I got a phone call "What did you do to our baseline? EVERYTHING is gone!"
What?

Took a while to figure out my "push script" went rampant and committed every single item in the project as zero-byte file.
While I was on vacation, it took the rest of the team half a day's work to clean out the entire mess I had unwittingly created.

Probably I took this one harder than the team, but here's
My lesson:
I now understand why version control software does not provide "one-step push" for changes. No automation can understand what the intention of your change was.

Not everything that can be automated should be automated.
Keeping the "brain-in-the-loop" is often the only way to eliminate accidents.

And this is why I no longer believe in "full automation".

Thursday, May 8, 2014

Are you smart or stupid?

Everyone wants to appear "smart", nobody wants to appear "stupid".
In this blog, the author describes why we should actually dare to be stupid.

Creativity actually requires sometimes doing things which have a high risk of failure.
working in a domain which is planned to the very last detail, where every process is stiffly defined and formalized, there is no more room for creativity.

But only by breaking out of known habits do you have the chance to make marvelous discoveries.

Did you know that Penicillin was only discovered because Alexander Fleming was so stupid as to forget closing his Petri dish full of bacteria samples?

An accident saved millions of lives!

Proud to Fail

Whether you are working in an Agile environment or not, there are small and big failures every day.
As long as we are humans, we are not omniscient, and therefore, will fail.

I remember a story I read somewhere, many years ago:
In a company was a young manager who was in charge of a $2m project. He made a severe mistake in the planning, the project failed. Others asked the CEO "Why don't you fire him?" - to which he replied "Why should I fire someone into whose education I have just invested $2m?"

As Agile practitioners, we should live in an environment devoid of coverups and blame-games. We should have the courage be open and honest with our shortcomings, without fear of reprisal.

The difference between a wise person and a fool is not that the wise person never failed.
Wisdom means learning from their failures - and improving.
Even better, we can use our own failure in order to help others improve!

I, am glad to have worked in such an environment for years.
In this blog, I want to share stories about failure and the lessons I have learned.

Pages

Friday, May 30, 2014

Lesson Learned

Wednesday, May 28, 2014

Lesson learned

Monday, May 26, 2014

Lesson Learned

Wednesday, May 21, 2014

Lesson Learned

Tuesday, May 20, 2014

Saturday, May 10, 2014

Friday, May 9, 2014

Thursday, May 8, 2014