Pages

Friday, October 30, 2015

SysOps and Development: An Antipattern

Developers develop, Sysops operate. Obvious.
Well, that's classical thinking.

The consequence is that problems do not get resolved on time: Users suffer, value is lost and the reputation of the IT department is tarnished.


Obvious problems often don't get resolved in a timely manner - the loser is: everyone!

What you see above is real data taken from an organization where some Scrum teams are responsible for creating features and another team is responsible for operating the platform.

Just taking this example, the biggest current issue is an encoding problem - something that happens in the "real world", but not in a controlled development environment: Continuous Integration was successful, a feature hit the market - and boom!

Separate Responsibilities

In the example above, developers are busy churning out new features while SysOps are busy with fire-fighting.
Taken from the chart, it's been 3 days since the problem started - 3 days where SysOps are constantly stressed while developers are busy with stacking more features on top of existing problems.

XP proclaims "A Red Master is developers' Priority 1" --- well, whose priority 1 should a Red Production be? SysOps or everyone?
In a classic Kanban system, when the production line goes down, it should stop the entire machinery until the issue is resolved. This also holds true for agile teams: If anywhere in the organization, there is a problem, stop. Fix.
Don't go on producing more problems.

Perverse incentives

Developers are appreciated for delivering new features, while SysOps are appreciated for keeping the platform stable. Consequently, Devs prefer to work on new stuff rather than fixing existing stuff. Unfortunately, the organization as a whole suffers. Not rowing in the same direction can't yield any better results.

The very fact that people have different incentives here implies is a problem with autonomy and self-organization!
You must align the incentives of everyone working on the product, not only "development team".

Definition of "Done"

Regarding developers higher for delivering a new feature than making an old feature stable yields the problem above: "Just add a new story into the Product Backlog and we'll deliver it with the next sprint." - while this is a classic approach to Agile Software development "There are no bugs, missing Acceptance Criteria are new stories" and Scrum's "The Sprint is Closed towards modification".
This is a severe misunderstanding: Can you actually consider a story "Done" when it causes problems to the customer?

When you see that there's an operative problem with a new feature, it means that in your retrospective, you should put your DoD under scrutiny.


DevOps: A solution

As the agile proverb goes "You build it, you run it."
As long as developers do not feel the operational pain, they are not interested in removing it. Likewise, as long as SysOps are unable to fix the root cause of a problem in source code, they become apathic towards code-related problems.
Moving both the operative problem as well as the solution space into the team's sphere of responsibility, developers will take an interest not only in the code they write and how it meets the acceptance criteria, but also how the code actually performs in the real world.

Organizationally, this can be done as simple as moving a SysOp into the development team and then holding the team accountable for events on the productive platform.
The Developers become Dev-Ops who will come up with innovative solutions to detect, analyse and prevent problems. A DevOps can solve issues in a fashion that would be completely unthinkable for a SysOp who is confronted with a "build ready" black box software.




Side note
The above screenshot was taken from Medullar, towards which I, personally, am completely biased. I am one of the core developers of this easy to use, open source, cloud-based, universal monitoring and problem solving solution which was designed to require minimum levels of intrusion and resource capacity while offering maximum flexibility.
Medullar allows you not only to detect and analyse problems in any kind of environment. It provides an API and UI to remedy them in real time. It even comes bundled with it's own test framework, so you don't need anything else to improve your operative performance.

No comments:

Post a Comment