Tuesday, February 24, 2015

Becoming great at answering the wrong questions

"The right answer depends on the question"

This proverb has long guided me and I always strive to ask better questions.

This week, I evaluated some devops tools and felt oddly reminded on how similar the effectiveness of DevOps is to a product owner's story writing.

The wrong user story doesn't deliver the business value you're looking for.
Here are some of the highlights for poorly written user stories that I encountered while evaluating tools. Marketing makes these stories look like they are what you'd really want in your DevOps stack. But let's take a closer look.

As a DevOps, I want to see way back in time when certain error messages occurred so that I know how long it has been going on.

Yikes! If there are unknown error messages in the logs which nobody has been taking care of for months on end and nobody realizes - or cares - then either it's not important or your organization is in a mess. You don't need a tool that permits you to discover errors a year back in time, you need a strategy to effectively deal with problems as soon as they occur!
Let me rephrase that story for you: "As a DevOps, I want to be notified immediately when something abnormal happens so I can analyze and resolve the problem before it hits the customer!"

As a DevOps, I want to have a one-click solution which connects stack traces in log files with the corresponding source code segment so that I can analyze the effect on the user easily.

Good luck on that. I personally prefer to have robust, well-tested software, If your software is throwing significant amounts of stacktraces, then the effect on the user isn't your primary problem.
Let me slice this one for you: "As Sysop, I don't want any stack traces in production logs, because I don't want to operate a system that doesn't do what it's supposed to do." - and - "As a Developer, I want a software test that has sufficient path coverage so that I won't run afoul of undefined behaviours during refactoring." - and - "As application user, I expect the software to behave as intended in each and every circumstance so that my business outcome is predictable."

As a Security expert, I want to be notified in real time when user data is compromized.

Sounds great. But what are we really talking about here?  Why do you even care to know that in real time? There are predictable, controllable ways in which user data can be compromized. In this case, your strategy shouldn't be to introduce realtime notification, but prevention. Any minute you're investing in data theft detection would be significantly better invested in hardening your systems.
Let me rephrase that one: "As a data security officer, I want all known security loopholes closed so that we don't even have data security incidents!"

As a DevOps, I need data for every possible failure scenario so that in case of incidents, I have enough data.

Tools will give you a false sense of control when you ask the wrong question. Your question is not to have data for everything, but to eliminate root causes - so that you won't need data.

Your job as a DevOps is not to hoard a boatload of data that can't be humanly understood, your job is to eliminate operative risks within the software so that the essential monitoring data can be reduced to a manageable and comprehensible level.


Tools are not solutions unless you first define the problem.
Your DevOps Strategy will fail unless you first learn to ask the right questions.

No comments:

Post a Comment