Fail Fast, Move On: October 2018

Wednesday, October 31, 2018

My discovery journey to FaaS

Moving from monolithic systems to autonomous services is more of a mindset challenge than a technical challenge, although depending on the architecture of a system, the latter can already be tremendous. In this article, I want to describe my own discovery journey which changed how I perceived software systems forever.

Introduction

I was working for a client on an Information Discovery project which gathered and evaluated data from a range of other platforms which were basically a Stovepipe Architecture, so we had brittle, incomplete data objects from a wide range of independent systems which all described the same business object. There were two goals: First, link the data into a common view - and second, highlight inconsistencies for data purification. So to say, it was a fledgling Master Data Management system.

Stage 1: The Big Monolith

Although I knew about services, I never utilized them to their full potential. In stage one, my architecture looked something like this:

The central application ran ETL processes, wrote the data to its own local database, then produced a static HTML report which was sent to users via mail.
The good news is that this monolithic application was already built on Clean Code principles, so although I had a monolith, working the code was easy and the solution flexible.

The solution was well received - indeed, too well. The HTML reports were sent to various managers, who in turn sent them to their employees, who in turn sent them to their colleagues who collaborated on the relevant cleanup tasks. This was both a data protection and consistency nightmare - I couldn't trace where the data went and who was working with which version. So I moved from an HTML report to server-side rendering within the application.

Stage 2: Large Components

My application became two - an ETL and a Rendering component, both built on the same database. I had solved the problem of inconsistent versions and access control:

But there was another issue: Not everyone needed the same data, and not everyone should see everything. There were different users with different needs. The Simplicity principle led me to client-side rendering: I would pass the data to the frontend, then let the user decide what they wanted to see.

Stage 3: The first REST API

I cut in a REST layer which would fetch the requested data from the backend based on URL parameters, then render these to the user on the browser:

This also solved another problem: I wouldn't need to re-create the report in the backend whenever I made changes to layout and arrangement. It enabled true Continuous Delivery - I could implement new representation-based user requests multiple times a day without having to do any changes to any data on the server. The cycle time was three to five relevant features - per day, and so the report grew.

Guess what happened next? People wanted to know what the actual data in the source system was. And that meant - there was more than one view.

Stage 4: Multiple Endpoints

Early attention to separable design and SOLID principles once again saved the day. I moved the Extraction Process out of the ETL application and into autonomous getter services:

This made my local database asynchronous with the data users might see, but this was a price everyone was willing to pay - it was enough if the report was updated daily, as long as people had access to un-purified raw data in real time. Another side benefit was that I pulled in pagination - increasing both performance and separability of the ETL process as a whole.

Loose coupling allowed me not only to have separate endpoints for each Getter, it allowed me to completely decouple each getter from a central point: I could scale!

As the application grew in business impact, more data sources were added and additional use cases came in. (To keep the picture simple, I'm not going to add these additional sources - just imagine there's many more).

Stage 5: Separate Use Cases

This modification was almost free - when the requirement came, I did it on a whim without even realizing what I had done until I saw the outcome:

Different user groups had different clients for different purposes. Each of the client needed only a subset of data, and that data was a subset of the available data - I could scale clients without even modifying anything in the core architecture!

And finally, we had network security concerns, so we had to pull everything apart. This was a freebie from a software perspective, but not a freebie from an infrastructure perspective: I separated the repositories and deployment process.

Stage 6: Decentralization

In this final stage, it was impossible to still call it "one application": I ended up with a distributed system with multiple servers, each of them hosting only a small set of functions which were fully encapsulated and could be created, provisioned, updated and maintained separately. In effect, each of my functions was a service running independently of its underlying infrastructure:

And yes, this representation omits a few of the infrastructure troubles I had, most notably, availability and discovery. The good news is that once I had this pattern, adding new services became a matter of minutes - Copy+Paste the infrastructure provisioning and write the few lines of code that did what I wanted to achieve.

Conclusion

As I like to say, "The monolith in people's heads dies last". Although I knew of microservices, I used to build them in a very monolithic way: Single source repository, single database and single deployment process. This monolithic approach kept infrastructure maintenance efforts low and reduced the complexity of maintaining system integrity.

At the same time, this monolithic approach made it difficult to separate concerns, pull in the necessary layers of abstraction to separate use cases, resulted in downtimes and a security nightmare: When you have access to the monolith, everything is fair game: Both availability and confidentiality are fairly easy to compromize. The FaaS approach I discovered for myself would allow me to maximize these two core goals of data security with no additional software complexity.

Thanks to tools like Gitlab, Puppet, Docker and Kubernetes, the infrastructure complexity of migrating from is manageable, and the benefits are worth it. And then, of course, there are clouds like AWS and Azure which make the FaaS transition almost effortless - when the code is ready.

Today, I would rather build 20 autonomous services with a handful of separate clients that I can maintain and update in seconds than a single monolith that causes me headaches on all levels, from infrastructure over testing all the way to fitness for user purpose.

This was my journey.

Your mileage may vary.

Tuesday, October 30, 2018

How not to run an agile project

This is an example how an agile approach fails. It's a rather painful lesson of messing up and I write from my own perspective, without any intent to blame anyone - just as a note of caution for those who might be in a similar situation.

Introduction

I was working with a client who needed an important business critical system.
Their IT department was organized with traditional project management and ran on 1-2 months cycles. Because this product would be core of the business for years to come, I suggested to do this with internal staff, who would later maintain and enhance the product. The suggestion included putting up a cross-functional team with six people, one process manager, two analysts who should also produce test scenarios, two coders and a sysop.
I wanted to deliver incrementally using fast feedback cycles to both offer maximum value to the business quickly and close any potential gaps in understanding on the go.
And so it came to pass that I was to be the Project Manager of this team. Due to staffing reasons, I ended up with an external and an internal analyst as well as an external and an internal developer. No problem - or so I thought.

Mishap #1: Autonomy

The first battle my team fought was that of opinions about technology. Microservices were to be used, and the data was mostly complex and weakly structured. The team's choice fell on a document-oriented database, MongoDB. This choice was met with massive resistance from the developers outside the team who weren't familiar with this technology. There was over a month of debate whether this technology was appropriate, and nothing got done because we didn't even have a way to store data. We finally reached a point where senior management had to intervene. Even after that, no developer outside the team would bother to learn about this new technology. Strike #1 against the team, they didn't have the support of other developers.

Problem: The team wasn't empowered to make autonomous decisions to begin with.

Mishap #2: Continuous Delivery

Continuous delivery was the plan, and we had two-week iterations to begin with. Unfortunately, the company was a bit short on infrastructure, so we didn't get an environment for five months. I did let this pass, because it was a problem outside my sphere of control. In retrospect, it was what probably killed the project to begin with, because we couldn't deliver anything of value for a long time - and when we did, it looked staggeringly miniscule. As a team, we were now in the defense already.

Problem: No Continuous Delivery, no feedback learning.

Mishap #3: Mixed Responsibility

No bragging, just describing the role. My role included responsibilities from PO and SM functions, lead business analyst and tester, as well as those of a traditional Project Manager. As you can guess, this cocktail turned really sour. I couldn't be a neutral SM when I was pushing a technology choice, I couldn't be the PO focused on user needs while tracking tasks and making timelines and I couldn't be the team's confidate when pressing for results.

Problem: Scrum roles and traditional project roles don't blend - especially not in one person.

Mishap #4: Culture

From the beginning, we faced resistance from the remaining IT staff, cumulating in seemingly witty remarks such as, "Do developers pick their own work here?" or "Do you seriously want to tell me developers test their own code?" The culture of an agile team made the team an intruder, an outsider for the others. It attacked their very belief system. Combined with #2, we really didn't have anything to show, so the criticism grew louder with no results to counter them.

Problem: Simply working in the same room with people who don't understand what you're doing won't have the conversion effect you expect.

Mishap #5: Clean Code

Getting a small microservice out of the door shouldn't be much of an issue with five people and a couple of weeks on your time. Enter - lack of understanding of the "Simplicity" Principle. You won't be agile when people turn a simple SQL query into a battleship of complexity.
We lost months on un-maintainable legacy code because one developer took private code ownership, hard-coding tons of magic numbers and nesting over twenty loops - only to discover that stuff didn't work properly and nobody could figure out why.

Problem: Before trying to work in an agile manner, developers need at least a rudimentary understanding of software craftsmanship.

Mishap #6: Management

After the line manager saw that I wasn't properly commanding and controlling the team, he started pushing his superiors to "fix this problem". The result was that we ended up with two project managers - one to take care of the project and the line manager, who wouldn't take any responsibility except ensuring that people reported to him (mostly things which neither they nor he even comprehended) and met pre-appointed deadlines that were created in complete disregard of the time required to do the work.
This second manager caused developers to skip writing tests, checking in quick-and-dirty code, skipping on the agreed DoD to avoid punishment. Basically, this second manager invalidated our entire agile approach within weeks.

Problem: Team autonomy is a farce when someone outside the team says what by when. A PO who must accept such externally imposed constraints is a toothless tiger.

Mishap #7: Trust

The absolute elephant in the room was a complete absence of trust which I didn't address in a timely fashion. The more trust there was inside the team, the bigger the distrust of those outside became. The line manager became weary that people started to have opinions contradicting his own and could no longer be commanded and controlled so easily. Trust became an even rarer commodity as defined responsibilities weren't met, professional boundaries were violated and snippish comments about other people's actions proliferated.
I'm not going to exempt myself here, such a mindset is truly contagious and definitely not helpful.

Problem: As long as there are trust issues, doing work is irrelevant. When the trust can't be mended, it's better to end fast than drag on.

Summary

Yes. I messed up. I took a too large portion. I will avoid to take such an "omni responsibility" in the future, simply because I won't have either the time or energy to fix all the fatal problems. The Scrum value of "Focus" can't be maintained when meddling in too many issues at the same time. The biggest problem was trust. It is the root cause of all other problems - and letting trust issues go unchecked is a surefire recipe for disaster.

Lessons for the future

The team must be empowered to make decisions about their work autonomously. While input from the outside organization needs to be considered appropriately, the differing opinions from those not even doing the work should never paralyze the team.
Forget working in an agile fashion without the necessary support infrastructure. If you can't get it quickly, you might as well scrap the entire project. When it takes four months to get a build environment, your team is toast. For me personally, I will put an exit clause in future agile contracts that I will quit when my teams don't have build infrastructure within a week.
When one person thinks of a traditional project manager and another thinks of an agile Product Owner, conflict is pre-programmed. This conflict can't be solved by compromise. A clear role definition upfront goes a long way. As long as the client doesn't understand the difference, it doesn't make sense to start working. If the client is willing to learn, a solution can be found. Otherwise, continuous conflict is pre-programmed.
"Culture eats strategy for breakfast". I know this, yet it's so easy to forget in the heat of the battle. Taking on too many roles simultaneously makes it too easy to forget leaning back and see how culture is preparing its next meal. A pure coach can observe and have the right conversations at the right time. A PO/PM caught in delivery mode goes blind. I did. Against better knowledge, so I can better understand those who have the same problem.
You will only be as agile as your team, and "Continuous attention to technical excellence" is part of the agile principles for a reason. Working with developers who haven't kept up to date in years is another surefire way of producing an unsustainable product. In future PO roles, I will make sure there's a training budget and developers get proper training so that the best they can do is on top notch level.
Two managers doesn't work for the same reason as we don't have two Product owners or two POTUS. One must give the direction. There can't be two winning strategies that aren't the same. Next time someone tries pushes me to share a single management/PO role with someone else, I will resign. Immediately.
Trust is paramount. I thought trust would come over time as we would deliver. I was wrong. On so many levels. No Trust - no Done. In future endeavours, I will invest much further into trust building, and when I discover that's impossible, it's better to end fast than drag on.

"A bad system beats a good person every single time". I don't even claim to be a good person, but I do say that I learned a lot about spotting systems which are set up to fail.

And I hope this article is helpful for you, too.

Pages

Wednesday, October 31, 2018

My discovery journey to FaaS

Introduction

Stage 1: The Big Monolith

Stage 2: Large Components

Stage 3: The first REST API

Stage 4: Multiple Endpoints

Stage 5: Separate Use Cases

Stage 6: Decentralization

Conclusion

Tuesday, October 30, 2018

How not to run an agile project

Introduction

Mishap #1: Autonomy

Mishap #2: Continuous Delivery

Mishap #3: Mixed Responsibility

Mishap #4: Culture

Mishap #5: Clean Code

Mishap #6: Management

Mishap #7: Trust

Summary

Lessons for the future