DevOps is all over the place nowadays. The toolset is exploding. Companies are offering a large variety of products to suit the needs of customers. Looking at Google Trends, the keyword devops has doubled in popularity in the last two years.
Acting as an Agile coach, I’ve been steadily working with DevOps teams for the last year where I’ve seen some downsides of being DevOps. Sorting through other blog articles, I didn’t find a lot of content on two downsides that I’ve witnessed with my teams. I thought I would share this experience to give readers another perspective to DevOps.
What is DevOps for us?
That’s a good question and it varies a lot depending on what you read. If only the founders of the term would have captured their definition in a manifesto (wink to Agile). For the rest of this article, our DevOps teams follow the mantra “You build it. You run it.”. They support the system they code. They are responsible for the entire lifecycle of the code. Once they start writing it, they know they will also debug it when it breaks in production.
While it is a thrilling experience to know that thousands of people are using your system, it is not as thrilling to get woken up in the middle of the night to handle a system failure. While the reader could say we should build more stability into our system to avoid this problem, this isn’t an overnight fix in our thriving startup that is ramping up more than 30 new hires in the last year months. Developers make mistakes like anybody else and these cause bugs in production.
In the meantime, our DevOps teams have to share call rotation as well as having a backup in case our first response team member is running into an issue. We’ve had people going out for a bike ride on a sunny Sunday morning turning back to their laptops as they had to debug code that had failed in production. We’ve even had in our interview process candidates asking how we handled call rotation as we presented our teams as DevOps.
It is great to have a continuous deployment train that can automatically push new features in production. It is not so great to read customer feedback telling us they are exhausted with learning new features every other month. Who would have thought! We were so happy to push features, automate our deployment process to speed up things that we completely forgot about the customer at the end of the production line.
I define User fatigue as customers who can’t keep up with new features. Features roll out just to frequently for a user to discover and adapt them to their current ways of working with the software. It might be hard for software developers to understand this as we are always in the gist of it. Think about field technicians or customer service representative that have a different relationship with software. Their main focus is not the piece of software we are passionately working on so their level of interest on new features isn't as high as our desire to deploy new features as soon as they are build.
I have experienced first-hand this user fatigue through the numerous releases of Visual Studio Code (VS Code). As I sometimes write a bit of code, I can open up the tool a few times per month. In the first few months back in early 2017, I was excited to get an indicator telling me a new version was available. I would read the long and descriptive readme file to discover how the tool improved. Two years later, I don’t really bother reading the file. There’s always a ton of new features. Animated gifs are there to speed up my learning. But I’m just to busy to take the time.
While I cannot neglect the tremendous value that the DevOps tooling brings to our profession, I would say that a well-educated decision about how you want to operate your DevOps teams is necessary.
Before jumping on the DevOps bandwagon, I would advise that your team ask itself the following questions to tackle some of the downsides of DevOps:
- How will we respond to system failures in production when they occur outside office hours?
- What are our delays to acknowledge this system failure? Do we let our customers know?
- Do we have a procedure to find the root cause of the system failure to avoid the problem from happing again in production? Do we set an ETA on those post-mortems to force us to learn from system failures?
- Do we have beta users who are more willing than others to opt-in to the frequent release of new features?