What Really Causes Technical Debt?
Advances in Technology, Maintenance & Updates are not Technical Debt.
Although I am going to use the term technical debt in this discussion. I am actively forming the opinion that this is the wrong term to be used both in how various assessments assess understanding and by practitioners in the agile community.
The Scrum Guide does not contain the term technical debt. It is terminology widely used within the community of scrum and agile practitioners concerning poor choices made by the scrum team or agile team about delivery of a backlog item. This poor choice then leads to a fragility within the solution. This fragility causes the actual releasable state of the product to become unreliable for the product owner, with an unknown amount of cost to put things right.
A binary definition of Technical Debt that is agreeable to all within the agile practitioner community does not exist. However, with assessments for certification in scrum by Scrum.org, the preceding explanation would cover the required understanding. Even better with an example given to show that the work at the time of delivery passed the team's definition of done, was not just a single developers error.
As an example, perhaps a Scrum Team was delivering a complex solution requiring single sign-on and used a library from a solution that was implemented by another company.
While this library may work and pass the definition of done, there is a lack of clarity that there are reuse rights on the code. Also, if the code library had been hacked with lots of direct access to variables within the code. Then this would be considered technical debt. The rights to release any solution using this 3rd parties code is unclear, and the work to unpick and replace it could be unknown and extensive.
So then how does this relate to the heading, including Advances in Technology, Maintenance & Updates? This correlation is about the widening of the term technical debt to include any technological fragility within a solution. Even if the decisions made by the team at the time of delivery were the best they possibly could have been. The term does not describe the actual issue of a potential fragility, albeit not debt caused by the delivery team at the time.
Lean Principles Work Types
To understand this, I suggest we consider the four types of work as identified by Lean and concerning a solution being delivered by a scrum or agile team that owns the product and the platform on which it resides.
- Planned Features or Functionality that serve the customer or organizational needs within the solution
- Planned development of the infrastructure that the application solution relies on for it to be delivered
- Planned Maintenance and Updates to item 1 & 2
- Unplanned work to remediate defects, fragility, and errors made in items 1,2 & 3
If we then take the example given previously moving to single sign-on or using Google or Facebook as an identity provider in an IDP / IDA relationship. Then this could easily be an example of work type 1, where a planned feature to make access easier for the organization's customers.
If you consider that many Java applications previously were served to users via a WAR file on Tomcat servers in virtual machines that have now moved to a container baser serverless architecture, then this could be described as work type 2.
Following the Java Example
If you follow through with the Java example, then Oracle, as current owners of Java and the Java Virtual Machine (JVM), regularly release updates of the JVM and ensuring a smooth and controlled update process to keep the solution on the current version would amount to work from type 3. If your solution and organization, also use a tool such as Jenkins to manage the builds of your solution. It also requires updates to keep current along with your builds being reliable and timely; then, this would also fall into work type 3.
Where in the delivery of work types 1 through 3, the team makes poor choices possibly in the name of hacking it together, workarounds, or just getting it done, aka JFDI. Then they run the risk of building in an inherent fragility to the solution that will result in an unknown amount of work type 4 that later catches them out when things start to fail.
The result is that all other work may have to be stopped to remediate problems. Bring the solution back to a stable state that the Product owner knows they can release. Work type 4 is generally the area where if poor choices are continually accepted, then the scrum team will accumulate unwanted technical debt.
What they all have in common
All work from types 1 through 4 should be made visible and transparent to all through the Product Backlog. By doing this, the Product Backlog represents the vision for the solution and makes transparent all work it will take to make this vision a reality.
It is the Product Owner’s responsibility to own and manage the product backlog in such a manner that it delivers the value from all work types and does not accumulate intended or unintended fragilities that will result in unplanned work to remediate the problems introduced.
Therefore, it is essential to coach product owners to understand both work types 2 & 3, when in the face of demands from stakeholders to relentlessly deliver functionality and new features to a solution. Not prioritizing work related to underlying infrastructure or maintenance to the already released product can lead to a fragility within the solution.
If you have done much reading or practical implementation on Lean principles, you will understand that it is work type 4 that can kill a solution and or an organization. When all efforts are focussed on solving problems, then it is impossible to deliver further value until you sort the issues already created.
Accountability to own the Product
It is not within the remit of the Development Team to decide to keep a secondary list of work-related to work items 2 through 4 so that in each sprint, it accepts a certain amount of new work from type 1 as identified and prioritized by the product owner. Then filing a balance from work types 2 through 4 and taking a view, the Product Owner doesn’t understand technical topics. It is their responsibility to engage with the Product Owner giving advice where required so that the Product Owner does understand the need to take account of this work.
Product delivery has elements of technical & underlying items that need to be delivered, and this should be expected. The same debate occurs over direct and indirect value derived from infrastructure work items and understanding how the indirect value from this adds to the whole.
These are just items that need to be managed on the Backlog of work the same as any other, including the potential they have for causing dependencies. In the situation where a dependency does arise, then if the product owner is not actively participating in the prioritization and ordering of this work, it runs the risk of causing problems, not solving them.
In of themselves, they do not form technical debt as they do not derive from poor choices, and teams should avoid seeing technical work as a credit card type debt. However, when not actively managed on the backlog as being equally valuable to functional features, then the product owner does run the risk of introducing fragility by the poor choice of not owning these items. The technical debt of this poor choice could mean the product ends up in a state where they cannot reliably consider it releasable and having an unknown cost to correct that situation.
In Scrum, the preference is not to introduce, manage, or maintain technical debt but to avoid adding it in the first place. It is the accountability of the Scrum Master role to coach the Scrum Team on making the right choices not to cause the accumulation of poor decisions as technical debt.
Understanding Test Driven Development will help teams build quality in from the beginning.