Incident estimation and tracking
I have a question on how you track incident and requests that are received sprint on sprint. 1. Do you estimate them during planning considering the history of incidents received by the team. 2. If you estimate, how do you predict complexity and size? 3. If not, which metric you use to project incident effort and how do you handle burndown n velocity in such cases
Incidents and other requests are very different.
Most requests can and should go on the Product Backlog. There, the Product Owner can discuss the request with stakeholders to gain an understanding of it, ensure it aligns with the overall product vision, and order it among the other items on the Product Backlog. The team will be able to refine it and, when it's appropriate, pull the Product Backlog Items into Sprints.
However, incidents, at least as the term is often used, are different. Incidents are quality reductions and service interruptions. Often, incidents can't wait to go through analysis, refinement, and planning. The team will likely need to take immediate action to restore services and return quality to acceptable levels.
Generally, I'm not a fan of estimating. However, if the team feels that estimating helps them plan and execute their Sprints, then it's a practice they can use. Estimation should be part of the refinement process. However, if there's an incident that cannot go through refinement and planning, I don't see the value in estimating it. Waiting only slows down the restoration of the service.
If you are estimating and using burndown and velocity, there are a few ways to handle interruptions, like incidents. One would be to not reflect the incident or unplanned work on a burndown chart or in velocity. Instead, it would be inherently reflected in that the burndown of planned work or the velocity of the team would be negatively impacted for the Sprint.
Presumably the incidents you describe are quality related, and show that work is not in the state people thought it was. Burndowns and velocity measures cannot therefore be trusted at all, since work is not Done.
In other words you have technical debt:
- Improve the Definition of Done so defects cannot recur.
- Quantify the technical debt incurred so far on the Product Backlog
- Come up with a policy of paying the debt off Sprint by Sprint.
Hi Ian,
Thanks for your response. However, incidents occur mostly post a prod deployment of a new feature or long after the deployment of an existing feature. As it's not caught pre-deployment, there's no way we can mark the relevant feature as 'Not Done'. As incidents are unpredictable, we have a clarification on how to estimate and track them as part of the sprints, especially when the team gets considerable number of incidents every sprint. Count of incidents is predictable, however the issue causing the incidents are unpredictable.
As it's not caught pre-deployment, there's no way we can mark the relevant feature as 'Not Done'.
The work isn't Done, regardless of how you currently mark it. That's what causes the unpredictability you are experiencing in the first place.
Work can appear to be Done and yet you can still incur technical debt, because the Definition of Done subsequently proves to be inadequate. It's best to establish transparency over the matter. Fix the Definition of Done, meet that continuously improving standard, and account for and resolve any newly discovered undone work.