Skip to main content

Estimating incidents

Last post 03:36 pm February 12, 2016 by Andrzej Zińczuk
7 replies
03:27 am August 11, 2015

I am the productowner of a devops team that is responsible for a fairly complex Oracle application. One of our tasks is to resolve incidents.For the devops team it is hard to estimate the amount of work to resolve these incidents. The problem is that before you start, not much is known about what causes the incident. Once analyzed, resoltion usually is not more than changing a few lines of code. Thinging about, I see essentially two solutions:

- Do all analyzis during backlog refinement and accept this is not estimated. I think this is not a very good idea
- Use an estimation and if you really have no idea accept a very inaccurate estimate based on previous experience with incidents.

Can you please share your experience with this problem? What works for you and may be a direction to choose for my team?

05:13 am August 11, 2015

Hi Rudolf,

I try to repeat your question as bellow:
There is a Scrum team in your Organization.
You are the Product Owner of the Team.
The team is responsible for building and maintain a complex application built upon Oracle.
There are some runtime errors raised from the product environment.(Maybe technical debt)

Are you a Product Owner or a project manager?
Is there a Scrum Master in your Team? What does s/he think about these issues?
What kind of incidents are there? System domain? Application domain?
If these were accumulative technical debt, the problem domain should be how to tackle these technical debts.

Rough right is much better than precise wrong.
Don’t care too much on precise estimating the cost to tackle these incidents.

I’ve experience to maintain/upgrade a legacy system. My strategies are:
1. Conduct a serials of Unit testing and Exploratory testing to identify potential technical debts.
2. Avoid accumulating new technical debts.
3. Pay off some of debts at each sprint.
4. For ad hoc issues, incidents, development team just adds to To-Do-List in Sprint Backlog. DT always reserves some capacity to tackle this kind of ad hoc tasks.

05:31 am August 11, 2015

Hi Rudolf, how volatile is the queue of incidents (in terms of priority)? This may help the team decide whether SCRUM is the right framework.

You may want to have a look at kanban which is more appropriate in environments with a high degree of variability and priority. At the moment the team is unable to batch work into timeboxed sprints that can be left alone which is more suited to SCRUM.

In kanban the team members simply pull the next unit of work from the backlog and proceed with implementing it (its unit based as opposed to batch based). Estimation is optional however some teams still choose to carry out the estimation in order to have more predictability - the focus here is on cycle time rather than velocity. Cycle time is the amount of time it takes for a unit of work to travel through the team’s workflow (including any analysis work)–from the moment work starts to the moment it ships. By optimizing cycle time, the team confidently forecast the delivery of future work.


05:52 am August 11, 2015

I agree with Jitesh.

If most of your tasks is maintain/support, Kanban should be the best choice.

If the system is ungoing developed by your team, Scrum + WIP of Kanban should be fine.
As mentioned above, DT adds the ad hoc tasks to Sprint Backlog. When team members pull the unit of work from the backlog should be governed by the WIP limitation.

09:15 pm August 11, 2015

Is a single, coherent Sprint Goal being negotiated each Sprint, and does the goal provide value? Or is each work item essentially independent of the others in terms of the releasable value it provides?

07:58 am August 13, 2015

Thank you all for your replies. My team is mainly involved creating new or changed functionality (so I think kanban is not a good solution). Scrum is fairly new for my company and for my team including me and the scrum master. In the past incidents did not have a great priority with the organization fot both valid and not so valid reasons, so some incidents remained unsolved for a long period (using workarounds)..

For most of my team members the product is a fairly unknown area of business. It is extremely complex by regulatory requirements, the nature of the product and by the companies culture.

So as a prodcu woner I must seek a proper balance between crating nice things for the business, increasing team capabilities and resolving incidents. All three have high business value. People advising the team on the estimation subject have different opnions on what is the best way to handle this. That's why I am interested in experience in other comapnies. At the moment the devops team is experimenting with a way of work to pull issues one by one in the sprint backlog, and allw a time boxed period of maximum 2 days to find out what's wrong. If they cannot located the root cause of the problem inside this period, we discuss how to proceed. The advantage is that in this way the work we do is transparant to our stakeholders. The disadvantage is that the devops team is working on issues that are not refined well enough. My gut feeling says me I best way to proceed is keep the transparency and accept the uncertainty.

@Ian, yes the incidents in most cases are not releated to a specific sprint goal. We are still working on defining proper sprint goals. In most sprints up till now we worked on two or three larger issues and some small ones.

09:03 am August 13, 2015

Scrumban is sometimes used as a compromise, in an attempt to balance incident handling with substantial pieces of new work:

In the past I have also addressed this class of problem with a split into two teams, a Kanban for incident response and a Small Change Scrum.

03:36 pm February 12, 2016

Why do you need to estimate incidents? More valuable data may come from history in such case. From incident solving perspective it's better to provide cycle time for particular type/size of incident rather focusing on predicting what kind of effort is there.

Having bugs which are hard to track to find cause might need some deeper dive (5why?) to check what practices might help to speed up analysis - some additional metrics, checkpoints, architectural change etc can be added to system to speed finding out what is happening.

By posting on our forums you are agreeing to our Terms of Use.

Please note that the first and last name from your member profile will be displayed next to any topic or comment you post on the forums. For privacy concerns, we cannot allow you to post email addresses. All user-submitted content on our Forums may be subject to deletion if it is found to be in violation of our Terms of Use. does not endorse user-submitted content or the content of links to any third-party websites.

Terms of Use may, at its discretion, remove any post that it deems unsuitable for these forums. Unsuitable post content includes, but is not limited to, Professional-level assessment questions and answers, profanity, insults, racism or sexually explicit content. Using our forum as a platform for the marketing and solicitation of products or services is also prohibited. Forum members who post content deemed unsuitable by may have their access revoked at any time, without warning. may, but is not obliged to, monitor submissions.