Skip to main content

Create Faster and More Accurate Forecasts using Probabilities

January 15, 2020

The most important question to stakeholders is often “When can it be done?” There is a lot that goes into answering that seemingly simple question. What is involved in the work? What else are we doing? What risks are there? The questions go on and on. Traditional estimation processes are so time-consuming, and often inaccurate, that an entire #noestimates movement has been started. However, these questions still need to be answered and we need a way to answer them that doesn’t get in the way of getting things done. That way is probabilistic forecasting.

What is probabilistic forecasting?

A probabilistic forecast is one that acknowledges a wide array of possible outcomes and assigns a probability, or likelihood of happening, to each. Every probabilistic forecast should have 2 components: a range and a probability.

weather report

Every time you look at a weather report to check the chance of rain, you are looking at a probabilistic forecast!

Release Planning

If a product owner wants to know how long it will take to finish a specific set of items in the product backlog, a traditional approach would be to examine the work, break it down into stories, and provide an estimated duration for each. Then, you add up the cumulative time, round up a bit for good measure, and add some padding for the potential risks or unknowns. Now you can finally produce a forecast that sounds something like “This effort will take approximately 13 weeks to complete.” Even if you use velocity instead of time, you end up with a similar forecast that sounds like “This effort will take about 7 sprints to complete.” This type of forecast, one that is focused on a single possible outcome, is known as a deterministic forecast.

A probabilistic forecast, with its 2 components, is presented as: “there’s an 85% chance we can get this done in 7 sprints or less.” The specific date something will be completed depends on when you start! So, once you factor in a start date you can give forecasts like: “There’s an 85% chance we’ll finish by May 11th or earlier if we start on February 3rd.”

calendar view of delivery dates and probabilities

Sprint Planning

The question “When will it be done?” is useful for release planning but what if you’re trying to do simple sprint planning? That takes a slightly different question: “How many can we get done?”. Even though the question is different, the approach is the same. We want to be able to say there’s an 80% chance that we’ll finish 27 or more items in this 2-week sprint.

Monte Carlo Histogram How many

Calculating the range and probability

Sounds great, right? But, the big question is how do we create these spectacular probabilistic forecasts? How are we supposed to know the range of possible outcomes, much less the likelihood of each? It’s not something we can just accurately estimate on our own. Well, it’s not magic! You can use a tool called a Monte Carlo simulation to easily provide that information.

A Monte Carlo simulation uses data you provide (estimates or historical data) to run thousands of simulations taking advantage of the variety found within the supplied data sample. According to wikipedia, the analysis from a Monte Carlo simulation has a narrower range (less unnecessary padding) than normal analysis because it is difficult for people to effectively consider all of the various permutations of each variable. In Monte Carlo simulations, outliers are given less weight than things that happen more often.

The outcome of each individual simulation can then be plotted on a histogram, allowing you to see the spread of possible outcomes and how many times each of those outcomes occurred.

Monte Carlo with percentile lines

The number of times any single outcome occurred compared to the total number of trials gives us a probability of that outcome. You can see reference lines for specific probabilities (50%, 70%, 85%, 95%) in the image above.

Why is probabilistic forecasting faster and more accurate?

You can run a Monte Carlo simulation in mere minutes. It is much faster than spending hours, days, or even weeks, of expensive expert time just for the sake of answering the question “When will it be done?” or “How many can we do?” Time spent estimating is time we aren’t actually creating the desired output. Providing cheaper forecasts that are at least as accurate should be a big part of your definition of working smarter, not harder.

Let’s talk about accuracy. By most accounts, estimation, even by experts, is usually wrong - often by a large margin. Why? Well, our domains involve a lot of uncertainty, much like the German Tank Problem. It is very difficult to anticipate everything that will need to be done for something to be a success. There is a lot of what I call “dark matter” that can’t be accounted for in a task breakdown or project plan. A lot of what takes time is often not even recognized as work - meeting, asking for feedback, checking in with colleagues to run ideas by them, etc. For these reasons and more, we rarely get estimates right and so we add padding to compensate.

In fact, the words “right” and “wrong” when it comes to forecasts should be re-evaluated. Using the language of probabilities reminds us that something unexpected can happen and disrupt our desired timelines. In truth, the only things we can be certain about are the things that have already been delivered. Outside of that, there is no 100% in a probabilistic forecast. If we said there was an 85% chance we would deliver work in 13 weeks or less and it took us 15, it doesn’t mean that we were wrong. We stated upfront there was a 15% chance that work would take longer. 

Just as a meteorologist continually updates the forecast for a storm as it progresses, teams can continually update their forecast as work is finished to minimize surprises. If you or your stakeholders don’t like the forecasts that are generated, you can reiterate that the forecast was driven by actual past capabilities. You can ask if anything has changed that would result in a different outcome. No? Then discuss what would need to change to have the desired outcome be more probable - better delivery pipeline automation, better testing tools, decreased scope, etc. When experts give estimates, it is easy to focus displeasure on the expert and their opinion. When forecasts are statistically generated by the data, it is harder to (metaphorically) shoot the messenger and easier to focus on the situation at hand.

Will this work if my work isn’t all the same size?

Many people think that probabilistic forecasting won’t work for them if their work is varied in type or size. Your work doesn’t have to be the same size at all for this to work. Obviously, variation will cause the spread of possible outcomes to be wider. However, there is a truth that feels very counter-intuitive: if the data that goes into the Monte Carlo simulation reflects the variety of your work, the generated forecasts will reflect that variety as well. 

Essentially, if your future conditions are similar to your past conditions, you’re all good. You will need to take the forecasts with a grain of salt when your team or its work materially changes, at least until you get some new data. The good thing is that you don’t have to take my word for it! Try it and see for yourself. 

Am I really ready for this? Are there tools available?

Any team, even new teams, can use this type of forecasting. This feels like an advanced concept but it really isn’t. It’s just very different than what we’re used to. Don’t have historical data? Use estimates until you finish some work and then switch to using historical data.

Also, you don’t have to love math! If you can do simple math (add, subtract, multiply, divide) then you know all the math you need to know to do probabilistic forecasting. The statistical formulas behind the simulation can be handled by tools. In the spirit of transparency, you should know that my business, 55 Degrees, partners with ActionableAgile to offer ActionableAgile for Jira - Agile Metrics. It is one of the only online tools for flow metrics and probabilistic forecasting. You can use ActionableAgile with Jira, Azure DevOps, or on its own by uploading your data manually into the standalone SaaS tool. Can’t use that? Then I highly recommend the free, downloadable excel spreadsheets from Troy Magennis at FocusedObjective. They also have some great learning exercises you can download and facilitate with your teams to explain some of these concepts. 

Where can I find training on this topic?

If complementary practices like probabilistic forecasting Monte Carlo simulations are interesting to you then you’ll love the Scrum.org class Professional Scrum with Kanban with trainers like me. Sign up for one now to learn how to make delivery more predictable by adding flow metrics to your scrum practice. 

If you want training on Advanced Agile Metrics, Forecasting & Predictability which goes much further into these topics than the PSK class, contact me or fellow PST, Dan Vacanti!

 

 


What did you think about this post?