How do you document technical tasks like data loading

20 replies

08:07 am June 13, 2013

We have only recently taken on scrum so are still learning as we go through the project.

Previously we have been using spikes for tasks that are not part of the story, but we now know that they should only be for investigation.

One major example we have had is loading of data. We are loading historical data from 7 other organisations into ours (at different times) and adding to this. We will then run searches against this data. This data load needs to be manually handled as it is in different formats and structures and we are putting them into one format for us to use. This has previously been spike work, but as we are actually 'doing' work, it shouldn't be a spike.

How should data loading be done. Should we created a story saying 'as a user i want to view xxx data'?

Thoughts would be appreciated.

Charles Bradley

02:51 pm June 13, 2013

PSmith83,

I'm a bit a of a User Story expert, so I'll try to help here.

Is the data going to eventually be in "production" or delivered with the actual product to end users?
What kind of product functionality does the data support?

(i.e. you mention "We will then run searches against this data." do you mean the end users will run searches against the data?)

Ian Mitchell

04:28 pm June 13, 2013

Hi PSmith83

From what you say, it looks like you should have multiple user stories. There are 7 sources being referenced at different times, so on that basis I'd expect at least 7 stories...more if each source is loaded over several runs. Each story should be small and independent enough to be progressed quickly and without impediment.

The value seems to be in the normalisation of the data and the reduction of technical debt, so I'd recommend phrasing the user stories in those terms.

Paul Smith

03:35 am June 14, 2013

I was thinking it should be multiple stories.

What we are doing (without going into detail of the exact project), is going live with seven different organisations at different stages and our own staff will be running searches against their data and adding to that data. We receive a set of data from each organisation to look at it and figure out what we need to do with it to get it into the format we want. We will then get a refresh of data the day before we go live, where we will need to run a job to convert the data and put it into live.

As this means we will eventually have 14 sets of data (2 for each organisation), how would I write these stories. The first set of data that we receive will be purely for our developers to get into the state we want and set up an automatic job to convert the second version at a later date. The second set of data will be for our own staff (the end users) to carry out their activities.

Sorry if this seems a silly question or I am not giving enough detail, but this is all new to our company so I have no one else to go to in-house to get advice.

Thanks

Ian Mitchell

05:55 am June 14, 2013

For the first set of data, phrase the stories in terms of the value being provided to the developers.

e.g. As a developer I need data migrated from source and reformatted so that an automatic job can convert the data

For the second set of data, the value is being provided to the users. You don't specify what their activities are so I'm not really in a position to make a guess at what their stories would look like.

However, I don't think you should get too hung up on how the stories are written. A user story is a placeholder for a future conversation. If it is phrased well enough to facilitate such conversation, then it has done its job. Don't sweat the small stuff over this. It's more important to concentrate on the acceptance criteria. In your case there is likely to be a great deal that needs specifying in the a/c of the first set of stories, since they have to support automated processing at a later date.

Paul Smith

06:05 am June 14, 2013

Ian,

Thanks for the comments, it is becoming clearer to me.

The developer example story you have given makes sense. As for the second load of data, I am guessing the story will be along the line of 'as an end user, I want to have access to x's data so that I can run searches against the data'.

I'm not too concerned about the wording of this, but was more concerned with what we should be doing as currently these tasks are down as spikes as opposed to stories. I know they shouldn't be spikes but wasn't completely sure if they should be stories or not.

Thanks

Paul Smith

07:04 am June 14, 2013

Ian,

Sorry but I have a further question.

Your example story,

As a developer I need data migrated from source and reformatted so that an automatic job can convert the data

, references a developer. As they are not a user of the system should we really be writing stories for them?

Ian Mitchell

08:49 am June 14, 2013

If that is where the value is, then yes, write it from the perspective of a developer. Developer stories can be the most direct way to address the payoff of housekeeping work or accumulated technical debt.

Paul Smith

08:52 am June 14, 2013

OK, that is where the value is, but what direct business value is there? The end users aren't worried about how the end data is made available, just that it is available.

The final story of 'as an end user, I want to have access to x's data so that I can run searches against the data' does give business value but the developer story doesn't.

Ian Mitchell

09:08 am June 14, 2013

There isn't any direct business value in a developer story. If there is any direct business value to the story then write it as a user story, not a developer story. As a user story it can then be placed on the Product Backlog and negotiated into a sprint accordingly, where it will count towards the burn-down.

A developer story will be introduced into the Sprint Backlog by the team itself. The Development Team wholly own their Sprint Backlog. If they determine that such a story is necessary for the successful delivery of the project then they have a responsibility to accommodate it. Note that even though a developer story can be estimated in terms of points, these points shouldn't count as product burn-down, since the story does not come from the Product Backlog.

Paul Smith

09:12 am June 14, 2013

That completely make sense now, I thought you were suggesting to put this on the product backlog rather than on the sprint backlog (me being stupid!)

Thanks

Charles Bradley

12:03 pm June 14, 2013

Disclaimer: User Stories are not a part of Scrum. They are one technique of many for representing PBI's.

IMO, and according to the creators of the user story practice, there is no such thing as a "developer story". A user story, by definition, delivers functionality into the system(product) under development and also delivers value to the customer (end user and/or purchaser of the system.)
More here: http://www.scrumcrazy.com/A+User+Story+Checklist

It is also hard to give user story advice over the internet.. See: http://scrumcrazy.wordpress.com/2013/06/04/its-difficult-to-give-user-s…

It sounds very much like you are doing horizontal slicing rather than vertical slicing, or like maybe you are not really doing software product development. It's still unclear to me how you are delivering value to your end customer. What is the business benefit of said queries being run by your staff? (Be sure and see my article above for advice on how to get user story advice over the internet)

Besides writing the job that converts the data, what other part of your system is actually new software?

Paul Smith

02:58 am June 17, 2013

Our end user is our internal staff that will run these searches. This project is a prototype and if taken forward, will then be made available to people outside of our organisation.

The whole system is new. We have built something that will add to the other organisations data and allows us to run searches against this data through a web interface. The process for converting data is not to manipulate it, just to store it withing tables of our own structure.

Charles Bradley

04:20 pm June 17, 2013

Based on that extra context, I wish to suggest two things to further your learning of how to solve this problem.

1. You mention "prototype", but in Agile/Scrum, we prefer working towards a "minimum viable product(MVP)" rather than a prototype. There are similarities, but there are also very important subtle differences worth learning.

This is a good start:
http://www.romanpichler.com/blog/agile-product-innovation/the-vision-th…

But really you should read the relevant chapters in his book about the MVP and "visioning sprints"

2. It sounds like you are currently doing "horizontal" slicing instead of vertical slicing. You need to "begin with the end in mind," then work your way back to the data and data loads, and figure out a way to slice and split so that these stories will fit in your sprint. Typically, when you're doing "visioning sprints" to get early feedback on a new product, you tend to want to prefer shorter sprints(1-2 weeks), to help increase your chances of "pivoting to value" quickly.

So, if you were to design one end to end feature (all the way to the web interface), would that feature a) fit into your sprint? and b) be something that could be shown in order to convey business value and get feedback from potential customers and those guiding the product vision?

If the answer to both is yes, then you simply need to include the data load as part of a story that delivers potential business value. (the data load will be a task not a story) If the answer is no, then you need to work on ways to split the stories so that they will fit into the sprint. For instance, maybe one org's data at a time, or maybe a subset of the data at a time...Subset of the queries at a time, etc The data is not what's important though -- what *is* important is creating features that deliver value. Start with a story that delivers value, then if it doesn't pass the two tests in the previous paragraph, look for ways to split it. Without knowing your feature domain and product domain, it's very hard for me to give guidance here -- which just goes back to how it's hard to give this kind of advice over the internet.

Have you considered hiring a User Story coach? Is it possible?

Randy Ho

07:44 pm June 18, 2013

As others have mentioned you are doing horizontal slicing instead of vertical slicing.

The way i would start with the story:

"As an analyst, i would like to perform searches against historical data from systems (the 7 systems you listed), so that I can create reports for forcasting of blah blah"

Disclaimer: I used analyst as the person performing the search to be more specific than just "user". Adjust to your business domain if needed.

You would groom this story with the team. Talk about what the data needs to look like in the search results (data structure, not so much user interface design) and the various formats of the 7 data stores you need to query from.

During planning, the team would figure out "how" to deliver this value. Doing a conversion is one option but perhaps you can simply create a gateway between the systems and query the systems directly for each query without an import.

If importing seems like the best thing to do, then a conversion and import would be one of the tasks needed to be performed to satisfy the story.

If the story effort estimates are "too large", you could slice it by either functionality or data.

e.g.
split the story into two stories:

"As an analyst, i would like to perform searches against historical data from systems A, B, and C so that I can create reports for forcasting of blah blah"

"As an analyst, i would like to perform searches against historical data from systems E, F, and G so that I can create reports for forcasting of blah blah"

Paul Smith

04:33 am June 19, 2013

I agree that the scenario given is horizontal but all of our stories are vertical so we are doing something right!

The scenario I have described is probably not clear and is a difficult one to describe. So far our MVP (the functionality) was built with 10 sprints and we are currently in sprint 11 with a few more planned. Forgetting about the data from the organisations, the functionality delivered allows the user to do everything they need to do without the data (so they can create new data and search against this). This has all been delivered using vertical stories.

The issue I am having is around how we use stories to document the data which comes in piece by piece from external companies and don't fit in with a story within the sprint. For example:

Sprint 1
Sprint 2
Sprint 3 - received organisation A's sample data
Sprint 4
Sprint 5 - received organisation B's sample data
Sprint 6 - received organisation C's sample data
Sprint 7
Sprint 8
Sprint 9 - received organisation B's live data
Sprint 10
Sprint 11
Sprint 12 - receive organisations A & C live data
Sprint 13 - receive organisations D, E & F sample data
Sprint 14
Sprint 15 - receive organisations D, E & F live data

For the live data we can put in a story saying the end user can access organisation A's data, but the issue is for the sample data. Each of the organisations data is in different formats and we have/will receive them from different third party suppliers.

This sample data is purely for our developers to play around with, manipulate into the format they require and set up an automated job that can be run when the live data is received so the process takes less than an hour to get the data into the live environment. There is no business value to this.

How should this be documented in a story? We are currently doing it as a spike which we know is incorrect but we are not sure how it should be done.

Hopefully this explains things a bit better. In reply to Charles post, a user story coach would be a great idea but is not viable for our organisation at this time. The only help we have had with stories is the internet and a couple of copies of Mike Cohn's book, User Stories Applied. Agile is something we have only started this year but I have made my point know that some sort of training on user stories is required.

Thanks

Charles Bradley

10:50 am June 19, 2013

Psmith,

Can you give me an idea of roughly (on average) how many man days it takes for the "sample data" piece vs. the "live data" piece?

Randy Ho

11:43 am June 19, 2013

This sample data is purely for our developers to play around with, manipulate into the format they require and set up an automated job that can be run when the live data is received so the process takes less than an hour to get the data into the live environment. There is no business value to this.

There is certainly business value here. You must remember that the result of a sprint is a potentially shippable increment of software.

It sounds like some of the development to support the import of data from an external system will result in additions to your increment of software. The additions will be in the form of scripts and job definitions in your increment of software. These additions provide value by allowing the users to eventually have access to the data through your existing search system.

From what you describe, your story should still be written in the perspective of the user needing to run searches against these additional data sets.

There may be another component to this which is work that does not result in additions to your increment of software. If the scripts to setup of the automated jobs or the scripts to do the data conversion is not to be included in your increment of software, then they probably shouldn't be part of your scrum.

For example, the developers of this forum software would have a story for "as a user i would like to reply to posts".

The developers however, would not have a story for "go to a forum user's house and hold their hand while they use the software to make a forum post".

These are tasks that the team would set aside to perform outside of "scrum development". The team should acknowledge the impact on capacity because of this and adjust accordingly.

Paul Smith

04:02 am June 20, 2013

Charles, the sample data is taking anywhere up to 7 - 10 man days to sort out, the live data takes a matter or hours.

Randyh, the sample data does not produce any increment in software. The software for adding to the data and searching it had already been developed prior to receiving the sample data. This sample data will not be made visible to the end users it is purely to eliminate errors/questions when the live data has been received.

Charles Bradley

06:21 pm June 23, 2013

PSmith,

The data load stuff sounds like stories because you are creating software (data loading jobs written in SQL or some other software technology -- you mentioned "automatic job to convert the data"), though it might not be software in the product you're delivering(Whether you think about the data load software as a separate system or part of the current (product)system -- I don't think it matters much -- up to you). Sounds like you could easily slice those into small chunks using vertical slicing and/or slicing via acceptance test.

The actual data migration, in my current understanding/view of your context, is *not* software development. It is software *operations*. In that instance, Scrum is not designed for software operations, but there is also probably very little wrong with just adding some tasks to the Sprint Backlog to represent the data migration work.

Does that help get us further in you solving your quandary?

[Btw, for the next week or two, I will be in a quite busy period, so I may not be as responsive...hard to predict at this point. p2a]

Lisa Fair

04:02 pm August 17, 2015

I'm not sure that you will be able to simply assign data migration to software operations. Based on your description, it seems that you are building a new system that will replace multiple systems. So perhaps the user story is that the User needs to be able to access existing records.
The data load is part of the product that must be delivered and validated prior to delivery.
Hope this helps!