This post is part of our “in-depth” series. Each post discusses scientific research that is relevant to our work with Scrum and Agile teams. With this series, we hope to contribute to more evidence-based conversations in our community and a stronger reliance on robust research over (only) personal opinions.
Why is code quality so often an issue? Why do software teams — despite their best initial intentions — often end up fighting a codebase that is hard to test, resistant to change, and prone to strange bugs?
I have many intuitions about this. But I’ve learned the hard way that intuitions are often wrong. So I was pleasantly surprised when Carsten Grønbejrg Lützen pointed at a peer-reviewed academic paper by Michele Tufano and his colleagues (2015), called “When and Why Your Code Starts To Smell Bad”. So down the rabbit hole I went, and into a field of research that was unfamiliar to me. But also a field that has much to say about code quality and how we can be better developers.
In this post, I summarize the insights from the study by Tufano and draw in more recent research. This post is ideal for developers with some experience, and for people who support developers in the development of their technical skills. It also ties well into a recent post where we discussed a scientific study that showed how critical the socio-technical skills of developers are to the success of Agile transformations.
Technical Debt and Code Smells
Before I dive into the scientific research, let us first get clear on some definitions. Technical debt consists of everything that makes your code brittle to future changes that make sense in light of how your understanding of the code and the domain grows over time. This could be a hastily applied hack to solve a recurring bug. Or the addition of one more method to an already bloated class. It could also be a copy-paste of an existing class, but with a different name. Whatever the cause, technical debt inevitably causes more work in the future by not implementing a better solution now. This is why Ward Cunningham coined the term “debt”. The higher it gets, the more interest you pay. This interest takes the form of more bugs, more time spent fixing weird issues, and more time required to comprehend code. And in that analogy, refactoring is the process by which that debt is repaid.
You know that your codebase has technical debt when it gets smelly. The notion of “code smells” was popularized by Martin Fowler. There are many potential smells. You’re dealing with a “Blob Class” when a class is very long. “Duplication” happens when the same piece of code appears in multiple areas, and any change needs to be replicated there too. “Spaghetti Code” happens when you have long and messy methods without parameters.
Scientific research on code smells
That technical debt impacts quality does not seem to be a matter of taste or opinion. Scientific studies have also shown that code smells increase the time it takes for developers to change code and the chance that bugs are introduced in that code (Khomh et. al. 2012, Li & Shatnawi, 2007). This effect is particularly apparent for code smells that lead to longer classes and classes with multiple smells (Abbes et. al. 2011, Politowski, 2020). Politowski and his colleagues (2021) also established through experiments that code smells that lead to longer classes substantially increase the time it takes for developers to comprehend the code and find a good solution. An interesting study by Søberg et. al. (2013) actually suggests that code smells themselves are not necessarily the issue, but the length of the class being edited and the number of changes made to it over time. Either way, code smells are persistent and remain in the code for a long time after their initial introduction (Chatzigeorgiou & Manakos, 2010).
“Scientific studies have also shown that code smells increase the time it takes for developers to change code and the chance that bugs are introduced in that code”
A single smell may not necessarily be a cause for concern, especially when that smell doesn’t make the class longer. More troubling is that surveys suggest that a third of developers have no, or very limited, knowledge of code smells (Yamashita & Moonen, 2013). This is not helped by the vague definitions of many smells and the fact that “smell detection” is highly subjective and dependant on the experience and preferences of individuals (Santos et. al., 2018). One would think that “refactoring” is often aimed at the removal of such code smells. But this doesn’t seem to be the case. Bavota et. al. (2015) studied over 10.000 refactor-focused commits on three open-source systems and found that they rarely removed code smells.
When And Why Are Smells Introduced?
So it seems clear that technical debt leads to more bugs and hampers the potential productivity of developers. But when and why are code smells introduced?
My intuition tells me that code smells are often introduced by inexperienced developers. I suspect that time pressure and pressure from management have a substantial influence. It seems more likely to me that code smells are introduced during ongoing maintenance of code rather than at the very start. But I’ve learned that my intuitions are often biased. The work by Michele Tufano and his colleagues (2015) is particularly revealing because it tries to answer the question of when and why code smells are introduced.
Tufano and his colleagues analyzed over 500.000 commits from 200 open-source projects belonging to Android, Apache, and Eclipse. They focused their attention on five code smells that represent violations of principles of object-oriented programming:
- Blob Class: A class that is large and serves many different responsibilities, which are actively used throughout the codebase. This smell is also sometimes called a “God Class”.
- Class Data Should Be Private: A class that exposes implementation details on how it stores information internally. This usually happens through exposing attributes.
- Complex Class: A class that contains many (nested) branching statements, like different levels of if-then-else and for-each statements.
- Functional Decomposition: A class that implements only a few methods from an inherited class while also declaring many private fields and methods.
- Spaghetti Code: A class without a clear structure and long methods without parameters.
To identify code smells, the researchers measured seven code metrics for each class that was added or modified by commits, including lines of code, the number of methods and attributes, the weighted methods per class, the response for a class, and the number of couplings with other classes. They also analyzed the context surrounding each commit. This included the type of commit (bugfix, new feature, refactor, enhancement). It also included the owner of the commit, and whether or not that person was a newcomer. The researchers also cleverly used issue tracking data to estimate developer workload and where the commit was on a timeline from the conception of the project to the present day.
“Tufano and his colleagues analyzed over 500.000 commits from 200 open-source projects […] to answer the question of when and why code smells are introduced..”
I will summarize the key findings from the study below. Take a look at the full paper if you’re aching for more details.
Finding #1: Most code smells are introduced from the start
I would intuitively expect that code smells are introduced into classes gradually as they are changed and maintained. However, this study shows that most code smells are introduced from the start of a new class and then stick around for a long time. Furthermore, the first signs “Blob Class” and “Complex Class” are often introduced at the start and tend to become stronger as developers add more code over time.
This finding doesn’t match (at least my) conventional wisdom, and that’s great. It shows us how important it is to spend a bit more time on the initial design of a class. Classes that are badly designed tend to become smelly from the start and remain so for a long time.
Finding #2: Smelly classes can be identified early on
The researchers also found that problematic classes can be identified early on. Classes that become smelly tend to see quick jumps in the various code metrics that were studied compared to clean classes. So when a class quickly grows in the lines of code over several commits or the number of methods it has, it is likely to be (or become) smelly.
What I think is most important about this finding is that even trends in simple metrics — like lines of code and the number of methods — go a long way in identifying problematic classes early on. This is a great way for teams to prioritize where to direct their refactoring efforts on.
Finding #3: Experienced developers are more likely to introduce code smells than newcomers
Who introduces code smells? My intuition is that they are introduced more often by newcomers or developers that are maintaining a class they didn’t create. This study disproves that intuition. Code smells are significantly more likely to be introduced by the owner of a class — and the person most familiar with it — rather than newcomers or other developers. In fact, newcomers rarely seem to introduce code smells. The smell that newcomers tend to introduce the most is “Spaghetti Code”.
Although this finding disproves my intuition, it does make sense. Newcomers are probably more hesitant to make big changes, whereas experienced developers are more inclined to do so. Experienced developers are also more likely to take on complex coding challenges. What I think is most important is that no developer is immune to code smells, regardless of their experience and familiarity with a class. This should be a humbling conclusion for any developer.
Finding #4: Most code smells are introduced just before a release
The study also provides empirical support for a common experience among developers; that bugs and code smells are introduced in the period before a release. Tufano and his colleagues calculated that the majority of code smells (89%-98%) were introduced in the month before a major release. When we take into account the loss of developer productivity and the increased chance of errors, this underscores how costly a “final rush to release” is.
Finding #5: Most code smells are introduced by busy developers
Over 55% of the developers that introduced new smells also had a high workload. The researchers measured this by calculating the total number of tasks assigned to a developer at the moment of a smell-introducing commit. This is only a rough approximation of the actual workload. It doesn’t take into account other work that developers have on their plate, and was not included in the analysis. It is also possible that busy developers simply contribute more code, and thus more code smells. However, the pattern was the same across all open-source projects that were studied by the researchers. So this finding provides at least some empirical evidence for a common intuition that busy developers are also messy developers.
Limitations of the study
A strength of the study by Michele Tufano and his colleagues is the massive size of its sample. A nice characteristic of large datasets is that patterns are more robust, and less likely to be the result of the “noise” caused by differences between projects, commits, and individuals.
However, it is important to note that the analysis was performed on data from publicly available repositories on Github and associated issue trackers. Although the large size of the sample evens out administrative mistakes in how work is tracked, the measures remain rough approximations.
Finally, the researchers focused on five code smells. The selection was based on prior research and was intended as representative of violations of key principles of object-oriented programming. The patterns may be quite different for other smells.
Recommendation #1: Learn about code smells with your team
The first recommendation is to develop a nose for code smells. A good starting point is this page on Wikipedia. We highly recommend doing this as a team. You can do this through weekly tech talks where you explore each code smell, agree on a shared definition and whether or how much of a priority it is to prevent a particular smell.
Keep in mind that not every code smell is a cause for alarm. Some developers feel that even if-then-else statements are a smell. And there is some point to that. But as the studies in this post show, not all types of code smell necessarily hamper productivity or comprehensibility. Some are more a matter of taste and their prevention can lead to over-designed code that is hard to understand for less experienced developers. The most worrying smells are those that indicate that your classes are too long (e.g. Spaghetti Code, Blob Class) and the presence of multiple smells in a single class. So start with those.
Recommendation #2: Use Lines of Code to identify long class early, and augment with other metrics
Many code smells are detectable with code metrics that can be run during builds, locally or on build-server. Some useful metrics are cyclomatic complexity, lines of code, class coupling, and depth of inheritance. Professional IDE’s like Visual Studio 2019 offer these out of the box, or you can also use dedicated tools like SonarQube.
Recommendation #3: Identify code smells during code reviews and pair programming
Code reviews and pair programming greatly benefit from increased awareness of code smells among developers. It gives a language to what most developers intuitively know isn’t good. The research we discussed in this post suggests that it is most beneficial to look for smells that are related to class length, like “Blob Class” and “Spaghetti Code”. By reviewing each other’s code and helping to find better designs, developers also learn how to recognize the smells early and avoid them in future work.
Recommendation #4: A flexible initial design makes your code less brittle to future changes
The research that we report on in this post shows that code smells are often introduced in the earliest versions of a class, and then remain there throughout the lifecycle. So if you want to keep your classes short and tidy, you have to consider strategies to do so before and during the coding of the initial versions. This can also lead to “analysis paralysis” where you keep thinking about the ideal design rather than writing it. I’ve found these steps helpful in many cases:
- Write the skeleton of the class by adding empty methods and attributes that you expect to need to fulfill the responsibility of the class. Name the methods, the attributes, and the class so that it captures what they do in the terminology of the people using the software (e.g. MarkOrderAsPaid or RefundOrder). This is a good example of Domain-Driven Design too.
- Evaluate the skeleton and look for where you can break it down into smaller classes with fewer methods. So if your class handles orders and manages them in a database, you can create one class to handle the orders and one to manage them in the database. This is a good example of the Single Responsibility Principle (SRP) at work and naturally results in small and concise classes.
- Use Dependency Injection to inject classes (or responsibilities) into other classes. So you can inject the class that manages orders in the database into the class that handles orders. This is a good example of Composition Over Inheritance.
- Start with one class. You can apply Test-Driven Development by writing your tests before the code, or do it alongside. Either way, implement the code for the class so that all tests pass.
- Evaluate the class you just created. Does it have many lines of code, many methods, many unit tests, many parameters for the methods, or many injected dependencies? These are code smells to take seriously as they indicate your class is too large, and you have to try to break down the class further.
- Repeat until you’re satisfied or don’t have the time left. Don’t worry if the code isn’t optimal. You can — and will — refactor this code later as you learn more about the domain and the code. But the steps you’ve taken here greatly increase the odds of ending up with an initial code design that is flexible and easy to comprehend — which is the point.
Code smells are signs of low-quality code. In this post, we explored scientific insights on code smells. One clear pattern from the studies we discussed is that long classes are particularly problematic. They make it harder for developers to comprehend what is going on and to make changes, they are more prone to errors and require more effort to change. Code smells like the “Blob Class” and “Spaghetti Code” clearly tie into this. The effect of other code smells on productivity, comprehension, and error-proneness is less clear. These may be more a matter of personal taste or preference.
Many code smells are generally introduced in the earliest versions of a class, rather than somewhere during maintenance. Surprisingly, code smells are also more often introduced by experienced developers and those who create the first version of a class, rather than junior developers or other developers maintaining the code. Unfortunately, code smells seem to stick around for a long time. So it is a good idea to invest in the early detection of critical code smells through code reviews, pair programming, and simple code metrics. The studies we shared suggest that this investment is worthwhile because it boosts productivity and decreases the potential for bugs and other errors.