The DRY principle is stated as Every piece of knowledge must have a single, unambiguous, authoritative representation within a system.
It is always easy to copy and paste some code when you need it in some other place of your application. Especially when it is a hotfix, you are under a pressure, and you should do it as quickly as possible. You think that now it is ok, later I will return and refactor this. And we know, that in most cases this will never happen until something again will be broken but this time because of this duplicated piece of code.
Every duplicated logic is a potential delayed-action mine. Duplication often leads to a maintenance nightmare. When you have the same logic in multiple locations and you need this code to be changed, chances high that the necessary changes won’t correctly be applied to every location where that piece of logic occurs. When these multiple locations come out of sync with one another a heap of bugs will appear. So, to avoid these type of bugs we need to be sure that the relevant piece of code exists only in a single location:
- Assign to this source of truth a clear name.
- Choose right location for it
- Reuse it, don’t duplicate
Next time when we need to change it we exactly know where to look for. Avoiding duplication also improves the readability of the code. A small simple function or method is much easier to read and understand than a huge complex one.
Think before you DRY
We often think about DRY as duplicate code is bad, code reuse is good, right? But it is a wrong approach. The principle is not about a code, it is about knowledge. Next time when you see a duplication of code do not immediately extract it to a method or an abstraction. Maybe
in this current context duplication is not an evil, it is normal.
Consider this example, with code duplication, but without knowledge duplication.
We have two classes
Category. Both of them can add a product. Furthermore, the implementations of adding a product are identical. Should we extract? Let’s try and see what happens. There are two variants here: a common parent class and a trait.
Common parent class:
Code duplication has been successfully removed, but now we have a wrong hierarchy.
Cart classes have one parent, but they are not the same data type. We have created a hierarchy only to achieve a code reuse. This should be always avoided. Category and cart are completely different objects.
And again duplication has been extracted, but what is the cost? We have increased a complexity with a use of the trait. Imagine that we need to change a bit this logic for
Cart class. For example, if we are adding a product that already exists in the cart, we simply increase its quantity. To implement this, we need to completely override traits method like this:
It looks like our trait has become useless. We should keep this duplication in these classes. In this case, the duplication is not about a knowledge, it is only about a code. But this code represents different knowledge, even if it looks quite the same. When we make changes to cart, it is unlikely that the same changes will be done to a category.
One more example when we don’t need to remove duplication is when the application will never be changed. When you are creating a prototype or some script that will be run once and then thrown away, it is ok to copy and paste some pieces of code, if they will never be changed. Code duplication costs a lot, but only if the code changes.
Rule Of Three
This rule can help us in making decision when to extract a duplicated code. It says: Extract duplication only when you see it the third time.
- The first time you do something, you just write the code.
- The second time you do a similar thing, you duplicate your code.
- The third time you do something similar, you can extract it and refactor.
But why? Why should we duplicate our code, when we have always been said that it is wrong?
These simple steps are build to prevent you from prematurely and wrong refactoring, when the problem we are trying to solve is not yet clear. Prematurely refactoring a piece of code when we don’t exactly see the problem behind it can cause tremendous problems in the future. With only two examples of duplication, it can be dangerous to abstract, because some details of the specific scenario can bleed into the abstraction. But when we have at least three examples of duplicated code we can more precisely define patterns and abstractions for it. These rules also prevent you wasting your time and making something reusable when at the end of the day you end up with only one use of it.
Consider two classes
Teacher. Both have the same method with exactly the same logic. Should we extract a base
Person class? Are we sure that the logic in these methods represents the same knowledge? Or this happened by accident.
We can extract a parent class
getFullName() method in it and then extend our classes from
Really looks very nice, but in some days we are told that the teachers’ full name should contain their middle name. Now what? We need to go and override parent’s implementation. As the result of our extraction we achieved nothing but complexity. We have a reusable thing that is used only once. It is not very clever.