The DRY Principle
Why every software engineer should care about repetition: solutions that actually works.
Hi Friends,
Welcome to the 137th issue of the Polymathic Engineer newsletter.
Imagine this situation. While looking over a pull request, you see a flaw in the input validation logic. As you look closely, you see that the same validation code appears in 12 distinct files in the codebase. Some implementations deal with edge cases, whereas others don't. Some have already been fixed for security holes, but some are still open.
What started as a simple bug fix has now become a long hunt around your software to discover all the places where the logic is being used twice and make sure everything is consistent.
If all this sounds familiar to you, that is exactly what the DRY (Don't Repeat Yourself) principle was meant to stop. Even though DRY is one of the most fundamental ideas in software development, it is not always understood and used properly.
This week, we're going to talk about what DRY really means, why it's more important than you think, and how to use it without falling into common traps. Whether you're a junior developer writing your first functions or a senior engineer planning out a large system, DRY will help you work faster and make your code easier to manage.
The outline will be as follows:
What DRY Really Means
The Hidden Costs of Repetition
Common Types of Duplication
DRY in Practice: Solutions That Actually Work
When NOT to Apply DRY
Project-based learning is the best way to develop technical skills. CodeCrafters is an excellent platform for practicing exciting projects, such as building your version of Redis, Kafka, DNS server, SQLite, or Git from scratch.
Sign up, and become a better software engineer.
What DRY Really Means
When most developers hear "Don't Repeat Yourself," they immediately think of copy-paste code, but DRY is about a lot more than that.
The formal definition was written by Andy Hunt and Dave Thomas in The Pragmatic Programmer, and says: "Every piece of knowledge must have a single, unambiguous, authoritative representation within a system."
The most important word here is knowledge, not code. DRY isn't just about getting rid of duplicate lines of code, but about eliminating duplication of intent, business rules, and understanding.
Consider this example that illustrates the difference:
def validate_age(age):
return validate_positive_integer(age, min_value=0)
def validate_quantity(quantity):
return validate_positive_integer(quantity, min_value=0)
A developer might flag this as a DRY violation because both functions call the same validation with identical parameters. They would be wrong, though. They both have the same code, but they mean different things.
Age validation and quantity validation are used for different business reasons and may change on their own. For now, they all have to follow the same rules. But in the future, the business may need them to follow different rules for each.
True knowledge duplication looks more like this:
# In user registration module
def create_user(email, password):
if len(password) < 8:
raise ValueError("Password too short")
if not re.match(r'^[^@]+@[^@]+\.[^@]+$', email):
raise ValueError("Invalid email")
# ... rest of user creation
# In password reset module
def reset_password(email, new_password):
if len(new_password) < 8:
raise ValueError("Password too short")
if not re.match(r'^[^@]+@[^@]+\.[^@]+$', email):
raise ValueError("Invalid email")
# ... rest of password reset
In this case, the rules for password strength and email format are used in more than one section. If the business decides that the minimum length of a password must be 12, you'll need to remember to change these two places, as well as all the other places where these rules are applied. The solution isn't just to extract the common code, but to centralize the knowledge about what makes valid user credentials.
How you design software will change once you understand this difference. You'll start seeing duplication not just as repeated code, but as repeated decisions, repeated business logic, and repeated understanding scattered throughout your system.
The Hidden Costs of Repetition
When we duplicate knowledge across our codebase, we're not only making additional work for ourselves now, but we are also building up technical debt that will cost us a lot more tomorrow.
The most obvious cost is maintenance overhead. Every time you need to modify the business logic, you have to find all of its copies. If you miss one, you've made things inconsistent, which might cause bugs or, even worse, security holes. But the hidden costs run much deeper.
First, there's the mental strain on developers. When you have duplicated logic, you always have to question yourself, "Is this the real version, or am I looking at a copy?" This extra mental work slows down development and makes mistakes more likely.
Second, testing is much harder when there is duplicated code. You now have to make sure that all copies behave the same way instead of only verifying one implementation exhaustively. Even worse, you might test one version a lot without testing others very much.
Consider this real-world scenario. You are making an e-commerce app, and the logic for processing payments shows up in three places: checkout, renewing a membership, and processing a refund.
When a security patch is needed for card processing, developers may change the logic for the checkout, but forget about subscriptions and refunds. If the company decides to standardize how it deals with edge cases, you have to make sure that changes are made across multiple files and teams. Changes that should be easy become dangerous, and new team members have a hard time figuring out which implementation is the "true" business logic. The overall result is that the speed of development slows down over time.
Common Types of Duplication
If you want to use DRY well, you need to know what to look for. Being aware of all the different ways that duplication can happen will help you handle it correctly.
Code duplication is the most obvious type, and it happens when programmers copy and paste big chunks of code and then make little changes to them. It seems faster at first, but it's the start of a lot of work that needs to be done on it. I have seen whole validation methods copied from one controller to another, with each copy handling a different set of edge cases. When a core validation rule changes, developers have to find all the copies, but most of the time, they miss at least one.
Logic Duplication is more harmful, even if less obvious. This happens when the same business rule is used in multiple places, often with code that is very different from the other places. For example, the API layer, the service layer and the database layer could use different methods to check if a person has the right permissions. Still, all of them show the same business knowledge about who can access what.
Process Duplication goes beyond code and affects how you work on development. If you have to test the same scenarios manually before each release, use various methods to deploy, or write similar configuration files for each service, you're breaking DRY at the process level. This type of duplication can be especially expensive because it takes developer time directly and makes mistakes more likely in critical workflows.
Data Duplication is when you store the same information in multiple places that you then need to keep in sync. Consider the following piece of code: