Types of Duplication in Code

By John Sonmez

One of the biggest reasons to refactor code is to eliminate duplication.  It is pretty easy to introduce duplication in our code either unintentionally or because we don’t know how to prevent or get rid of it.

The three types of duplication

I’ve found that there are three basic types of duplication that we can eliminate from our code that successfully build on each other.

  • Data
  • Type
  • Algorithm

Most developers tend to get stuck at the data level, but in this post, I will show you how to recognize type and algorithm duplication and refactor it out of your code.

duplicate content thumb Types of Duplication in Code

Data duplication

The most basic type of duplication is that of data.  It is also very easily recognizable.

Take a look at these methods:

public Position WalkNorth()
{
   var player = GetPlayer();
   player.Move("N");
   return player.NewPosition;
}

public Position WalkSouth()
{
   var player = GetPlayer();
   player.Move("S");
   return player.NewPosition;
}

public Position WalkEast()
{
   var player = GetPlayer();
   player.Move("E");
   return player.NewPosition;
}

public Position WalkWest()
{
   var player = GetPlayer();
   player.Move("W");
   return player.NewPosition;
}

 

Pretty easy to see here what needs to be refactored.

Most developers don’t need any help to realize that you should probably refactor this code to a method like the following:

public Position Walk(string direction)
{
   var player = GetPlayer();
   player.Move(direction);
   return player.NewPosition;
} 

 

In this example data is duplicated.  To be specific the string data of the direction passed into move is duplicated.  We can eliminate that duplication by creating a method that parameterizes the differences represented by that data.

Type duplication

Now, data duplication is where a majority of developers stop, but we can go much farther than that.  In many cases the difference between two methods is only the type in which they operate on.

With the use of generics in C#, we can refactor out type and parameterize this concept as well.

Look at this example:

public int FindIntMatch(int i)
{
   var match = (int)container.Get(i);
   return match;
}

public string FindStringMatch(string s)
{
   var match = (string)container.Get(s);
   return match;
}

 

Here we have two method that do pretty much the same thing, but they just differ on the type they operate on.  Generics gives us the ability to actually refactor out that type information just like we would with data.

public T FindMatch(T t)
{
   var match = (T)container.Get(t);
   return match;
}

 

By refactoring to the above method we have eliminated duplication. We have achieved this by refactoring out type.

Algorithm duplication

Without a good understanding of delegates and functional programming, few developers ever even consider refactoring out algorithm duplication, but it can be done fairly easily.

Take a look at this example:

public void GoForRun()
{
   GetDressed();
   Run();
   Shower();
}

public void LiftWeights()
{
   GetDressed();
   Lift();
   Shower();
}

 

It is a pretty basic example, but it highlights the kind of duplication that I often see left in many code bases.  Delegates in C# allow us to treat functions like data.  With this ability we can easily refactor out the commonality in these two method to get something like this:

public void DoFitnessActivity(Action activity)
{
   GetDressed();
   activity();
   Shower();
}

 

We could have also refactored out this duplication by using an abstract base class and having the inherited classes definite their own fitness activity, but using delegates creates a much simpler approach and casts the problem in the same light as refactoring any other type of data.

Combining them together

Often I find that several different types of duplication are present across several methods in a class.

When this is the case, it is often possible to apply data, type and algorithm duplication refactoring techniques to find the most simple and elegant solutions.

I’ve also found this is a skill that must be practiced.  When I first really started using generics and delegates in C#, I had a hard time finding uses for them, because I could not easily recognize the patterns of duplication that called for them.  But, I found over time that it became easier and easier to recognize where these techniques could be applied to reduce duplication in my methods.

I’ve also found the key to eliminating duplication is sometimes to first exaggerate it.  Often I will purposely take two methods that I know have some duplication and make them look even more duplicated in order to be able to clearly see where the duplication lies.  I may do several small refactoring steps to get to the point where it is easy to identify what data, type or algorithm is being repeated.