Types of Duplication in Code

One of the biggest reasons to refactor code is to eliminate duplication.  It is pretty easy to introduce duplication in our code either unintentionally or because we don’t know how to prevent or get rid of it.

The three types of duplication

I’ve found that there are three basic types of duplication that we can eliminate from our code that successfully build on each other.

  • Data
  • Type
  • Algorithm

Most developers tend to get stuck at the data level, but in this post, I will show you how to recognize type and algorithm duplication and refactor it out of your code.

duplicate content thumb Types of Duplication in Code

Data duplication

The most basic type of duplication is that of data.  It is also very easily recognizable.

Take a look at these methods:

[sourcecode language='csharp'  padlinenumbers='true']
public Position WalkNorth()
{
   var player = GetPlayer();
   player.Move("N");
   return player.NewPosition;
}

public Position WalkSouth()
{
   var player = GetPlayer();
   player.Move("S");
   return player.NewPosition;
}

public Position WalkEast()
{
   var player = GetPlayer();
   player.Move("E");
   return player.NewPosition;
}

public Position WalkWest()
{
   var player = GetPlayer();
   player.Move("W");
   return player.NewPosition;
}
[/sourcecode]

 

Pretty easy to see here what needs to be refactored.

Most developers don’t need any help to realize that you should probably refactor this code to a method like the following:

[sourcecode language='csharp' ]
public Position Walk(string direction)
{
   var player = GetPlayer();
   player.Move(direction);
   return player.NewPosition;
} 
[/sourcecode]

 

In this example data is duplicated.  To be specific the string data of the direction passed into move is duplicated.  We can eliminate that duplication by creating a method that parameterizes the differences represented by that data.

Type duplication

Now, data duplication is where a majority of developers stop, but we can go much farther than that.  In many cases the difference between two methods is only the type in which they operate on.

With the use of generics in C#, we can refactor out type and parameterize this concept as well.

Look at this example:

[sourcecode language='csharp' ]
public int FindIntMatch(int i)
{
   var match = (int)container.Get(i);
   return match;
}

public string FindStringMatch(string s)
{
   var match = (string)container.Get(s);
   return match;
}
[/sourcecode]

 

Here we have two method that do pretty much the same thing, but they just differ on the type they operate on.  Generics gives us the ability to actually refactor out that type information just like we would with data.

[sourcecode language='csharp' ]
public T FindMatch(T t)
{
   var match = (T)container.Get(t);
   return match;
}
[/sourcecode]

 

By refactoring to the above method we have eliminated duplication. We have achieved this by refactoring out type.

Algorithm duplication

Without a good understanding of delegates and functional programming, few developers ever even consider refactoring out algorithm duplication, but it can be done fairly easily.

Take a look at this example:

[sourcecode language='csharp' ]
public void GoForRun()
{
   GetDressed();
   Run();
   Shower();
}

public void LiftWeights()
{
   GetDressed();
   Lift();
   Shower();
}
[/sourcecode]

 

It is a pretty basic example, but it highlights the kind of duplication that I often see left in many code bases.  Delegates in C# allow us to treat functions like data.  With this ability we can easily refactor out the commonality in these two method to get something like this:

[sourcecode language='csharp' ]
public void DoFitnessActivity(Action activity)
{
   GetDressed();
   activity();
   Shower();
}
[/sourcecode]

 

We could have also refactored out this duplication by using an abstract base class and having the inherited classes definite their own fitness activity, but using delegates creates a much simpler approach and casts the problem in the same light as refactoring any other type of data.

Combining them together

Often I find that several different types of duplication are present across several methods in a class.

When this is the case, it is often possible to apply data, type and algorithm duplication refactoring techniques to find the most simple and elegant solutions.

I’ve also found this is a skill that must be practiced.  When I first really started using generics and delegates in C#, I had a hard time finding uses for them, because I could not easily recognize the patterns of duplication that called for them.  But, I found over time that it became easier and easier to recognize where these techniques could be applied to reduce duplication in my methods.

I’ve also found the key to eliminating duplication is sometimes to first exaggerate it.  Often I will purposely take two methods that I know have some duplication and make them look even more duplicated in order to be able to clearly see where the duplication lies.  I may do several small refactoring steps to get to the point where it is easy to identify what data, type or algorithm is being repeated.

  • http://jhonatantirado.wordpress.com/ nathan

    Reblogged this on Nathan.

  • franjobrekalo

    For me the hardest part is when some of the logic is repeated and not all of them. So, I have several functions and some of them repeat some logic, but not all of the functions repeats all logic. I’ve found it easier to just let it be the way it is, but the problem had come, when I had to make changes – I was changing each function, of course. If my code was in one place, then only that function needed to be changed.

    Very well written – I’ve never thought before about algorithm duplication and – of course – mostly I don’t use delegates in my programmming.

  • Pingback: The Morning Brew - Chris Alcock » The Morning Brew #1114()

  • http://gravatar.com/feldmanseancanada Sean Feldman

    John,
    Good post. I used to think that duplication is evil. Certainly duplication is code. Yet there are casing when you want to break down your logic in simple chunks and duplication is intended for the purpose of breaking down system into components. As an example – services. I’d rather have independent services that can be managed separately, then no duplicated code at all and need to touch every service that shares code when one of them is updated.

    • http://simpleprogrammer.com jsonmez

      I agree with you on that, although there are often ways to do both by moving your common methods into a framework.

  • Pingback: Bookmarks for May 28th – [ andalas.com ]()

  • http://penumarthy.com Shashi

    John,

    The type refactoring example you’ve shown needs to be made safer by using constraints. The problem with the way the method looks now is that in assumes container.Get() can support any type T. This means we’ve changed the specification of the method and introduced bugs.

    • http://simpleprogrammer.com jsonmez

      Yes, you are correct. Was just trying to keep to super simple, but in a real world application, you are absolutely right.

  • http://twitter.com/rstackhouse Robert Stackhouse (@rstackhouse)

    Thanks for sharing. I appreciate the fact that with delegates in C# we can make functions (almost) first class citizens.

    To me, the “Algorithm Duplication” term is the most clear. It says, “You are doing these things in this sequence in multiple places with a slight variation. That variation being what method gets called in the middle.”

    In all the cases, what is duplicated is the generic rather than the specific.

    The terms “Type Duplication” and “Data Duplication” seem misleading.

    Type Duplication would indicate to me that you’ve got two classes that do the same thing. “Overload Duplication” might be a better appointed term here.

    Data Duplication, to me, would indicate that you have the exact same data (possibly in the exact same format) in different stores. I’ve primarily been a web developer in my career. Therefore, I naturally think of data in the same terms the pointed haired bosses do: the stuff that lives in the RDBMS. To me, “Method Call Duplication” would be clearer here.

    My $0.02.

  • http://www.devexpress.com/mark Mark Miller

    Hi John, great article. I’m wary of the naming chosen for the different types of duplication. In this article, the first part of the name identifies the piece that *differs* and the second part of the name is “duplication”. So when the data differs, it’s called “data duplication”, two words together that mean the *opposite* of what is really happening. I’m hopeful we can come up with better ways to identify what is changing while still keeping the names for these three kinds of duplication easy to use and meaningful.