Posted by: jsonmez | March 24, 2010

Naming Database Columns, When to Violate DRY

DRY = Don’t Repeat Yourself.

For the most part this is a good principle to follow.  But, there is one particular instance where I feel repeating myself makes things more clear.

It is very important for us to remember that principles are important, but ultimately they are very strong guidelines, not absolutes.  (It is hard to imagine that I just said that, since I am usually arguing on the merit of principles, but I digress.)

Traditions

In particular, when creating database tables, most people will create a table structure that looks like:

Looks perfectly normal and good.  But, I don’t like it.  Imagine you have to join these tables together to create a query.  (A pretty common occurence.)

SELECT * FROM Book
JOIN Author ON Book.author_id = Author.id
JOIN Publisher ON Book.id = Publisher.publisher_id

Did you catch my mistake above?  Probably not.  Most people would overlook it.  With the ER diagram above, and not very many columns on the tables, it is a little bit easier to see.

It is very easy to mix up which table is the parent and which is the child.  It is very easy to put the .id on the wrong side of the join clause.  The correct SQL should have been:

SELECT * FROM Book
JOIN Author ON Book.author_id = Author.id
JOIN Publisher ON Book.publisher_id = Publisher.id

Bucking the trend

Now take a look at this ER diagram:

The corresponding SQL for selecting from this database is:

SELECT * FROM Book
JOIN Author ON Book.author_id = Author.author_id
JOIN Publisher ON Book.publisher_id = Publisher.publisher_id

It is a small change, but it makes a huge difference.  Writing SQL to join the tables is much easier and much more difficult to get wrong.  Yes, I am repeating myself here, but I am doing it for a very good reason.  I am making the SQL much more clear and I am making it much easier to talk about these tables.

Think about the last time you had a conversation about database tables.  Did you say id?  Did you then have to qualify which id you were talking about?  If you never have “id”, you eliminate a large amount of ambiguity.

The major argument against what I am suggesting is that you don’t need to name the id for the book, book_id because it is a column on book, so you know that id refers to the id of the book.  I do not disagree with this.  For the most part that principle is sound.  The question though is “which one is easier to read, write and talk about?”


Responses

  1. Your article has a reasonably put argument, but then you spoil it in the last paragraph by claiming it would be acceptable to leave the ID of the Book table simply left as ID.

    It seems to me you’ve fallen into a very well used trap of failing to consider the future scope of the application.

    What if some future requirement necessitates the introduction of another table that has a foreign key to the Book table. That table is then going to suffer from having to join on Book.ID in the same manner as your Author and Publisher did initially.

    If you argument is to use fully qualified column names for foriegn keys, it should be applied consistently.

    • You are correct. I didn’t mean my last paragraph to say that we should do that. I was intending to indicate that some people may argue against what I had said, by saying “you don’t need to name the id for the book, book_id because it is a column on book, so you know that id refers to the id of the book”

      I don’t disagree with that logic, but overall I think the option I present here is better.

      Thanks for pointing that out though. I wasn’t very clear in what I was trying to say.

  2. I agree, prefix your id columns with the name of the table for clarity… it’s so easy to your JOINs mixed up otherwise.

    BTW I’m not sure this is against the spirit of DRY, which is mostly about not writing the same code over and over. Using slighter longer column names for clarity would probably be encouraged.

    • It could be considered a violation of DRY because you are repeating the name of the table in the column, but I agree with you DRY is mostly applied to writing the same code over again.

  3. I agree. Clarity in queries can almost eliminate a lot of thought regarding looking at code. Especially if you are new to a schema, that little prefix helps to identify the behavior the original dev went for.


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Connecting to %s

Categories

Follow

Get every new post delivered to your Inbox.

Join 603 other followers