A Programmer’s Guide To Effective Debugging
As a software developer, I can guarantee you one thing for sure: you are going to spend a great deal of time debugging code.
There are certain constants in life which are unavoidable: death, taxes, and programmers creating bugs.
Since so much of your time will be spent debugging code, it’s probably a good idea to be good at debugging, don’t ya think?
Unfortunately, many developers—even highly experienced ones—tend to, well… suck.
There are plenty of developers who can whip through new features and sling code around like nobody’s business, but who cleans up the mess of bugs they leave behind?
It’s one thing to know how to write good code; it’s another thing to know how to debug the ugliest code you’ve ever seen in your life that was written by the legendary Bob who built the whole first version of the application in 48 hours in his basement, but was kind of an “odd” fellow.
Fortunately, debugging, like any other skill, is something that can be learned.
If you apply the right techniques and practice, you can become great at it.
Who knows? You might even enjoy it.
It’s about taking a systematic approach to the problem—not rushing it, and not expecting you can just find the problem, get in, and get out.
It’s about staying calm and collected: attacking the problem from a logical and analytical perspective instead of an emotional one.
In this chapter, I’m going to lay out a systematic approach to debugging, which will help you to avoid that dreaded debugger mindset and take your debugging skills to the next level.
What Is Debugging?
Before we dive deep, let’s go shallow.
What exactly is debugging?
It seems pretty obvious, right?
You open up the debugger and you “debug” problems with the code.
Ah, but that is where you are wrong.
Debugging has nothing to do with the debugger.
Debugging has everything to do with finding the source of a problem in a code base, identifying the possible causes, testing out hypotheses until the ultimate root cause is found, and then eventually eliminating that cause and ensuring that it will never happen again.
Ok, I suppose we could call that fixing bugs. Semantics.
The point is, debugging is more than fiddling around in a debugger and changing code until it seems to work.
First Rule of Debugging: Don’t Use the Debugger
Ah, what’s this you say?
A new bug for me to fix?
Oh, this is a hairy one?
Have no fear, sir. I will unleash the full power of my mental arsenal on this unholy terror.
With that mindset, you the programmer sit down at your desk.
You fire up the debugger.
Carefully you step through the code.
Time seems to blur, minutes turn into hours, hours into weeks.
You are an old man sitting at the keyboard, still in the same debugging session, but somehow you are “closer.”
Your children have all grown. Your wife has left you.
The only thing that remains is… the bug.
The first thing most programmers do when they want to debug an issue in the code is to fire up the good old debugger and start looking around.
Don’t do this.
The debugger should be your last resort.
When you fire up the debugger immediately, you are effectively saying, “I don’t have any idea what is causing the issue, but I’m just going to look around and see.”
It’s like when your car breaks down and you don’t know jack shit about cars, so you open up the hood and look for something wrong.
What are you looking for?
You don’t even know.
Don’t get me wrong.
The debugger is a wonderful and powerful tool.
Used properly, the debugger can help you solve all kinds of problems and can help you see what happens when your code is running.
It’s not, however, the place to start, and many bugs can be solved without ever touching the debugger.
You see, just like Facebook or funny YouTube cat videos, the debugger has a way of sucking you in.
Reproduce the Error
So, if you don’t simply fire up the debugger to debug a problem, what do you do?
Well, I’m glad you asked.
The first thing any sane person should do is to reproduce the bug to make sure that it is actually a bug and that you will be able to debug it.
One-hundred percent of problems that can’t be reproduced can’t be debugged.
So, if you can’t reproduce the problem, there ain’t no point in debugging it. Ya hear me?
Not only can you not debug a problem that can’t be reproduced, but also, even if you did fix it, you can’t verify it was fixed.
So, the very first thing you should do when you are trying to debug a bug is to make sure you can reproduce the bug yourself.
If you can’t, go and get help.
If a tester filed the bug, get them to reproduce it for you.
If the bug is intermittent and can’t be reliably reproduced, this means that you do not know the circumstances required to reproduce the problem.
There is no such thing as an intermittent problem.
If it is a problem, it can be reproduced; you just have to know how.
Side note on intermittent problems:
Ok, so your boss is demanding that you fix the problem.
They’ve seen it in production. Customers have seen it. It’s definitely a problem.
The “I can’t reproduce it” pushback is not working—they aren’t buying it.
What do you do?
You still can’t debug a problem that you can’t reproduce.
But what you can do is gather more evidence.
Insert logging statements in the code.
Gather as many details as possible about when the problem happens and under what conditions it happens.
Artificially recreate the environment and circumstances if you can.
Do not be tempted to throw in “fixes” for the problem that you can’t recreate.
If you don’t understand the problem enough to recreate it, you have a very, very low chance of accidentally fixing it by a guess, and you will have an extremely difficult time knowing if your fix even worked.
Find a way to reproduce the problem, even if it is only reproducible in production.
Sit and Think
After you can reproduce your issue, the next step is a step most software developers skip because they are so eager to solve the problem—but this step is crucial.
It’s a really simple step.
Just sit and think.
Yes, that’s right.
Think about the problem and what the possible causes could be.
Think about how the system works and what might bring about the odd behavior you're seeing.
You are going to be in a rush to jump into the code and into the debugger and start “looking at things,” but before you start looking at things, it’s important to know what you are looking for and what things to look at.
You’ll likely come up with a few ideas or hypotheses about what might be causing the issues.
If you don’t, be patient. Keep sitting and thinking.
Stand and walk around if it helps, but before you move on, you should at least have a few ideas that you want to test.
If you absolutely can’t come up with anything, continue to resist firing up the debugger, and instead take a browse through the source code and see if you can gather a few more clues about how the system is supposed to work.
You should have at least two or three good hypotheses you can test before you move on from this step.
Test Your Hypotheses
Ok, so you’ve got some good hypotheses, right?
The flux-capacitor is connected to the thingamabob, so if the voltage coming out of the whozitswatt is below grade… THE THINGAMABOB MUST BE CONFIGURED INCORRECTLY!
Err… something like that.
Ok, let’s fire up the debugger and test our hypotheses! Yeah, man, let’s do it!
Hold up there, young buck.
We don’t need the debugger just yet.
Wait, what? How am I going to test my hypotheses if I can’t use the debugger, you ask?
Yes, that’s right: unit tests.
Try to write a unit test to test your hypotheses.
If you think some part of the system isn’t working correctly, write a unit test that you think will exploit the issue.
If you are right and you’ve found the problem, you can fix it right then and there, and now you will have a unit test in place to verify the fix and ensure it never happens again.
(Still make sure you try and reproduce the actual bug, though, before you call it fixed.)
If you are wrong and the unit test you write passes as expected, you’ve just made the system a little more robust by adding another unit test to the project, and you’ve disproved one of your hypotheses.
Think of it as ratcheting up the problem space.
Every time you write a unit test and it passes, you are eliminating possibilities. You are traversing through your debugging journey by locking and closing doors behind you as soon as you find out they are dead-ends.
If you’ve ever been lost for hours or days in a debugging session, you should immediately realize how valuable this is.
One of the reasons why the debugger is so bad is because it can encourage us to revisit the same wrong corridors over and over again as we check and recheck our assumptions, either forgetting what we already looked for or not trusting that we looked hard enough.
A unit test is like climbing a mountain and putting an anchor in place that makes sure you can’t fall too far backwards.
Writing unit tests to test your hypotheses will also ensure that you aren’t haphazardly trying things and looking around.
You have to have a specific assumption you are testing when you write a unit test in order to help debug a problem.
Now, I’m a realist.
I know that sometimes it will be extremely difficult or impossible to write a unit test to test a hypothesis.
In this case, it’s ok to fire up the debugger, but only if you obey this one rule:
Have a specific purpose for doing it.
Know exactly what you are looking for and what you are checking when you use the debugger.
Don’t merely go in there to look around.
I know it may seem like I’m being a bit anal and pedantic about this whole thing but trust me, there is a reason for it.
I want you to become a skilled debugger, and you are only going to get that way by being deliberate about how you debug.
Check Your Assumptions
Most of the time, your hypotheses are not going to pan out.
That’s just life.
If that’s the case, the next best thing you can do is to check your assumptions about how things are working.
We typically assume that code is working a certain way or that some input or output must be some value.
Often we think, “Well, this can’t possibly be happening. I’m looking at the code right here. There is no way it could be producing this output.”
Often, we are wrong.
It happens to the best of us.
The best thing you can do with these assumptions is to check them.
And what’s the best way to check them?
Yes, that’s right. More unit tests.
Write some unit tests that check obvious things which “have to be working” along the workflow of the problem you are trying to debug.
Most of these tests should easily pass, and you’ll say, “Duh.”
But, every once in awhile, you’ll write a unit test to test some obvious assumption and the results will shock you.
Remember, if the answer to your problem was obvious, it wouldn’t be a problem at all.
Once again, the pragmatist side of me has to tell you that, yes, it’s ok to open up the debugger to check your assumptions as well.
But only after you’ve tried to check the assumptions using unit tests first.
Again, it’s like climbing that mountain and putting in anchors along the way.
Avoid the debugger if you can, use it if you must, but, once again, only to validate or invalidate specific assumptions you have already formed.
Divide and Conquer
I remember working on a really hairy bug with a printer incorrectly interpreting a print file written in the PostScript printing language.
I tried everything I could think of to debug the problem.
I tested all kinds of hypotheses.
Nothing panned out.
It seemed like the bug was some kind of combination of multiple commands in the print file, and I had no idea which ones it was.
So, what did I do?
Well, I cut the print file in half.
The bug was still there.
So, I cut it in half again.
It disappeared this time.
I tried the other half. Back again.
I kept hacking away at big chucks of the print file until I got the entire file down from several thousand lines of code to just five.
The five lines of code that, in that order, produced the bug.
Sometimes when you get stuck debugging, what you need to do is figure out a way to cut the problem in half—or take as big of a chunk out of it as possible.
Depending on the problem, this could look very different, but try and think of ways you can eliminate a large amount of code or remove a large amount of the system or variables and still reproduce the bug.
See if you can come up with tests which completely eliminate parts of the system for being responsible for the error.
Then do it again… and again.
If you keep hacking away, you’ll likely find the critical components required to create the error, and then the problem can become relatively easy to solve.
If You Fix It, Understand Why
I’m going to give you one final piece of advice about debugging—although I’m sure I could write a whole book on the subject.
If you fix a problem, understand why what you did fixed it.
If you don’t understand why what you did fixed the problem, you are not done yet.
You may have inadvertently caused a different problem, or—very likely—you haven’t fixed your original problem.
Problems don’t go away on their own.
If you didn’t fix the problem, I can guarantee you it’s not fixed. It’s just hiding.
But if you did fix the problem, don’t stop there. Explore a little deeper, and make sure you understand exactly what was going on that caused the problem in the first place and how what you did fixed it.
Too many software developers debug a problem by twiddling bits, the code apparently starts working, and they assume it is fixed without even knowing why.
This is a dangerous habit for many reasons.
As I mentioned above, when you randomly tweak things in the system and change bits of code here and there, you could be causing all kinds of other problems which you aren’t aware of.
But, perhaps more than that, you are training yourself to be a shitty debugger.
You are developing the habit of messing with things until it works. No technique, no rigor.
You may get lucky sometimes, but you won’t have a repeatable process or reliable skillset for debugging.
Not only should you understand what broke, why, and how you fixed it, but also you should verify the fix.
I know it seems like common knowledge, but I can’t tell you how much time is wasted by programmers “fixing a problem,” assuming the fix worked, and passing the code to QA only for QA to reproduce the problem and have it go back to the developer who has to start over at square one again.
It’s a huge waste of time that can be prevented by taking an extra five minutes to verify that what you fixed is actually fixed.
In fact, don’t just verify the fix; write a regression test for the problem so that it never happens again.
If you truly understand the problem you fixed, you should be able to write a unit test that exploits the issue, and then your fix should make that unit test pass.
Finally, look for other instances of this same class of bug.
Bugs tend to hang out together.
If you found something wrong with one assumption you made about the system or some incorrectly coded component, it’s very likely that there are other issues which are also caused by that same problem.
Again, this is why it is so critical that you understand what the real problem was and why your solution fixed it.
If you know what happened and why, you can quickly figure out if there are likely tobe other issues caused by the same underlying problem.
Art and Science
Remember, debugging—like software development—is part art and part science.
You can only get good at debugging by practicing.
But practicing is not enough. You have to specifically, systematically debug, not just play around in the debugger.
Hopefully, I’ve given you a good overview of how to do that; now the rest is up to you.