The art of debugging

How to fix a bug: A structured approach

No plan of operations extends with any certainty beyond the first contact with the main hostile force Field Marshall Helmuth Carl Bernard Graf von Moltke

Even with the best of intentions and most disciplined of development habits, bugs will surface in your code when other people start using it. The process of fixing bugs requires some methodical detective work, some imagination, and a healthy dose of scepticism. Approach bugfixing as an intellectual challenge, a dungeon master’s puzzle. Sheer logical brute force can get to the bottom of all but the most obscure bugs.

1. Replicate the bug

Although it is tempting to dive straight into the code and go bughunting, you can quickly get lost without a clear picture of the problem. Make sure you can replicate the bug consistently, because the bug report might be inaccurate, or making an assumption that needs to be checked. Maybe it only happens in a small number of circumstances.

Replicate it in the environment it was reported as occurring in, that should prove the bug report is accurate. Then replicate it again in your development environment, to make sure your environment has the same bug.

A typical mistake is to assume you know what the issue is, and dive straight into the code and make a change. Does the change work?

Replicating the bug helps build a more accurate understanding of the issue, and perhaps help identify what factors play a part in the bug, and what is unaffected.

Spending time replicating the bug checks that you have the right information available. Starting with checking the accuracy of the bug report, then checking that you understand the nature of the bug. This puts you into a good starting context for tracking down the bug.

Try some variations of replication steps, see if you can simplify the steps to replicate. Spending time here helps eliminate potential problem areas, and help identify characteristics that may play a part in the bug.

At the end of this step, you should be able to consistently replicate the bug in your development environment. That means, when you make a code change to fix this bug, you will be able to check whether the code change has made a difference or not, and whether that difference is an improvement or not.

One good idea, if your team uses functional testing tools like Selenium, is to script the steps necessary to replicate the issue. It should be a team habit, of writing a test for each bug. This not only helps automate the repetitive retesting, but helps ensure that if the bug does reoccur, then you have a test that will spot it.

2. Target the code that presents the bug

Now we can open our code editor and explore the code. The obvious starting point is the code that renders the incorrect information or behaviour.

The first step is to understand what this particular portion of code is doing. The code only does what it is written to do, nothing more, nothing less.

The main question to answer is how does the code you’re looking at differ from what is expected to happen? Is it because the code does not cover the use-case the bug is about? Is it because a variable has an unexpected result? Is it because the code path skips the code that should have produced the right result?

3. Isolate the bug

Now we isolate the code that causes the bug. The goal of this step is to establish where the code starts to go wrong. So we start at the place where we know definitively there is a problem. And then we can backtrack through code that affects the end result.

Finding the real source of the bug takes discipline. Don’t make assumptions about whether the code is doing the right thing or not. Use output debugging or breakpoint debugging tools to confirm exactly the state of the code at each step. Assumptions are basically blackholes where bugs remain hidden, until you question and verify those assumptions. Get each section of the code to prove to you it is functioning as you expected. Do this either by output logging, or step-by-step debugging.

You should quicky establish the code path from entry to exit. So start at the highest appropriate level in the code looking for the point in the code where the actual state of the application differs to what you expected. Also confirm that a critical piece of code is actually being executed (and executed at the right time).

Variables are incorrect either because they were not set at all, set incorrectly, or set correctly but clobbered later. Isolating a bug is determining which of these cases is the root cause.

When you’ve isolated which high-level step introduces the bug, then you can dig into that particular method and figure out at which step in that method the state of the application goes wrong. Check at the start and the end of the method that the application state is what you expected. If the end result isn’t what you expected, and the start result is, then somewhere in this method (and the code it calls) the bug is lurking.

Get the code to tell you which statement / expression is causing the application state to go wrong. Debug as much as you can at each level of code until you are confident which statement is causing the problem. Then before you dig deeper into that statement, remove the debug statements/breakpoints you no longer need.

This is a divide and conquer strategy for isolating a bug. You have a beginning debug statement that confirms the application state is right (by outputting the most relevant variable in the current method/scope), and an ending debug statement that confirms that the application state is no longer correct. That way you know that somewhere in the code between these two debug statements is the code containing the error.

As you progress you are getting closer and closer to the actual source of the bug. You’ll have reached that bug when your start and end debug statements cover just a handful of lines that cannot be broken down any further.

Your debugging should confirm that the smallest chunk of code possible is what is causing the bug.

4. Understand the bug

With just a small portion of code that introduces the bug, you need to answer the question why. Why is the code as written introduce this bug?

  • Is it an unfulfilled requirement?
  • Is the code incorrect?
  • Is there an edge case the code isn’t supporting?
  • Is the code path bypassing an important step?

You know what the bug is, and the code that’s directly participating in that bug. So now comes the time to resolve why there is a difference. When you have a satisfactory answer to that question, you’re primed to understand what a bug fix looks like.

And, in a good quality code base, you should be able to write a unit test that replicates the bug in an appropriate context of that component.

5. Make the cleanest possible fix

Now that you understand why the code is incorrect, hopefully the steps needed to fix the bug are obvious. If you are lucky, a small change in the code currently under investigation will fix your bug.

More typically, the bug is more subtle than that, so it might take several tiny fixes in different places.

More difficult, the bug needs a refactoring of the existing code to separate out tightly coupled code because you need something right in the middle.

By isolating the chunk of code that exhibits the error, with a decently architected codebase, the fix should be entirely encompassed within that component or in the immediate vicinity.

6. Confirm the bug is fixed

A bug fix should then be confirmed by the unit test on the component or module that exhibuts the bug. When the output matches the expected output, this particular bug has been squashed.

Now check whether that has fixed the bug that was initially reported. If it has, you are done.

Test, looking for any side-effects. Typically fixing one bug uncovers another. So start the debugging process again from the beginning. Take a step back, understand the nature of this newly surfaced bug.

When the bug, and the bugs it was hiding are fixed, run your unit tests and functional tests. And then check your fixes in, and resolve the bug ticket.

Debugging tips

Debugging is the science of determining why code doesn’t do what it was supposed to do. A disciplined question and answer approach can help narrow down and unlock the source of the bug.

  • Approach bugfixing systematically
  • Identify and validate assumptions
  • Test every hypothesis
  • Verify that each piece of code is doing exactly what you expect.
  • Complement each fix with a test that confirms the bug doesn’t reoccur.
  • Treat debugging as a game of 20 questions: asking the right question will unlock the issue a little more
  • When you get stuck, step away from the keyboard. Give your brain time to think.