How to Choose Good Names in Code
Choosing a good name for a piece of code is so important. If your code is going to be read at least one time—if only by yourself—then names will play a major part in your capacity to work with it.
Variable names, function names, class names, names in an interface: all are priceless ways to let a reader know what a piece of code is doing. During code review at work, I'm quite picky with my team members concerning good naming—sorry about that, lads!—but I believe this can make or break the quality of our code.
Even if there are other valuable means to know what a piece of code is doing, like documentation for instance, good names are an extremely efficient channel to convey information about your code for at least two reasons:
- Very good names instantly tell what is going on in your code, as opposed to looking up the documentation and finding your way around code by following it.
- Naming can be improved quickly. You can make a quick fix that updates some names in the code, manually or by using a tool (such as the popular clang-tidy for example), and if your code builds, you're nearly certain it will pass the tests.
Because it’s so important, I’m going to provide you with guidelines on how to choose good names. I've taken some of these guidelines from Steve McConnell’s reference book, Code Complete (which is part of the must-reads John advises in his post “How to Get Started in Software Development”).
Some of the tips come from discussions, suggestions, and code reviews I’ve had with my peers at work. A couple of them I've worked out on my own by experimenting with code over the years.
We'll start by explaining how to avoid bad names and then focus on how to pick good ones.
Don't Use Anything Illegal
Let's get this out of the way: there are names that you are just not allowed to use in a given language.
Besides using names reserved by the standard (like ‘int’), which will halt compilation, each language has its specific rules for legal names. For example in C++, some combinations of underscores (_) in a name will compile while not being legal, because they are reserved for the compiler or standard library implementer. Using them may conflict with objects or routines declared by them, leading to subtle bugs and unexpected behaviour.
Here are the names that are reserved for the compiler and standard library implementers in C++:
- Any name with two consecutive underscores in it (__)
- Any name starting with one underscore immediately followed by a capital letter (_isOk, isOk_too, _IsNotOk)
- A name starting with one underscore and in global namespace
So don't consider using such names, as they could get you into trouble.
Don't Waste Information
When you think about it, your code perfectly knows what it is doing. In fact, it is the one that knows best: it executes what’s in it as faithfully as it possibly can!
Giving good names is really about retaining as much of the information that the code holds as you can. Said differently, it is about not wasting information by obfuscating the code. It’s interesting to note that hiding information is often encouraged, via encapsulation. But in this context it is rather information disclosure that you want to aim for.
For this reason, limit the use of abbreviations. Abbreviations and acronyms are convenient to write but difficult to read. As the saying goes, code is written once but read many times.
However, you don't have to systematically spell out all acronyms to make code clearer, and some repeated unabbreviated code can even harm readability.
For instance, it seems reasonable to use “VAT” in your code instead of writing valueAddedTax every time you use it, because everyone knows what VAT is.
How do you choose whether or not to use an acronym in code? Rule of thumb: if the end user of your application would understand a particular abbreviation or acronym, then it is OK to use it in code because it shows that everyone in your domain area knows what it means.
Moreover, don't try to optimize for the minimum number of characters. On forums, you can see guys that argue that their method is superior because it involves less typing. But which is more of a hassle: a couple of keystrokes, or a couple of minutes staring at code, trying to figure it out?
You certainly don’t want to waste time figuring out what a function and method name meant, especially when you can make their names as long as necessary. Research from the University of Southampton (Rees 1982) suggests that function and method names can reasonably go up to 35 characters, which sounds like a lot, but a longer name can help you understand what a piece of code is and what it does with only a quick glance.
However, the length of a function name can also become bloated for bad reasons:
- If a function’s name is too long because the function is doing too many things, the fix is not in the naming, but at the function level itself by breaking it down into several logical parts.
- Function names get artificially bloated when they include superfluous information that is already expressed by their parameter types. For instance:
can be renamed:
This leads to more natural code at call site:
as opposed to:
- Negations are another example of undesirable information in a name, because they force a reader to make the intellectual effort of reverting them to their positive meaning in order to parse the code. The following example:
can be improved by using an affirmative name:
Now that we’ve ruled out certain bad naming practices, let’s focus on how to pick good names.
Pick Names Consistent with Abstraction Levels
As described in a previous post, respecting levels of abstraction is at the root of many good practices. And one of these practices is good naming.
A good name is a name that is consistent with the level of abstraction of the surrounding code. As explained in the post on levels of abstraction, a good name expresses what code is doing, not how it is doing it.
To illustrate this, let’s take the example of a function computing the salaries of all the employees in a company. The function returns a collection of results associating keys (employees) to values (salaries).
A bad function name, focused on how the function is implemented, would be:
The problem with such a function name is that it expresses that the function computes its results in the form of a vector of pairs instead of focusing on what it does, that is, computing the salaries of the employees. A quick fix for this would be to replace the name with the following:
This relieves the call site from some implementation details, letting you—as a reader of the code—know what the code is intending to do.
Respecting levels of abstraction has interesting consequences for variables and object names. Often in code, variables and objects represent something more abstract than what their type implies.
For example, an int often represents more than just an int: it can represent the age of a person or the number of elements in a collection. Or a particular object of type Employee can represent the manager of a team. Or an std::vector<double> can represent the daily average temperatures observed in New York over the last month. (Of course, this doesn't hold in very low-level code, like adding two ints, or in places where you use strong types.)
In such cases, you want to name the variable after what it represents rather than its type. You’d name your int variable ‘age’, rather than ‘i’. You'd name the above Employee ‘manager’ and not just ‘employee’. You'd name the vector ‘temperatures’ rather than ‘doubles’ .
This seems quite obvious, yet there are cases where we generally neglect to apply this guideline. Let’s illustrate this with iterators and C++ templated types.
Let's take a collection of cash flows paid or received from a financial product. Some of these cash flows are positive; some are negative. We want to retrieve the first cash flow that went towards us, so we’ll focus on the first positive one. Here is a first attempt at writing this code:
This code uses the name ‘it’, reflecting how this variable is implemented (with an iterator), rather than what the variable means. How do you compare this to the following code?
Which code saved you the most effort understanding it? Can you imagine the difference when you don’t have to read two lines of code but 10, or 50? Note that this ties in with our previous section about not wasting the precious information code knows about itself.
The same logic applies to template parameters. Especially when learning to use templates, where a lot of the examples we see come out of academic sources, we have a tendency to write the following line of code for all our template classes and functions:
…while you may know more about T than that, it is just a type.
Using T as a type name is fine in very generic code where you don’t know anything about the type, like in std::is_const:
But any information about what T represents should be worked into your code. Let’s take here the simple example of a function parsing a serialization input:
And by showing more explicitly what T represents:
Compare the two pieces of code. Which one do you think is easier to work with? I would definitely say the second one because it tells you that the type is the one to be parsed, whereas the first one only told that T is… a type.
You may think naming the type makes a big difference or you may think it doesn’t. But what is certain is that the second piece of code includes more documentation in it, and for free. The time it takes to replace a bad name with a good one in your own code, particularly a name used locally, is close to zero, and the performance cost incurred by a clearer name is exactly equal to zero.
This absence of cost is true for good naming in general: for once there is a free lunch out there, so let’s take advantage of it.