girl looking at readable code

What Makes Readable Code: Not What You Think

You often hear about how important it is to write “readable code.”

Developers have pretty strong opinions about what makes code more readable. The more senior the developer, the stronger the opinion.

But have you ever stopped to think about what really makes code readable?

The standard answer

You would probably agree that the following things, regardless of programming language, contribute to the readability of code:

  • Good variable, method and class names
  • Variables, classes and methods that have a single purpose
  • Consistent indentation and formatting style
  • Reduction of the nesting level in code

There are many more standard answers and pretty widely held beliefs about what makes code readable and I am not disagreeing with any of these.

(By the way, an excellent resource for this kind of information about “good code” is Robert Martin’s excellent book, Clean Code, or Steve McConnell’s book that all developers should read, Code Complete. *both of these are affiliate links, thanks for your support.)

Instead, I want to point you to a deeper insight about readability…

The vocabulary and experience of the reader

I can look at code and in 2 seconds tell if you it is well written and highly readable or not.  (At least my opinion.)

At the same time, I can take a sample of my best, well written, highly readable code and give it to a novice or beginner programmer, and they don’t spot how it is different from any other code they are looking at.

Even though my code has nice descriptive variable names, short well named methods with few parameters that do one thing and one thing only, and is structured in a way that clearly groups the sections of functionality together, they don’t find it any easier to read than they do code that has had no thought put into its structure whatsoever.

In fact, the complaint I get most often is that my code has too many methods, which makes it hard to follow, and the variable names are too long, which is confusing.

There is a fundamental difference in the way an experienced coder reads code versus how a beginner does

An experienced developer reading code doesn’t pay attention to the vocabulary of the programming language itself.  An experienced developer is more focused on the actual concept being expressed by the code—what the purpose of the code is, not how it is doing it.

A beginner or less experienced developer reads code much differently.

When a less experienced developer reads code, they are trying to understand the actual structure of the code.  A beginner is more focused on the actual vocabulary of the language than what the expression of that language is trying to convey.

To them, a long named variable isn’t descriptive, it’s deceptive, because it is hiding the fact that NumberOfCoins represents an integer value with its long name and personification of the variable, as something more than just an integer.  They’d rather see the variable named X or Number, because its confusing enough to remember what an integer is.

An experienced developer, doesn’t care about integers versus strings and other variable types.  An experienced developer wants to know what the variable represents in the logical context of the method or system, not what type the variable is or how it works.

Example: learning to read

Think about what it is like to learn to read.

When kids are learning to read, they start off by learning the phonetic sounds of letters.

When young kids are reading books for the first time, they start out by sounding out each word.  When they are reading, they are not focusing on the grammar or the thought being conveyed by the writing, so much as they are focusing on the very structure of the words themselves.

Imagine if this blog post was written in the form of an early reader.

Imagine if I constrained my vocabulary and sentence structure to that of a “See Spot Run” book.

Would you find my blog to be highly “readable?”  Probably not, but kindergarteners would probably find it much more digestible.  (Although they would most likely still snub the content.)

You’d find the same scenario with experienced musicians, who can read sheet music easily versus beginners who would probably much prefer tablature.

An experienced musician would find sheet music much easier to read and understand than a musical description that said what keys on a piano to press or what strings on a guitar to pluck.

Readability constraints

Just like you are limited to the elegance with which you can express thoughts and ideas using the vocabulary and structure of an early reader book, you are also limited in the same way by both the programming language in which you program in and the context in which you program it.

This is better seen in an example though.  Let’s look at some assembly language.

[sourcecode language=”cpp” padlinenumbers=”true”] .model small
.stack 100h

.data
msg db ‘Hello world!$’

.code
start:
mov ah, 09h ; Display the message
lea dx, msg
int 21h
mov ax, 4C00h ; Terminate the executable
int 21h

end start
[/sourcecode]

This assembly code will print “Hello World!” to the screen in DOS.

With x86 assembly language, the vocabulary and grammar of the language is quite limited.  It isn’t easy to express complex code in the language and make it readable.

There is an upper limit on the readability of x86 assembly language, no matter how good of a programmer you are.

Now let’s look at Hello World in C#.

[sourcecode language=”csharp”] public class Hello1
{
public static void Main()
{
System.Console.WriteLine("Hello, World!");
}
}
[/sourcecode]

It’s not a straight across the board comparison, because this version is using .NET framework in addition to the C# language, but for the purposes of this post we’ll consider C# to include the base class libraries as well.

The point though, is that with C#’s much larger vocabulary and more complicated grammar, comes the ability to express more complex ideas in a more succinct and readable way.

Want to know why Ruby got so popular for a while?  Here is Hello World in Ruby.

[sourcecode language=”ruby”] puts "Hello, world"
[/sourcecode]

That’s it, pretty small.

I’m not a huge fan of Ruby myself, but if you understand the large vocabulary and grammar structure of the Ruby language, you’ll find that you can express things very clearly in the language.

Now, I realize I am not comparing apples to apples here and that Hello World is hardly a good representation of a programming language’s vocabulary or grammar.

My point is, the larger the vocabulary you have, the more succinctly ideas can be expressed, thus making them more readable, BUT only to those who have a mastery of that vocabulary and grammar.

What we can draw from all this?

So, you might be thinking “oh ok, that’s interesting… I’m not sure if I totally agree with you, but I kind of get what you’re saying, so what’s the point?”

Fair question.

There is quite a bit we can draw from understanding how vocabulary and experience affects readability.

First of all, we can target our code for our audience.

We have to think about who is going to be reading our code and what their vocabulary and experience level is.

In C#, it is commonly argued whether or not the conditional operator should be used.

Should we write code like this:

[sourcecode language=”csharp”] var nextAction = dogIsHungry ? Actions.Feed : Actions.Walk;
[/sourcecode]

Or should we write code like this:

[sourcecode language=”csharp”] var nextAction = Actions.None
if(dogIsHungry)
{
nextAction = Actions.Feed
}
else
{
nextAction = Actions.Walk;
}
[/sourcecode]

I used to be in the camp that said the second way was better, but now I find myself writing the first way more often.  And if someone asks me which is better, my answer will be “it depends.”

The reason why it depends is because if your audience isn’t used to the conditional operator, they’ll probably find code that uses it confusing.  (They’ll have to parse the vocabulary rather than focusing on the story.)  But, if your audience is familiar with the conditional operator, the long version with an if statement, will seem drawn out and like a complete waste of space.

The other piece of information to gather from this observation is the value of having a large vocabulary in a programming language and having a solid understanding of that vocabulary and grammar.

The English language is a large language with a very large vocabulary and a ridiculous number of grammatical rules.  Some people say that it should be easier and have a reduced vocabulary and grammar.

If we made the English language smaller, and reduced the complex rules of grammar to a more much simple structure, we’d make it much easier to learn, but we’d make it harder to convey information.

What we’d gain in reduction of time to mastery, we’d lose in its power of expressiveness.

One language to rule them all?

It’s hard to think of programming languages in the same way, because we typically don’t want to invest in a single programming language and framework with the same fervor as we do a spoken and written language, but as repugnant as it may be, the larger we make programming languages, and the more complex we make their grammars, the more expressive they become and ultimately—for those who achieve mastery of the vocabulary and grammar—the more readable they become. (At least the potential for higher readability is greater.)

Don’t worry though, I’m not advocating the creation of a huge complex programming language that we should all learn… at least not yet.

This type of thing has to evolve with the general knowledge of the population.

What we really need to focus on now is programming languages with small vocabularies that can be easily understood and learned, even though they might not be as expressive as more complicated languages.

Eventually when a larger base of the population understands how to code and programming concepts, I do believe there will be a need for a language as expressive to computers and humans alike, as English and other written languages of the world are.

What do you think?  Should we have more complicated programming languages that take longer to learn and master in order to get the benefit of an increased power of expression, or is it better to keep the language simple and have more complicated and longer code?

If you like this post don’t forget to Follow @jsonmez or subscribe to my RSS feed.

  • http://gravatar.com/jasperavisser Jasper a. Visser

    Small detail: it’s a ternary operator, not tertiary operator. (Tertiary means ‘third’, ternary means ‘composed of three components’).

    • http://simpleprogrammer.com jsonmez

      Ah, you are right. Thanks. I always get that confused. Thanks for pointing that out. Fixed now.

  • informatimago

    “An experienced developer wants to know what the variable represents in the logical context of the method or system, not what type the variable is or how it works.”

    I read this as an argument for programming languages that don’t force the programmer to annotate the types. (lisp, haskell, and their derivatives).

  • http://csharpindepth.com jonskeet

    It’s *a* ternary operator (and the only one) but its name is the *conditional* operator. It’s a ternary operator because it has three operands – but that doesn’t same anything about what it does.

    More importantly, your usage of it is invalid (as least in C# and Java) because the conditional operator can’t be used as a stand-alone statement, only an expression within a statement.

    I’d also argue that experienced developers often care very much about the types of variables… and the information that type implies can be used to make the variable name simpler without losing any context.

    • http://simpleprogrammer.com jsonmez

      Thanks Jon! You are right of course.
      I have updated the wording and examples.
      That is what I get for writing code directly into the browser. I wrote the example how I wanted the conditional operator to work. :)

      I agree that experienced developer do care about the types of variables.

      I only mean to place an emphasis on that when they are reading code, they are not focusing on the actual technical details of the language as much as a beginner would be, but their focus is more on the actual concept the code is trying to convey.

      In many cases, it doesn’t require even knowing the variable type… Hence, the usage of var so often in C# that doesn’t seem to affect readability for many experienced developers, but seems to hamper readability to some degree for beginners.

      For example, if I write some code like this:

      var coins = sorter.sortQuarters();

      You do not need to know what actual type the variable coins represents. (At least at first glance, to understand the concept the code conveys.)

      Not sure how well I conveyed that point in the original post, so thanks for pointing that out as well.

  • Johan Samyn

    Interesting point of view, John. How do you fit into this reasoning rather small but nonetheless expressive languages, like the new Go for example ?

    • http://complextosimple.wordpress.com jsonmez

      Go is awesome, it has a very high expressiveness in a short syntax. But, even thought the vocabulary is smaller, the expressiveness comes from some added complexity.

  • informatimago

    Programing languages are a little diferent from natural languages in that you are more or less constantly redefining the language. The more so in higher level programming languages like lisp or haskell, where you can easily use metalinguistic abstraction, in addition to data abstraction, functional abstraction ans syntactic abstraction (cf SICP).

    Essentially, we are writing programs as if we started our books with a chapter or two defining the grammar and vocabulary of a new Esperanto or Volapuk, in which the rest of the book is written. For each book!

    And this is a good think, since it let’s write very nice books, in which the language use is the most adapted to the ideas expressed in each book.

  • Pingback: In the News: 2013-04-15 | Klaus' Korner()

  • Pingback: OTR Links 04/16/2013 | doug --- off the record()

  • jernfrost

    Great observation. I have observed something similar when writing abstract code. Often code is more reusable and flexible when it is written in an abstract form than when it deals with very concrete instances. The problem is that concrete code is easier to understand than abstract code. Abstract code requires more understanding of software patterns.

    Difficult choice. Lots of boilerplate concrete code or short abstract code that people have problem understanding.

  • Adam Davis

    Nice article. I do have one quibble, however. I don’t agree with one of your first statements:

    >> Developers have pretty strong opinions about what makes code more readable. The more senior the developer, the stronger the opinion.

    It’s actually the second part that I don’t agree with. As a developer, I definitely have strong opinions regarding readability. But as I acquired more experience (or as I got older) I became more open to different formatting and naming standards, especially language-specific ones. There seems to be a tendency among new developers to latch on to a specific style or language (which they’re successful with) and to think that everything should look like that. I did that when I was less experienced, and I’ve noticed younger developers doing the same thing.

    Now I’m more likely to tailor my code to fit the naming conventions of the language it’s written in. I’m also less likely to whine about reformatting my code to fit into company standards or surrounding code.

    • jsonmez

      I agree, as long as formatting and conventions are standardized, what they are isn’t as important to the readability of the code as much as things like naming and actual structure is.

  • http://twitter.com/tamasrev Tamas Rev

    It’s interesting to say that lower level languages can express more. I usually hear that higher level, “scripting” languages let us express more.

    I’d resolve it like this: lower level languages can express more in terms of the computer. Scripting languages, on the other hand, express features in a more compact form. As a side-effect, they speed up development and slow down the resulting software. (Unless this scripting language a functional language and the software is designed for scalability.)

    • jsonmez

      That is a good way of looking at it, I agree.

  • tz1

    There is also a difference between reading and understanding, and I’m not sure with the small snippets either is well illustrated.

    The structure, the overall flow in the text of a program should say something about what is going on.

    You have too many methods if you can’t state a reason for splitting some isolated and linear process into them where they are called from exactly one place. If there are duplications there are too few.

    Variables should be named, not just have names, “CharlesPrinceOfWales” is longer than “Joe”, so which do you think is more significant? If a ephemeral loop index is as long as the master state variable, it will be confusing.

  • http://ionrock.org Eric Larson

    One way to bridge the gap between learning and comprehension is to write comments. More specifically, providing commentary as to what the code does and why it works the way it does. Some prose describing why a specific technique or tool was used can go a long way to provide insights into the next developer that looks at the code.

    Working code has an extreme level of confidence that can be a detriment when trying to fix bugs. The code is emotionless and has no means of letting the reader know it is actually a hack or compromise. Comments are a great way (in any language) to present insights into the code that would be impossible to glean otherwise.

  • thesunnyk

    Nice observation. I do want to take issue with your conditional operator example, however, and I suspect it’ll yield something interesting:

    The two statements are actually saying different things. The first statement is saying:

    “I’m assigning my next action to feed if the dog is hungry, or walk otherwise”

    The second statement is saying:

    “I’m assigning my next action to None.

    if the dog is hungry, I’m changing my next action to feeding the dog

    Otherwise, I’m changing my next action to walking the dog”.

    You can see that, despite taking less syntax, the second statement is actually more cumbersome, and has problems. The nextAction is rebound, for example, which is a bad thing(tm). The “if” statement is also completely generic, which means you could stick anything in there, where really your “statement” is about what you’re assigning to the nextAction variable.

    The “problem” with the first statement is that it’s a very specific construct, and only useful sometimes. In something like Rust or Scala you’d write something more like:

    val nextAction = if (dogIsHungry) Action.Feed else Action.Walk

    You can see here that it uses the same syntax as elsewhere in the language, preserves the bound nature of nextAction, and the intent that you’re simply assigning nextAction here. The language itself plays a huge part in the readability!

    • jsonmez

      Yes, you are definitely correct. My example is not great for the conditional operator either.
      Good point and the language playing a huge part in the readability.
      I have noticed this is the case also with Go.

  • Sum Dum Guy

    Hm… Sounds like the principles found in “Code Complete” still apply; even 20 years later.

    • jsonmez

      Yep, it is certainly timeless.

  • disqus_LE5mUPwetP

    The simplicity of a language doesn’t have to limit it’s expressiveness. It can also be the other way around.

    http://commandcenter.blogspot.jp/2012/06/less-is-exponentially-more.html

  • Spudley

    I think the article misses a vital point.

    A large part of what we do to write good quality code is *not* aimed simply at making it easier to read.

    Well structured code is not immediately easy to read — the novice devs who say your code has too many methods are correct that this isn’t easy to read; it would be a lot easier to read for those novices if you gave them big chunky methods that do a bunch of things in a nice easy linear process. But that wouldn’t be good code.

    Short methods are good, but we don’t do that to make it easy to read. We do that to make it easier to unit test and easier to practice code re-use. Some of the best practices that are aimed at helping us with these things are explicitly harder to read if you’re not used to them.

    Think about it. Stuff like dependency injection makes things a heck of a lot less readable than simply saying “myObj = new myClass()” at the appropriate place in your code. But that doesn’t mean it’s a bad thing.

    That’s one of the main stumbling blocks for newbies when they first get introduced to good coding concepts like unit testing. Some of the things that go with it feel clunky and hard to follow if you don’t understand the underlying reasons for doing them.

    • jsonmez

      I would add to this that many experienced developers don’t understand what the actual value of unit testing or IoC is and as a result follow and prescribe it blindly.

  • RPH

    “An experienced developer wants to know what the variable represents in the logical context of the method or system, not what type the variable is or how it works.”

    I’m a senior developer and I actually do care what the types of variables are. Why? Because I spend the bulk of my time maintaining code, not just reading it. When I’m maintaining code, I need to easily guess the type of a variable so I know what happens to it in different contexts and what methods are available for it. If the use of a variable is isolated to next 5-10 lines, then yeah you can call it x,y or whatever you want because I can read that code in isolation of the surrounding code. Otherwise, make it descriptive, please.

    Also I don’t agree that large grammars are worth their expressiveness. Firstly, I think languages, like C, with small grammars can still be sufficiently expressive. Sometimes, a large grammar doesn’t really give you the ability to express more types of things. It gives you the ability to express the same thing in multiple ways. I like to contrast Perl and Python in this respect. I can’t stand reading other’s Perl code because it usually uses a style I’m not accustomed to. Python is just as expressive but there’s usually an obvious way to express an algorithm, making it universal, Languages like, C++, with large grammars, are extremely hard to read and it has contributed to developer’s all adopting their own styles, much like natural languages have different dialects and lingoes. How do programming teams solve this problem? They adopt coding guidelines that restrict the developers to a subset of the language’s grammar.

    In summary, maintainability trumps readability in terms of priorities. A novice won’t be a novice for long and then will be spending the most of his or her time maintaining that code, after the overall structure is learned. Large grammars hurt readability and definitely hurts the universal readability of code and the extra expressiveness they provide isn’t worth it in my opinion.

    • jsonmez

      We are basically saying the same thing.
      Large grammars have the potential to be more expressive, but only if used correctly and if the usage is somewhat universal. (Perl, IMO, is good example of the opposite of this.)

      Consider also the “grammar” of modern JavaScript. Extremely large, extremely divided.

      And yes, of course you need to know the type of the variable. I agree with you there as well. I am just making the point that as a more experienced developer, the type is less important than the actual meaning of what that variable represents in the code, while a beginner would be more focused on the type and the technical details of the language.

  • Pingback: What Makes Code Readable: Not What You Think | Computing Education Blog()

  • http://www.daedtech.com/blog Erik Dietrich

    I had never really thought about readability varying with levels of experience, but I probably should have given that my style is similar to the one you describe (small, factored methods) and I too have heard the “too many methods” and “I’d rather it all be in one place I can step through in the debugger” complaints.

    The tableture vs sight-reading comparison really hit home here for me. Tableture is much more approachable for a novice, but someone used to sight-reading will say “what’s the rhythm of this, how long do I hold these notes and how loudly should I play them?”

    I think I’m going to have to ponder the idea of coding with an audience in mind and whether my response to newer developers would be “you should learn to be idiomatic in this language” or “I should help you along with some tableture-code.”

  • Jason Doege

    In general I agree with your premise. However, your example using the conditional operator suggests that he only difference between the two is found in the readability. In many cases, the ternary operator will compute a temporary and then store the value of the temporary in (using your example) nextAction rather than branch to one of two assignments to nextAction.

    Similar arguments can be made about readability fo unary increment and decrement type operators versus “a=a+1″ but, again, they may produce different code due to the differing semantics of the expressions.

    This point probably should be mentioned. In fact this could easily be turned into a “refactoring for readability” article.

    • jsonmez

      I agree, the example was not the best. Good points. thanks for the comment.

  • Pingback: What Makes Code Readable: Not What You Think | Geekness in Words()