Hunting Down The Mythical High Quality Code
Writing high quality code is something that takes years to understand and start doing.
Like every complex thing, the definition of “high quality code” is blurred and filled with vague, non strictly defined words like: “clean”, “stable” and “maintainable”.
For some time I’ve been working on a system which aims to define these words and set relatively strict rules on how they should be implemented and checked. Should this (work in progress) system be applied the code, quality would steadily rise over time.
Before we get to it we must build a strong foundation on how code quality is developed and the usual issues people face.
In a nutshell
“Code reviews are a systematic examination (sometimes referred to as peer review) of computer source code. It is intended to find mistakes overlooked in the initial development phase, improving the overall quality of software. Reviews are done in various forms such as pair programming, informal walkthroughs, and formal inspections.” As always, thank you Wikipedia.
There is definitely more than one opinion regarding the format of code reviews. John suggests implementing pair programming since that way ego doesn’t get in the way.
Other people like Yegor say that they should be independent.
The one thing people agree on is that there should be a form of reviews. So how are these reviews done? There are two sides: the reviewee and the reviewer. The reviewer must review the code and give feedback to the reviewee. The reviewee should apply the feedback to the code and the process should be repeated until the code passes the review.
There are many different tools for code review, but all of them boil down to different ways you can highlight and comment what should be changed. Even if you don’t have a tool you can always leave comments in the source.
Point is, as long as there is some form of review the tools and format aren’t going to make a world changing difference.
Let’s do a small exercise: Think for 1-2 minutes if the reviewee and the reviewer can be the same person and if they should be the same person.
Do you have an answer ? Good! Let’s compare notes.
Enter self code reviews
My answer is “yes, you can,” and you can benefit a lot from it, but it’s a tricky thing to do.
There is a set of challenges we have to solve before there is any point in doing code reviews of your own code.
The main argument against them is that when you’ve written your code you can’t see it’s flaws. If this were true people wouldn’t be able to debug. Still, it does touch on a very sensitive subject: Discipline, or at least lack of discipline. Being critical of your own work is hard. You must be honest with yourself and not ignore certain weak points which will be a hassle to fix. You must take responsibility for the issues you haven’t fixed or introduced and work towards fixing them. If you don’t do that you will just end up wasting your time.
Before starting the self review you must clear your head
Take a 5 minute break and stare at the sky, drink a refreshing glass of water or make yourself a coffee. Anything which will take your mind off the code you’ve just written.
Break up the code into small chunks
You should review the smallest coherent parts of the code. In most cases this means that you should review the code function by function. If you try to review too much code at a time you will surely skip a lot of errors. If you review too few lines you won’t have the context you need to validate certain claims regarding code correctness and maintainability.
You must find errors
No code is perfect and you should be able to find things which aren’t optimal in your code. Unless the code is trivial there is always something to improve. If you can’t find anything clear your head and try again until you find something.
Practice practice practice
It doesn’t really get easier in time since when you comprehend a mistake you are making you usually stop doing it altogether. It does, however, start taking less time so don’t give up until you’ve tried out your review list for at least a month.
Automate with code checkers
Reviewing 15+ different things even on 20 lines of code is rough. You are bound to skip a thing or two each time you review. It will also take a lot of time to review larger chunks of code. Luckily you don’t have to do everything manually. Every language has a variety of different tools which perform different checks. These tools are called static code analyzers. Most of them are configurable and you will be able to find tools for most of the things that you’d want to check. That way you will check manually only the few things which can’t be caught with a tool.
Note that the tools are language specific what works for one language and platform won’t work for others. If you want to have a look at a list with static analyzers check out these on Github.
High Quality Code
Consists of 3 parts:
- Code correctness
Although production code should pass all 3 in the real world we sometimes have to make compromises (usually due to lack of time). If that’s the case, the first 2 points (Code correctness and Stability) must pass. If they don’t the code shouldn’t be in production since it will most likely cause more damage than a feature delay. (It obviously depends on the feature, but it’s worth discussing the possibility to delay the feature with the business and deploy guys )
You must know the acceptance criteria, what the business value and the end result of the data should be in order to check this. Most common mistake is to look for correctness without being sure what is “correct” in the context of this particular code. Take a function that registers an account. Let’s assume that the function must create a new entry in the database, generate an activation link and send it via email.
If any of these 3 steps is missing or not working correctly then the code isn’t correct. This is something specific to the function and project, there isn’t a way to know what the code is supposed to do without seeing the requirements. This is by far the most important part of the code review and it is overlooked way too often. In order to be able to validate it you must understand what the code should do.
Stability in a project consists of how different exceptional and error situations are handled. Will the program instance just die ? Will it corrupt the data beyond repair ? Is it trivial to hack the software ? Or will the software display a meaningful message to the user, log it’s errors and recover ?
- Error checking – All the input parameters of a function, all the parameters gotten from calling functions, services and so on inside the function and the return values must be checked. You must check the business logic as well. If it’s expected that an account registration will generate an activation link you should perform some basic checks that the link is valid, that it really will activate the account and so on.
- Error reporting – Before we handle a reported error we must leave a trace in the projects log. What we should do is give all the context we can: time, function name, PID (process id), module input parameters, call stack and explain what exactly went wrong. When an error has happened the more details you log the better since it will make it easier for the developer to figure out what went wrong and reproduce the issue if need be. These messages should be comprehensive enough so that the developers can debug the code using them alone. That being said, try to provide useful additional information. It’s way too easy to add a lot of extra information which is just noise. Noise isn’t just there, it’s actually raising the complexity of understanding the error since you first must weed out the useless information and only then you can start figuring out why things went wrong.
- Error handling – So we have found an error and reported it. What’s next? We must either recover from it and continue the execution (if that’s the case it must be logged) or kill off the entire execution path. If the latter is the case there several things which must be considered: Firstly, the error must be returned in a proper way in the context of the project. If you are using exceptions returning an error as a return value won’t do the job. Secondly, the information you are returning must be safe for the upper levels of the code. If you are out of disk space and the errors from the module in which the error has occurred are shown directly to the user you should return an error appropriate for the user, for example: “Temporary error. Please try again later” since it’s pretty embarrassing to admit to your end-users that you can’t maintain your servers and are out of storage. The most important thing when handling errors is that every time you mask or change an error you must log the change. If you can’t trace the changes in the error you raise to complexity of solving it.
- Traceability / Logging – In well written functions there is exactly one place where you can put a trace (debug) message and it will give a clear picture of the state of the function and the execution path. These types of messages are different from the ones that occur when an error is reported since they are written on every function call not when an error has occurred. This means that they can generate a lot of spam in the logs, so there should be a way to disable them. For instance while development is done it makes sense to have them turned on, but when you want to push it to production the usual case is to turn them off. They shouldn’t be used for debugging errors on production. Once you get to production the messages form the “Error reporting” part should be comprehensive enough for you to debug it. A tip here is to have a different log for the errors and for the debug messages since that way the errors won’t be flooded with noise.
- Naming consistency – Even though this is more of a maintainability issue, having consistent variable and function names is crucial. In this part of the review I am looking at very basic things, for example if the convention is CamelCase and someone has used under_score then this is an issue and must be fixed.
- Do loops have correct termination conditions? – The reasoning behind this check is simple: an infinite loop will hang up the process and eat a lot of CPU time without doing anything valuable, possibly damaging valuable data. Although I don’t see this mistake often (in fact I can’t recall the last time I’ve seen it in a serious project) it’s better to be on the safe side and always check for this.
Is strict mode enabled ? – If your language offers a strict mode you should use it. It may be a hassle at first but the sole purpose of strict modes are to exclude some corner cases and errors.
The higher the maintainability, the easier it is for developers to add business value, the less time they need to spend refactoring and the easier it is for new people to start working on the project.
- File / Module / Class structure – Is the code you’ve added in the correct file ? Could have you reused something or extended something old?
- Variable and function naming – They should follow the style of the source code. If you are using under_score it must be under_score. If it’s CamelCase, mixedCase, or whatever just follow the already established style. Also they should be clear – anyone who reads the function’s name and knows the class / package it comes from must know without a doubt what it will do. Same logic applies for the variable naming. I know this isn’t a clear explanation but cache invalidation and naming things are supposedly the hardest things in computer science.
- Should the private functions be private – To decide this I ask myself the following questions: Could this function be used in another module / class ? Should it be used in another module or class? If the answer is yes, then it should be public. If it’s no – private.
- Are there comments – There are an absurd amount of views on comments. Some people think that they are the root of all evil and that your code should be self commenting. Others commit to adding 20-30 lines in front if a function for its formal definition. A third group of people uses comments sparsely to break down some complex parts of the code and so on and so on. Point is if your codebase has a common practice you have to follow it. That’s that.
- Are complicated expressions split into parts – When you have different expressions it is a good idea to split them on several parts.
- Is it clever – Code which can’t usually be done in most languages or uses a hacky approach probably shouldn’t exist. If for some reason you really want to keep it sure, go ahead, just be sure to leave enough comments that explain it.
- Unit testing coverage – If you are doing unit tests, the code you are writing must be covered. If you aren’t using unit tests – skip this step.
- Are all included classes / modules / variables used – If they aren’t they must be removed. Dead code rises the complexity exponentially with time.
- Is everything initialized properly – Although this is language dependant and usually there is a default value having uninitialized variables can lead to some weird warnings and error messages and most importantly it can lead to errors
Building Your Review System
Create a checklist
You must define quality code for yourself and create a checklist – it may look like the one above, it may look nothing like it. The checklist should be in a place where you can easily edit it (eg Google Docs). The format is up to you, the rule of thumb is that the easier it is for you to read it the better.
Use the checklist
The second part of the system is to review your code on your own and use the checklist. Although it is exactly what it seems there are several things which it took me a while to start doing:
- Review often with small chunks of code, I usually review 1-3 functions at a time
- Take a short break after you are done and before reviewing the code
- Always try to fix the issues of the code after you are done reviewing. There is a high possibility you will make mistakes since you are essentially doing 2 things at the same time: reviewing code and refactoring
- Automate with code checkers
Adapt and personalise the checklist
The list you’ve created is not written in stone. It should change. In fact if it doesn’t change over time something’s wrong. I suggest doing a revision every 2-3 thousand lines of reviewed code. (The gap is so big – 33% because some languages generate more boilerplate than others.)
The Spoils of Your High Quality Code Hunt
You can raise your code’s quality by doing self code reviews. In order to get the most out of them, you should define your code’s standards and challenge yourself to write code that meets them.
Raising your code’s quality will free up time for you to deliver more business value which can easily be turned into a good raise in your income, regardless of whether you are working for a company or for yourself.