We Can’t Measure Anything in Software Development

Baccarat is an interesting card game that you’ll find at many casinos. The objective of the game is to correctly predict whether the bank or player will win a hand.

In Baccarat the scoring for a hand is very simple, add up all the cards at their face value with face cards being worth 10 and only count the total in the ones column.

6 + 7 + J = 23 = 3

A + 4 = 5

The highest possible hand is 9 and whoever has the highest hand wins. If the player and banker have the same hand, it is a tie.

I won’t go into the details of how the number of cards are drawn is determined, but if you are interested you can find that information on Wikipedia. Basically, you end up having pretty close to a 50 / 50 chance of either the player or banker winning a hand. (Of course the house edge still is about 1.06% in the best case.)

The interesting thing about Baccarat though, is that despite the odds, despite common sense, despite the understanding that the game is completely random, people will still sit there and record every single hand and score trying to use it to look for patterns to predict future results.

These poor deluded souls actually think they are measuring something on these score cards, as if what happened in the last hand will in any way affect what will happen in the next hand.

After many years of trying to find the secret formula for measuring software development activities, I’ve come to the conclusion that trying to measure just about any aspect of software development is like trying to measure the odds of a future Baccarat hands based previous Baccarat hands.

Why we want to measure software development

It’s understandable why we want to measure software development—we want to improve. We want to find out what is wrong and fix it and we want to know when things go wrong.

After all, who hasn’t heard the famous quote:

“What gets measured gets improved.”

Don’t we all want to improve?

Somehow we get stuck with this awful feeling that the opposite is true—that what doesn’t get measured doesn’t get improved.

And of course we feel guilty about it, because we are not doing a good job of measuring our software development practices.

Just like the avid Baccarat gambler, we want to believe there is some quantifiable thing we can track, which will give us information that can give us the edge.

Sometimes the reason for wanting to measure is more sinister practical, we want to evaluate the individuals on our team to see who is the best and who is the worst.

If we could figure out how to measure different aspects of software development, a whole world of opportunities open for us:

We can accurately give customers estimates
We can choose the best programming language and technology
We can figure out exactly what kind of person to hire
We can determine what kind of coffee produces the best code

How we try

I’ve been asked by many managers to come up with good metrics to evaluate a software development team.

I’ve tried just about everything you can think of:

Lines of code written
Bugs per developer
Bugs per line of code
Defect turn around time
Average velocity
Unit test code coverage percentage
Static analysis warnings introduced
Build break frequency

I’ve built systems and devised all kinds of clever ways to measure all of these things.

I’ve spent countless hours breaking down backlogs to the smallest level of detail so that I could accurately estimate how long it would take to develop.

I’m sure you’ve probably tried to measure certain aspects of software development, or even tried to figure out what is the best thing to measure.

It’s just too hard

No matter what I measure or how I try to measure it, I find that the actual data is just about as meaningless as notebook full of Baccarat hands.

One of the biggest issues with measuring something is that as soon as you start measuring it, it does start improving.

What I mean by this is that if I tell you that I am going to start looking at some metric, you are going to try and improve that metric. You won’t necessarily improve your overall productivity or quality, but you’ll probably find some way—intentional or not—to “game the system.”

Some managers try to get around this issue by just not telling the team what they are being measured on. But, in my opinion, this is not a good idea. Holding someone accountable to some realistically arbitrary standard without telling them what, is just not very nice at all, to put it mildly.

But really the biggest reason why it is too hard to measure aspects of software development, is that there are just way too many variables.

Each software development project is different
Each feature in a project is different
Software developers and other team members are different
From day to day even the same software developer is different. Did Jack’s wife just tell him she was cheating on him? Did Joe just become obsessed with an online game? Is Mary just sick of writing code this week?
As you add more unit tests the build time increases
Different team members go on PTO
Bob and Jim become better friends and chat more instead of work

The point is everything is changing every day. Just about every aspect of software development is fluid and changing.

There is not one metric or even a set of metrics you can pick out that will accurately tell you anything useful about a software development project. (At least I have never seen one at any software development shop I’ve ever been at on consulted at.)

If you were building widgets in a factory, you could measure many qualities about that widget making process, because much of it would be the same from day to day, but with software development, you are always exploring new territory and a 1000 different variables concerning how you are developing the software changing at the same time.

Measuring without measuring

So am I basically saying that metrics in software development are completely worthless and we shouldn’t bother to track anything?

No, not exactly.

What I am saying is that trying to use metrics int the same way that we measure the average rainfall in a city, or running pace improvement by looking at its average over time, doesn’t really work in software development.

We can track the numbers, but we can’t draw any good conclusions from them.

For example, say you track defects per lines of code and that number goes up one week, what does it mean? Any number of things could have caused that to happen or it could just be a totally random fluke. You can’t really know because there isn’t a knob you can turn and say “ah, I see we turned up the coffee bitterness factor to 3 and it resulted in more bugs.” Instead there are 500 knobs and they all changed in random directions.

So, I am saying don’t look at how the numbers of any particular metric are moving from day to day or week to week and expect that it means anything at all, instead look for huge deviations, especially if they are sustained.

If all of a sudden your average team velocity dropped down to almost nothing from some very high number, you won’t know what caused it, but you’ll know that it is much more likely that there was one single knob that got cranked in some direction and you’ll at least have some idea what to look for.

You really have to treat the software development process more like a relationship than like a factory.

I don’t have a series of metrics I use to evaluate my relationship with my wife or my friends. I don’t secretly count how many times my wife sighs at me in a day and track it on a calendar to determine our relationship quality factor.

Instead what I do is talk to her and ask her how things are going, or I get a more general idea of the health of the relationship by being involved in it more.

Team retrospectives are a great way to gauge the temperature of the team. Ask the team members how things are going. They will have a pretty good idea if things are improving or slowing down and what the effectiveness level is.

Measure not, but continuously improve, yes

So kick back, don’t worry so much. I promise I won’t tell Six Sigma that you aren’t using metrics.

Instead focus on continuously improving by learning and applying what you learn. If you can’t notice enough of a difference without metrics, metrics wouldn’t have helped you anyway, because the difference would just be lost in variance anyway.