# We Can’t Measure Anything in Software Development

By John Sonmez / February 11, 2013

Baccarat is an interesting card game that you’ll find at many casinos.  The objective of the game is to correctly predict whether the bank or player will win a hand.

In Baccarat the scoring for a hand is very simple, add up all the cards at their face value with face cards being worth 10 and only count the total in the ones column.

6 + 7 + J = 23 = 3

A + 4 = 5

The highest possible hand is 9 and whoever has the highest hand wins.  If the player and banker have the same hand, it is a tie.

I won’t go into the details of how the number of cards are drawn is determined, but if you are interested you can find that information on Wikipedia.  Basically, you end up having pretty close to a 50 / 50 chance of either the player or banker winning a hand.  (Of course the house edge still is about 1.06% in the best case.)

The interesting thing about Baccarat though, is that despite the odds, despite common sense, despite the understanding that the game is completely random, people will still sit there and record every single hand and score trying to use it to look for patterns to predict future results.

These poor deluded souls actually think they are measuring something on these score cards, as if what happened in the last hand will in any way affect what will happen in the next hand.

After many years of trying to find the secret formula for measuring software development activities, I’ve come to the conclusion that trying to measure just about any aspect of software development is like trying to measure the odds of a future Baccarat hands based previous Baccarat hands.

## Why we want to measure software development

It’s understandable why we want to measure software development—we want to improve.  We want to find out what is wrong and fix it and we want to know when things go wrong.

After all, who hasn’t heard the famous quote:

“What gets measured gets improved.”

Don’t we all want to improve?

Somehow we get stuck with this awful feeling that the opposite is true—that what doesn’t get measured doesn’t get improved.

And of course we feel guilty about it, because we are not doing a good job of measuring our software development practices.

Just like the avid Baccarat gambler, we want to believe there is some quantifiable thing we can track, which will give us information that can give us the edge.

Sometimes the reason for wanting to measure is more sinister practical, we want to evaluate the individuals on our team to see who is the best and who is the worst.

If we could figure out how to measure different aspects of software development, a whole world of opportunities open for us:

• We can accurately give customers estimates
• We can choose the best programming language and technology
• We can figure out exactly what kind of person to hire
• We can determine what kind of coffee produces the best code

## How we try

I’ve been asked by many managers to come up with good metrics to evaluate a software development team.

I’ve tried just about everything you can think of:

• Lines of code written
• Bugs per developer
• Bugs per line of code
• Defect turn around time
• Average velocity
• Unit test code coverage percentage
• Static analysis warnings introduced
• Build break frequency

I’ve built systems and devised all kinds of clever ways to measure all of these things.

I’ve spent countless hours breaking down backlogs to the smallest level of detail so that I could accurately estimate how long it would take to develop.

I’m sure you’ve probably tried to measure certain aspects of software development, or even tried to figure out what is the best thing to measure.

## It’s just too hard

No matter what I measure or how I try to measure it, I find that the actual data is just about as meaningless as notebook full of Baccarat hands.

One of the biggest issues with measuring something is that as soon as you start measuring it, it does start improving.

What I mean by this is that if I tell you that I am going to start looking at some metric, you are going to try and improve that metric.  You won’t necessarily improve your overall productivity or quality, but you’ll probably find some way—intentional or not—to “game the system.”

Some managers try to get around this issue by just not telling the team what they are being measured on.  But, in my opinion, this is not a good idea.  Holding someone accountable to some realistically arbitrary standard without telling them what, is just not very nice at all, to put it mildly.

But really the biggest reason why it is too hard to measure aspects of software development, is that there are just way too many variables.

• Each software development project is different
• Each feature in a project is different
• Software developers and other team members are different
• From day to day even the same software developer is different.  Did Jack’s wife just tell him she was cheating on him?  Did Joe just become obsessed with an online game?  Is Mary just sick of writing code this week?
• As you add more unit tests the build time increases
• Different team members go on PTO
• Bob and Jim become better friends and chat more instead of work

The point is everything is changing every day.  Just about every aspect of software development is fluid and changing.

There is not one metric or even a set of metrics you can pick out that will accurately tell you anything useful about a software development project.  (At least I have never seen one at any software development shop I’ve ever been at on consulted at.)

If you were building widgets in a factory, you could measure many qualities about that widget making process, because much of it would be the same from day to day, but with software development, you are always exploring new territory and a 1000 different variables concerning how you are developing the software changing at the same time.

## Measuring without measuring

So am I basically saying that metrics in software development are completely worthless and we shouldn’t bother to track anything?

No, not exactly.

What I am saying is that trying to use metrics int the same way that we measure the average rainfall in a city, or running pace improvement by looking at its average over time, doesn’t really work in software development.

We can track the numbers, but we can’t draw any good conclusions from them.

For example, say you track defects per lines of code and that number goes up one week, what does it mean?  Any number of things could have caused that to happen or it could just be a totally random fluke.  You can’t really know because there isn’t a knob you can turn and say “ah, I see we turned up the coffee bitterness factor to 3 and it resulted in more bugs.”  Instead there are 500 knobs and they all changed in random directions.

So, I am saying don’t look at how the numbers of any particular metric are moving from day to day or week to week and expect that it means anything at all, instead look for huge deviations, especially if they are sustained.

If all of a sudden your average team velocity dropped down to almost nothing from some very high number, you won’t know what caused it, but you’ll know that it is much more likely that there was one single knob that got cranked in some direction and you’ll at least have some idea what to look for.

You really have to treat the software development process more like a relationship than like a factory.

I don’t have a series of metrics I use to evaluate my relationship with my wife or my friends.  I don’t secretly count how many times my wife sighs at me in a day and track it on a calendar to determine our relationship quality factor.

Instead what I do is talk to her and ask her how things are going, or I get a more general idea of the health of the relationship by being involved in it more.

Team retrospectives are a great way to gauge the temperature of the team. Ask the team members how things are going.  They will have a pretty good idea if things are improving or slowing down and what the effectiveness level is.

## Measure not, but continuously improve, yes

So kick back, don’t worry so much.  I promise I won’t tell Six Sigma that you aren’t using metrics.

Instead focus on continuously improving by learning and applying what you learn.  If you can’t notice enough of a difference without metrics, metrics wouldn’t have helped you anyway, because the difference would just be lost in variance anyway.

If you like this post don’t forget to Follow @jsonmez or subscribe to my RSS feed.

##### What Software Developers Can Learn From Arnold Schwarzenegger
Jan 29, 2016 / By Saeed Gatson
Nov 30, 2015 / By John Sonmez
##### Will I Lower My Earnings Potential By Not Focusing On The Right Thing?
Nov 12, 2015 / By John Sonmez
##### There's Hate in the Software Community
Nov 06, 2015 / By Joel Rodriguez
Sep 23, 2015 / By Jason Lowenthal
##### Not Allowed to Learn on the Job
Sep 10, 2015 / By John Sonmez
Aug 03, 2015 / By John Sonmez
##### Harold Almon Took Charge of His Software Development Career
May 28, 2015 / By John Sonmez

#### John Sonmez

John Sonmez is the founder of Simple Programmer and a life coach for software developers. He is the best selling author of the book "Soft Skills: The Software Developer's Life Manual."

• http://gravatar.com/franjobrekalo FrenkyB

Good post. It reminds on one of the companies I’ve worked for some two years ago – when recesion started there was not so much work, so we all started to deal more with ourselves and measure all kinds of things. We had a local joke that each of us spends at least one hour a day describing what he/she was doing that day.

• http://dangerismymiddlename.com/ Paul Danger Kile

I agree 100%.

Here is where “what gets measured gets improved” comes from, and after seeing it done-right, it becomes apparent why it’s not remotely applicable to software development.

A automobile parts foundry was making cylinder heads. Most of the heads were getting scratched somewhere on the assembly line. The line engineer used a strobe light, and a video camera, to see what was happening in the machinery. He saw a place in the assembly line where the cylinder heads did a little wiggle, and thought that might be where the problem was. He made an adjustment, and noted that he scratches went away. He then set the adjustment back to the previous settings, and noted that the scratches came back. At this point he knew exactly what was causing the scratch, and the problem was now solved for thousands of future parts.

Dr. Deming developed these techniques (http://en.wikipedia.org/wiki/W._Edwards_Deming), went to Japan, and taught those companies how to use his process, which was extremely similar to how science is conducted. Honda and Toyota beat the pants off of the Detroit 3, and all of a sudden, the entire manufacturing industry dropped their success-is-measured-in-volume-BS, and adopted Deming’s science-like process.

The success-is-measured-in-volume-BS? That might also look like, “what gets measured gets improved,” but in that case, the wrong thing is being measured. Your employees might produce shoddy parts, because in any given time-period they can produce more shoddy parts than quality parts.

Planned obsolescence? It’s a myth. Companies didn’t create products that became obsolete on purpose. If any of them had, then their competitors would have kicked their butts. No. Shoddy products are truly the best that you get when you are measuring the wrong things.

Each programming project is unique. We don’t have thousands of future projects that can be improved with a tweak to an assembly-line’s machinery. Unfortunately, some execs thought, if it worked for cars, then it would work for _everything_. Not so.

• http://simpleprogrammer.com jsonmez

Great point. Thanks for the back story.

• Brandon

Interesting post…

1) Good point about the observer effect – the act of observation has an effect on the object being observed – although as you pointed out, it may not always have the desired effect.
2) Like you said, I think some metrics, like velocity are useful within a certain margin of error.
3) I think the most important metric is: Is my customer happy? Where customer = person paying you.
4) Happy wife = Happy life ;)

Good luck @ PluralSite!

• http://tamasrev.wordpress.com tamasrev

There was a citconf session about it. The two pivotal statements to me were:
1. you can’t cheat 50 metrics – you might be able to cheat 1 or 2.
2. use the metrics not to improve things (instead of assigning blame as management often does)

More on this here: http://www.citconf.com/wiki/index.php?title=Meaningful_metrics

• Pingback: Kakurasu Draws()