What Software Developers Should Know About Source Control

I’ve always had somewhat of a love / hate relationship with source control.\n\nI learned fairly quickly on in my software development career that, love it or hate it, knowing your way around source control is a pretty important part of being a programmer.\n\nI was working on a small project at HP at the time with just one other developer.\n\nWe were working on a program to automate the testing of HP printers called AntEater.\n\nOne lovely morning, I was happily coding away and decided that I needed to get the latest updates to the code.\n\nI was working on a few files for a new feature I was building, and my teammate, Brian, had just checked in some changes.\n\nNot wanting to be working with outdated code, I pulled down the latest changes to my machine.\n\nI built the application and ran it to make sure everything was working.\n\nThe application launched, but something strange was happening on my computer.\n\nThe hard drive light just kept flashing.\n\nI could hear the whirr of the mechanical drive working hard.\n\nIt was doing something, but what?\n\nWithin minutes, an error dialog popped up on my screen followed by the dreaded blue screen of death.\n\nMy PC rebooted automatically, and I was greeted with the message “non system disk error.”\n\nUmm, ok. Usually this meant your hard drive had crashed.\n\nI contacted IT.\n\nThey took a look at my system and confirmed that something was really wrong. Probably the hard drive had been corrupted.\n\nThey reimaged my machine and the next day I had a brand new Windows installation.\n\nI spent that day reinstalling and reconfiguring my development environment.\n\nFinally, I had everything back in order, so I downloaded the latest source code for the application, along with the changes I had made on my branch, and fired up the app.\n\nOnce again, my hard drive light started flashing.\n\nI tried to abort, but it was too late.\n\nSeconds later, I was greeted by a reboot and a familiar message… “non system disk error.”\n\nWTF?\n\nWhat was going on?\n\nI was pissed to say the least.\n\nFinally it occurred to me.\n\nI went over to Brian’s desk and looked at the changes he had committed.\n\nHe had changed a variable in a C++ header file to be initialized to the value of “C:\\temp.”\n\nHe had done this so that the function he wrote would work, which had the app scan from temporary files and delete them at startup.\n\nI had made a change in the same C++ header file, but I hadn’t merged my change yet.\n\nSo, when I pulled down his latest code, I didn’t get the newest header file that had the variable set to “C:\\temp,” but I did get the code that scanned through “tempFileLocation” and deleted everything there.\n\nSince my variable wasn’t initialized, it was defaulting to “C:\\”—the root directory of my computer.\n\nEvery time I launched the app, it was recursively deleting all the files on my computer.\n\nSource control can be so much fun.\n\n

What Is Source Control?

\n\nSource control, or version control as it’s sometimes called, is a way to keep track of different versions of files and the source code of a software project and to coordinate the efforts of multiple developers who may all be working on the same sets of files.\n\nThere are many versions and implementations of source control and source control systems, but they all have the same goal of helping you to best manage the source code of your software development project.\n\n

Why Is It Important?

\n\n\n\nBack when I first started working as a software developer, there were plenty of teams who didn’t use source control.\n\nI’ve worked on multiple projects where the source code for a multimillion dollar system resided on a shared network folder or a floppy disk that was passed around.\n\nLord knows how many companies relying on this version of source control—sometimes called the sneakernet—went belly up when someone mistakenly deleted the contents of the disk or shared folder.\n\nOne of the main reasons source control is so important is because it mitigates this problem.\n\nA team using a source control system is much less likely to “lose” their code.\n\nSource control gives you a place to check in your code and keep it secure so that it can’t be haphazardly deleted, and it allows you to keep track of changes so that if you accidentally delete some portion of the code or make a huge mistake, you can go back and fix it.\n\nEver saved multiple copies of a document on your computer with different dates in the title, so that you’d be able to go back to an earlier version if you needed to?\n\nThat’s what source control can do for all the code in your application.\n\nBut source control is not just about making sure you don’t lose your source code.\n\nYou could just back things up regularly to avoid that problem.\n\nSource control also helps you to be able to coordinate multiple developers working on the same set of files in a code base.\n\nWithout source control helping to manage the different changes developers are making, it’s very easy for developers to overwrite each other’s changes or be forced to wait until someone else is done editing a file before they can edit it.\n\nA good source control system will allow you to work on even the same files simultaneously and then merge the changes together.\n\nSource control also solved the problem of working on multiple versions of a software application’s code base.\n\nSuppose that you have an application that you have released to customers and it has some bugs in it that need to be fixed, but at the same time you are working on some new features for the next version of the application and those new features aren’t quite ready yet.\n\nWouldn’t it be nice if you could have multiple versions of the code?\n\nFor instance, one version could be the current released version where you make bug fixes, and another version could be where you develop your new features.\n\nAnd wouldn’t it be nice if you could apply the bug fixes to the version of the code that contains the new features, as well?\n\nSource control gives you the power to do just that.\n\n

Source Control Basics

\n\n\n\nThere is quite a bit to know about source control—and you certainly aren’t going to become an expert just by reading about it—but you can learn the basics.\n\nIn this next section, I’m going to give you a quick rundown of the basics of source control, followed by a few of the most common source control technologies out there, so that you can at least understand how source control generally works.\n\n

Repositories

\n\nOne of the key concepts with just about all source control systems is the idea of a repository—it’s basically the place where all the code is stored.\n\nWhen you are working with source code, you’ll be getting the code from the repository, working on it, and checking in your changes.\n\nOther developers may also be doing the same.\n\nThe repository is the place where all that code comes together and where the code technically “lives.”\n\nDifferent source control systems have different concepts of what repository is and might even have local repositories, but ultimately, for any code base, there has to be one central location or repository that acts as the system of record.\n\n

Checking Out Code

\n\nWhen you want to get a local version of the code that you can modify, you’ll need to check out code from the repository.\n\nOlder source control systems had you actually check out the code and lock the files, so only you could edit them.\n\nMost source control systems today let you “check out” code by letting you pull down a local copy of that code onto your own machine or local repository.\n\nThis checked out code is your local copy, and changes that you make to it are only made on your machine or in your local repository.\n\nIt is only when you “check-in” or merge your code to the central repository that other developers see your changes.\n\nNormally when you are working with source control, you’ll check out a local copy of the code base, implement new features, or make other changes to the code, and then when you are done, you’ll check that code back in and handle any conflicts which may arise from multiple developers working on the same sections of code.\n\n

Revisions

\n\nSource control systems have a concept of revisions which are the previous versions of a file that is contained within source control.\n\nSo, for example, if we have a file called foo.bar that I first create and then you later modify it and then I modify it again sometime later down the road, the source control repository will contain three different version of foo.bar.\n\nWhy is this important?\n\nWell, for a couple of reasons.\n\nFirst of all, suppose I screwed up foo.bar and you want to revert back to the version that existed before I made my changes.\n\nSince the file is in source control, you can simply revert back to the previous revision or check out the revision and pretend like my changes never even existed.\n\nYou could also look at the revision history and compare the changes in the file over time to figure out how a file evolved by seeing what changes happened at each revision and who made them.\n\n(I like to call this finger-pointing.)\n\n

Branching

\n\nOne of the most misunderstood areas of source control is branching—or rather how to use branching correctly.\n\nThe concept, though, is fairly simple.\n\nMost source control systems allow you to create a branch off of an existing code base, in order to create a new code base that can be independently evolved from its parent.\n\nWait, what? I thought you said this was simple, John.\n\nOk, think of your code like a tree.\n\nYou’ve got the trunk, and at some point you might have multiple branches which come off of that trunk.\n\nWhat does this look like in reality?\n\nSuppose you have a version of your software that you are working on, and you are ready to ship that version to customers and call it version 1, but… you still want to continue working on new features for version 2.\n\nThe problem is—even though you are an awesome coder—you know there are going to be at least a few bugs you are going to have to fix in version 1, which you are shipping to customers.\n\nHowever, you don’t want to start shipping them version 2 features when you give them bug fixes for version 1. (You are planning on charging them for an upgrade to version 2 later.)\n\nSo, what do you do?\n\nSimple. You branch the code.\n\nOnce you are ready to ship version 1, instead of just shipping what is in the trunk, you create a new branch. You call this branch “version 1.”\n\nThen, you can make bug fixes on the version 1 branch and implement your new features on the trunk.\n\nOnly one problem…\n\nWhat if you want to get those bug fixes into the trunk as well?\n\n

Merging

\n\nLook how beautifully I set that one up.\n\nThe solution to your problem is merging.\n\nWhat is merging, you may ask?\n\nIt’s exactly what it sounds like.\n\nYou are going to merge the changes from one code line into another.\n\nIn our little software example above, we simply used a merge feature of our source control system to merge our version 1 branch changes to the trunk.\n\nMerging allows us to take all the changes we made on the version 1 branch, after we had branched from the trunk, and merge them right into the trunk.\n\nThe merge would only go one direction, so we’d get all the changes from the version 1 branch into the trunk, but none of the new features we were working on in the trunk would go into the version 1 branch.\n\nJust as we intended.\n\nAll is well and peaceful in the world, that is until we actually try to do the merge and we find that we have…\n\n

Conflicts

\n\nF$%^!, d%&$! What is this s&$*?!\n\n(Strangely, I don’t have any problem typing shit in this book by itself, but it seems a little inappropriate to drop three “strong” words, starting with the F-bomb, in one sentence.)\n\nThese are the kinds of words frequently uttered when developers try to do the simple, straight-forward process of merging just a few simple changes back into the trunk.\n\nMostly this happens on Friday evening at 5:00 PM, when you only mean to do a quick merge and get the hell out of there.\n\nYou kick off the merge, put your coat on, text your friends, and tell them where you are going to meet them for a relaxing evening of drinks and a dinner, and quickly glance at your screen to see:\n\n“CONFLICT (content): Merge conflict in simplefile.java\n\nAutomatic merge failed; fix conflicts and then commit the result.”\n\nOr some other such garbage.\n\nThe hours pass by as you stare at a bunch of “<<<<<” and “>>>>>” symbols in a file and try to make sense of it all.\n\nI’m not going to lie; merge conflicts are… a bitch.\n\nMost of the time, a good source control system will try to automatically merge simple changes made in one part of a file into another file, and it all works magically.\n\nBut… every so often, you make a change in one file on a branch, and some stupid idiot developer also makes a change in the same file on the same line—because he’s an idiot—and manual intervention is required.\n\nThe computer has no way of knowing which change should override the other one, or if both changes should somehow be included or if there is some other way to resolve the conflict, so it’s up to you.\n\nYour Friday night is ruined.\n\nResolving merge conflicts and the intricacies of merging could be a whole other book by itself, so I’m not going to delve into the details here.\n\nIt’s sufficient to know for now how merging basically works, and when it doesn’t that conflicts exist which have to be manually resolved and not to do “simple merges really quickly” on Friday nights right before you are getting ready to leave.\n\n

Technologies

\n\nSource control has a pretty long and somewhat interesting history, of which are are not going to discuss here, since I lied about the interesting part.\n\nIt’s sufficient to say that source control systems evolved from passing around the source code on a USB drive, to strategically copying entire folders of source control and renaming them V1, to the fairly complex systems we have today.\n\nMany wars were fought in source control land, and eventually two major factions emerged victorious: centralized source control and distributed.\n\nCentralized is older. It doesn’t have quite as much “bling,” but it’s a little simpler to understand and it does the job.\n\nCVS and Subversion are two examples of centralized source control.\n\nDistributed is newer. It’s probably a bit shinier in most people’s eyes and it’s a bit more complicated, but more people are using it.\n\nGit and Mercurial are two examples of distributed source control.\n\n

Centralized Source Control

\n\nWith centralized source control, you have one repository, which exists on a central server that all developers working on the code utilize to get copies of the files they need and to check in changes they’ve made to files.\n\nEach developer has a source control client that manages checking in and checking out code from the central repository.\n\nAll of the version’s history and revisions of the files are stored in the central repository.\n\nThe typical workflow for using centralized source control might look something like:\n\n

Update my local copy of the code line I’m working on from the repository.

Make my changes.

Commit my changes to the central repository (and deal with any conflicts).

\n\n

Distributed Source Control (DVCS)

\n\nThe biggest difference with using distributed source control is that each developer has a full copy of the entire repository on their own machine.\n\nSome really cool hipsters like to say that this means that “there is no central repository, dude. It’s like we just all have our own versions of the software, and no version is better than any others.”\n\nThis is just plain wrong.\n\nYes, theoretically this is possible, but how the hell are you going to ship code and coordinate a project between multiple developers if you don’t have some kind of system of record?\n\nIt’s not going to happen.\n\nIf you think it will, you should probably start your own utopia or cult or something.\n\nThe reality of the situation is that, yes, each developer has their own complete copy of the repository, but you still utilize some central version of the repository that acts as the system of record or the master repository for the project.\n\nWhen you work in a distributed source control system, you simply work locally and do everything you would with a central repository system, except it happens locally.\n\nEssentially, this means you don’t have to transfer as many files across the network, and you can work disconnected for a while.\n\nEventually, though, you’ve got to get changes that other people have made, and you’ve got to send your beautiful, precious changes out into the world to fend for themselves.\n\nYou do this by pulling and pushing.\n\nWith a DVCS, you can pull down changes to your local repository, and you can push changes you’ve made out to the master repository or any other repository you want—including your hipster, decentralized, every-repository-is-equal friend.\n\n

A Quick Rundown of the Most Popular Source Control Systems

\n\n\n\nIf you are reading this book in the future, this list will probably change.\n\nThere is always a new source control hotness.\n\nBut, for now, at the time of writing this book, I thought I’d give you a brief introduction to the most common source control systems you are likely to see in the wild.\n\nCaution: it’s brief.\n\n

CVS

\n\nNo, it’s not a drug store. It’s source control.\n\nIt’s known as CVS or Concurrent Versions System. (I’ve never called it by the full name; I actually had to look that up.)\n\nWhat is it?\n\nWell, I know some people will get pissed when I say this, but in my opinion it’s the precursor to Subversion.\n\nCVS is a centralized source control system, and it is fairly robust.\n\nIt’s pretty powerful, but a bit slow.\n\nMost organizations that were using CVS eventually switched to Subversion, but CVS still handles some things a little differently and some people prefer those differences.\n\nTagging and branching, for example, as well as rolling back commits are handled differently in CVS.\n\nCVS zealots will tell you CVS does it right, and Subversion does it wrong.\n\nI don’t really care all that much, so I just nod my head because I don’t like getting stabbed with a fork.\n\n

Subversion

\n\nBias alert.\n\nSubversion is probably the source control system I’m most familiar with.\n\nI’ve taught courses on how to use it in a purely graphical manner, I’ve written blog posts about branching and merging strategies using it, and I’ve managed SVN servers, repositories and source control strategies for pretty large development teams using the technology.\n\nDoes this mean I’m a huge fanboy and think everything else sucks?\n\nNo, not really.\n\nAs far as centralized source control systems go, I think Subversion is the best, but it definitely has its set of shortcomings.\n\nOverall, though, it gets the job done and is fairly easy to use, so I like it.\n\n

Git

\n\n\n\nGit has basically become synonymous with source control.\n\nAsk an under-25 developer today what source control is and he or she will most likely say, “What, do you mean Git?”\n\nThere is a good reason for this.\n\nGit is… well… pretty awesome.\n\nReally, it is.\n\nAs far as source control software goes, Git does pretty much everything you want.\n\nIt’s extremely powerful.\n\nThe basics are fairly simple.\n\nAnd it’s quick, efficient, and universal.\n\nGit even has a pretty large company which supports open source and managed hosting for Git projects called GitHub.\n\nDefinitely worth checking out if you haven’t already.\n\n

Mercurial

\n\nMercurial is kind of like Git’s evil twin brother.\n\nSome people have said Git is like MacGyver, and Mercurial is like James Bond.\n\nI’m not exactly sure what they are talking about—or what they are smoking—but I sort of get it.\n\nMercurial could be described as a little more elegant and polished than Git.\n\nSame basic idea—they are both distributed source control.\n\nSame basic functionality and features.\n\nBut, in my experience, Mercurial is just a little bit easier to use and figure out whereas Git is a little more arcane, but there are more ways to combine and hack things together.\n\nSo, essentially I’ve just described Mercurial by comparing it to Git.\n\nHmm, well that will have to do.\n\nIf you use both, you’ll see why.\n\nIt’s sort of like one of those pointless religious war type thingies.\n\n

Anything Else?

\n\nNo, not really.\n\nThe main source control systems are pretty much these four with Git taking a huge—and I mean, HUGE—share of the market.\n\nYes, some people are off using other stuff and merrily humming along, but it’s much more rare.\n\nSo, there you go, now you have the basics of source control.\n\nRemember to commit early and to commit often.\n\nOh, and please use meaningful commit messages.\n\n