Deploying software doesn’t have to be that complicated!
I’ve seen and built many software building and deployment solutions over my career, and I have come to find that most software deployment can be boiled down to a simple process.
I’m not trying to give you a solution for your software deployment automation, nor am I trying to perfectly model your exact process.
What I am trying to do in this post, is to help you to simplify your process.
If you can identify the parts of your deployment process that fit into the simple steps I am going to outline below, it should be much easier for you to automate your deployment process.
Even though software build processes, infrastructure and components are unique, I have found that most software deployment processes can be simplified into the following steps.
- Build software without configuration
- Create environment specific configuration.
- Create a set of database changes.
- Bundle software, configuration and database changes.
- Apply new software
- Apply new configuration
- Apply new database changes
- Start it back up
You might read through these steps and think “well duh.”
You might be tempted to say “my process is more complicated than that.”
I’m not going to argue with you. You are right, your process is probably more complicated than that. But, does it need to be?
Can you simplify your process to fit into these steps?
Sure, the implementation of these steps is likely to be fairly complex and vary for each type of software, but if you can distill the process into these steps, you can much more easily automate that process.
Where people go wrong
The big key to my simple version of deployment is
Build software without configuration
You MUST do this! Departing from this step causes all kinds of pain and complexity. Please don’t try to build your software and the configuration for an environment at the same time. These things must be pulled out from the get g or you will have the pain of trying to tease them apart later – or you will have to create separate builds for each environment.
It is also critical that the same bits that were built by your build server are what is deployed to each environment!
I will say that this isn’t the easiest problem to solve. You may need to have a separate build process that builds up the configuration for an environment.
Separating the two will also force you down the path of building a process to apply that configuration to an environment.
But, if you are willing to accept that this is just a must and bite through this pain, you’ll come out on the other side clean (even though you had to crawl through tunnels of crap.)
The whole story
Now that I’ve hopefully convinced you to separate your configuration from the building of your software, let’s go over the big picture of a deployment using the simple process outlined above.
It all starts out when you build your software. Perhaps you have a continuous integration build server setup that is automatically building software on each check-in; perhaps you are manually kicking off a script.
Once you have built your software, you have some bits that you should be able to apply to any environment. Nothing that you built here should be machine or environment specific in any way.
Now, you kick off another process, or perhaps one was kicked off simultaneously by your continuous integration server. This builds up the configuration for the environment you are going to deploy to.
A similar process is kicked off—also could be simultaneous, for generating a list of database changes that need to be applied to the target environment.
Now that you have your bits, configuration and database changes, you are ready to deploy.
If you are smart, you’ve even built these ahead of time and they are just waiting for when you need them.
Next, gather up the artifacts and move them to the deployment target where you actually apply them.
First, unpack your bits and put the new bits into place. (You may or may not need to take your application fully offline to do this.)
Then apply the new configuration on top of your newly installed bits for that environment.
Finally, apply database changes for that environment.
Now you should be completely deployed and can start up your application.
But how do I do it?
Perhaps you agree with me that the actual process should be what I have outlined and described, but now you are at the point of implementing a solution.
How do you actually automate this stuff?
Good question. If you figure out a simple answer, let me know.
This is the point where you might be writing custom tools and scripts to get all this working. The key is to take it one step at a time.
There are at least two tools out there that I know of that help you do this. I can’t speak for either of these tools, since I haven’t used them myself, but I have heard good things about them.
One other thing to consider is how you are going to get the right stuff to the right server. You will want to think about things like:
- Promoting build products
- Preloading promoted products to servers to make deployment faster
- Getting through firewalls by having the software or some other process PULL the upgrade to your target, rather than you PUSHING it there.
- Rollback, or some kind of mitigation strategy if things go wrong. (My recommendation here is not to get fancy. I have NEVER seen a successful rollback, only a database restore followed by a manual code restore. If you mess up bad, just count on restoring the machine and the database.)
Well, I finally reverted back to making my dev environment non-virtualized.
I spent the last couple of months testing out running different VMs for each function of application development I was working on, and I am pretty convinced that technology has not yet reached the point where this makes sense.
I’m not saying you can’t do it. I’m just saying it isn’t worth the cost.
My initial assumptions or theory
I first thought that this would be a good idea for several reasons:
- I might actually get a speed boost by having dedicated virtual machines for each function of application development, by not having the other cluttering programs installed on the machines that didn’t need them.
- I would be able to only use what I needed at the time, thus reducing the overhead I would incur.
- I would be able to keep each virtual machine clean to its purpose, so it would never get cluttered up and need to be repaved.
- I would be able to transport my machine to any computer just by transferring the VM.
- If I hosed a machine I could restore to a known good state from a backup.
Most of these were decent assumptions. To give you an idea of what I was really aiming for, you have to understand my VM setup.
- Web Development Machine
- Mobile Development Machine
- SQL Server Machine
- Other / Side Project Development Machine
- Test Dangerous Stuff Machine
The basic idea was isolate in order to make things clean and orderly and increase performance on single tasks by only using what I need.
It wasn’t a total disaster
But it wasn’t exactly what I had planned.
Things worked out pretty well, and I could have kept operating in that mode, but there were enough irritations that I decided to bite the bullet this weekend and axe the VMs for normal development work.
The biggest factor that made me end this experiment was performance. I had my VMs running on my SSD drive, but I still felt that the performance was suffering quite a bit.
Compile times and general Visual Studio user experience just were not what I thought was tolerable for every day use. Considering how often I compile in a day, even a small bit of a performance hit is magnified.
It was definitely nice to have development activities separated out to the point that I could just fire up certain VMs to do certain kinds of work, but it also would be a bit of a pain when I got to my machine and found the wrong VM was running or that I needed to apply windows updates to 5 machines.
Having SQL server on a separate box seemed like one of the best ideas but when I really think about it, SQL server running on your machine only really consumes memory when it is not actively being used and I have 16 gigs of RAM. Also losing that auto complete you get for connecting to a local instance of SQL server was a fairly large price to pay.
Another big issue turned out to be tools and tooling. Turns out many of the tools I use aren’t specific to a certain development task. Things like text editors and R# needed to be installed on each VM. Often I would find that the tool I wanted wasn’t installed on the VM that I was using and it became a headache keeping everything up to date and installed in the right place.
One issue I didn’t expect was the use of monitors. Using VMWare I could expand a VM onto multiple monitors but I found that it was irritating when the VM was covering up a window on the main computer and I needed to switch back and forth between multiple machines. In my configuration, I have 6 monitors hooked up to my PC, so for most people this probably wouldn’t be as big of an issue.
I had some big dreams of using Unity mode to allow me to use applications from different VMs and make it all feel like it was on the main PC, but that technology just isn’t quite what it needs to be to make it worth the cost of efficiency. Right now it seems Unity mode is rather slow, error prone, and hard to use.
For me it comes down to this…
I want my workstation to be as fast as possible when I am doing disk or processor intensive tasks. PCs are not at the point yet where we have so much CPU and disk speeds are so fast that everything is still virtually instantaneous in a VM. If we get to that point, I won’t feel like I am making a sacrifice in a VM.
I want to be able to seamlessly switch tasks and monitors. The reason I have 6 monitors on my computer is because I like being able to drag any window anywhere and keep right on going. Running VMs put a bit of a stutter step into my window flinging.
I don’t want to maintain 5 different operating systems. You don’t really notice the effort when you are just maintaining one, but when you are trying to keep 5 operating system up to date with patches, software updates and everything else, it become a pain.
Sure, I can’t have that nice clean, my computer just stepped out of a shower feeling as I open up my VM targeted exactly for the task I am about to do. Yes, I have to put all kinds of junk into my registry and feel “icky” about it. SQL server is always running in the background chomping up a gig or so of my RAM.
But! I am running about as fast as I can and I have the agility I need to be more efficient.
As always, you can subscribe to this RSS feed to follow my posts on Making the Complex Simple. Feel free to check out ElegantCode.com where I post about the topic of writing elegant code about once a week. Also, you can follow me on twitter here.
When we last left off we had just gotten our BATs as part of the acceptance criteria for any new backlogs that are worked on. This puts us at a point where we could really say that we have successfully implemented BAT testing.
You don’t want to get too comfortable just yet though, because the next hurdle you will most likely face will be the problem of not having enough time to execute all your tests.
You want to think about this ahead of time
Nothing worse than getting everything going and then not being able to execute the entire test suite, because you didn’t plan ahead.
You don’t want to get to the point where confidence in your BAT suite it lost because you are not able to get all the tests executed in a reasonable amount of time.
The more frequently your tests are run the more value they have.
By reducing cycle time from the time a breaking change is made in the system and the time it is discovered, you greatly reduce the risk it imposes on your software and you decrease the scope of the code changes in which it could have occurred.
To put it simply, the faster you can find out you broke something, the more likely you can fix it before it does damage, and the more likely you will be to know what thing you did caused the breakage.
How can we reduce cycle time?
There are a few different strategies we can employ and we can mix and match some of these strategies.
Straight forward parallelization
The best and most effective thing you can do is to take a given test run, split up the execution of those tests on multiple machines and execute them in parallel.
This approach is going to give you the best bang for your buck. You should really try to get some amount of parallelization going before attempting any other solution, since it is going to make a huge impact on the total execution time for your tests without making any sacrifices.
There are even many ways you can mix and match to do parallelization:
- Use multiple physical machines
- Run many virtual machines on a single host
- Put an executor program on every team member’s machines that will execute tests when that machine is idle or put into test execution mode (perhaps at night)
- Use a cloud based computing platform to execute your tests
- Run multiple browser instances on a single machine
With this approach, you would preload some of the test data that your tests might generate by manually clicking through screens.
This is fairly hard to explain, so let me give you an example:
Suppose you had a set of tests that all involved creating customers, but each customer you create in the system takes about 3 minutes to create by clicking through the screens to get them into the system.
We don’t need to have 500 tests all executing the same exact logic in the system 500 times for 3 minutes just to generate all the customer data that will be used in the tests.
Instead, we can leave a few tests that are exercising the customer creation functionality, and we can execute a SQL script to push all the other customer data into the test database for the other tests to use.
Using this technique we might be able to reduce our total execution time by 3 minutes * each test, or about 25 hours for 500 tests.
This can be a huge savings in time, and it doesn’t come at that high of a cost. The sanctity of our tests is slightly compromised, but we are taking a calculated risk here knowing that we already have covered the area of execution which we are preloading data for.
Consider this technique when you notice certain things in the system taking a very long time to do.
Test runs by area
With this technique, we can reduce the total execution time in a given period by splitting up test areas and running them either at different times or in response to changes in certain areas.
You have to be very careful with this approach, because if you don’t do it correctly, you can start to erode some of the safety your BAT tests are providing you.
I would only do something like this as a last resort, because it is so easy to virtualize today, and hardware is so cheap.
I’d much rather run too many tests than too few.
With test randomization, we are going to take our total desired execution time, and divide it by the average time for running a test. We then can use that number of tests to run to randomize the execution of our tests each time we run them and only run the number or tests that will fit in the desired execution time.
This choice is also a compromise that I typically don’t like to take.
It can be very useful though, combined with other techniques when you still don’t have enough time to execute your entire test suite.
The basic idea here is that you are going to randomly run tests each time to fill up the time you have to run tests.
This one seems fairly obvious, but can be very helpful.
Often I will see teams starting out with automation, trying to write way too many BAT tests for a given feature. Sure, with automated tests it is possible for run a test for every single possible combination of values in your 5 drop downs, but will it really benefit you?
In many cases you have to think about what you are trying to protect against with your BATs. Sometimes running every combination of data selection choices is going to be important, but other times you are only going to need to write tests to test a few of the happy path scenarios.
It is important to find a balance between test coverage and test volume and not just for execution time. There is a logistical overhead to having a large volume of mostly redundant tests.
So even though this technique might seem dangerous and counter-productive, I will almost always employ it to some degree.
Here are some of the things you might want to watch out for as you are scaling out and streamlining your execution of BATs:
- Parallelization issues. If you are using shared state, you can run into big trouble when your tests are executing in parallel. There are many manifestations of this. You could have issues at the database level or at the local machine memory level. The best way to avoid this kind of problem is to use separate virtual machines for each test execution, and not reuse data setup between test cases.
- Ineffective error reporting. If you run a huge volume of tests in parallel, you better have a good way to sort through the results. It is much harder to figure out why things failed when they are run across multiple machines.
- Test order dependencies. Make sure tests don’t rely on being run in a certain order or you will have lots of pain when you disrupt that order.
- Environment setup. Make sure all your test execution environments are exactly the same unless you are specifically testing different environments for execution. You don’t want tests failing on one machine but passing on another.
As always, you can subscribe to this RSS feed to follow my posts on Making the Complex Simple. Feel free to check out ElegantCode.com where I post about the topic of writing elegant code about once a week. Also, you can follow me on twitter here.
Once you’ve built some smoke tests with your shiny new automation framework, you are going to want to get those smoke tests up and running as soon as possible…
But! You might want to consider holding off for a second and reading this post!
It is worth taking a bit of time and thinking a bit about the strategy and psychology of adding the smoke tests to your regular build process.
There are several things we are going to want to discuss and think about before adding automated tests to a build.
Why are we adding the smoke tests to the build?
I think it is always important that before we do something we ask ourselves why we are doing it. Doing things, because someone said it is good, or because it seems right is not a very good reason.
The primary benefit we are going to get by adding smoke tests to the build now instead of waiting for a larger portion of tests to be written is that we are going to be able to immediately tell if a basic functionality of the system is broken instead of finding out after we have deployed the build to different environments and used it.
We want to be able to get this value as soon as possible. We don’t want to have to wait until we have a huge amount of tests, because even a few tests will give us some value in this area.
The other reason we are adding the tests to the build is so that we can notify developers when a test fails, so that they know to fix whatever was broken. By having the smoke tests run as part of the build, we are able to reduce the amount of time before a break is introduced and the break is discovered.
This concept of reducing the delta between introduction and discovery is very important, because it makes it much easier to identify what caused a break.
Perhaps a distant 3rd reason to add the smoke tests at this stage, is to prove out our framework and technologies. Better to start small with a small number of tests and get everything working.
With those ideas in mind, we can move onto some of the more important considerations.
Don’t ever add failing tests to the build
When you add your smoke test suite to the automated build, all of the tests should pass!
I will say it again, because it is really important and ignoring this advice may doom your entire BAT project.
Do not ever add failing tests to the build!
It seems like an innocent enough thing, but it causes great harm, because it is very much like the boy who cried wolf. Consider what happens when we introduce our new BAT system and we add our smoke tests to the build, but we have 3 failing tests.
First thing that will happen is that our tests will run during the next build and 3 will fail. This should cause a build failure notification of some sort to go to the team, at which time you will be in the uncomfortable position of saying something to the effect of “ignore that fail, those are real defects, but they are known issues.”
Next, let’s assume a developer checks in some bad code and it causes a 4th test to fail.
Do you think most people will notice this 4th failing test?
Even if they do, do you think they will just assume it is another known issue?
Worse yet, they may think your BATs are meaningless and just fail randomly.
Do you see where I am going with this? We want to start off with the correct expectations and we want test failures to be meaningful. To ALWAYS be meaningful. The second you have test failures that are not meaningful, you lose credibility for any future meaningful failures.
This is such a critical psychological battle with build systems and automated tests that I have often implemented a policy where no one is allowed to work on anything else but fixing the build or test failure when a build fails. When you have 20 developers either sitting idle or working on one issue for 2 hours, it sends a pretty clear message that we take build failures seriously.
Make sure your tests never undermine this message.
What do I do with failing tests then?
If they are real defects, put them in your defect tracking system or create backlogs for them and make sure the tests are disabled until those defects are fixed.
Your automated tests should be acting as regression tests for things that are already working. It is absurd to think that it is even close to cost effective to write automated tests to actually functionally test new features or code!
It is the process of writing the automated test that should uncover defects. Running of automated tests should uncover new defects where functionality has regressed from working to not working.
Here are a list of some other considerations you will want to think about before embarking on this journey.
- How will you actually run the tests? (Will you use the build script to kick it off, some 3rd party plugin, etc)
- Where will you run the tests? (Will you use the build server itself, or another machine. You probably don’t need to worry about scaling out to multiple machines right now, but you should at least keep it in the back of your mind.)
- How long will it take to run the smoke tests? (Fast is better, because feedback cycle is reduced. If you have a really long test suite, you might want to pare down the smoke tests and run some with every build, and all of them every night for now.)
- Will developers be able to run the tests locally? (Make sure you consider how to do this before putting the tests into the build. Nothing more frustrating than being measured against a bar you can’t yourself see.)
- How will you notify and report results from the test failures? (We don’t care about passes, only failures, but you need a good way to report the result that prompts action.)
- What is the policy for test failures? (Make sure there is some agreement on what kind of action will be taken when tests fail. A great place to start, if you can get management to buy in on it, is all work stops until all tests pass. Extreme, but extremely effective.)
- How will you prevent false fails and identify them? (Don’t get stuck with egg on your face. If your framework fails and the problem is not a real defect, you need to be able to quickly identify it and fix the framework problem. There are a few strategies for doing this, but that is for another post.)
- How do new tests get added and commissioned? (You are going to be constantly growing your test suite, so you will need a good method for adding tests to the smoke test suite.)
Remember, you only get one chance at a first impression!
As always, you can subscribe to this RSS feed to follow my posts on Making the Complex Simple. Feel free to check out ElegantCode.com where I post about the topic of writing elegant code about once a week. Also, you can follow me on twitter here.
So I’ve had Enterprise Integration Patterns sitting on my bookshelf for quite a while now. I had skimmed it a few times, but never really gave it a read.
It’s a hefty book that you could definitely use to cause some major kidney trauma to an unsuspecting DBA if you sneak up on him from behind and jab the pointy end of the book into his unprotected backside.
I finally got around to reading this because it is one of my last remaining analog books. It is part of my quest to cleanse my life of all possessions that are not digital or are not monitors.
This book is really about messaging. Don’t let the title fool you. Definitely consider picking up this book if you want to use a messaging platform like BizTalk, MSMQ, or JMS to integrate several applications together.
- This book is extremely detailed about each messaging pattern, when to use it, and how to implement it. If you are seriously going to consider implementing a messaging solution, you need this book. Honestly, I have done some messaging without it and now that I have read it, I feel like I really missed the whole point before.
- Multi-language / technology. This book is generalized enough to not push you in a language or technology decision, but has specific examples in Java and C#.
- Simple to understand. I was rushing through this book because I wanted to get through it and I found that I was picking up pretty much every concept being thrown.
- Excellent reference. I can see using this book in the future to go back and solve some sort of problem dealing with messaging.
- Broken down into perfect size pieces. If you read this book, you should have all the tools you need to solve any kind of complex messaging scenario. By thinking of messaging in terms of the patterns or blocks in this book, very complex problems become much simpler.
- It is freaking long. Seriously. This is a long book. It has some diagrams and some code, but it is long. Get ready for an adventure.
- It’s a little dry, probably because it is so long. Some of the code examples are a bit repetitive, and no one ever wants to see XML soap bindings on pages in a book.
What I learned:
I have to say that I really did learn a large amount of information from this book. I really feel like I got a good understanding of how to apply messaging patterns to various sorts of problems.
I feel like this book gave me a really big toolbox of all the possible tools that I would need to solve any messaging pattern.
I also learned just how easy it is to use MSMQ and JMS and throw messages on a pipe. It’s really not that bad.
After having used BizTalk a while ago and not really understanding what it was or what it was trying to do besides allow you to change file formats between a bunch of different clients, I feel like this book definitely opened up my eyes to the true value of a solution like that. If I had a BizTalk project now, I am sure I would be much more effective after reading this book.
Overall, I would definitely recommend that every developer that is working with messaging read this book. Even if you are not, I would still recommend reading it so that you can have your eyes opened up to how messaging can solve many of the problems we try to solve in create-ftp-batch-cron-jobish ways.
I spent a good time last night troubleshooting a “works on my machine” problem.
It takes pain to learn something; this pain perhaps was good. It reminded me of a concept that is really important in your software development infrastructure.
I have three golden rules of development environments and deployment:
- I should be able to run a software build locally that is the exact same build that will be run on the continuous integration server.
- Only bits that were built by the continuous integration server are deployed and the same bits are deployed to each environment.
- Differences in configuration in environments only exist in one place and on that machine.
You don’t have to follow these rules, but if you don’t, most likely you will experience some pain at some point.
If you are building a new deployment system, or you are revamping an old one, keeping these 3 rules in mind can save you a large amount of headache down the road.
I think it is worth talking about each rule and why I think it is important.
Rule 1: Local Build = Build Server Build
If you want your continuous integration environment to be successful, it needs to appropriately report failures and successes.
If your build server reports failures that are false, no one will believe the build server and you will be troubleshooting problems that are build configuration related instead of actual software problems. Troubleshooting these kinds of problems provides absolutely no business value. It is just a time sink.
If you report false successes when you deploy the code to another environment, you will discover the issue, will be wasting time deploying broken code, and you will have a long feedback loop for fixing it.
As a developer, I should be able to run the exact same command the build server will run when I check in my code. I would even recommend setting up a local version of the continuous integration server your company is using.
By being able to be confident that a software build will not fail on the build server or during deployment if it doesn’t fail when running it locally, you will prevent yourself from ever troubleshooting a false build failure. (The deployment still could fail, and the application could still exhibit different behavior on different environments, but at least you will know that you are building the exact same bits using the exact same process.)
Rule 2: Software Build Server Bits = Only Deployed Bits
Build it once and deploy those bits everywhere. Why?
Because it is a waste of time to build what should be the exact same bits more than once.
Because the only way to be sure the exact same code gets deployed to each environment (dev, test, staging, production, etc.), is to make sure that the exact same bits are deployed in each environment.
The exact same SVN number is not good enough, because an application is more than source code. You can build the exact same source code on two different machines and get a totally different application. No, I’m not loony. This is a true statement. Different libraries on a machine could produce different binaries. A different compiler could produce different binaries.
Don’t take a risk. If you want to be sure that code you tested in QA will work in production exactly the same way, make sure it is the exact same code.
This means you can’t package your configuration with the deployment package. Yes, I know you always have done it that way. Yes, I know it is painful to figure out another way, but the time it will save you by never having to question the bits of a deployment ever again will be worth it.
Rule 3: Environment Configuration Resides in Environment
Obeying this rule will save you a huge amount of grief.
Think about it.
If the only thing different in each environment is in one place, in one file in that environment, how easy will it be to tell what is different?
I know there are a lot of fancy schemes for adding configuration to the deployment package based on what environment the deployment is going to. I have written at least 3 of those systems myself.
But, they always fail somewhere down the line and you spend hours tracing through them to figure out what went wrong and ask yourself “how the heck did this work again?”
By making the configuration for an environment live in the environment and in one place, you take the responsibility of managing the configuration away from the software build process and put it in one known place.
I love talking about tools and automating. I’ve written about having a dedicated developer tools team, and what you should automate. This time I want to talk about choosing between what I call vertical difficulty and horizontal difficulty when solving a problem.
Horizontal difficulty is difficulty that is associated with just doing the work as the current structure or tooling exists at that moment.
Consider the problem of moving a washer and dryer. If you have no tools and you just have to lift it, there is some horizontal difficulty involved.
In programming terms horizontal difficulty might look like writing a complicated SQL statement with multiple conditional joins because the data is all over the place. Or writing a web page without using a framework because your application doesn’t have one.
This is the difficulty associated with mainly building tools or frameworks. It is the kind of difficulty that exists in simplifying a problem by going a layer up to “meta” solve the problem.
If you are familiar with Calculus in mathematics, Calculus is an example of what I would call vertical difficulty. Many mathematical problems are solved through the use of Calculus by taking the level up one higher and solving the problem there.
To keep with the same example of moving a washer and dryer, the vertical difficulty would be building a cart or dolly to move the washer and dryer. An important point here, which I will make again, is that in many cases the amount of raw effort required to build a dolly or cart, or even to figure out a way to procure one, will be equivalent to the effort required to move the washer and dryer.
In terms of code, vertical difficulty might be creating an error handling framework, creating a custom control for your web page, using views to simplify SQL data access, or even to repartition and move data to make a better model.
Where horizontal difficulty represents brute force, vertical difficulty represents mental fatigue.
What about that sawhorse?
If you are familiar with woodworking or construction, you will have no doubt seen a sawhorse. A sawhorse is platform that can be used to hold something so you can cut it.
Sawhorses are usually constructed on the jobsite before any other work begins.
Well, have you ever tried to hold a piece of wood and cut it straight? How about searching for different objects in your garage that you can prop the wood on so that you can get it high enough above the ground that you can put a saw through it?
Experienced craftsman build the sawhorse first. They don’t start cutting pieces of wood and then build the sawhorse. An experienced craftsman knows that by building the sawhorse first, he will save time by not wasting time on each cut. His cuts will be more accurate and he might just be able to bring that sawhorse to his next job.
Every time you sit down to solve a programming problem, you should think about whether or not you should be building a sawhorse first.
Are you saying always build the sawhorse?
No, not at all. If you are going to cut one piece of wood, do not build a sawhorse. If you are going to cut two pieces of wood, don’t do it either. I won’t tell you how many pieces of wood that it will take to pay off, but I will tell you 3 things:
- It doesn’t take many cuts for a sawhorse to pay back the time it takes to build it.
- The more sawhorses you build, the faster you get at building them.
- You are always wrong about how many cuts you are going to make. When you estimate 3 it might end up being 20.
Vertical vs horizontal difficulty
It is very important to weigh out the pros and cons of each before making a decision which way to go. I am, of course, going to try and lean you towards choosing vertical difficulty over horizontal most of the time, but ultimately it is up to you.
Let’s look quickly at some pros and cons for each (very generalized.)
- Can follow a well ridden path. Usually there is an example of how to solve the problem already. (Someone has done it before.)
- Less thinking, you just follow the approach and go; after some amount of hours you will be done. (Consider copy and pasting each cell of an html table to a spreadsheet, vs writing a program to parse it.)
- Less risk, you are very likely to get to your destination with minimal problems.
- Boring. This is not really going to challenge that programmer blog reading brain of yours.
- You or someone else will be probably doing the same thing again. Solving the problem once only helps to beat down the weeds in the trail, but it doesn’t make it shorter.
- You might be building on top of a bad foundation. By adding one-offs as individual solutions to the problem, the general case can become more hidden. (If you want to solve the problem better later on, you make it harder each time you solve it the horizontal way.)
- Simplified working space. Once you solve a problem a vertical way, you end up building an abstraction that makes the problem seem easier at the lower level. (Think about connectors on your motherboard vs individually connecting each wire.)
- Reuse. Many times when you solve a problem the vertical way, you can reuse that solution to solve future problems in almost no time at all. (Build connector couplings for wires and next time you can just snap them together.)
- Bigger picture understanding of the system. When you take the time to go up a level and solve a problem, you can see the bigger picture better and can understand the system as a whole better. This will lead to better solutions and fewer mistakes later.
- You are developing a skill that is multi-purpose and can be applied more widely than a very specific skill which might be developed in a horizontal solution. (Thinking about working at McDonald’s vs running several McDonald’s.)
- Clean. Usually you will end up with less code. Less code means less bugs. Changes happen in one place instead of 50.
- It can be hard mentally. It can require a higher level of skill. Not everyone who can solve the problem horizontally can solve it vertically.
- Higher risk. If you mess up along the path of the horizontal solution, you can probably go back a few steps and fix it. If you mess up along the vertical solution, you might have to scrap it and start over. (Building a house vs building a microchip.)
Okay, that’s it.
Wait, what? Did you say I forgot the biggest con of Vertical difficulty?
No, I didn’t. I left it out on purpose.
Vertical difficulty does not always mean it takes more time. Sometimes it is actually faster to do the vertical difficulty path even when “cutting one piece of wood.”
I have seen Perl programmers and gurus parse through text or whip up a meta-solution that can solve a problem faster than I could have done it manually once. And they have a script around to do it again.
I have seen VI wizards edit the heck out of a text file much faster than I could point and click to do the same thing.
Scripting languages and editors like VI are designed for solving vertical problems. When you are using VI and issuing commands to edit text, you are solving a vertical problem. You are operating at a high level to edit a text file.
Many times you will find that the vertical solution is not only faster the first time you implement it, but it also makes the solution almost instant the next time around.
I’ve been seeing it and hearing about it more and more, and every time I do, I cringe.
“Hey, how do I test out our web app?”
“Umm… ok, that sounds great.”
What this is doing
What I am talking about is going to C:\windows\system32\drivers\etc\hosts and adding entries to that file that map ip addresses to host names.
The hosts file is basically a mapping for ip addresses to host names.
So if you put an entry,
in the file, when you type google.com into your browser you will go to bing.com instead. (That is the ip address for bing.com.)
Why would someone do this?
Well, if you have some code, automated scripts, or process that uses your real production domain name, you can modify your hosts file so that the production host name gets resolved to the test ip address.
Yes, I do realize this may seem cool and clever.
Heck, you might have stopped reading this blog and called an emergency meeting so you can tell everyone on your team how they can modify their hosts file to magically make your production scripts work on test without passing in the url.
What could possibly go wrong?
Let me give you a scenario:
“Okay Jeff, I’m gonna run the tests scripts on my new workstation to make sure everything is still working right.
“Joebob, how do those fancy dandy test scripts of yours work again?”
“Well Jeff, they drop all the data in the test database and then create new test data, run the tests, and clean up the data after them again.”
… Joebob runs the test scripts and watches as they all pass, thinking about how awesome and clever he is.
RING… RING… RING…
“Hello, this is Joebob.”
“Hi Joebob, this is the CEO, Mr. Wonky, I am getting reports that there are no products at all being displayed on our production website.”
“Oh shnikies! I must have forgotten to modify my hosts file to point wonkywares.com to test.wonkywares.com. I ran my test scripts against production.”
… Joebob pulls a wonkywares revolver out of his bottom drawer, right next to the single malt scotch he keeps for “planning day.”
I am sure you can guess what happens next.
Let me list for you some of the bad things that can happen if you modify your host file to be clever, instead of parameterizing your scripts or code to be able to take a configurable url.
- You run your test scripts against production because you forgot to configure your hosts file on your new machine.
- You think you modified some production data for a data-fix, but you actually only modified test data.
- Someone else uses your machine and doesn’t know that you have a modified host file.
- You use someone else’s machine and forget they don’t have a modified hosts file, so you think you are going to the test website, but you are really modifying live production data.
- Your IT guys switch the IP addresses of the staging and production server as part of a rotation so they can do maintenance work on what was the production hardware. You now are pointing to production instead of staging and have no clue.
- Someone else tries to run your script or code and is not aware that they have to modify their hosts file first. They run your scripts against production.
- You change your hosts file and expect the change to immediately take effect, but your browser is smart and has cached the ip address of the page you just visited.
I am sure you can think of plenty more reasons.
So instead of being “clever” and modifying your hosts file, try making whatever code you are writing take in a parameter for the url to use, or read it from a configuration file.
In my previous post, I talked about the idea of having a simple branching strategy and why I prefer one where everyone works off the same branch.
In this post I will show you how to create what I believe is the most simple and effective branching strategy.
Take a look at this diagram of a sample project’s code lines:
Walking through it
The idea here is very simple. Let’s walk through a development cycle together:
- Development starts. Everyone works off of trunk. Code is frequently checked into trunk, many developers checking in code 3-4 times a day, as they complete small quality sections of development.
- The continuous build server is continuously building and checking the quality of the code every single time code is checked in. Any integration problems are immediately fixed.
- Enough features are done to create a release. Trunk is tagged for release and a release 1 branch is created representing the currently release production code.
- Developers continue to work on trunk not being interrupted by the release.
- A customer finds a high priority issue in Release 1.
- A Rel 1 Hot Fix branch is created, branched off of Release 1 to fix the high priority issue. It turns out that a good fix will take some time. Team decides the best course of action is to apply a temporary fix for now.
- Rel 1 Hot Fix is done and merged back into Release 1 branch. Release 1 is re-deployed to production.
- In the meantime another emergency problem shows up that must be fixed before the next release. Rel 1 Hot Fix 2 branch is created.
- The bug fix for Rel 1 Hot Fix 2 is a good fix which we want in all future releases. Rel 1 Hot Fix 2 branch is merged back to Release 1 branch, and merged back to trunk. Release 1 is redeployed.
- In the meantime work has been going on on trunk, team is ready for Release 2.
- Release 2 branch is created…
Breaking it down
I gave a pretty detailed walk-through for a very simple set of actual steps. But, I hope you can see how simple this process really is.
The basic idea here is that we are trying to decouple releases from development as much as possible. The team is always going to keep chugging along, building new features and enhancing the code base. When we decide we have enough features for a release, we simply branch off of trunk and create the release branch.
We can even do some testing on the release branch before we go to production if we need to without impacting future development.
The release branch code-lines never come back to trunk. They don’t need to, they only exist so that we can have the exact production code and make modifications to it as hot-fixes if we need to.
We branch hot-fixes off of the release branch so that we can work on them independently, because not all hot-fixes go back to the main code-line. We can make a hot-fix just for the current release, or we can merge it back to trunk to make it a permanent fix.
That is all there is to it. This kind of branching strategy almost completely eliminates merges. The only merge you ever do is small merges for hot-fixes.
Your branching strategy does not have to be complicated. A simple strategy like this can fit almost any software development shop.
Frequently disputed points
Almost immediately when I introduce this simple system someone says:
What about half-completed features? I don’t want to release half-completed features. Using this strategy with everyone working off trunk, you will always have half-completed features.
So what? How many times does a half-completed feature cause a potential problem in the system? If the code is quality and incrementally developed, it should not impact the rest of the system. If you are adding a new feature, usually the last thing you do is actually hook-up the UI to it. It won’t hurt anything to have its back-end code released without any way to get to it.
Continuous integration, (especially running automated functional tests), trains you to always keep the system releasable with every commit of new code. It really isn’t hard to do this, you just have to think about it a little bit.
If worse comes to worst and you have a half-finished feature that makes the code releasable, you can always pull out that code on the release branch. (Although I would highly recommend that you try and find a way to build the feature incrementally instead.)
If you know you’re going to do something that will disrupt everything, like redesigning the UI, or drastically changing the architecture, then go ahead and create a separate branch for that work. That should be a rare event though.
I need to be able to develop the features in isolation. If everyone is working off of trunk, I can’t tell if what I did broke something or if it is someone else’s code. I am impacted by someone else breaking things.
Good, that is some pain you should feel. It hurts a lot less when you’re continuously integrating vs. working on something for a week, merging your feature and finding that everything is broken.
It is like eating a meal. All the food is going to end up in the same place anyway. Don’t worry about mixing your mashed potatoes with your applesauce.
If something someone else is doing is going to break your stuff, better to fail fast, then to fail later. Let’s integrate as soon as possible and fix the issue rather than waiting until we both think we are done.
Besides that, it is good to learn to always check in clean code. When you break other people and they shoot you with Nerf guns and make you wear a chicken head, you are taught to test your code locally before you check it in.
How to be successful
How can you be successful at this simple strategy?
- Make sure you have a continuous integration server up and running and doing everything it should be doing.
- When you work on code, find ways to break it up into small incremental steps of development which never break the system. Hook up the UI last.
- Always think that every time you check in code, it should be code you are comfortable to release.
- Check in code at least once a day, preferably as soon as you make any incremental progress.
- Test, test, test. Test locally, unit test, test driven development, automated functional tests. Have ways to be confident the system never moves backward in functionality.
- So important I’ll say it twice. Automated functional tests. If you don’t know how to do this, read this.
- Release frequently instead of hot-fixing. If you never hot-fix you will never have to merge. If you never have to merge, you will live a longer, less-stressed life.
- Don’t go back and clean up code later. Write it right the first time. Check it in right the first time.
Hopefully that helps you to simplify your branching process. Feel free to email me or post here if you have any questions, or are skeptical that this could work.
Source control management has always been one of those sticky topics which always causes many questions. Many veteran programmers are baffled by the in-and-outs of branching and merging. And for good reason; it is a difficult topic.
I’ve been around in many different organizations. I’ve been the person who was told what the SCM strategy was, and I have been the person who designed it. I’ve seen it done just about every way possible, and after all that, do you know what I think? (I’ll give you a hint, it’s the name of this blog)
Keep it simple. Working directly off the trunk is by far the best approach in my opinion.
It almost sounds like heresy when I actually type it onto my screen, but if you’ll bear with me for a moment, not only will I show you why I think this is essential for an Agile process, but I’ll show you how to make it work.
Continuous integration is the root
As a software development community as a whole, I think most of us can agree that CI is good and provides real value.
What is Continuous Integration?
Continuous Integration is a software development practice where members of a team integrate their work frequently, usually each person integrates at least daily – leading to multiple integrations per day. Each integration is verified by an automated build (including test) to detect integration errors as quickly as possible.
So the idea here is to integrate as soon as possible, the faster the better. If something is going to break, we want it to break sooner rather than later.
The best way for this to happen is that whenever I check in my code it is instantly integrated with everyone else’s code and we build and test it.
This is a change from the old way of doing things, in which I am working on my code in isolation for some period of time and periodically I show up with a bunch of code and merge it all together.
The only practical way to have continuous integration is to have everyone working off the same branch. It doesn’t have to be trunk, but it has to be the same branch. Sure, there are some ways to have people work off different branches and automatically merge them when they commit as a trial integration, but that is a large amount of effort, and it can’t handle conflicts.
One of the “big dogs” in Agile is collaboration. Collaboration is pretty important. Working on separate branches tends to go against this idea. I know you can still collaborate when you have separate branches, but it is more likely that you won’t.
In general I have found that software developers in particular are the kinds of people who will go off and do their own thing if given the chance. It is not a bad thing, but it means that we need to set up developers to collaborate. If your process isn’t explicitly fostering collaboration, it probably won’t happen (at least not as much as you would like it to.)
If we are all working on the same branch we have to communicate. We have to work together to accomplish a goal. I can’t just work in isolation, because what I am doing affects you, and what you’re doing affects me. We have to talk about it.
I’m more likely to watch your code come in and say “hey, where is your unit test, bub?” You’re more likely to watch my code come in and say, “is there a reason you’re not using superWidgetX to solve that problem?”
If I check in some junk and break the build, everyone is impacted. This might seem like a bad thing, but it fosters a team mentality instead of an individual mentality. Think about boot camp (or at least what you have seen on TV.) What happens when one guy screws up? EVERYONE does push-ups. Why? To build a team, a team mentality.
I have found in Agile processes like Scrum, the best quality work is done when developers are paired up working on a backlog item. This isn’t impossible to do with individual developer branches or feature branches, but it is much harder and less likely to happen.
Keeping it simple
One of the central themes of my blog… of my life… is keeping it simple. Simple is good, it makes everything easier.
I am notorious for walking into an organization and ripping out all the ridiculous process that provides no value and putting in very simple and lean process in its place. It is in my nature.
If you have ever spent some time in what I like to call “merge hell”, you’ll understand what I am about to talk about.
I have been working with source control for a very long time. I have been the “merge master,” merging huge numbers of branches together on large projects. I have worked on developer branches, feature branches, release branches and everything in-between, and I still can’t get it right!
Don’t get me wrong. I know the theory behind it, I know how to do it. But merging can be so complicated at times that the chance of making a mistake is very high.
You really have to ask yourself a question here: “Is all the overhead you are incurring from doing your complicated branching and merging strategy resulting in a real value that does not exist over a more simple strategy?”
My general rule of thumb is, never branch unless you have to.
A simple Agile branching strategy
In my next post, I will show you what I think is the most simple and effective branching strategy. A strategy I have effectively used in the past and have developed over time. I’ll sum it up briefly here.
- Everyone works off of trunk.
- Branch when you release code.
- Branch off a release when you need to create a bug fix for already released code.
- Branch for prototypes.
That’s it, stay tuned for next time when I talk about how to make this simple branching strategy practically work in your Agile environment.