Mr. Developer, Prod is Not For You
In my Kanbanand guide I have a rule for infrastructure that says, “Production data never goes to any other environment.”
Someone recently asked what exactly this means.
It means that the database data that is in production never goes into dev, integration, qual, or whatever other environments you may have set up at your workplace. I think more important than the “what” is the “why”, and while I'm at it, I'll tell you why I won't give you, as a developer, access to production data at all.
Why production data stay in production
Let's start with a simple reason: security. Production data in most cases is going to contain at least some amount of sensitive information. If your company leaks sensitive information about customers or clients you can almost guarantee that you will have a lawsuit on your hands.
It is much better to keep production data as safe as possible. The best way to do that is to limit access to it as much as possible. If you move production data to different environments you will be greatly increasing the amount of places that data lives and correspondingly the number of people who have access to it.
Aside from just the security concerns, is the problem of reliance on exact data to recreate a problem. Debugging an issue which is seen in production should not require the actual production data. If it does, there is a problem with the tooling you have created to support your application. You might consider creating some development tools that will allow you to simulate any production issue in a different environment without having to use real production data.
I know it is much easer to just use production data, but it is a bad habit that masks other problems with your tooling and support. It also has a large cost to the infrastructure and support of an application if you require frequent data migration between environments.
Why developers stay out of production
The reasoning is very similar. We could talk again about security. We could talk again about the problems with relying on production data to debug a problem in your application.
This really comes down to a discipline and constraint issue. By enforcing this constraint and practicing this discipline, you gain quite a few things out of necessity:
- Logging must be improved to understand what happened in production
- You are forced to consider creating a user click tracking mechanism
- Test data generation is forced to be considered
- Testing becomes more complete because your team is forced to recreate issues independently rather than using production data
It might not be immediately apparent to you how these things are connected, but if you enforce this kind of a policy, you will quickly find that you will need better ways of understanding what the user did.
It may seem counter-productive to make a developer's job harder by preventing them from accessing production, but in the long run being forced to create tooling and better logging to understand how the user is using the system, and having more complete testing will save time.
Some tips on making this reality
Yes, I know, easier said than done. Let me help you get it done with these tips:
- Build enough logging into the application to make it easy to understand what the system is doing and the flow through the system.
- Build a detailed level of logging that can be turned on for different parts of a live running production system.
- Build or use tools that take data from your web server logs and application logs and translate that into a digest of how the user clicked through the system. (Doing this well will allow you to recreate most scenarios that may have happened in production.)
- Create migration tools that allow you to migrate and cleanse a particular piece of data. The goal here is not to take the whole database and dump it to another environment, but to take specific sets of data, cleanse them (get rid of all personally identifiable information), and put them somewhere else to examine and debug.
- Have ways to generate volumes of clean test data to put into non-production environments. This may look like a test script that runs though the application and clicks through many scenarios to generate test data, or a bunch of SQL scripts that populate test data tables.