Your Automation Test Sucks
Ugh, I am so frustrated after today. The very fact that I am sitting here the night before my holiday break writing this blog post should indicate my level of frustration. Basically, one of my managers asked me to turn off one of my automated checks until it is more stable.
Although I’ve heard such silly comments before, after about 20 minutes of discussion, I realized there was no convincing him that turning off any automated check, at any point in time is never a good idea. As a result, I am here, at night, venting the bitter rage of my wounded heart upon the web.
So, what is the Problem?
Our group has a dedicated team that is responsible for making sure our applications stay up 24/7. This is basically a team of 2 people: an Ops person and another Engineer who is rotated weekly. Whenever this team receives production failure notices, they must look into them, figure out if they are critical to our applications and either take action to fix the problem or contact the application’s Subject Matter Expert (SME). Often, this team needs to wake up in the middle of the night and even on weekends to look at some production failures. The job sucks!
I have an automated functional GUI check that runs every single hour to make sure that a user can login to our application. This automated check pulls up our website, logs in, and then validates that the application successfully loaded. One night, this test failed and the support team needed to wake up at 2am to take a look and make sure that our application was working. When they manually reproduced the steps, the application was fine and there were no problems. Therefore, the core set of issues my manager has with this automated check is that (and I’m paraphrasing):
- It’s unstable
- It fails all the time
- It’s unfair that someone has to wake up in the middle of the night to find out that there is no issue
Hence, the automated check needs to be turned off!
Unstable Automated Check, Really?
As a result of these ludicrous accusations, I went digging into the automated check failure to figure out the root cause of the issues. It was actually pretty simple to figure out the problem. My automated checks run in a sexy technology stack which involves BrowserStack cloud through Selenium Webdriver. Thanks to BrowserStack, every functional GUI test has a video recording with Dev Tools open, text logs, and screenshot logs. My first step was to look at the video to see why the test actually failed.
Upon further analysis, I saw that there was an iframe that took 26 seconds to load! Obviously, the automated check was waiting for the login fields to show up so that the test could proceed. But because the iframe took so long to load, the test threw an exception.
My question for you:
Is this the fault of the test for being “unstable” or is this the fault of the application for having these intermittent problems?
My argument is that this is not the fault of the check, but rather a flaw in the application. The application does not function in an efficient manner, so the check failed. I don’t care if this automated check fails once in one hundred test runs, it’s still an issue with the application. Just because we cannot reproduce it manually doesn’t mean we should turn off the automated check. In fact, as the images so efficiently captured by my check indicate, I have conclusive evidence showing the problem within the application.
The crucial question is:
Should an application take this much time to load?
If the answer is yes, then our team has no business in Software Development for producing such slow applications and actually thinking that they’re acceptable. I don’t know any application in the world that takes this long to load. Furthermore, the user encountering this slow load time may have been trying the application for the first time. And which users in today’s world are going to sit around and wait for 26 seconds for an iframe to load? Very few.
Even worse, this could have been an actual paying client, in which case, this could have resulted in complaints, frustration, and more lost resources. Either way, this hiccup amounts to a loss of income, which is not what any company wants.
Given all these considerations, it’s pretty evident that this load time is not acceptable. It’s a bug that needs to be fixed!
Does the Automated Check Fail Regularly?
Because of a well constructed automation framework, it is extremely easy to figure out how “flaky” an automated check is. I just looked in the cloud to see all of the test run sessions for that specific test and then figured out how many of those failed.
Overall, it failed about once every 8 runs. I would definitely prefer that this automated check didn’t fail so often. Considering the evident instability of the application, there’s not much that I could do in this case. Two possible, but ultimately inadequate, solutions come to mind.
Increase the Timeout?
Some automation engineers might suggest increasing the time it takes for the application’s iframe to throw an exception. Initially, that seems like a valid solution, except that it’s not.
Rather than fixing the functionality of the application to make it more stable, this solution simply covers up its failures by extending the time allowed for the application to load. Is 25 seconds not acceptable for a single page element to load? In today’s high speed internet world, this should be more than enough.
Add a Retry?
Another proposition that I heard was to add a retry before throwing the exception. Again, this does nothing for fixing the functionality of the app, but again covers it up by creating an unrealistic use case. I would argue that very few real world users would ever reload a page if it is taking too long to come up. They would rather save their 25 seconds and go to another site that isn’t wasting their valuable time. So, why would I create a retry of the check so that it passes more frequently?
But what if a user did try again and reload the page, I know I do that sometimes. Still, this is not the kind of user experience that we want to be creating for our end users. Can you imagine if all your browsing sessions included constantly pressing the Refresh button on your browser? How bad of an experience would that be?
It happened to me recently. I was trying to shop on Amazon on Black Friday and the app was super slow. I knew that they were probably under an insane load, so I refreshed the browser to give them another chance. The app still did not load. So, I left. Instead, I went on Google and started searching for deals on the items. As a result, Amazon lost income.
Anyway, the goal of this automated functional GUI check is to simulate user behavior on our application in an automated fashion. Thus, if the user isn’t going to do a retry on our application, then the automated check shouldn’t retry either.
In fact, misunderstanding the purpose of automated functional testing by the manager is what led to this conversation in the first place. First, the automated check failed and sent out a text alert to the support team. Second, the support team saw the screenshots, opened the app and then tried to recreate the failure manually. Guess what, it passed for them! Thus, we might conclude, with my manager, that the automated check is unstable. However, the video, screenshots and logs recorded during the run tell us a completely different story.
In conclusion, yes, I would love for this check to fail less often, but I cannot come up with a more reasonable solution to this problem other than fixing the damn code base.
Should the Support Team Acknowledge Inconsistent Failures?
Yes, the support team should wake up and deal with these inconsistent alerts in the middle of the night!
Look, I understand that this really sucks. In fact, I have to do it all the time, and I hate it. I’m a huge health freak and it makes me really angry to know that I have to sacrifice my sleep for something like that. I don’t even sacrifice my sleep for my wife, for whom I would sacrifice almost anything. Therefore, I feel the pain maybe even more than most, especially when I need to wake as a result of instabilities caused by other people’s applications. Ugh, my blood boils.
However, I also understand that it is my job to do so once in awhile. And the Ops team needs to do this all the time. That’s their job; it’s what they get paid for.
And as much as it sucks to destroy my health because other teams wrote bad code, it’s what needs to be done. We can be angry about it, but we should channel our anger towards the right source. The automated check is NOT the source of our frustration in this case. The automated check just reveals the unpleasant truth. The check simply shows us that the code is unstable and it must be fixed.
The root cause of the problem is poorly written and poorly tested code. Since I started testing this app, I can confidently say that it’s garbage! It’s tough to hear, but it’s true.
Therefore, rather than complaining about the check, why don’t we complain about the code? Why don’t we wake the developer responsible for this problem, make him find the root cause and fix the instability? Why is everyone suffering except the person who was responsible for the problem?
Why must a QA Engineer defend the validity of his automated checks to his manager? Why must an Ops person wake up in the middle of the night to try and resolve these intermittent problems? This isn’t fair for anyone, and problems begin in a company when people don’t feel like they are treated fairly.
Had I coded or tested this functionality, I would man up, get out of bed, and fix my error. First, because it’s annoying to the end user. And second, because I wouldn’t want to burden my teammates with my mistakes. Would you?
Ultimately, the problem does not lie with a “flaky” check as suggested by management. Rather, the problem is in the faulty code of the application, which could be easily found and fixed. Killing the messenger (aka the automated check) for telling you the truth is not going to make the truth disappear. Instead, it will just delay the truth’s appearance until it comes back as something worse, like a client complaint.
One More Thought
Let’s think about this. Even if all of my assertions are complete nonsense (which happens sometimes, I admit), are we willing to lose the functionality coverage of one of the most important components of our application, the login?
Because of a “flaky” automated check, do we really want to stop monitoring to make sure that a user can always successfully login and load the application? The user can have issues at many points throughout the flow of this use case:
- The iframe may not load
- They may not be able to login
- Maybe they logged in, but the application doesn’t load
- Maybe the application loads, but not completely
If the iframe loading can be unstable, imagine the other negative user experience problems that can occur.
To turn off, or even cover up the complaints from this test, would mean losing some of the most critical monitoring for our application. If a user cannot login to our app, they can’t use it and they can’t pay us. Therefore, it makes no sense to turn off or pad the check just so that it passes and allows our team members to sleep peacefully under a blanket of false comfort.
What have we learned?
So through this entire tirade, there was actually some really good information that I wanted to convey. Regardless of whether I was successful or not, I will re-emphasize those points here:
- Usually, a “flaky” check is not actually flaky, rather, the code running the application is the unstable culprit. If you have a moderate understanding of automated software testing and follow automation testing best practices like the Page Object Pattern or KISS, then you can be almost sure that your tests are mostly stable. Obviously, if you are doing silly things like implicit waits, waits that are unrealistically short, or other automation anti-patterns, your tests are probably much less stable.
- If you can prove that your automated check isn’t “flaky”, then the only solution to the failures is to fix the code.
- Often we are faced with tough situations where a manager may disagree with your opinion. In that case, do your best to present all of the facts regarding why the automated check is failing in the first place. Convey to your superior that covering up inconsistent failures does not actually fix the bugs. Rather, people need to be held responsible for the code that they produce.
- Test early, test often. Although I did not specifically talk about this topic, this is one of the widely accepted ways to prevent bad code from happening. By making the code more testable, testing it earlier in the lifecycle, and testing more often, we can have less production issues and more sleep. Now that sounds like a win for everyone!
What are your thoughts? Do you agree or disagree with my assertions? Have you faced such issues before and, if so, how did you handle them?