According to a Survey of Flaky Tests, co-published by researchers at UK and USA universities in October 2021, shown that even large firms such as Google and Microsoft are not immune to flaky tests – around 41% and 26% of their tests, respectively, were revealed to be flaky. The same survey found that flaky tests jeopardize CI efficiency: 47% of unsuccessful tasks that were manually restarted succeeded on the second attempt.
In this post, we will follow a couple of our client's development teams as they overcome flaky builds and change their productivity. What is evident is that time spent re-running and troubleshooting flaky builds is time wasted for other tasks.
Second, the time spent waiting must be compensated for. In addition, team members who switch contexts to concentrate on another task of work while waiting, lose attention.
Third, delayed feedback on the impact of application modifications hampers learning and waste.
The problem is more frequent than you may think.
It's fairly unusual for a build system to reach a point where, when a pipeline fails, the team's initial reaction is to investigate whether it failed for reasons unrelated to the codebase. Perhaps there is an issue with a build agent that is performing one of the tasks in the pipeline. The agent, for example, might have ran out of storage space or gone offline. Perhaps the issue is caused by a mismatch between tool and library versions on build agents, developers' machines, and the production environment.
What if the failure is caused by non-determinism and interactions between tests inside the build, or, even worse, past builds that have changed the build agent's state?
Flaky builds reduce productivity and reduce the quality of your tests.The CI system at one of our clients was holding them down: "Performance may occasionally be pretty bad and inconsistent," admited our client's CTO . Team performance was being hampered by wasting time resolving CI performance issues and just having a sluggish feedback loop.
Another customer we had the chance to talk was also affected by faulty build and test issues. As a result, the value of the team's CI setup declined. First, a significant amount of time was wasted retrying builds to determine whether or not a failure was genuine. Second, one of their developers put in a lot of time and effort to troubleshoot the fragile builds. Third, and most significantly, random failures began to obscure actual problems. Failing tests were causing them more inconsistences because If the build always or randomly fails, you no longer look at the build result.
Step 1: Commit to fixing the issue.
The first step in resolving any problem is to recognize that you have one. And, because you're most likely working in a team, this entails convincing everyone on the team to see the problem — and commit to addressing it.
Flaky builds and/or tests occur in every reasonably complicated code base at some point. What counts is how you deal with it. You will solve it if you act. If you do not, difficulties will escalate and working on the project will be a horrible experience.
Step 2: Improve stability, isolation, and repetition by using controlled build environments.
Current CI/CD best practices make extensive use of virtualization to provide the level of consistency required to address the problem of flaky builds. Spinning up a new virtual machine (VM) for each task ensures that the runtime environment for each job is clearly defined and reproducible, that jobs are firmly segregated from one another, and that configuration is consistent across all builds. This eliminates a key source of erratic, difficult-to-replicate mistakes.
Another advantage of virtualization is that it saves money. When you outsource CI/CD to a cloud-based solution, the vendor manages VM images on your behalf. Not only does this decrease the effort on your team, but it also ensures that the images remain stable and have the most up-to-date utilities installed.
Even with rock-solid build machines, situations that require debugging will arise. Configuration concerns, for example, might occasionally arise during acceptance tests, which are often run in CI rather than on developer computers prior to releasing a commit. Build systems often gather logs, provide debugging of running and completed operations, and store artifacts.
However, there is another method to provide developers with even more access to recreate and debug issues: Docker.
Step 3: Use containers to create bit-exact development, CI, and production environments.
Docker changes the developer experience when it comes to debugging build failures. Docker containers allow developers to replicate and troubleshoot difficult build tasks on their own system in an identical environment. There is never a need to limit build system capacity by bringing nodes down to debug a task, and there is never a need to log in to build machines and debug when your chosen tools are not accessible.
Furthermore, executing build tasks in Docker containers gives you more granular control over the runtime environment. Another advantage is that running Docker containers is faster than running a whole VM.
One of our client's number one decision he took was to shift the test environment to containers and the use of a Test Management solution like TestQuality that integrated with GitHub and Jira, which were our client's issue and requirements trackers.
Thanks to TestQuality they found a solution that offered a rich set of integrations for a wide variety of test automation and unit testing tools and frameworks such as Selenium, Cucumber, PyUnit, and others. Integration with CI/CD tools such as Jenkins and CircleCI were also options used to automate uploading testing results from their DevOps pipeline into their Test Management workflow.
TestQuality's Test Reliability analisys to detect Test Flakiness
TestQuality's Analyze main Tab offers several test measurement options:
- Test Growth: It measures your test growth over time. How many tests you have and what is the breakdown of those tests, automated or manual.
- Test Quality: Based on the execution of your tests, which were useful and those that were not as useful. You will also see those tests that are highlighted for quality reasons. Tests that have not been run yet etc.,
- Test Reliability: This is an special tab for us since it shows how flaky our tests are. Do they fail and then succeed many times? Are they useful to you, or will they be ignored by your team? Test reliability will help you identify those tests that are flaky. In the graph each test's flakiness is displayed as icons.
- Test Requirements: This option analyzes and Identifies which of our tests are linked to requirements in our linked repository.
Finally, some thoughts
Flaky CI builds may be a difficult issue, and there are always project-specific aspects that we cannot address in this post. However, the guidance provided above should set you on the right route toward developing a completely healthy continuous integration process with the added value of a Test Management Tool like TestQuality that offers an integration engine that allows you to connect to pull in automated test results from popular CI/CD, Test Automation, and Unit Testing systems..
TestQuality can simplify test case creation and organization, it offers a very competitive price but it is free when used with GitHub free repositories providing Rich and flexible reporting that can help you to visualize and understand where you and your dev or QA Team are at in your project's quality lifecycle. But also look for analytics that can help identify the quality and effectiveness of your test cases and testing efforts to ensure you're building and executing the most effective tests for your efforts.
Sign Up for a Free Trial and add TestQuality to your workflow today!