Software testing has been an established discipline as old as software development itself. We have seen significant evolution of testing practices recently specially driven by Continuous Delivery and DevOps, where testing is increasingly integrated with agile development and other software lifecycle practices.
Despite this, we continue to see evidence (from industry surveys and customer communications) that testing practices – while probably meeting testing goals -- fall short of helping to meet the overall DevOps goals of enterprises – especially velocity and quality.
For example, the latest 2022 State of DevOps (DORA) report indicates that low to medium maturity enterprises (that comprise a whopping 89% of the total survey population) continue to struggle with velocity (Lead Time for Changes) and quality (Change Failure Rate). See figure below.
For medium/low maturity enterprises, Lead Time for Changes (LTC) this metric has not improved over the past several years of DORA surveys.
On the other hand, the Change Failure Rate (CFR) has increased significantly (since the 2021 DORA report) – for low maturity enterprises (from 16-30% to 46-60%) while staying the same for medium maturity enterprises. This means that 89% of enterprises have change failure rates exceeding 16%! We have seen a notable exception to this in financial services organizations – which obviously are very risk averse – and tend to have low CFR across the board. However, we found that they have very high change approval overheads (often multiple days even for a small change) that slows down LTC.
Reducing Lead Time for Changes (LTC) is the most desired DevOps goal for most enterprises as they seek to accelerate innovation and improve customer experience. However, we see that as enterprises try to improve velocity (reduce LTC), quality (CFR) suffers. When enterprises try to improve quality (reduce CFR), velocity (LTC) suffers.
Clearly, a vast majority of enterprises have a challenge balancing velocity with quality. When we speak to our customer testers, we see that they generally meet their traditional testing goals, for example: test coverage, code coverage, test automation % etc. However, many of these testers are not aware of organizational DORA metrics goals, or the impact of their testing activities on LTC and CFR.
What this means is that (a) their testing processes are not optimized to achieve this balance, and (b) there is a lack of alignment between organizational quality (the “outcome”) goals and testing (the “activity”) goals. See figure below. The acronyms CI refers to Continuous Integration while CD refers to Continuous Delivery – the key constituent phases of CI/CD pipeline.
What we need is optimization of test processes to better meet the business needs driving DevOps initiatives. In this blog, we will discuss 8 key practices that are necessary to drive this optimization. For testing to be truly “Continuous”, every testing sub-process and testing tool must operate in a continuous manner – hence “Continuous Everything”. Let’s dive in.
So why do enterprises have challenges balancing velocity with quality? There are several reasons, summarized below.
One of the top reasons for high LTC and CFR in enterprises is lack of balanced testing. Most surveys indicate that enterprises have un-optimized test plans – they either test too much (wastes effort and time – especially significant during the CD process, increasing LTC) or test tool little (increasing quality risk like CFR). Some studies have indicated that there may be up to 67% bloat in test cases for bespoke applications, whereas others (specially packaged apps) have up to 56% gaps in desired test coverage.
Combine this with the fact that many enterprises have inverted testing pyramids, which means that most of the time-consuming testing activities are happening in the CD part of the life cycle – thereby increasing LTC. For example, I have heard from customers who are defining test cases and test data for CD phase tests during the CD process itself, hence holding up the pipeline and inflating LTC.
When release deadlines loom closer, inevitably some of the CD phase tests get left out, resulting in under-testing, and increasing risk of high CFR.
In addition, many enterprises have a lengthy pre-production end-to-end load testing process (vs doing continuous performance testing) which significantly adds to the LTC.
Most enterprises generally have 3 test environments in the CD phase of the lifecycle as shown in the figure above – some have even more – like a dedicated performance testing environment. Configuring, provisioning and correctly setting up these test environments across multiple dependent application pipelines is a major bottleneck that amplifies the LTC.
Some enterprises have reduced the time for provisioning test environments using cloud and infra-as-code capabilities. However, it still takes significant time to deploy the right application assets (e.g. application binaries, test scripts, test configurations, test data etc.) into these environments – especially for complex applications with multiple dependencies – which may take hours or even days.
Also, distributed applications may have dependencies on other application components that may be part of their own (parallel) executing pipelines and have to be synchronized to allow tests to happen across these dependencies.
Test data is one of the biggest bottlenecks in test environment setup, especially for complex applications. This can be further complicated by factors such as data dependencies, data volume and quality issues, and data versioning challenges. Un-optimized test data results in either under-testing (risking CFR) or waste (increasing LTC due to time taken to set up data-stores). In addition, a significant number of enterprises (29% to 37%) were found to use un-protected copies of production data as test data in their test environments. This risks PII leakage, data breaches and data privacy non-compliance (e.g. GDPR) issues. Such issues often have consequences more severe than high CFR alone.
A corollary of the point 2 above, it becomes difficult to keep test assets updated in these environments (and in sync across inter-related pipelines) in the face of rapid change where multiple builds may happen during the day, triggering frequent re-deployments and re-tests.
This poses significant issues for enterprises where test assets and testing processes are not fully integrated with CI/CD processes and engines, thereby requiring manual (and often erroneous) intervention. This not only leads to delays in testing (impacting LTC), but also inconsistencies in test results, and difficulty in identifying and resolving issues (impacting CFR).
For example, I have encountered customers that manually refresh their test data for each test cycle in each environment, since their test data management system is not integrated with the CI/CD engine.
Very few enterprises are able to accurately and efficiently predict quality and risk of change failure before changes are deployed to production. This is onerous to do manually, and algorithmic approaches often do not produce useful information.
Most release/QA leaders look at traditional testing/QA metrics (such as test/code coverage and test pass/fail metrics, code quality data) to determine release/deployment readiness. As we discussed before, many of these classic testing/QA metrics satisfy their goals/thresholds, yet we still have failures when software changes are deployed to production.
This is often because testing (the activity) goals are not aligned with quality in production (the outcome) goals, and test plans do not (or are unable to) fully address all the myriad failure modes that the software can experience in a production environment.
As I mentioned before, to get around this, risk-average enterprises, such as financial services enterprises, typically invest in elaborate change management and change approval processes that often add several additional days to the LTC.
We will now discuss how testing processes must be adapted to meet both the demands of LTC and CFR. We discuss 3 key principles (this section) and 8 practices (following section). These enable the testing process to be truly “continuous” to help meet the goals of DevOps.
Implementing these principles and practices will help us achieve both Continuous Flow and Continuous Quality in tandem. Continuous Flow is a lean principle that’s more than just about velocity, it focuses on impedance reduction and minimization of waste. Continuous Quality is about ensuring and validating quality at each step of the DevOps lifecycle.
The three principles are as follows (see Figure below):
The CI process must focus on building quality into the software to ensure low CFR. This means several things:
If the CI process is the place for development of test assets and other creative work, the focus in the CD process must be on minimization of elapsed time to minimize LTC. This implies most of testing work in the CD process must be automated, highly optimized (e.g. using change impact analysis) and integrated with the CI/CD framework to minimize toil.
The best way to ameliorate CFR is to predictively determine what it is likely to be, and address it pro-actively before deployment to production. Processes have to be put in place to calculate failure risk at every stage of the life cycle, understand the causal factors and redress the potential causes.
In this section we discuss 8 key practices (see figure below) that are required for continuous testing. Let’s discuss each in turn:
A key principle of Continuous Flow is reduction of waste and idle time. To achieve this in a Continuous Testing process, we must ensure that all the right test assets are built out before the start of the process (so there is minimal wait time). It also means that such assets are incrementally enhanced in each step of the life cycle (vs developed in one go) and supports the subsequent steps -- hence progressive build out.
The entire testing process in the DevOps lifecycle is in fact intrinsically progressive. In keeping with the testing pyramid (discussed in practice #5 below), the test scenarios (and accompanying test assets) need to be progressively “real-world” as we go from left to right in the CI/CD pipeline. While the tests in the left (CI process) are more focused on unit and component tests, the tests to the right (CD/Ops process) are increasingly focused on system integration and real usage scenarios.
We illustrate this concept in my prior blogs on Continuous Test Data Management and Continuous Service Virtualization. For example, in Continuous Service Virtualization (see Figure below), virtual needs are progressively realistic as we go from left to right. We start with simple, lightweight, synthetic virtual services for tests during the CI process where they are used primarily by developers and SDETs. We keep enhancing them incrementally (with additional transactions and test data) as we go toward the right in the CD process where they are used primarily by testers.
Note that such progressive build out not only reduces idle time (LTC), and improves quality (by focusing on fit-for-purpose for the appropriate stage of the life cycle), but also improves collaboration between different personas (such as Developers, SDETs, Testers, Deployment Engineers and SREs) that is so vital for enabling continuous flow in DevOps.
Shift-left is a well-understood concept in testing, and won’t be detailed here. I will simply point out a few techniques that help to make shift-left more effective. These are:
Shift-right entails doing more testing in the immediate pre-release and post-release phases (i.e. testing in production) of the application lifecycle. These include practices such as: release validation, destructive/chaos testing, A/B and canary testing, customer experience (CX)-based requirements validation and testing (e.g. correlating user behavior with test requirements), crowd testing, production monitoring, extraction of test insights from production data, etc.
Shift-right plays a key role in closing the loop between testing (the activity) and quality (the outcome). Testers generally have sparse access to production (or operations) data, and do not typically leverage this data for driving quality improvements – other than from defects detected by users. Generating insights from the voluminous data in production is one of the key drivers for improving quality.
We have already discussed the importance of requirements validation (in the previous section) – this can be extended based on CX data such as typical user journeys, user likes/dislikes and other forms of customer sentiment. Also, using shift-right approaches, more realistic tests (and test data) can be generated based on real user interactions in production.
Techniques such as canary and A/B testing provides early user feedback (and remediation) on changes and new feature updates before they are rolled out to a broader user base. Crowd testing also helps to provide real user feedback, especially from perspectives of usability and value to users.
Understanding failure modes from production also helps us improve our reliability testing. Using analytics and machine learning techniques, it is also possible to correlate failures in prediction to pre-production data to enable better prediction of CFR – that helps to improve both quality and velocity. We discuss more on this in practice 7 below.
For more details on various techniques in shift-right testing, please refer to my prior blog here.
One of the key ways to reduce LTC is to automate as much of the repetitive work as possible during the CD process. From a testing perspective this includes:
It is not sufficient to automate these using test management tools. This also requires that test processes (or test management tools) are fully integrated with CI/CD pipeline automation engines. Test orchestration engines (such as Continuous Delivery Director) which are integrated with CI/CD platforms may also be used to support this.
The testing pyramid provides an empirical approach to focusing test effort by type of testing. This also aligns well with how tests should be optimized for different stages of the CI/CD lifecycle as shown in the Figure below. Running more extensive tests during the CI process (mapped to the first two layers of the pyramid) helps to detect (and eliminate defects) early, thereby improving quality. Reducing the number of tests as we progressively go from left to right of the CD process (mapped to the upper three layers of the pyramid) helps to reduce the LTC.
Every test process and tool must be configured to support the testing pyramid. We have already discussed (in Practice 1: Progressive build out) how Continuous Service Virtualization and Continuous Test Data Management support the testing pyramid by providing progressively realistic virtual services and test data aligned to the different stages of the CI/CD lifecycle. The Figure below shows how SV and TDM support the different layers of the test pyramid.
One of the best ways to optimize tests in alignment with the test pyramid is by using model-based testing (MBT). Models can be developed for requirements at different levels of granularity (from individual stories to end-to-end user scenarios) from which appropriate types of tests may be generated.
MBT (using tools like Agile Requirements Designer) also allows us to systematically optimize tests for the different layers of the pyramid. We apply lower optimization (meaning higher number of tests) for the lower parts of the pyramid, and higher optimization (meaning less tests) for the upper parts of the pyramid to meet appropriate test coverage goals.
When combined with change impact analysis (see the next practice #5, next), this provides the most effective way to optimize tests based on the pyramid, thereby resulting in minimizing impact to LTC. MBT also allows us to automate the generation of test assets (including test scripts, test data, and virtual services) to ensure testing stays responsive to requirements and application changes and does not lag behind – yet another way to reduce impact on LTC. See Figure below. For more details on this approach, please see my prior blog on Progressive Modeling.
Change impact driven testing helps to focus attention on tests that have been impacted by changes to the application or requirements – thereby helping to both reduce tests (and improve LTC) and improve quality. Change impact testing also helps to reduce updates to dependent test assets (such as test data and virtual services), thereby helping to reduce testing effort.
Change impact testing (along with test pyramid driven optimization discussed above) is very important for tests in the CD process (to reduce LTC), but is also very useful during the CI process – to help provide rapid feedback to developers/SDETs.
There are various forms of change impact driven testing. We will discuss three key forms here:
Analytics and insights provide the basis for automated data driven decision making that is critical for driving improvements in both LTC and CFR. The CI/CD pipeline is very data rich and provides opportunities for a variety of analytics.
A key analytics we will discuss here is related to tracking and predicting the risk of CFR. Since CFR is a production metric, it is a trailing indicator of quality, hence it is important to predict it for every change before it is deployed to production. The approach we recommend is to use machine learning (ML) to correlate past production change failure events with pre-production data (from the CI/CD pipeline) associated with those changes, and define appropriate ML models. For more details on this, please review my prior blog on Continuous Observability, where we describe a system of intelligence (SOI) for risk prediction, see figure below.
Data from the current change set are then analyzed using the ML models to predict the risk of failure due to the change as well to provide reasons triggering the risk. This provides the ability to address these risks proactively so as to reduce the CFR.
For this process to be effective, the risk prediction needs to be done continuously along the CI/CD lifecycle as the change progresses along the pipeline. For example, risks are first determined during pull requests/code commits, then after build, and after each stage of testing in the CD process (see Figure below).
For enterprises that are not quite ready to use machine learning, it is possible to assess risk using heuristic rules (such as those used in Broadcom Continuous Delivery Director) based on pre-production data in the CI/CD pipeline, see example in figure below. Since this approach does not correlate pre-production data with production failure data, it is less effective than the ML approach.
In order to help reduce LTC, it is also possible to define analytics for schedule delay prediction. This approach is popular in value stream management where pipeline data from value stream mapping is used to understand bottlenecks. The data is then used for driving continuous improvement for predicting and removing such bottlenecks. For a more detailed discussion on this, please see my prior blog on this subject on using AI/ML for VSM.
Site Reliability Engineering (or SRE) is an emerging approach to help improve reliability of systems and provide a prescriptive approach to implementing DevOps.
Site Reliability Engineers (or SREs) use techniques such as Service Level Objectives (SLOs) and Error Budgets (EBs) to quantify the risk tolerance for systems and services, as well as to balance the needs of velocity and system stability and reliability.
As EB consumption increases, SREs throttle delivery velocity in favor of hardening and vice-versa.
Similarly, testers play a key role in balancing the needs of velocity with overall system quality. As CFR risks increase, testers increase testing (and remediation) to throttle velocity (and vice-versa).
With the increased adoption of SRE in enterprises, testing efforts need to be synergized with SRE to jointly assure system reliability – as well as balance CFR with LTC.
In an integrated approach we combine the use of SLOs and EBs (from SRE perspective) with release risk insights (discussed above) and other testing metrics to provide a joint approach to balancing velocity with quality. This approach was discussed in detail in my prior blog (“SREs and SDETs: Leveraging Synergies to Boost Velocity and Quality), and is summarized in the figure below.
In the previous sections we have illustrated the key principles and practices with examples using Broadcom Continuous Testing solutions. In this section, let’s see how all of these principles and practices come together in the context of Performance Testing.
In many enterprises, the classic approach to end-to-end load testing continues to be a bottleneck in the CI/CD process, see figure below. Such testing is often laborious and time consuming and holds us the DevOps pipeline (increases LTC). In addition, it generally finds performance problems late in the cycle, which increases quality risk (increases CFR).
This challenge can be readily addressed using the approaches we have discussed in this blog by practicing Continuous Performance Testing that leverages these practices. This is especially applicable for modern component or microservices based applications. This ensures that we remove bottlenecks in the CD process (thereby reducing LTC) as well as significantly improve quality and reliability (reducing CFR).
The key changes we make relative to our 8 key practices are as follows (see figure below):
While the above describes the key changes we make to the performance testing process using the key practices, for a detailed step-by-step description of the above process, please refer to my prior blog Continuous Performance Testing.
This blog has described the key principles and practices Continuous Testing needs to implement to better support DevOps goals. In an age of DevOps and Continuous Delivery, testing is often viewed as a bottleneck and a necessary (and evil) cost that slows down the pipeline and erodes profitability. It is key to tie testing efforts to better align with the DevOps outcomes. For this to happen, every testing process and tool must operate in a “continuous” manner.
This blog provides the necessary practical guidance to optimize our testing efforts so that we help to achieve (and not hinder) enterprise DevOps goals. In the end, this helps to make testing an integral value added component of DevOps that provides the means to balance velocity with quality.