by: Shamim Ahmed
Continuous testing has emerged as a popular practice within DevOps, assisting teams in their quest to release high quality software on demand. While test data management (TDM) practices are extremely important to ensure that testing is effective, various surveys have indicated that TDM issues are one of the leading causes of delays in testing and application delivery. That is because traditional TDM has relied on ETL (extract-transform-load) type activities to extract and mask subsets of data from production data stores. Consequently, if teams are to keep pace with the demands of continuous testing, they will need to modernize TDM practices and processes.
In this blog, we will describe some best practices for establishing continuous TDM as part of continuous testing.
“Continuous” TDM is derived from the principle of “continuous everything” in DevOps. Continuous TDM is a key component of continuous testing within the DevOps framework. With this approach, testing and the management of test data happen across the different phases of the CI/CD lifecycle, as opposed to a single “testing phase.” See figure below.
The continuous nature of testing means that TDM activities need to be embedded in all parts of the lifecycle (such as design, development, and all testing and deployment activities) along the Agile CI/CD pipeline. See an example in the figure below.
Since these tests happen within short windows of time, it means that the TDM activities (such as test data creation, provisioning, deployment, and so on) also need to happen within these windows. This article provides an overview of continuous TDM in the context of DevOps, and reveals how it is different from traditional TDM.
Traditional TDM has typically relied on ETL (extract-transform-load) type activities to subset and mask data from production data stores (right side of figure below).
This approach relies on understanding the schema of the production data stores as well as the ability to gain direct access to data stores to extract a subset of the data. As the number of applications (and associated data stores) increases, this often becomes a time-consuming process, and contributes to significant delays in testing and release cycles.
This approach does not align with the needs of agile teams, who need fast access to updated data in frequent tests in the context of continuous testing described above. In order to address these needs, we need to establish new practices, including shifting TDM activities left, using synthetic test data, and integrating with model-based testing and service virtualization (the left side of the above figure). In addition, this requires new techniques, such as traffic sensing and data virtualization. These practices are described as follows.
As with most activities in the continuous testing lifecycle, continuous TDM also requires that most of the TDM work (such as test data specification, design, and generation) “shifts left,” so it happens during the CI part of the DevOps lifecycle. This minimizes the delay that TDM processes can cause during the CD part of the lifecycle, helping to speed deployment cycle time. (See figure below.) We will describe what TDM activities happen in each stage of the lifecycle later in this blog.
A key tenet of shifting TDM left is the need to synthesize as much of the test data as possible, as opposed to traditional approaches where we extract and mask a subset of data from production. Synthetic test data supports a more agile, and more developer and tester friendly approach to TDM, since it reduces dependency on production data and operations teams that control that data. It allows for fast, controlled generation of data suited for purpose, and is free from privacy and security concerns associated with personally identifiable information. This is especially important for shift-left tests, such as unit and component tests, in which we are building out new functionality (for which data may not exist in production), and developers and SDETs need access to small amounts of test data with more variety as quickly as possible.
For tests in the latter stages of the CD pipeline, such as end-to-end tests, it probably makes sense to have more realistic test data, often taking a subset from production. For other tests along the CI/CD pipeline, we recommend the use of hybrid test data with an emphasis on synthetic test data. See the next section on the test pyramid.
The test pyramid is one the key tenets of shifting testing left. This means that more and more TDM processes need to support tests in the lower half of the pyramid, for example unit, component, integration, and component tests. (See figure below.) This is a good thing, since test data is easier to create and provision in the lower tiers of the pyramid than it is in the higher tiers.
This also influences the type of TDM approach we use:
We recommend model-based testing (MBT) as a key enabler for continuous testing. It makes sense therefore to integrate TDM with MBT. This requires us to specify test data requirements or constraints as part of the model itself. The figure below offers an example of how we do this using Broadcom Agile Requirements Designer. Other MBT tools provide similar support. This example shows how we set the test data for the “UserName” of a simple login model.
The test data rules embedded in the model could either be static (hardcoded or tied to data in a spreadsheet), formulaic (as in the example above) for synthetic generation, or tied to a back-end TDM system. The test data is generated or refreshed automatically every time tests are generated from the model.
This approach extends all of the benefits of MBT to TDM, namely:
Service virtualization is a well established practice for agile development and testing. By virtualizing the dependencies (which the system under test depends upon), we also reduce the TDM burden for those dependent components.
In fact, the type of test data used (whether synthetic, hybrid, or production-like) correlates with the extent of service virtualization used. At the bottom of the test pyramid, we aggressively use both synthetic test data and virtual services. Towards the top of the pyramid, we use more realistic test data with real application components. We can use hybrid approaches for the middle tiers.
This correlation of progressive service virtualization with progressive TDM along the CI/CD lifecycle is shown below.
When using virtual services for test data, we need to make sure that we maintain consistency between data used to drive the tests, the application database, and the virtual service. The first two are typically taken care of by TDM tools. For the virtual service, we need to either record or synthesize the virtual service with the same test data set used for the test. See figure below.
In addition to the TDM approaches described above, where available, we also need to consider other complementary approaches to generating test data. Some of these approaches include the following:
In order to achieve continuous TDM, we need to ensure that test data provisioning and deployment are also automated as part of provisioning and deployment automation along the CI/CD lifecycle. As discussed in the previous section, this is especially important in the CD part of the lifecycle, where we need to minimize elapsed time to reduce cycle time. This can be achieved by integrating the deployment of test data with deployment automation tools. For applications that are deployed in containers, we may package test assets (including test data) in side-car deployment containers and deploy them alongside application containers.
The following figure summarizes the different activities in a typical continuous TDM process across the different stages of the CI/CD pipeline.
These activities are summarized below:
Test data management starts with well-defined acceptance criteria for the backlog items. This provides the dev/test team with seed test data that can be used for defining acceptance test cases. In keeping with our recommended approach for model-driven TDM, we recommend that teams capture this information as part of the model, which defines the behavior associated with the backlog item. In this way, test data can be generated from the model along with acceptance tests.
The TDM activities at this stage support the needs of development and subsequent CI/CD phases. This includes:
During development, developers and SDETs execute unit and component tests using the synthetic test data (and virtual services) that were created in the previous step. The availability of test data and virtual services is in fact a big enabler for supporting extensive unit testing.
This is an important stage in which testers and test data engineers design (or even generate/refresh) the test data for impacted test scenarios (based on the backlog items under development) that will be run in subsequent stages of the CI/CD lifecycle. The test data developed here will typically be hybrid (mix of synthetic data and a subset of data from production) based on the testing pyramid discussed above. In addition, the test data will need to be packaged (for example in containers or using virtual data copies) in order to ease and speed provisioning into the appropriate test environment (along with test scripts and other artifacts).
In this step, we typically run automated build verification tests and regression tests using the test data generated in the previous step.
The focus in these stages is to run tests (in the upper layers of the test pyramid) using hybrid test data created during Step 2(b). (See figure below.) The key in these stages is to minimize the elapsed time TDM activities require. For example, the time taken to create, provision, or deploy the required test data must not exceed the time to deploy the application in each stage.
Continuous TDM is meant to be practiced in conjunction with continuous testing. Various resources offer insights into evolving to continuous testing. If you are already practicing continuous testing and want to move to continuous TDM, our recommendation is to proceed as follows:
This blog has provided an overall approach for continuous TDM practices. As you can probably tell, microservices-based applications are extremely well suited to supporting continuous TDM. This is true because such applications are modular and componentized. I have previously blogged on new approaches to TDM for such applications. In a future blog, I will discuss approaches for continuous TDM for microservices applications.
Until such time, my friends, stay well, and may all your TDM efforts be continuous!