Demands for Increasing Data Masking Scale
For teams responsible for data masking, demands continue to grow. It seems each month the size of tables that need masking gets significantly larger. Now, it’s not uncommon for teams to receive requests to mask tables with hundreds of millions and even billions of rows. However, while table sizes keep growing, time frames don’t. No matter how large the table, teams still typically need to turn around requests within eight to 12 hours.
How do teams scale their masking to accommodate these expanding demands and tight turnaround times? The good news is that Broadcom Test Data Manager is helping customers meet these demands every day. In this post, I’ll offer an introduction to a new feature in Broadcom Test Data Manager called Scalable Masking and outline how you can use the feature most effectively.
Working with Scalable Masking
Starting with release 4.9, Test Data Manager offered Scalable Masking capabilities. Customers that upgrade to the latest release, 4.10, will be able to leverage these features and more. Users can initiate masking jobs through the TDM portal and centrally run masking jobs across a range of tables and data models.
Container Approach Yields Significant Performance Advantages
In these new releases, Test Data Manager was offered as a Docker container view. This allows teams to run multiple masking engines as a Docker container. As a result of this container-based approach, when teams have large tables, they can split jobs across multiple masking engines, which provides significant performance and scalability benefits.
While a lot of variables will affect performance, Scalable Masking can mask between four and 15 million cells per minute. (See the table below for examples of differing data source sizes and configurations.)
When requests are made in the portal, they are submitted as RESTful requests to the message bus. Based on the number of engines available and the processing status of the engines, the message bus sends those requests to the appropriate engine.
Masking engines connect to the target table, the table is sent to the engine, and the engine conducts masking. The message bus reports on progress back to the portal. This reporting enables administrators to track status and it provides a documented record that can be retained for auditing purposes.
It is important to note that the Docker masking engine communicates directly with the database instance, and they both reside on the same subnet, which can provide significant benefits in performance and throughput.
Example: How Job Splitting Works
To illustrate how masking jobs can be split, here’s a hypothetical an example:
- Environment. An organization has the TDM portal, docker container, and message bus running and they have four containers set up as part of their implementation. By default, each container will have four engines.
- Scope. The team wants to run a masking job with 10 tables, with each table having an average of 5M rows. They have five columns in each table that have to be masked.
- Split. The TDM portal submits the request to the message bus, which splits the job.
- Each table gets its own Scalable Masking engine, so all 10 tables can be masked in parallel.
- Two containers, which each have four masking engines, will execute eight of the jobs.
- A third container will handle the last two jobs, while the other two engines, and the fourth container, remain idle.
Tips and Best Practices
To get the most out of the power of Scalable Masking, following are some key strategies:
- Calculate and plan based on table sizes and masking scope. Upfront, it is important to establish a count of records to be masked, which is calculated by multiplying the number of columns being masked and the number rows.
- Allocate adequate space and resources. Masking jobs may fail due to server issues or lack of required memory, so it is important to allocate the resources required. Teams need to allocate enough memory in order to create the tablespace necessary and they need to have enough processors available to complete the job in a timely manner. As a rule, the more processors, the shorter the masking window will be.
- Validate database instance configuration and performance. Work closely with the DBA to make sure recommended configurations are applied and to ensure masking is working correctly.
- Manage heap size. This setting determines how much RAM is allocated to each instance. If the heap size is too small, teams may see their masking job fail. In general, in both Oracle and SQL Server, about 3 GB of heap size is sufficient to run most jobs properly.
Recommended Settings
Following are suggested settings for Scalable Masking:
- BATCHSIZE=37500
- BLANKSASNULLS=Y
- COMMIT=37500
- EMPTYASNULL=Y
- FETCHSIZE=75000
- GETTABLEROWCOUNTS=N
- ORDERBY=N
- PARALLEL=<Based on the number of CPU cores available>
- LARGETABLESPLITENABLED=Y
- LARGETABLESPLITSIZE=<Your calculation based on the largest table row count>
Conclusion
For today’s development teams, the ability to scale data masking continues to get more critical. By employing Scalable Masking and properly configuring their environments, teams will be able to dramatically scale their masking capacity. To learn more, be sure to read Masking Performance Optimization in CA TDM Portal.
Abhijit Mugali
Abhijit Mugali has extensive experience in both technical product ownership and strategic product management. He interacts with clients across geographies for requirement gathering, beta participation, and product launch. He also has expertise interacting with the global sales and pre-sales teams to effectively...
Other posts you might be interested in
Explore the Catalog
Blog
August 16, 2024
How Broadcom Test Data Manager Helps Enterprises Meet Fast-Changing Global Data Security and Privacy Requirements
Read More
Blog
May 31, 2024
Revolutionizing Healthcare Demands a New Approach to Software Development
Read More
Blog
December 20, 2023
Broadcom Software Academy Wins Silver in Brandon Hall Group’s Excellence in Technology Awards
Read More
Blog
October 20, 2023
The Broadcom Approach to Test Data Management for the Mainframe
Read More
Blog
August 1, 2023
How ARD Helps DevOps Realize Its Full Potential
Read More
Blog
April 20, 2023
Revitalize Your Testing With Continuous Everything Practices to Meet DevOps Goals
Read More
Blog
March 16, 2023
Better Together: Optimize Test Data Management for Distributed Applications with Service Virtualization
Read More
Blog
December 2, 2022
Ambiguous User Story? Perhaps It's Time to Model
Read More
Blog
November 10, 2022