Broadcom Software Academy Blog

Streamlining Azure Data Factory Workflows with Enterprise Workload Automation

Written by Richard Kao | Sep 20, 2023 4:57:08 PM
Key Takeaways
  • Employ Automation by Broadcom to efficiently manage complex, multi-phase automation deployments within Azure Data Factory (ADF).
  • Avoid having costly, brittle islands of automation by leveraging enterprise automation.
  • Manage complex pipelines that span a range of platforms and vendors, including cloud vendors and on-premises systems.

More than ever, it’s vital for organizations to operate with maximum agility, and to get better at leveraging data. Many organizations are turning to Azure Data Factory (ADF) because it offers significant advantages in both areas, however, challenges with basic scheduling require an enterprise automation solution like AutoSys Workload Automation or Automic Automation for workload orchestration.

ADF is a fully managed, serverless data integration service. Featuring more than 90 built-in connectors, the service helps organizations simplify and scale their data integration initiatives. The solution helps users run their extract, transform, and load (ETL) processes, without having to do any coding.

The challenges posed by basic ADF scheduling

As IT organizations continue to expand their use of ADF and other cloud services, the volume of automated workflows continues to grow. In the process, however, they encounter a number of challenges.

Dependencies create obstacles

ADF features a basic, time-based scheduler that operators can use to automatically run jobs at specified times. The problem is that this scheduler can’t intelligently accommodate dependencies—and ADF workflows typically have multiple upstream and downstream dependencies. Dozens of sources may feed data into ADF, and several downstream applications may rely on ADF outputs for ongoing operation.

To coordinate these various processes and associated dependencies, administrators relying on the ADF scheduler are stuck with having to rely on forced time delays, that is, scheduling subsequent tasks to start at a time after which prior tasks have been completed.

Data quality issues and high failure rates

Relying on these hard-wired schedules means that if one task takes longer than the forced time delay established, a subsequent task will kick off, typically with old, inaccurate, or incomplete data. This means the ultimate output of the job sequence will be suboptimal or even unusable.

These issues get magnified in many large-scale environments, where dozens of data sources may be used. If one source doesn’t come across in time, workflows may encounter cascading failures.

When running ADF, users are highly reliant upon a range of dispersed, distributed networks. At any given moment, automation jobs can fail, simply due to a call to an API returning an error message. When this type of downtime occurs, scheduled jobs will fail, creating issues for subsequent downstream jobs.

Inefficient workload schedules

To prevent these issues, operators can opt to add buffers. For example, imagine that the longest phase-one activities of a given workflow can take up to 10 hours to complete. A user could then add a buffer of two hours, and schedule all phase-two workloads to start a total of 12 hours after phase-one tasks were kicked off.

While this approach can help minimize failures, this means ADF instances will need to be kept idling for two hours, or more if jobs complete ahead of schedule. This can be very costly. In many environments, these idling resources may account for fees of thousands of dollars, and these costs are accrued frequently.

Labor-intensive follow-up and mitigation

If a failure is discovered while a workstream is underway, the administrator will have to disable the schedule, potentially in multiple products, and manually troubleshoot and address any issues that have arisen.

Why scripting isn’t the answer

To avoid some of the challenges outlined above, some automation groups have sought to develop shell scripts for creating automated workflows. However, these approaches require significant up-front investment, are very difficult to support and run over time, and are not scalable. Further, inefficiency and costs continue to mount as the scale of the environment grows.

How enterprise automation solves the problem

As long as automation has been around, the potential for costly, brittle islands of automation has also been around. While businesses continue to expand their use of ADF, automation managers don’t want to add a siloed automation tool that they have to maintain and support, in addition to the other platforms they have already invested time and money in, and have already established expertise in.

That’s why the use of enterprise automation continues to be so essential. Enterprise automation provides central management of automation workloads across a range of environments and platforms. These solutions enable organizations to adapt to the evolving requirements of cloud-driven workloads, including those running in ADF.

Automation by Broadcom offers robust scheduling that enables users to manage dependencies across pipelines, integrations, applications, and processes. These solutions deliver end-to-end visibility across cloud vendors and on-premises deployments.

Introducing Automation by Broadcom for cloud integrations

Automation by Broadcom offers a wide range of cloud integrations, which are featured in our Automation Marketplace. With the solutions’ broad platform and service coverage, customers can efficiently manage complex, multi-phase automation deployments within ADF—as well as complex pipelines that span platforms and services from a range of platforms and vendors, including cloud vendors and on-premises systems.

The ADF integration

With Automation by Broadcom for ADF, developers and data scientists can fully leverage the power of ADF in harnessing enterprise data. At the same time, automation teams can continue to employ Automation by Broadcom solutions as their central, unified platform for managing and orchestrating automation workloads across their application landscape.

Automation by Broadcom Solutions offer a rich set of capabilities that are invaluable for IT operations organizations. Users can model any process dependencies, they can establish centralized operational control, and they can gain 360-degree visibility of all services running in production.

Advantages of the ADF Integration

By implementing the ADF integration with Automation by Broadcom solutions (e.g. Automic, AutoSys, and dSeries), organizations can realize a number of benefits, particularly as their usage of ADF and other cloud solutions continues to expand. Here are a few of the potential upsides:

  • Better data. Automation by Broadcom enables groups to establish multi-phase, multi-job data pipelines that address required interdependencies. By doing so, they can better ensure that downstream jobs only kick off when all upstream processes have been completed. Ultimately, this means complete and accurate data sets are used consistently.
  • Greater efficiency. In the past, if operations staff members were solely reliant upon scheduling capabilities in ADF, they’d be saddled with manual, labor-intensive efforts associated with tracking job progress, and, when issues occur, investigating the issue, stopping related jobs, and so on. Automation by Broadcom enables automation operators to avoid all this manual effort. Further, the solutions’ ADF integrations enable organizations to avoid the high cost and effort of developing complex, custom shell scripts and maintaining them over time.
  • Lower costs. By establishing effective dependencies between multiple phases and workloads, operators can ensure jobs are started, run, and stopped with maximum speed and accuracy. This means the costs associated with leaving instances idle while waiting for forced time delays are reduced or eliminated completely.
  • More resilience. Automation by Broadcom enables teams to add safeguards to account for connectivity issues, which is vital for cloud services like ADF. For example, an API connection may go down temporarily. Instead of having this temporary issue lead to a permanent job failure, operators can establish configurable intervals to retry connections and continue processing when services are back online.

Conclusion

For today’s enterprises, extracting maximum value from data is an increasingly critical imperative. To achieve this objective, it is vital to establish seamless automated data pipelines and to have the ability to harness the power of cloud-based data integration services like ADF. With Automation by Broadcom, IT organizations can leverage a unified platform for managing all automation workloads running in ADF and all their other cloud-based services and on-premises platforms.

Visit the Automation Marketplace for details on integration features, links to extended TechDocs, and the option to download the software.