Broadcom Software Academy Blog

The Unreasonable Effectiveness of Simplicity in IT Operations Strategy

Written by Duane Nielsen | Jul 18, 2024 7:35:50 PM
Key Takeaways
  • Embracing simplicity aligns stakeholders and enhances IT operations.
  • Utilizing a three-level framework to assess and prioritize digital services will help you establish, communicate, and commit to a simple mission.
  • Revenue- and productivity-based metrics enable you to assess and categorize services.

A constant challenge in business is aligning stakeholders, customers, and employees behind a single mission. Carefully crafted plans often fail spectacularly as a result of complexity. Some of these efforts fail slowly; some even before execution begins.

Especially when collaboration is important, simplicity can make challenging work innately more understandable, measurable, and engaging. When work cuts across multiple functional groups who may have different priorities, measures of success, or perceptions of risk, a simple framework can help everyone understand how they contribute to the larger mission and why the mission matters.

Simplicity, IT operations and digital business services

Simplicity makes it easier to communicate, enlist collaborators, measure progress, and implement plans. As a result, teams can clearly understand how to fulfill a larger mission.

This principle applies to IT operations. Delivering reliable business services is both incredibly complex and massively important to modern businesses, but the overarching strategy for doing so doesn't have to be.

In this article, I outline a simple framework teams can use to assess and prioritize digital services—a critical first step for establishing, communicating, and committing to a simple mission. Related to this, read this blog by Adeesh Fulay, Head of Engineering for DX Operational Intelligence: “Business Services Become a Viable Organizing Principle.

This “three level” framework is not unique—you will find this and similar approaches under various names that IT operations can use to assess and prioritize work, allocate resources, and so on. Here, it’s helpful to see how it can be applied to digital services.

Three levels of business services

This framework consists of the following three steps:

  • Identify the business services your IT organization provides
  • Gather two objective business metrics for each service: “impact” and “quantified impact”
  • Classify each service as “business critical,” “productive,” or “best effort”

To illustrate, let’s consider digital business services you might find at a telecommunications company:  

  1. Online Store: Shopping Cart Service
  2. Call Center: Incoming Call Queue Service
  3. Customer Relationship Management: New/Update Record Service
  4. HR: New Employee Onboarding Service
  5. Enterprise Reporting: Dashboarding Service
  6. IT Support: Ticketing Service
  7. Corporate Website: Analytics and Reporting Service

Most of the services in this example are self-explanatory and have direct corollaries to services in other industries. While each of these are intuitively important, it helps to evaluate them using the three-level framework.

Framework applied to our sample digital services

Business Service Impact Quantified Impact
Online Store: Shopping Cart Service    
Call Center: Incoming Call Queue Service    
Customer Relationship Management: New/Update Record Service    
HR: New Employee Onboarding Service    
Enterprise Reporting: Dashboarding Service    
IT Support: Ticketing Service    
Corporate Website: Analytics and Reporting Service    

Quantify, quantify, quantify

Next, for each service consider the question, “What is the objective and quantifiable business impact of the service becoming unavailable.”

To help guide ourselves through the conversations with our stakeholders, we start with dividing these applications into three buckets:

  • Lost revenue. These are digital services in which an immediate and quantifiable loss of revenue occurs—and is demonstrated—when the service is down. You can further clarify this with a second criterion, such as the average revenue lost per hour. Business champions of revenue-bearing services should be able to associate these types of financial measures with specific services.
  • Lost productivity. When these digital services are unavailable, employees are unable to execute their duties or are likely to miss deadlines as a result. An additional criteria to consider is how many users the service supports or if there are reasonable work-arounds available to them.
  • No immediate impact. Every other service, by default, falls into this category because the impact cannot be conveyed in terms of productivity or revenue, or because the quantified impact is just too small relative to the other services.

At this point, it’s worth noting that not all organizations are driven by revenue or productivity. In these cases, you would substitute “Cannot fulfill the primary mission of the organization,” and “Degraded ability of staff to support the mission” respectively.

Putting it together

Once business impact is determined and quantified, we can classify our services using these simple rules:

  • Lost revenue → “critical”
  • Lost productivity → “productive”
  • No immediate impact → “best effort”

Here’s the completed table:

Business Service Impact Quantified Impact
Business Critical
Online Store: Shopping Cart Service Lost revenue $50,000 per hour
Call Center: Incoming Call Queue Service Lost revenue $30,000 per hour
Productive
Customer Relationship Management: New/Update Record Service Lost productivity 500 Internal users blocked
HR: New Employee Onboarding Service Lost productivity 100 Internal users blocked
Best Effort
Enterprise Reporting: Dashboarding Service Best effort 20 Internal users inconvenienced
IT Support: Ticketing Service    
Corporate Website: Analytics and Reporting Service None Unknown

This table forms the basis for communicating our strategy to the wider organization.

For digital services classified as “critical,” the service level objective (SLO) should be ambitious. Gapless, state-of-the art monitoring should be prioritized. Follow-the-sun L1 support should be available for users. Engineers should be on call 24/7 in case of failures. Continual improvement processes should be in place to ensure these services stay at a high level of reliability. There should be an uncompromising emphasis on the quality of the user experience.

For digital services classified as “productive,” the SLO should be reasonable. A good standard of availability and infrastructure monitoring should be implemented. Support should be available during business hours. A more reactive stance may be employed in case of failures, so long as the SLO is maintained.

For digital services classified as “best effort,” the SLO can be significantly more lenient. A basic standard of availability monitoring is sufficient. Ideally these services should be outsourced to a third party. If these services must be kept in-house, there should be an expectation set that resources will be prioritized to “critical” and “productive” applications, and users may need to occasionally “make do” in the case of exceptional failures and resource constraints. If failure rates increase to the point that they have an impact on revenue, productivity, or other key metrics, a more reliable alternative should be found for these services.

So that’s it!  In a nutshell, this is how the three-level IT operations framework is applied to digital business services. With this simple framework, you can  assess and prioritize your digital business services and clearly communicate a simple mission to multiple teams across IT. I hope this inspires you to think about the strengths and weaknesses of your current strategy for delivering IT services to your customers, employees, and partners.