July 18, 2024
The Unreasonable Effectiveness of Simplicity in IT Operations Strategy
Written by: Duane Nielsen
Key Takeaways
|
|
A constant challenge in business is aligning stakeholders, customers, and employees behind a single mission. Carefully crafted plans often fail spectacularly as a result of complexity. Some of these efforts fail slowly; some even before execution begins.
Especially when collaboration is important, simplicity can make challenging work innately more understandable, measurable, and engaging. When work cuts across multiple functional groups who may have different priorities, measures of success, or perceptions of risk, a simple framework can help everyone understand how they contribute to the larger mission and why the mission matters.
Simplicity, IT operations and digital business services
Simplicity makes it easier to communicate, enlist collaborators, measure progress, and implement plans. As a result, teams can clearly understand how to fulfill a larger mission.
This principle applies to IT operations. Delivering reliable business services is both incredibly complex and massively important to modern businesses, but the overarching strategy for doing so doesn't have to be.
In this article, I outline a simple framework teams can use to assess and prioritize digital services—a critical first step for establishing, communicating, and committing to a simple mission. Related to this, read this blog by Adeesh Fulay, Head of Engineering for DX Operational Intelligence: “Business Services Become a Viable Organizing Principle.”
This “three level” framework is not unique—you will find this and similar approaches under various names that IT operations can use to assess and prioritize work, allocate resources, and so on. Here, it’s helpful to see how it can be applied to digital services.
Three levels of business services
This framework consists of the following three steps:
- Identify the business services your IT organization provides
- Gather two objective business metrics for each service: “impact” and “quantified impact”
- Classify each service as “business critical,” “productive,” or “best effort”
To illustrate, let’s consider digital business services you might find at a telecommunications company:
- Online Store: Shopping Cart Service
- Call Center: Incoming Call Queue Service
- Customer Relationship Management: New/Update Record Service
- HR: New Employee Onboarding Service
- Enterprise Reporting: Dashboarding Service
- IT Support: Ticketing Service
- Corporate Website: Analytics and Reporting Service
Most of the services in this example are self-explanatory and have direct corollaries to services in other industries. While each of these are intuitively important, it helps to evaluate them using the three-level framework.
Framework applied to our sample digital services
Business Service | Impact | Quantified Impact |
Online Store: Shopping Cart Service | ||
Call Center: Incoming Call Queue Service | ||
Customer Relationship Management: New/Update Record Service | ||
HR: New Employee Onboarding Service | ||
Enterprise Reporting: Dashboarding Service | ||
IT Support: Ticketing Service | ||
Corporate Website: Analytics and Reporting Service |
Quantify, quantify, quantify
Next, for each service consider the question, “What is the objective and quantifiable business impact of the service becoming unavailable.”
To help guide ourselves through the conversations with our stakeholders, we start with dividing these applications into three buckets:
- Lost revenue. These are digital services in which an immediate and quantifiable loss of revenue occurs—and is demonstrated—when the service is down. You can further clarify this with a second criterion, such as the average revenue lost per hour. Business champions of revenue-bearing services should be able to associate these types of financial measures with specific services.
- Lost productivity. When these digital services are unavailable, employees are unable to execute their duties or are likely to miss deadlines as a result. An additional criteria to consider is how many users the service supports or if there are reasonable work-arounds available to them.
- No immediate impact. Every other service, by default, falls into this category because the impact cannot be conveyed in terms of productivity or revenue, or because the quantified impact is just too small relative to the other services.
At this point, it’s worth noting that not all organizations are driven by revenue or productivity. In these cases, you would substitute “Cannot fulfill the primary mission of the organization,” and “Degraded ability of staff to support the mission” respectively.
Putting it together
Once business impact is determined and quantified, we can classify our services using these simple rules:
- Lost revenue → “critical”
- Lost productivity → “productive”
- No immediate impact → “best effort”
Here’s the completed table:
Business Service | Impact | Quantified Impact |
Business Critical | ||
Online Store: Shopping Cart Service | Lost revenue | $50,000 per hour |
Call Center: Incoming Call Queue Service | Lost revenue | $30,000 per hour |
Productive | ||
Customer Relationship Management: New/Update Record Service | Lost productivity | 500 Internal users blocked |
HR: New Employee Onboarding Service | Lost productivity | 100 Internal users blocked |
Best Effort | ||
Enterprise Reporting: Dashboarding Service | Best effort | 20 Internal users inconvenienced |
IT Support: Ticketing Service | ||
Corporate Website: Analytics and Reporting Service | None | Unknown |
This table forms the basis for communicating our strategy to the wider organization.
For digital services classified as “critical,” the service level objective (SLO) should be ambitious. Gapless, state-of-the art monitoring should be prioritized. Follow-the-sun L1 support should be available for users. Engineers should be on call 24/7 in case of failures. Continual improvement processes should be in place to ensure these services stay at a high level of reliability. There should be an uncompromising emphasis on the quality of the user experience.
For digital services classified as “productive,” the SLO should be reasonable. A good standard of availability and infrastructure monitoring should be implemented. Support should be available during business hours. A more reactive stance may be employed in case of failures, so long as the SLO is maintained.
For digital services classified as “best effort,” the SLO can be significantly more lenient. A basic standard of availability monitoring is sufficient. Ideally these services should be outsourced to a third party. If these services must be kept in-house, there should be an expectation set that resources will be prioritized to “critical” and “productive” applications, and users may need to occasionally “make do” in the case of exceptional failures and resource constraints. If failure rates increase to the point that they have an impact on revenue, productivity, or other key metrics, a more reliable alternative should be found for these services.
So that’s it! In a nutshell, this is how the three-level IT operations framework is applied to digital business services. With this simple framework, you can assess and prioritize your digital business services and clearly communicate a simple mission to multiple teams across IT. I hope this inspires you to think about the strengths and weaknesses of your current strategy for delivering IT services to your customers, employees, and partners.
Duane Nielsen
During his 15 years in IT Operations consulting, Duane has worked with customers from around the world –from startups in Asia to the largest enterprises in Europe and the US. Duane combines his expertise in IT Ops tooling with his experience tackling complex challenges throughout his career to guide customers along...
Other posts you might be interested in
Explore the Catalog
Blog
November 4, 2024
Unlocking the Power of UIMAPI: Automating Probe Configuration
Read More
Blog
October 4, 2024
Capturing a Complete Topology for AIOps
Read More
Blog
October 4, 2024
Fantastic Universes and How to Use Them
Read More
Blog
September 26, 2024
DX App Synthetic Monitor (ASM): Introducing Synthetic Operator for Kubernetes
Read More
Blog
September 16, 2024
Streamline Your Maintenance Modes: Automate DX UIM with UIMAPI
Read More
Blog
September 16, 2024
Introducing The eBPF Agent: A New, No-Code Approach for Cloud-Native Observability
Read More
Blog
September 6, 2024
CrowdStrike: Are Regulations Failing to Ensure Continuity of Essential Services?
Read More
Blog
August 28, 2024
Monitoring the Monitor: Achieving High Availability in DX Unified Infrastructure Management
Read More
Blog
August 27, 2024