July 18, 2024

The Unreasonable Effectiveness of Simplicity in IT Operations Strategy

Key Takeaways

Embracing simplicity aligns stakeholders and enhances IT operations.
Utilizing a three-level framework to assess and prioritize digital services will help you establish, communicate, and commit to a simple mission.
Revenue- and productivity-based metrics enable you to assess and categorize services.

A constant challenge in business is aligning stakeholders, customers, and employees behind a single mission. Carefully crafted plans often fail spectacularly as a result of complexity. Some of these efforts fail slowly; some even before execution begins.

Especially when collaboration is important, simplicity can make challenging work innately more understandable, measurable, and engaging. When work cuts across multiple functional groups who may have different priorities, measures of success, or perceptions of risk, a simple framework can help everyone understand how they contribute to the larger mission and why the mission matters.

Simplicity, IT operations and digital business services

Simplicity makes it easier to communicate, enlist collaborators, measure progress, and implement plans. As a result, teams can clearly understand how to fulfill a larger mission.

This principle applies to IT operations. Delivering reliable business services is both incredibly complex and massively important to modern businesses, but the overarching strategy for doing so doesn't have to be.

In this article, I outline a simple framework teams can use to assess and prioritize digital services—a critical first step for establishing, communicating, and committing to a simple mission. Related to this, read this blog by Adeesh Fulay, Head of Engineering for DX Operational Intelligence: “Business Services Become a Viable Organizing Principle.”

This “three level” framework is not unique—you will find this and similar approaches under various names that IT operations can use to assess and prioritize work, allocate resources, and so on. Here, it’s helpful to see how it can be applied to digital services.

Three levels of business services

This framework consists of the following three steps:

Identify the business services your IT organization provides
Gather two objective business metrics for each service: “impact” and “quantified impact”
Classify each service as “business critical,” “productive,” or “best effort”

To illustrate, let’s consider digital business services you might find at a telecommunications company:

Online Store: Shopping Cart Service
Call Center: Incoming Call Queue Service
Customer Relationship Management: New/Update Record Service
HR: New Employee Onboarding Service
Enterprise Reporting: Dashboarding Service
IT Support: Ticketing Service
Corporate Website: Analytics and Reporting Service

Most of the services in this example are self-explanatory and have direct corollaries to services in other industries. While each of these are intuitively important, it helps to evaluate them using the three-level framework.

Framework applied to our sample digital services

Business Service	Impact	Quantified Impact
Online Store: Shopping Cart Service
Call Center: Incoming Call Queue Service
Customer Relationship Management: New/Update Record Service
HR: New Employee Onboarding Service
Enterprise Reporting: Dashboarding Service
IT Support: Ticketing Service
Corporate Website: Analytics and Reporting Service

Quantify, quantify, quantify

Next, for each service consider the question, “What is the objective and quantifiable business impact of the service becoming unavailable.”

To help guide ourselves through the conversations with our stakeholders, we start with dividing these applications into three buckets:

Lost revenue. These are digital services in which an immediate and quantifiable loss of revenue occurs—and is demonstrated—when the service is down. You can further clarify this with a second criterion, such as the average revenue lost per hour. Business champions of revenue-bearing services should be able to associate these types of financial measures with specific services.
Lost productivity. When these digital services are unavailable, employees are unable to execute their duties or are likely to miss deadlines as a result. An additional criteria to consider is how many users the service supports or if there are reasonable work-arounds available to them.
No immediate impact. Every other service, by default, falls into this category because the impact cannot be conveyed in terms of productivity or revenue, or because the quantified impact is just too small relative to the other services.

At this point, it’s worth noting that not all organizations are driven by revenue or productivity. In these cases, you would substitute “Cannot fulfill the primary mission of the organization,” and “Degraded ability of staff to support the mission” respectively.

Putting it together

Once business impact is determined and quantified, we can classify our services using these simple rules:

Lost revenue → “critical”
Lost productivity → “productive”
No immediate impact → “best effort”

Here’s the completed table:

Business Service	Impact	Quantified Impact
Business Critical
Online Store: Shopping Cart Service	Lost revenue	$50,000 per hour
Call Center: Incoming Call Queue Service	Lost revenue	$30,000 per hour
Productive
Customer Relationship Management: New/Update Record Service	Lost productivity	500 Internal users blocked
HR: New Employee Onboarding Service	Lost productivity	100 Internal users blocked
Best Effort
Enterprise Reporting: Dashboarding Service	Best effort	20 Internal users inconvenienced
IT Support: Ticketing Service
Corporate Website: Analytics and Reporting Service	None	Unknown

This table forms the basis for communicating our strategy to the wider organization.

For digital services classified as “critical,” the service level objective (SLO) should be ambitious. Gapless, state-of-the art monitoring should be prioritized. Follow-the-sun L1 support should be available for users. Engineers should be on call 24/7 in case of failures. Continual improvement processes should be in place to ensure these services stay at a high level of reliability. There should be an uncompromising emphasis on the quality of the user experience.

For digital services classified as “productive,” the SLO should be reasonable. A good standard of availability and infrastructure monitoring should be implemented. Support should be available during business hours. A more reactive stance may be employed in case of failures, so long as the SLO is maintained.

For digital services classified as “best effort,” the SLO can be significantly more lenient. A basic standard of availability monitoring is sufficient. Ideally these services should be outsourced to a third party. If these services must be kept in-house, there should be an expectation set that resources will be prioritized to “critical” and “productive” applications, and users may need to occasionally “make do” in the case of exceptional failures and resource constraints. If failure rates increase to the point that they have an impact on revenue, productivity, or other key metrics, a more reliable alternative should be found for these services.

So that’s it! In a nutshell, this is how the three-level IT operations framework is applied to digital business services. With this simple framework, you can assess and prioritize your digital business services and clearly communicate a simple mission to multiple teams across IT. I hope this inspires you to think about the strengths and weaknesses of your current strategy for delivering IT services to your customers, employees, and partners.

Tag(s): AIOps , DX OI , DX APM

Duane Nielsen

During his 15 years in IT Operations consulting, Duane has worked with customers from around the world –from startups in Asia to the largest enterprises in Europe and the US. Duane combines his expertise in IT Ops tooling with his experience tackling complex challenges throughout his career to guide customers along...

Other Resources You might be interested In

Blog August 22, 2025

Handling Incomplete User Stories at the End of an Iteration

When a team reaches the end of an iteration, some user stories may not be completed. This post details causes and options for managing these scenarios.

Read Blog

Blog August 20, 2025

What’s Hiding in Your Wiring Closets?

See why you must move from periodic audits to a state of perpetual awareness. Track every change, validate it against policy, and understand its impact.

Read Blog

Blog August 15, 2025

All Network Monitoring Tools Are Created Equal, Right?

See how observability platforms provide a unified view across multi-vendor environments and correlate network configuration changes with performance issues.

Read Blog

Blog August 15, 2025

Scale Observability, Streamline Operations with AppNeta Monitoring Policies

This post reveals how, with AppNeta’s monitoring policies, you can leverage a powerful framework for scalable, flexible, and accurate network observability.

Read Blog

Course August 14, 2025

AppNeta: Current Network Violation Map Dashboard

Learn how to configure and use the Current Network Violation Map dashboard in AppNeta to identify geographic regions impacted by WAN performance issues.

Go to Training

Course August 14, 2025

AppNeta On-Prem: Minimize Unplanned Downtime

Learn how to configure the AppNeta On-Prem environment following best practices for high availability and disaster recovery to maintain service continuity and minimize unplanned downtime.

Go to Training

Office Hours August 12, 2025

Rally Office Hours: August 7, 2025

Get tips on how to use the Capacity Planning feature in Rally, then follow the weekly Q&A session with Rally product experts.

View Recording

Blog August 11, 2025

dSeries Version 25.0 Boosts Insights, Security, and Operational Efficiency

Discover how ESP dSeries Workload Automation 25.0 represents a significant leap forward, making workload automation more secure, visible, and efficient.

Read Blog

Blog August 7, 2025

What Your SD-WAN Isn't Telling You

SD-WAN's limited view blinds it to underlay issues. Augment SD-WAN with end-to-end visibility to validate decisions and diagnose root causes for network resilience.

Read Blog