<img height="1" width="1" style="display:none;" alt="" src="https://px.ads.linkedin.com/collect/?pid=1110556&amp;fmt=gif">
Skip to content
    July 18, 2024

    The Unreasonable Effectiveness of Simplicity in IT Operations Strategy

    Key Takeaways
    • Embracing simplicity aligns stakeholders and enhances IT operations.
    • Utilizing a three-level framework to assess and prioritize digital services will help you establish, communicate, and commit to a simple mission.
    • Revenue- and productivity-based metrics enable you to assess and categorize services.

    A constant challenge in business is aligning stakeholders, customers, and employees behind a single mission. Carefully crafted plans often fail spectacularly as a result of complexity. Some of these efforts fail slowly; some even before execution begins.

    Especially when collaboration is important, simplicity can make challenging work innately more understandable, measurable, and engaging. When work cuts across multiple functional groups who may have different priorities, measures of success, or perceptions of risk, a simple framework can help everyone understand how they contribute to the larger mission and why the mission matters.

    Simplicity, IT operations and digital business services

    Simplicity makes it easier to communicate, enlist collaborators, measure progress, and implement plans. As a result, teams can clearly understand how to fulfill a larger mission.

    This principle applies to IT operations. Delivering reliable business services is both incredibly complex and massively important to modern businesses, but the overarching strategy for doing so doesn't have to be.

    In this article, I outline a simple framework teams can use to assess and prioritize digital services—a critical first step for establishing, communicating, and committing to a simple mission. Related to this, read this blog by Adeesh Fulay, Head of Engineering for DX Operational Intelligence: “Business Services Become a Viable Organizing Principle.

    This “three level” framework is not unique—you will find this and similar approaches under various names that IT operations can use to assess and prioritize work, allocate resources, and so on. Here, it’s helpful to see how it can be applied to digital services.

    Three levels of business services

    This framework consists of the following three steps:

    • Identify the business services your IT organization provides
    • Gather two objective business metrics for each service: “impact” and “quantified impact”
    • Classify each service as “business critical,” “productive,” or “best effort”

    To illustrate, let’s consider digital business services you might find at a telecommunications company:  

    1. Online Store: Shopping Cart Service
    2. Call Center: Incoming Call Queue Service
    3. Customer Relationship Management: New/Update Record Service
    4. HR: New Employee Onboarding Service
    5. Enterprise Reporting: Dashboarding Service
    6. IT Support: Ticketing Service
    7. Corporate Website: Analytics and Reporting Service

    Most of the services in this example are self-explanatory and have direct corollaries to services in other industries. While each of these are intuitively important, it helps to evaluate them using the three-level framework.

    Framework applied to our sample digital services

    Business Service Impact Quantified Impact
    Online Store: Shopping Cart Service    
    Call Center: Incoming Call Queue Service    
    Customer Relationship Management: New/Update Record Service    
    HR: New Employee Onboarding Service    
    Enterprise Reporting: Dashboarding Service    
    IT Support: Ticketing Service    
    Corporate Website: Analytics and Reporting Service    

    Quantify, quantify, quantify

    Next, for each service consider the question, “What is the objective and quantifiable business impact of the service becoming unavailable.”

    To help guide ourselves through the conversations with our stakeholders, we start with dividing these applications into three buckets:

    • Lost revenue. These are digital services in which an immediate and quantifiable loss of revenue occurs—and is demonstrated—when the service is down. You can further clarify this with a second criterion, such as the average revenue lost per hour. Business champions of revenue-bearing services should be able to associate these types of financial measures with specific services.
    • Lost productivity. When these digital services are unavailable, employees are unable to execute their duties or are likely to miss deadlines as a result. An additional criteria to consider is how many users the service supports or if there are reasonable work-arounds available to them.
    • No immediate impact. Every other service, by default, falls into this category because the impact cannot be conveyed in terms of productivity or revenue, or because the quantified impact is just too small relative to the other services.

    At this point, it’s worth noting that not all organizations are driven by revenue or productivity. In these cases, you would substitute “Cannot fulfill the primary mission of the organization,” and “Degraded ability of staff to support the mission” respectively.

    Putting it together

    Once business impact is determined and quantified, we can classify our services using these simple rules:

    • Lost revenue → “critical”
    • Lost productivity → “productive”
    • No immediate impact → “best effort”

    Here’s the completed table:

    Business Service Impact Quantified Impact
    Business Critical
    Online Store: Shopping Cart Service Lost revenue $50,000 per hour
    Call Center: Incoming Call Queue Service Lost revenue $30,000 per hour
    Productive
    Customer Relationship Management: New/Update Record Service Lost productivity 500 Internal users blocked
    HR: New Employee Onboarding Service Lost productivity 100 Internal users blocked
    Best Effort
    Enterprise Reporting: Dashboarding Service Best effort 20 Internal users inconvenienced
    IT Support: Ticketing Service    
    Corporate Website: Analytics and Reporting Service None Unknown

    This table forms the basis for communicating our strategy to the wider organization.

    For digital services classified as “critical,” the service level objective (SLO) should be ambitious. Gapless, state-of-the art monitoring should be prioritized. Follow-the-sun L1 support should be available for users. Engineers should be on call 24/7 in case of failures. Continual improvement processes should be in place to ensure these services stay at a high level of reliability. There should be an uncompromising emphasis on the quality of the user experience.

    For digital services classified as “productive,” the SLO should be reasonable. A good standard of availability and infrastructure monitoring should be implemented. Support should be available during business hours. A more reactive stance may be employed in case of failures, so long as the SLO is maintained.

    For digital services classified as “best effort,” the SLO can be significantly more lenient. A basic standard of availability monitoring is sufficient. Ideally these services should be outsourced to a third party. If these services must be kept in-house, there should be an expectation set that resources will be prioritized to “critical” and “productive” applications, and users may need to occasionally “make do” in the case of exceptional failures and resource constraints. If failure rates increase to the point that they have an impact on revenue, productivity, or other key metrics, a more reliable alternative should be found for these services.

    So that’s it!  In a nutshell, this is how the three-level IT operations framework is applied to digital business services. With this simple framework, you can  assess and prioritize your digital business services and clearly communicate a simple mission to multiple teams across IT. I hope this inspires you to think about the strengths and weaknesses of your current strategy for delivering IT services to your customers, employees, and partners. 

    Tag(s): AIOps , DX OI , DX APM

    Duane Nielsen

    During his 15 years in IT Operations consulting, Duane has worked with customers from around the world –from startups in Asia to the largest enterprises in Europe and the US. Duane combines his expertise in IT Ops tooling with his experience tackling complex challenges throughout his career to guide customers along...

    Other posts you might be interested in

    Explore the Catalog
    icon
    Blog November 4, 2024

    Unlocking the Power of UIMAPI: Automating Probe Configuration

    Read More
    icon
    Blog October 4, 2024

    Capturing a Complete Topology for AIOps

    Read More
    icon
    Blog October 4, 2024

    Fantastic Universes and How to Use Them

    Read More
    icon
    Blog September 26, 2024

    DX App Synthetic Monitor (ASM): Introducing Synthetic Operator for Kubernetes

    Read More
    icon
    Blog September 16, 2024

    Streamline Your Maintenance Modes: Automate DX UIM with UIMAPI

    Read More
    icon
    Blog September 16, 2024

    Introducing The eBPF Agent: A New, No-Code Approach for Cloud-Native Observability

    Read More
    icon
    Blog September 6, 2024

    CrowdStrike: Are Regulations Failing to Ensure Continuity of Essential Services?

    Read More
    icon
    Blog August 28, 2024

    Monitoring the Monitor: Achieving High Availability in DX Unified Infrastructure Management

    Read More
    icon
    Blog August 27, 2024

    Topology for Incident Causation and Machine Learning within AIOps

    Read More