April 28, 2021
Understanding the Performance of Your AWS Services: How AIOps Can Help
Written by: Mike Mackrory
The most important part of providing software solutions for your customers is ensuring you’re meeting their needs. To do this, there are two key things that you must consider: First, your software must perform the tasks that your consumers expect, and second, your software needs to perform those tasks quickly, efficiently, and accurately. In this article, we will explain this second component, focusing specifically on performance.
In particular, we will examine services that are deployed into the Amazon Web Services (AWS) public cloud. We’ll explain what constitutes a high-performing service and why it’s essential to monitor performance. Then we will explore performance monitoring tools that are available in the AWS environment and compare them to third-party monitoring solutions.
This article aims to help you understand the options that are available to you for monitoring and managing your services’ performance. We also hope to help you get set up with the tools that will enable you to provide your consumers with the best possible user experience.
Factors that Affect Performance
When you devise a strategy for analyzing performance, you can divide what you monitor into two categories: underlying infrastructure and user experience. If the service is deployed on a virtual machine using Amazon Elastic Compute Cloud (EC2), you should be aware of the following metrics:
- Memory utilization: available vs. used memory
- CPU utilization
- Network utilization: incoming and outgoing data rates
- Disk utilization: input/output metrics and available vs. used storage
Each of these metrics will give you a window into whether or not you are using the appropriate infrastructure as well as how it is performing. You can also gain insights and implement changes if the instance is consistently operating at the upper or lower limits of its capacity. You might want to move services that operate at the upper limit to a type of instance that has more resources. In contrast, those that operate at the lower limit could be moved to a smaller instance type, which will reduce your operating costs.
If your application is containerized and deployed using Amazon Elastic Container Service (ECS) or Elastic Kubernetes Service (EKS), then you will have fewer infrastructure metrics to monitor. You still want to be aware of CPU and memory utilization within your nodes and clusters, and you want to adjust your configurations accordingly.
The second part of your monitoring strategy should be monitoring user experience. If your users cannot reach your service, or if a load balancer is throttling their requests, you’ll lose customers even if your infrastructure metrics look fantastic. To monitor consumer experience, you should be aware of the following:
- Request and response times
- Number of requests over time
- Error rates expanded by error type
Each of these metrics will help you understand the volume of calls that your service is handling, how long they are taking, and whether user requests are successful. It’s essential to establish a baseline measurement for each one. Once you have a baseline, you should observe how each metric changes over time and then respond to deviations from that baseline.
AWS-Provided Tools
As a comprehensive cloud service provider, AWS gives all of its customers access to Amazon CloudWatch. CloudWatch is a metrics repository for all AWS-hosted services. Its standard resolution is free and provides metrics at one-minute intervals. Users can also subscribe to its high-resolution offering, which provides metrics at one-second intervals. To reduce storage demands, CloudWatch aggregates metrics over time, which means that your performance data will become less specific the longer that it’s stored.
Fig.1 Example of CloudWatch Metrics for an EC2 Instance
CloudWatch also allows you to set limits and alarms on your metrics. For example, you can set an alarm to trigger when the memory usage exceeds 85% on an instance for more than three minutes. You can also configure the alarm to send a message, start a predefined process, or connect to another service through a webhook.
An unfortunate downside of CloudWatch is that, while it does provide access to a wide variety of metrics for each of its services, you need to know which metrics you’re looking for and how to combine them to get actionable insights. At its core, CloudWatch is just a collection and reporting service, so when it comes to detecting anomalies and monitoring intelligently, you’ll need something more.
Leveraging Third-Party APM Experts
Ideally, you’ll want your engineers to devote their time to adding new features and improving your software’s performance. One of the benefits of using standardized services and hosting them in the cloud is that you can leverage the expertise of those whose sole focus is on application performance management (APM). You can add an agent to your instances or a sidecar application to your container environment that will gather essential metrics and transmit them to an APM provider.
APM providers typically provide standard dashboards and monitoring as part of their product offerings. In most cases, you can enable these systems quickly and begin monitoring intelligently with just a few hours of work. Some of these providers have recently started offering artificial intelligence for IT operations, or AIOps, solutions, which is an exciting option that adds exceptional value to your performance monitoring strategy.
AIOps and Proactive Monitoring with Thresholds and Alerts
AIOps combines machine learning and data science with a performance monitoring solution. AIOps provides automated remediation capabilities, enables you to detect problems sooner, and ultimately improves your consumers’ experience. AIOps can help you improve performance as well as identify new ways to increase your efficiency and responsiveness.
If you would like to learn more about AIOps, how it works, and the potential benefits of using it, The Definitive Guide to AIOps is an excellent place to start. This white paper defines AIOps in more detail, explores the underlying principles and technologies, and explains how you can apply it to your organization. You can also download the AIOps from Broadcom solution brief, which provides specific details about this AIOps product offering.
Providing a high-performing user experience is essential for meeting your users’ needs. Fortunately, partnering with experts at organizations like Broadcom makes it easy to achieve this goal, ensuring that your software is reliable and that your customers can access it easily.
Tag(s):
AIOps
Mike Mackrory
Mike Mackrory is a Global citizen who has settled down in the Pacific Northwest — for now. By day he works as a Lead Engineer on a DevOps team, and by night, he writes and tinkers with other technology projects. When he's not tapping on the keys, he can be found hiking, fishing, and exploring both the urban and rural...
Other posts you might be interested in
Explore the Catalog
Blog
December 13, 2024
Full-Stack Observability with OpenTelemetry and DX Operational Observability
Read More
Blog
December 6, 2024
Power Up Your Alarms! Enriched UIM Alarms for Added Intelligence
Read More
Blog
November 26, 2024
Topology: Services for Business Observability
Read More
Blog
November 22, 2024
Regular Expressions That I Use Regularly
Read More
Blog
November 22, 2024
Cloud Application Performance: Common Reasons for Slow-Downs
Read More
Blog
November 4, 2024
Unlocking the Power of UIMAPI: Automating Probe Configuration
Read More
Blog
October 4, 2024
Capturing a Complete Topology for AIOps
Read More
Blog
October 4, 2024
Fantastic Universes and How to Use Them
Read More
Blog
September 26, 2024