June 5, 2023
Office 365 Monitoring: The Challenges, and What to Do About Them
Written by: Alec Pinkham
Office 365 is used by more than one million companies around the world. Business employees count on these apps constantly to do their jobs, whether they’re writing documents, updating spreadsheets, building slides, or checking email.
While cloud-based apps like Office 365 offer undeniable advantages for enterprises and business users, they also create tough challenges for IT operations and network operations (NetOps) teams.
Overall, Microsoft does a great job of delivering its cloud-based Office 365 services in a reliable fashion. For its Office 365 services, Microsoft employs globally distributed instances and sophisticated content delivery networks. This means users are typically able to access an instance running in relatively close proximity. If an issue with that instance arises, they’ll be able to access another one, generally with no discernible performance issues. Ultimately, it’s getting increasingly rare for cloud-based apps like these to go down.
However, for these and other cloud-based apps, app outages and performance issues can and do happen.
As businesses employ these cloud-based offerings, they grow highly reliant upon not only the cloud services, but all the network and infrastructure points that sit between the user and the cloud.
When users come to IT to complain about Office 365 being slow or not responding, there could be a number of potential causes. It could be the application, it could be an issue in the last-mile ISP’s network, it could be an issue with the end users’ system, and many other causes.
Further, even in the cloud provider’s environment, there are many infrastructure elements that sit between the user and the actual application server. An array of networking equipment, including load balancers, routers, firewalls, and more may be in play.
The problem: lack of visibility
The core challenge is visibility—more specifically the lack thereof.
The Office 365 user experience is highly reliant upon networks and apps that the IT and NetOps teams have no visibility into, and no control over. Therefore, when problems arise, it can be very tricky for teams to understand what the issue is, where it is occurring, and how to address it.
Requirements
To speed mean-time-to-repair metrics, it is essential to first reduce the scope of potential areas to troubleshoot. Teams need to be able to immediately determine whether the issue is arising in the user environment, whether they’re in the office or working from home, and whether their connections rely upon residential or corporate ISP networks.
This visibility is also essential in reducing mean-time-to-innocence, which means teams don’t waste cycles trying to troubleshoot issues in their environment when the issue arose in a third-party network. Teams must be able to gain visibility into the different domains outlined below.
Network delivery
Teams need to understand how user traffic gets to and from the app, even though they don’t own or control those networks that sit between the user and the cloud-based app. While SD-WAN technologies can provide edge-to-edge visibility, this isn’t sufficient. Teams need full, end-to-end visibility, from the user to the application server.
ISP network visibility
A big requirement is the ability to gain visibility into all the third-party ISP networks that user services are now reliant upon. (For more information, see our recent post on troubleshooting issues in ISP networks.) Teams need to take a trust-but-verify approach to third-party ISP networks.
Particularly within the context of business-critical applications, it is important to leverage capabilities like ISP detection so teams can gain an accurate understanding of the middle mile, or intermediary, networks that user traffic traverses.
Application, end-user experience visibility
While it’s important to find out when network issues arise, that’s not enough. What’s vital is that teams have a way to determine whether the issue affects users.
While cloud providers like Microsoft may provide status pages, those aren’t sufficient when users, networks, and apps are distributed around the world. It is important to be able to understand availability and performance of the Office 365 apps, regardless of where those services may be running. By establishing application-level visibility, teams gain an end-to-end picture. This enables teams to immediately identify not only when issues are occurring, but more importantly, when those issues are making an impact on the user experience.
The solution: employing continuous active monitoring
Teams need to monitor network performance from users’ perspectives, no matter where they are, which network they are using, or which cloud-based apps they access. This is why taking an active monitoring approach is so vital. Through active monitoring, teams can continually test network connections and emulate user behavior. With these capabilities, teams can gain end-to-end visibility across complex application delivery paths.
Toward that end, it’s vital to establish monitoring points on user machines, and test Office 365 performance using the same links that users rely upon. In this way, teams can begin to get hop-by-hop visibility and ultimately truly understand performance from the end-user perspective.
The performance of these complex, multi-provider network environments can change substantially from one minute to the next. Given how dynamic these environments are, teams need to do continuous monitoring. It is vital to look at where traffic is going, how it is performing over time, and to take measurements continuously.
By combining traditional monitoring approaches with these active monitoring insights, teams can gain a full, end-to-end picture of the network. They can assess performance across WANs, middle mile, backbone, physical or virtual environments, and more.
In this way, teams can get the data they need to objectively track and enforce the SLAs of their network providers.
Watch our presentation to learn more
To learn more, be sure to watch our Small Bytes presentation, entitled How do I troubleshoot Office 365 issues for my end users? Find out how to gain the visibility needed to more rapidly troubleshoot issues for your Office 365 users, while boosting operational efficiency. See how AppNeta can help improve your ability to identify issues, no matter where they arise.
Alec Pinkham
Alec is a Product Marketing Manager for the AppNeta solution at Broadcom. He spent seven years with AppNeta in the Application and Network Performance Monitoring space before joining Broadcom. Prior to AppNeta his background is in software product management in HMI/SCADA solutions for industrial automation as well as...