Within enterprises, it used to be that applications ran on a single server. Owners could directly monitor that discrete machine, conveniently access all the logs they needed, see all the metrics that mattered, and hit the reboot button, without needing to confer with “everyone.”
Those days are gone.
Modern application architectures stretch the definitions of the words “federated” and “distributed.” We now have distributed applications. We may actually have super-multi-path-virtual-containerized-business-critical-shared-distributed apps. And we love them.
These new apps are powering the digital services we love, including social media, ride sharing, and streaming.
New digital services, supported by these modern architectures, are changing how IT organizations are structured, how we collaborate, and how power structures and accountability are shared across teams.
We do love these apps—at least when they perform well, there are no faults, and they aren’t generating difficult-to-weather alarm storms and hard-to-decipher logs.
It’s when things go “unexpectedly” that we love OpenTracing and the advances that we attribute to the adoption of Jaeger within app development and IT operations teams.
OpenTracing is a method used to profile and monitor applications, especially those built using a microservices architecture. It is sometimes also called distributed request tracing.
Most importantly, it is a vendor-neutral API that allows developers to easily add tracing to their applications, without having to worry about the specifics of the underlying tracing implementation. Distributed tracing helps pinpoint where failures occur and what causes poor performance.
In modern distributed applications, it can be difficult to debug issues when things go wrong: A single request to the application is likely reliant upon multiple services. When that request is unfulfilled, determining which microservice is (mostly) responsible can feel like trying to solve a Rubik’s Cube puzzle.
To identify the root cause of a problem, we have two common tools: logging and metrics. From experience, we know that logs and metrics fail to give the complete picture of the condition of distributed or super-distributed systems.
We need full, end-to-end observability, either to stay out of trouble, or get out of trouble more quickly. Logging and capturing metrics are not enough. The idea is to incorporate distributed tracing into the application so that we can get:
This helps engineers identify root cause and ownership, and then direct issues to the right team, with the right contextual information on the first attempt. This approach helps answer the questions:
Wikipedia defines observability as follows: “A measure of how well internal states of a system can be inferred from knowledge of its external outputs. It helps bring visibility into systems.”
If something goes wrong during the execution of the flow, debugging is a nightmare. Without observability, you never know which part of the system failed.
We do have logs and metrics for the services, but logs do not give a complete picture because they are scattered across a number of log files: It is difficult to conceptually link information from multiple logs together to understand the context of an issue. Metrics can tell you that response times for a service exceed a certain threshold, but they may not help you easily identify the root cause.
As a result, a lot of time is lost in defect triaging and determining ownership of issues. Plus, since services are owned by different teams, this can result in much higher mean-time-to-resolution (MTTR) metrics when issues affect services.
Distributed tracing—via Jaeger—comes to the rescue. Jaeger is an open-source tracing system that was originally developed at Uber. It is designed to be highly scalable and flexible. Jaeger supports multiple storage backends, including Cassandra, Elasticsearch, and in-memory storage. It also supports multiple tracing protocols, including OpenTracing, Zipkin, and Jaeger’s own native format.
Distributed tracing has two key parts:
Distributed tracing helps tell stories of transactions that cross process or service boundaries. The image below, from the Jaeger interface, shows an example of a full transaction with each individual constituent span, the duration of its execution, whether each span succeeded or failed, and full end-to-end latency. By expanding each row, a practitioner can quickly understand the flow of control.
The following sections offer a few tips to help you start tracing with Jaeger.
When deploying Jaeger tracing, you will need to address the following components:
Most distributed and microservice-based applications are deployed in a containerized environment, such as Kubernetes. Given that, it is not surprising that the recommended way of installing and managing Jaeger in a production Kubernetes cluster is via the Jaeger operator. Helm charts are also supported as an alternative deployment mechanism.
Validated traces come in through the pipeline from the collector. Jaeger stores these traces in its data store. Currently, Jaeger supports two primary persistent storage types:
Additional backends are discussed here.
For this example, we will use Elasticsearch.
Below is sample code for running a separate collector with Elasticsearch as a data store.
kubectl create namespace observability # <1>
kubectl create -f https://raw.githubusercontent.com/jaegertracing/jaegeroperator/master/deploy/crds/jaegertracing.io_jaegers_crd.yaml
#<2>
kubectl create -n dx -f https://raw.githubusercontent.com/jaegertracing/jaegeroperator/master/deploy/service_account.yaml
kubectl create -n dx -f https://raw.githubusercontent.com/jaegertracing/jaegeroperator/master/deploy/role.yaml
kubectl create -n dx -f https://raw.githubusercontent.com/jaegertracing/jaegeroperator/master/deploy/role_binding.yaml
Change WATCH_NAMESPACE to your name space in this case dx
kubectl create -n dx -f https://raw.githubusercontent.com/jaegertracing/jaegeroperator/master/deploy/operator.yaml
#I did this from local file
chmod 777
kubectl create -n dx -f operator.yaml
kubectl create -f https://raw.githubusercontent.com/jaegertracing/jaegeroperator/master/deploy/cluster_role.yaml
#Change namespace here too
kubectl create -f https://raw.githubusercontent.com/jaegertracing/jaegeroperator/master/deploy/cluster_role_binding.yaml
kubectl create -f cluster_role_binding.yaml
kubectl get deployment jaeger-operator -n dx
kubectl create namespace observability # <1>
kubectl create -f https://raw.githubusercontent.com/jaegertracing/jaegeroperator/master/deploy/crds/jaegertracing.io_jaegers_crd.yaml
#<2>
kubectl create -n dx -f https://raw.githubusercontent.com/jaegertracing/jaegeroperator/master/deploy/service_account.yaml
Below is the simplest.yaml file.
## INSTALLING JAEGER Query ,Collector & Agent ####
apiVersion: jaegertracing.io/v1
kind: Jaeger
metadata:
name: simple-prod
spec:
strategy: production
collector:
maxReplicas: 5
resources:
limits:
cpu: 100m
memory: 128Mi
storage:
type: elasticsearch
options:
es:
server-urls: http://es.XXX.nip.io
ui:
options:
dependencies:
menuEnabled: false
tracking:
gaID: UA-000000-2
menu:
- label: "About Jaeger"
items:
- label: "Documentation"
url: "https://www.jaegertracing.io/docs/latest"
linkPatterns:
- type: "logs"
key: "customer_id"
url: /search?limit=20&lookback=1h&service=frontend&tags=%7B%22customer_id%22%3A%22#{customer_id}%22%7D
text: "Search for other traces for customer_id=#{customer_id}"
Here’s how to apply simplest.yaml.
kubectl apply -n dx -f simplest.yaml
The agent can be injected as a sidecar on the required microservice. (See sample below.)
Following are examples of indexes created by Jaeger.
Once infrastructure monitoring is available, you can push traces to Jaeger.
Traces in your code make your application observable.
You’ll need to add the Jaeger client library to your application. This library provides a simple API for creating and propagating traces through your system. If you are using Maven, you can add them to your pom.xml file.
Simple example below:
The below client jar will be used to push traces to the Jaeger collector.
Start pushing traces from the application.
Define process boundaries.
Once your application is configured to send traces to Jaeger, you can start creating trace spans. These spans represent a specific unit of work within your application, such as an HTTP request or a database query.
To create a trace span, you’ll first need to create a span context. This context contains the trace ID and span ID that are used to propagate the trace through your system. Next, you’ll create a span object, which represents the unit of work that you’re tracing. You’ll set the span’s parent context to the context you created earlier, and then start and finish the span as needed.
Jaeger enhances observability through OpenTracing. This helps you detect and discover issues throughout the app lifecycle, including in your deployment and test pipeline, in your operations, and during maintenance. Adding OpenTracing to your toolbox for distributed and super-distributed applications will save you time and provide you with the insights you need for more precise triaging and better team collaboration.
References: https://www.jaegertracing.io/docs/1.26/operator/