Broadcom Software Academy Blog

OpenTracing via Jaeger

Written by Varun Saxena | Apr 13, 2023 8:19:32 PM

Within enterprises, it used to be that applications ran on a single server. Owners could directly monitor that discrete machine, conveniently access all the logs they needed, see all the metrics that mattered, and hit the reboot button, without needing to confer with “everyone.”

Those days are gone. 

Modern application architectures stretch the definitions of the words “federated” and “distributed.” We now have distributed applications. We may actually have super-multi-path-virtual-containerized-business-critical-shared-distributed apps. And we love them.

These new apps are powering the digital services we love, including social media, ride sharing, and streaming.

New digital services, supported by these modern architectures, are changing how IT organizations are structured, how we collaborate, and how power structures and accountability are shared across teams.

We do love these apps—at least when they perform well, there are no faults, and they aren’t generating difficult-to-weather alarm storms and hard-to-decipher logs.

It’s when things go “unexpectedly” that we love OpenTracing and the advances that we attribute to the adoption of Jaeger within app development and IT operations teams.

What is OpenTracing?

OpenTracing is a method used to profile and monitor applications, especially those built using a microservices architecture. It is sometimes also called distributed request tracing.

Most importantly, it is a vendor-neutral API that allows developers to easily add tracing to their applications, without having to worry about the specifics of the underlying tracing implementation. Distributed tracing helps pinpoint where failures occur and what causes poor performance.

Why do we need OpenTracing?

In modern distributed applications, it can be difficult to debug issues when things go wrong: A single request to the application is likely reliant upon multiple services. When that request is unfulfilled, determining which microservice is (mostly) responsible can feel like trying to solve a Rubik’s Cube puzzle.

To identify the root cause of a problem, we have two common tools: logging and metrics. From experience, we know that logs and metrics fail to give the complete picture of the condition of distributed or super-distributed systems.

Super-Tracing for Super-Distributed?

We need full, end-to-end observability, either to stay out of trouble, or get out of trouble more quickly. Logging and capturing metrics are not enough. The idea is to incorporate distributed tracing into the application so that we can get:

  • Distributed transaction monitoring
  • Root cause analysis
  • Performance and latency optimization
  • Service dependency analysis

This helps engineers identify root cause and ownership, and then direct issues to the right team, with the right contextual information on the first attempt. This approach helps answer the questions:

  • Which services are affected by this issue?
  • Which issues are affecting which service(s)?

Wikipedia defines observability as follows: “A measure of how well internal states of a system can be inferred from knowledge of its external outputs. It helps bring visibility into systems.”

If something goes wrong during the execution of the flow, debugging is a nightmare. Without observability, you never know which part of the system failed.

We do have logs and metrics for the services, but logs do not give a complete picture because they are scattered across a number of log files: It is difficult to conceptually link information from multiple logs together to understand the context of an issue. Metrics can tell you that response times for a service exceed a certain threshold, but they may not help you easily identify the root cause.

As a result, a lot of time is lost in defect triaging and determining ownership of issues. Plus, since services are owned by different teams, this can result in much higher mean-time-to-resolution (MTTR) metrics when issues affect services.

Solution Approach

Distributed tracing—via Jaeger—comes to the rescue. Jaeger is an open-source tracing system that was originally developed at Uber. It is designed to be highly scalable and flexible. Jaeger supports multiple storage backends, including Cassandra, Elasticsearch, and in-memory storage. It also supports multiple tracing protocols, including OpenTracing, Zipkin, and Jaeger’s own native format.

Distributed tracing has two key parts:

  1. Code instrumentation. This involves either using automatic instrumentation via language-specific implementation libraries or using manual instrumentation. Manual instrumentation requires you to add instrumentation code into your application source code to produce traces.
  2. Collection and analysis. This involves collecting data and providing meaning to it. Jaeger also provides visualization tools to easily understand request lifetime.

Distributed tracing helps tell stories of transactions that cross process or service boundaries. The image below, from the Jaeger interface, shows an example of a full transaction with each individual constituent span, the duration of its execution, whether each span succeeded or failed, and full end-to-end latency. By expanding each row, a practitioner can quickly understand the flow of control. 

Background and Tips for Getting Started

The following sections offer a few tips to help you start tracing with Jaeger.

Jaeger components and architecture

When deploying Jaeger tracing, you will need to address the following components:

  • Agent. The agent is co-located with your application and is used to gather the Jaeger trace data locally. It handles the connection and traffic control to the collector (see below) and it does data enrichment.
  • Collector. This is a centralized hub that collects traces from various agents in the environment and sends them to backend storage. The collector can run validations and enrichment on the spans.
  • Query. A query retrieves traces and serves them via a packaged UI. That said, third-party UIs can be used as well.

Prerequisites: Installation (on a Kubernetes Cluster)

Most distributed and microservice-based applications are deployed in a containerized environment, such as Kubernetes. Given that, it is not surprising that the recommended way of installing and managing Jaeger in a production Kubernetes cluster is via the Jaeger operator. Helm charts are also supported as an alternative deployment mechanism.

Validated traces come in through the pipeline from the collector. Jaeger stores these traces in its data store. Currently, Jaeger supports two primary persistent storage types:

  1. Elasticsearch
  2. Cassandra

Additional backends are discussed here.

For this example, we will use Elasticsearch.

Jaeger Collector and UI

Below is sample code for running a separate collector with Elasticsearch as a data store.

kubectl create namespace observability # <1>
kubectl create -f https://raw.githubusercontent.com/jaegertracing/jaegeroperator/master/deploy/crds/jaegertracing.io_jaegers_crd.yaml
#<2>
kubectl create -n dx -f https://raw.githubusercontent.com/jaegertracing/jaegeroperator/master/deploy/service_account.yaml
kubectl create -n dx -f https://raw.githubusercontent.com/jaegertracing/jaegeroperator/master/deploy/role.yaml
kubectl create -n dx -f https://raw.githubusercontent.com/jaegertracing/jaegeroperator/master/deploy/role_binding.yaml
Change WATCH_NAMESPACE to your name space in this case dx
kubectl create -n dx -f https://raw.githubusercontent.com/jaegertracing/jaegeroperator/master/deploy/operator.yaml
#I did this from local file
chmod 777
kubectl create -n dx -f operator.yaml
kubectl create -f https://raw.githubusercontent.com/jaegertracing/jaegeroperator/master/deploy/cluster_role.yaml
#Change namespace here too
kubectl create -f https://raw.githubusercontent.com/jaegertracing/jaegeroperator/master/deploy/cluster_role_binding.yaml
kubectl create -f cluster_role_binding.yaml
kubectl get deployment jaeger-operator -n dx
kubectl create namespace observability # <1>
kubectl create -f https://raw.githubusercontent.com/jaegertracing/jaegeroperator/master/deploy/crds/jaegertracing.io_jaegers_crd.yaml
#<2>
kubectl create -n dx -f https://raw.githubusercontent.com/jaegertracing/jaegeroperator/master/deploy/service_account.yaml

Below is the simplest.yaml file.

 ## INSTALLING JAEGER Query ,Collector & Agent ####
apiVersion: jaegertracing.io/v1
kind: Jaeger
metadata:
name: simple-prod
spec:
strategy: production
collector:
maxReplicas: 5
resources:
limits:
cpu: 100m
memory: 128Mi
storage:
type: elasticsearch
options:
es:
server-urls: http://es.XXX.nip.io
ui:
options:
dependencies:
menuEnabled: false
tracking:
gaID: UA-000000-2
menu:
- label: "About Jaeger"
items:
- label: "Documentation"
url: "https://www.jaegertracing.io/docs/latest"
linkPatterns:
- type: "logs"
key: "customer_id"
url: /search?limit=20&lookback=1h&service=frontend&tags=%7B%22customer_id%22%3A%22#{customer_id}%22%7D
text: "Search for other traces for customer_id=#{customer_id}"

Here’s how to apply simplest.yaml.

kubectl apply -n dx -f simplest.yaml

Jaeger PODS

Jaeger UI

The agent can be injected as a sidecar on the required microservice. (See sample below.)

Elastic Indexes

Following are examples of indexes created by Jaeger.

Once infrastructure monitoring is available, you can push traces to Jaeger.

Start Adding Traces To Your Code

Traces in your code make your application observable.

1. Import Jaeger Client Library

You’ll need to add the Jaeger client library to your application. This library provides a simple API for creating and propagating traces through your system. If you are using Maven, you can add them to your pom.xml file.

Simple example below:
The below client jar will be used to push traces to the Jaeger collector.

2. Inject Tracer in Application (Spring Boot)

Start pushing traces from the application.

3. Create New SPANS (Parent Contexts)

Define process boundaries.

Once your application is configured to send traces to Jaeger, you can start creating trace spans. These spans represent a specific unit of work within your application, such as an HTTP request or a database query. 

4. Create Child Spans and Propagating Parent Context

To create a trace span, you’ll first need to create a span context. This context contains the trace ID and span ID that are used to propagate the trace through your system. Next, you’ll create a span object, which represents the unit of work that you’re tracing. You’ll set the span’s parent context to the context you created earlier, and then start and finish the span as needed.

Summary

Jaeger enhances observability through OpenTracing. This helps you detect and discover issues throughout the app lifecycle, including in your deployment and test pipeline, in your operations, and during maintenance. Adding OpenTracing to your toolbox for distributed and super-distributed applications will save you time and provide you with the insights you need for more precise triaging and better team collaboration. 

References:  https://www.jaegertracing.io/docs/1.26/operator/