<img height="1" width="1" style="display:none;" alt="" src="https://px.ads.linkedin.com/collect/?pid=1110556&amp;fmt=gif">
Skip to content
    April 13, 2023

    OpenTracing via Jaeger

    How to Establish Observability of Modern, Distributed Apps

    Within enterprises, it used to be that applications ran on a single server. Owners could directly monitor that discrete machine, conveniently access all the logs they needed, see all the metrics that mattered, and hit the reboot button, without needing to confer with “everyone.”

    Those days are gone. 

    Modern application architectures stretch the definitions of the words “federated” and “distributed.” We now have distributed applications. We may actually have super-multi-path-virtual-containerized-business-critical-shared-distributed apps. And we love them.

    These new apps are powering the digital services we love, including social media, ride sharing, and streaming.

    New digital services, supported by these modern architectures, are changing how IT organizations are structured, how we collaborate, and how power structures and accountability are shared across teams.

    We do love these apps—at least when they perform well, there are no faults, and they aren’t generating difficult-to-weather alarm storms and hard-to-decipher logs.

    It’s when things go “unexpectedly” that we love OpenTracing and the advances that we attribute to the adoption of Jaeger within app development and IT operations teams.

    What is OpenTracing?

    OpenTracing is a method used to profile and monitor applications, especially those built using a microservices architecture. It is sometimes also called distributed request tracing.

    Most importantly, it is a vendor-neutral API that allows developers to easily add tracing to their applications, without having to worry about the specifics of the underlying tracing implementation. Distributed tracing helps pinpoint where failures occur and what causes poor performance.

    Why do we need OpenTracing?

    In modern distributed applications, it can be difficult to debug issues when things go wrong: A single request to the application is likely reliant upon multiple services. When that request is unfulfilled, determining which microservice is (mostly) responsible can feel like trying to solve a Rubik’s Cube puzzle.

    To identify the root cause of a problem, we have two common tools: logging and metrics. From experience, we know that logs and metrics fail to give the complete picture of the condition of distributed or super-distributed systems.

    Super-Tracing for Super-Distributed?

    We need full, end-to-end observability, either to stay out of trouble, or get out of trouble more quickly. Logging and capturing metrics are not enough. The idea is to incorporate distributed tracing into the application so that we can get:

    • Distributed transaction monitoring
    • Root cause analysis
    • Performance and latency optimization
    • Service dependency analysis

    This helps engineers identify root cause and ownership, and then direct issues to the right team, with the right contextual information on the first attempt. This approach helps answer the questions:

    • Which services are affected by this issue?
    • Which issues are affecting which service(s)?

    Wikipedia defines observability as follows: “A measure of how well internal states of a system can be inferred from knowledge of its external outputs. It helps bring visibility into systems.”

    If something goes wrong during the execution of the flow, debugging is a nightmare. Without observability, you never know which part of the system failed.

    We do have logs and metrics for the services, but logs do not give a complete picture because they are scattered across a number of log files: It is difficult to conceptually link information from multiple logs together to understand the context of an issue. Metrics can tell you that response times for a service exceed a certain threshold, but they may not help you easily identify the root cause.

    As a result, a lot of time is lost in defect triaging and determining ownership of issues. Plus, since services are owned by different teams, this can result in much higher mean-time-to-resolution (MTTR) metrics when issues affect services.

    Solution Approach

    Distributed tracing—via Jaeger—comes to the rescue. Jaeger is an open-source tracing system that was originally developed at Uber. It is designed to be highly scalable and flexible. Jaeger supports multiple storage backends, including Cassandra, Elasticsearch, and in-memory storage. It also supports multiple tracing protocols, including OpenTracing, Zipkin, and Jaeger’s own native format.

    Distributed tracing has two key parts:

    1. Code instrumentation. This involves either using automatic instrumentation via language-specific implementation libraries or using manual instrumentation. Manual instrumentation requires you to add instrumentation code into your application source code to produce traces.
    2. Collection and analysis. This involves collecting data and providing meaning to it. Jaeger also provides visualization tools to easily understand request lifetime.

    Distributed tracing helps tell stories of transactions that cross process or service boundaries. The image below, from the Jaeger interface, shows an example of a full transaction with each individual constituent span, the duration of its execution, whether each span succeeded or failed, and full end-to-end latency. By expanding each row, a practitioner can quickly understand the flow of control. 

    ESD_FY23_Academy-Blog.OpenTracing via Jaeger.Figure 1

    Background and Tips for Getting Started

    The following sections offer a few tips to help you start tracing with Jaeger.

    Jaeger components and architecture

    When deploying Jaeger tracing, you will need to address the following components:

    • Agent. The agent is co-located with your application and is used to gather the Jaeger trace data locally. It handles the connection and traffic control to the collector (see below) and it does data enrichment.
    • Collector. This is a centralized hub that collects traces from various agents in the environment and sends them to backend storage. The collector can run validations and enrichment on the spans.
    • Query. A query retrieves traces and serves them via a packaged UI. That said, third-party UIs can be used as well.

    ESD_FY23_Academy-Blog.OpenTracing via Jaeger.Figure 2

    Prerequisites: Installation (on a Kubernetes Cluster)

    Most distributed and microservice-based applications are deployed in a containerized environment, such as Kubernetes. Given that, it is not surprising that the recommended way of installing and managing Jaeger in a production Kubernetes cluster is via the Jaeger operator. Helm charts are also supported as an alternative deployment mechanism.

    Validated traces come in through the pipeline from the collector. Jaeger stores these traces in its data store. Currently, Jaeger supports two primary persistent storage types:

    1. Elasticsearch
    2. Cassandra

    Additional backends are discussed here.

    For this example, we will use Elasticsearch.

    Jaeger Collector and UI

    Below is sample code for running a separate collector with Elasticsearch as a data store.

    kubectl create namespace observability # <1>
    kubectl create -f https://raw.githubusercontent.com/jaegertracing/jaegeroperator/master/deploy/crds/jaegertracing.io_jaegers_crd.yaml
    kubectl create -n dx -f https://raw.githubusercontent.com/jaegertracing/jaegeroperator/master/deploy/service_account.yaml
    kubectl create -n dx -f https://raw.githubusercontent.com/jaegertracing/jaegeroperator/master/deploy/role.yaml
    kubectl create -n dx -f https://raw.githubusercontent.com/jaegertracing/jaegeroperator/master/deploy/role_binding.yaml
    Change WATCH_NAMESPACE to your name space in this case dx
    kubectl create -n dx -f https://raw.githubusercontent.com/jaegertracing/jaegeroperator/master/deploy/operator.yaml
    #I did this from local file
    chmod 777
    kubectl create -n dx -f operator.yaml
    kubectl create -f https://raw.githubusercontent.com/jaegertracing/jaegeroperator/master/deploy/cluster_role.yaml
    #Change namespace here too
    kubectl create -f https://raw.githubusercontent.com/jaegertracing/jaegeroperator/master/deploy/cluster_role_binding.yaml
    kubectl create -f cluster_role_binding.yaml
    kubectl get deployment jaeger-operator -n dx
    kubectl create namespace observability # <1>
    kubectl create -f https://raw.githubusercontent.com/jaegertracing/jaegeroperator/master/deploy/crds/jaegertracing.io_jaegers_crd.yaml
    kubectl create -n dx -f https://raw.githubusercontent.com/jaegertracing/jaegeroperator/master/deploy/service_account.yaml

    Below is the simplest.yaml file.

     ## INSTALLING JAEGER Query ,Collector & Agent ####
    apiVersion: jaegertracing.io/v1
    kind: Jaeger
    name: simple-prod
    strategy: production
    maxReplicas: 5
    cpu: 100m
    memory: 128Mi
    type: elasticsearch
    server-urls: http://es.XXX.nip.io
    menuEnabled: false
    gaID: UA-000000-2
    - label: "About Jaeger"
    - label: "Documentation"
    url: "https://www.jaegertracing.io/docs/latest"
    - type: "logs"
    key: "customer_id"
    url: /search?limit=20&lookback=1h&service=frontend&tags=%7B%22customer_id%22%3A%22#{customer_id}%22%7D
    text: "Search for other traces for customer_id=#{customer_id}"

    Here’s how to apply simplest.yaml.

    kubectl apply -n dx -f simplest.yaml

    Jaeger PODS

    ESD_FY23_Academy-Blog.OpenTracing via Jaeger.Figure 3

    Jaeger UI

    ESD_FY23_Academy-Blog.OpenTracing via Jaeger.Figure 4

    ESD_FY23_Academy-Blog.OpenTracing via Jaeger.Figure 5

    The agent can be injected as a sidecar on the required microservice. (See sample below.)

    Elastic Indexes

    Following are examples of indexes created by Jaeger.

    ESD_FY23_Academy-Blog.OpenTracing via Jaeger.Figure 6

    Once infrastructure monitoring is available, you can push traces to Jaeger.

    Start Adding Traces To Your Code

    Traces in your code make your application observable.

    1. Import Jaeger Client Library

    You’ll need to add the Jaeger client library to your application. This library provides a simple API for creating and propagating traces through your system. If you are using Maven, you can add them to your pom.xml file.

    Simple example below:
    The below client jar will be used to push traces to the Jaeger collector.

    2. Inject Tracer in Application (Spring Boot)

    Start pushing traces from the application.

    3. Create New SPANS (Parent Contexts)

    Define process boundaries.

    Once your application is configured to send traces to Jaeger, you can start creating trace spans. These spans represent a specific unit of work within your application, such as an HTTP request or a database query. 

    4. Create Child Spans and Propagating Parent Context

    To create a trace span, you’ll first need to create a span context. This context contains the trace ID and span ID that are used to propagate the trace through your system. Next, you’ll create a span object, which represents the unit of work that you’re tracing. You’ll set the span’s parent context to the context you created earlier, and then start and finish the span as needed.


    Jaeger enhances observability through OpenTracing. This helps you detect and discover issues throughout the app lifecycle, including in your deployment and test pipeline, in your operations, and during maintenance. Adding OpenTracing to your toolbox for distributed and super-distributed applications will save you time and provide you with the insights you need for more precise triaging and better team collaboration. 

    References:  https://www.jaegertracing.io/docs/1.26/operator/

    Tag(s): AIOps , DX OI , DX APM

    Varun Saxena

    Varun Saxena is a Staff Software Engineer at Broadcom Software and has more than 13 years of experience developing and designing enterprise software products. Prior to Broadcom he has worked for companies such as Oracle & Tech Mahindra.

    Other posts you might be interested in

    Explore the Catalog
    May 11, 2023

    IT Operations in 2023: AI/ML & Automation Will Continue to Be the North Star

    Read More
    May 1, 2023

    Enhanced Security, Improved Performance Reporting with DX UIM 20.4 Cumulative Update 7

    Read More
    April 19, 2023

    IT Operations in 2023: Business Services Become a Viable Organizing Principle

    Read More
    April 10, 2023

    Top Reasons to Embrace a Hybrid Multi-Cloud Strategy

    Read More
    March 27, 2023

    Digital Transformation Drives a New Cloud Era

    Read More
    March 21, 2023

    IT Operations in 2023: Mapping IT to Business Goals

    Read More
    February 16, 2023

    DX UIM 20.4 CU6: What’s New and Why You Should Upgrade Now

    Read More
    November 1, 2022

    DX UIM 20.4 CU5: What’s New and Why Upgrade

    Read More
    October 21, 2022

    Modern IT Infrastructure Management: Three Pillars for Success

    Read More