April 29, 2021

What are Zipkin and Jaeger, and How Does Istio Use Them?

Key Takeaways

Leverage Istio to gain a powerful way to establish observability of large systems with minimal development effort.
Implement tools like Zipkin and Jaeger for distributed tracing to visualize and analyze application performance issues effectively.
Factor in ease of deployability when choosing between Jaeger and Zipkin.

Service meshes like Istio have changed the world of observability. They provide the fastest path to generating the critical metrics and traces that enable software reliability teams to find bugs and bottlenecks in a system. Zipkin and Jaeger are implementations of distributed tracing, and Istio uses them to provide observability into requests throughout a system of microservices. Let’s explore what distributed tracing is, why you would want it, how Istio uses it, and the differences between Zipkin and Jaeger as backends for your traces.

Distributed Tracing

In a monolithic application, when an error occurs, you usually already have a trace to follow: a stack trace. Because the entire lifetime of a request or transaction is owned by the one application, you can get a full view of what happened in that transaction. After an error, that’s known as a stack trace. In addition, profiling libraries can give you the timing of how long a particular function or database call took.

But what about in a distributed system? Let’s say your request has to pass through three microservices, taking the following steps to complete:

You send your request to App A
App A calls App B, App B checks a Redis cache
App B calls App C, App C queries a database

If you get a 403 status code back, which one generated the error? If your request normally takes 100ms, but now is taking 3 seconds, which application or database slowed down? Distributed tracing systems provide data to answer these kinds of questions.

How Distributed Tracing Works

Distributed traces are composed of a trace, which is a parent object, and spans, which are children. In our three-service example:

App A generates a trace ID. Internally, it creates child spans of function calls, and finally makes a request to App B. In the headers of that request, it sends the trace ID.
App B takes the trace ID and makes its own spans of its internal calls, including around the call to Redis. It sends a request to App C, also sending the trace ID in the request headers.
App C makes its own spans, and encounters an error. To one of the spans, it adds a key value of “error = 403” and responds with the 403 error code.

In each application, there is a library for a tracing backend, like Zipkin or Jaeger. All of the trace and span information is sent to the backend, where a developer can analyze it.

Istio and Distributed Tracing

Istio, like other service meshes, provides convenience around distributed tracing, minimizing the work developers have to put in to get the benefits. Istio can be configured to recognize tracing headers, and automatically generate a span for each service in the mesh, giving you a view of your system at the service entry/exit level. This functionality offers a powerful way to get observability of a large system with minimal effort from developers.

Limitations

Since the Envoy sidecars that Istio deploys are unaware of an application’s business logic, the spans that Istio automatically creates are at the entry and exit of a request through that application. Instrumenting database calls or function calls still has to be done by the developer.

Developers have to add a little bit of code to forward the headers that tracing relies on. This functionality can be added in middleware without adding a full tracing library. But since tracing relies on request headers, trace context must pass between services, even with an Envoy sidecar. This context could be wrapped in a common library your organization adds as a dependency to all apps, for example. Istio provides a helpful list of headers that are required:

x-request-id
x-b3-traceid
x-b3-spanid
x-b3-parentspanid
x-b3-sampled
x-b3-flags
b3

Zipkin vs Jaeger

Once you have your applications instrumented, either with tracing libraries or simply by placing them in a mesh, you need a place to analyze them. Zipkin and Jaeger provide backends that collate all the traces and spans, and allow users to view them. Examples of the analysis view are available for Zipkin and Jaeger.

Choosing between these options used to be a harder decision. Zipkin and Jaeger are not just backends; they also define their own formats and protocols for trace data. However, as the distributed tracing ecosystem has matured, Jaeger has adopted compatibility with Zipkin’s protocol. Both are supported by the OpenTelemetry project. This support means teams can send traces of almost any common format to almost any major backend, and have any necessary translations done in the middle. Both projects are open source, and have similar architectures:

A collector or multiple collectors that receive traces and spans
A storage backend (both support Cassandra and ElasticSearch, Zipkin also supports MySQL)
A UI for querying traces

When a team is choosing which to run as a backend for their organization’s traces, the most important consideration is ease of deployability and maintainability, which will differ from team to team. Zipkin offers easy deployment via Docker Compose. This option might be more suitable for teams working directly on VM instances, as Jaeger has a Kubernetes Operator for convenient deployment to a Kubernetes cluster.

When designing your deployment, a key thing to keep in mind is that the query UI for both Jaeger and Zipkin must be kept inside a VPN-accessible network, as neither has any security on their frontends. They are designed for private networks where any developer can view the data. This might mean multiple deployments into cordoned-off networks, depending on your security posture or organizational structure.

Whichever you choose, your developers are going to be delighted to go...

from a world where they are ssh’ing to a VM and exec’ing into a container to curl to the next upstream service to see if the connection is alive,
to a world where they are looking at a trace chart and seeing where errors are occurring.

And with the rate that this ecosystem is maturing, you soon won’t have to choose between Jaeger and Zipkin, as the current tracing Tower of Babel is replaced by the OpenTelemetry collector.

Tag(s): AIOps

David Sudia

Dave Sudia is an educator, turned developer, turned DevOps engineer. He's passionate about supporting other developers in doing their best work by making sure they have the right tools and environments. In his day-to-day, he's responsible for managing Kubernetes clusters, deploying databases, writing utility apps, and...

Other resources you might be interested in

Office Hours October 23, 2025

Rally Office Hours: October 9, 2025

Discover Rally's new AI-powered Team Health Widget for flow metrics and drill-downs on feature charts. Plus, get updates on WIP limits and future enhancements.

View Recording

Course October 23, 2025

AAI - Navigating the Interface and Refining Data Views

This course introduces you to AAI’s interface and shows you how to navigate efficiently, work with tables, and refine large datasets using search and filter tools.

Go to Training

Office Hours October 23, 2025

Rally Office Hours: October 16, 2025

Rally's new AI-driven feature automates artifact breakdown - transforming features into stories or stories into tasks - saving time and ensuring consistency.

View Recording

Blog October 22, 2025

What’s New in Network Observability for Fall 2025

Discover how the Fall 2025 release of Network Observability by Broadcom introduces powerful new capabilities, elevating your insights and automation.

Read Blog

eBook October 22, 2025

Modernizing Monitoring in a Converged IT-OT Landscape

The energy sector is shifting, driven by rapid grid modernization and the convergence of IT and OT networks. Traditional monitoring tools fall short.

Read eBook

Blog October 22, 2025

Your network isn't infrastructure anymore. It's a product.

See why it’s time to stop managing infrastructure and start treating the network as your company's most critical product. Justify investments and prove ROI.

Read Blog

Blog October 22, 2025

The Network Engineers You Can't Hire? They Already Work for You

See how the proliferation of siloed monitoring tools exacerbates IT skills gaps. Implement an observability platform that empowers the teams you already have.

Read Blog

Blog October 8, 2025

Nobody Cares About Your MTTR

This post outlines why IT metrics like MTTR are irrelevant to business leaders, and it emphasizes that IT teams need network observability to bridge this gap.

Read Blog

Blog October 8, 2025

Tag(ging)—You’re It: How to Leverage AppNeta Monitoring Data for Maximum Insights

Find out about tagging capabilities in AppNeta. Get strategies for making the most of tagging and see how it can be a game-changer for your operations teams.

Read Blog