Observability Data: Ingestion Pipeline Best Practices

Written by Robert Gauthier | Jul 25, 2025 7:00:45 PM

Key Takeaways

Find out why AIOps and observability require data from all corners of the IT estate.
See how teams can struggle with stitching different data sets into a coherent body of information.
Discover the three main options for data normalization, and the pros and cons of each.

Great data is a prerequisite to all things AIOps and observability. Great observability data results in fewer observability gaps, better analysis and insights, and more confidence within teams that rely on the power of modern AIOps and observability technologies. Goals for improved automation, IT efficiencies, intelligent triage and remediation all become more achievable with better data.

Even with this powerful data, AIOps and observability technologies need to “do work” on the data to extract its value.

To start: Great data for AIOps and observability should encompass monitoring data from all corners of the IT estate. Organizations can suffer negative consequences of blind spots and loss of confidence within monitoring and IT operations teams if they fail to capture alarms, metrics, topology, events, code, logs, and metadata from a range of environments, including Kubernetes, microservices, mainframe, NetOps, and more.

This wealth of data leads to the next set of challenges: stitching these data sets into a coherent body of information. Let’s clarify the data stitching work required.

Data Ingestion, transformation, and normalization in operational observability pipelines

In a typical data pipeline, raw data is collected from various source systems, transformed into a clean and usable form, and then normalized to ensure consistency and interoperability across the organization.

1. Data ingestion

The process of collecting raw data from diverse sources, such as databases, APIs, file systems, or IoT devices, into a centralized storage platform, such as a data lake, data warehouse, or cloud object store.

Here’s an example:

Ingesting daily transaction data from an e-commerce API into a cloud-based analytics platform.

2. Data transformation

Once ingested, raw data reveals itself to be messy or inconsistent. Transformation involves cleaning, structuring, and enriching the data to make it usable for analysis or modeling.

Common transformations include:

Converting timestamps to a standard format (e.g., YYYY-MM-DD).
Splitting full names into first and last names.
Calculating derived metrics like total_price = quantity × unit_price.

3. Data normalization

Normalization ensures that data is presented in a consistent structure and scale.

Normalization can refer to two distinct practices:

Value normalization: Scaling numeric values into a standard range, such as 0-1.
Database normalization: Structuring data to reduce redundancy and improve relational integrity. Examples include parsing mailing codes into regional and local addresses or separating customer name and address information into distinct tables.

Where should normalization occur?

There are options for where data normalization should occur. Each has notable pros and cons.

Option	Pros	Cons
Normalize at the source Apply standardized naming, formatting, and data structures upstream at the source system level.	Requires policy-level commitment with support from management Results in cleaner data throughout the pipeline Enables plug-and-play ingestion Delivers long-term efficiency Allows for faster knowledge transfer and easier on-boarding for team members	Requires enforcement of enterprise-wide standards and collaboration Can mean slower initial implementation due to organizational alignment requirements
Normalize during ingestion (Ingestion map + normalize) Apply normalization as part of the ingestion step to streamline immediate use.	Reduces raw data clutter Accelerates downstream processing	Harder to trace or audit original data Less transparency into source discrepancies
Normalize after ingestion (Ingestion ➝ transform/normalize) Separate normalization into its own step after data is ingested.	Offers greater flexibility and modularity Easier to troubleshoot and adjust logic	Requires more storage for raw vs. processed data Additional processing stages will likely increase latency and impact processing performance May introduce new governance and consistency issues Ties you to technology

Choosing the right strategy

The best approach depends on your architecture, governance maturity, and business needs. In simple terms, consider these guidelines when selecting a strategy:

For strategic scalability → Normalize at the source
For performance and simplicity → Normalize during ingestion
For agility and traceability → Normalize after ingestion

Most importantly, select a strategy but know that many organizations apply a combination of strategies to fit specific needs and to further enrich data after ingestion.

This combination of great observability data, modern AIOps and observability capabilities, and a data strategy that matches your organization’s situation, will help you unlock enormous value for your teams and set the stage for automation improvements, new IT efficiencies, and intelligent triage and remediation.

And, with the right strategy, you’ll uncover additional benefits such as:

Reducing the cost of integration
Making integration tools interchangeable, enabling your organization to realize maximum value
Reducing reliance on specialists with domain-specific knowledge

View full post