Regular Expressions That I Use Regularly

Written by Günter Grossberger | Nov 22, 2024 8:28:03 PM

Key Takeaways

See how to use regular expressions to filter results when creating a dashboard for metric data.
Discover how to filter results in DX Application Performance Management (DX APM) and DX Operational Observability (DX O2).
Access step-by-step guidance for creating a dashboard in DX O2.

Whether creating dashboards or defining alarms, you will always need regular expressions to select data that you want to use.

This blog provides an overview of regular expressions in the context of creating a dashboard for metric data. I also cover some regular expressions I use frequently to build filters of data I commonly use.

What are regular expressions?

A regular expression is “a sequence of characters that specifies a match pattern in text”. While there are many ways to define a pattern to match, Perl Compatible Regular Expressions (PCRE) has emerged as a de-facto standard. PCRE is widely used in programming languages, including Perl, PHP, Java, .NET, Python, and others. Although implementations may differ slightly, they share a common set of patterns.

Regular expressions can be used to filter results for many capabilities available in DX Application Performance Management (DX APM) and DX Operational Observability (DX O2). These include:

Metric groups and alarms
All kinds of (attribute) filters
DX Dashboard queries
Universe definitions
Service definitions
SLI definitions

Regular expressions are used almost anywhere in the solution where you can filter data.

Scenario: Creating a dashboard

Let’s create a dashboard for our TixChange application in DX O2. We open the “Dashboards” view, select “New Dashboard”, and then “Add an empty panel”.

We want to display application metrics. To do so, change the default “-- Grafana --” data source to “AIOps_Metrics”. In the query builder, we notice two specifiers: the “Source Name Specifier” and the “Attribute Name Specifier”. Together, they define which data is displayed in our widget. For DX APM metrics, the Source Name Specifier specifies the agent expression; the Attribute Name Specifier specifies the metric expression. For the Source Name Specifier there are four options:

NONE
ALL
EXACT
REGEX

Obviously, “NONE” will not select any data and “ALL” will select all agents as data sources. “EXACT” performs an exact (literal) match. Note: “REGEX” is the only option that allows you to select data from multiple agents. If you want to display data from exactly one agent, you can copy the agent name (include the domain) from the metric browser for example.

Source Name Specifier

The “Source Name” consists of four parts separated by the pipe character (“|”):

Domain name|agent host name|agent process name|agent name

Domains are a legacy mechanism to filter data. In DX APM, it defaults to “SuperDomain.” For other data it may be “UIM” or “Custom.” We want to match all agents that have an agent name containing “TixChange” or “tixchange.” We don’t care about the host and process names. I may cover the topic of APM agent naming in a subsequent blog.

To match “SuperDomain” we can literally write “SuperDomain”. If we want to match any domain, we can write “.*” or better “.+”. The dot ‘.’ matches any character and the prefix “*” means “0-n times” and “+” means “1-n times”. So if we want to make sure that there is at least one character, we should use “.+” but most often simply “.*” is used.

Next, we want to match the pipe character. Because it is a special character in regular expressions (see below), we have to escape the pipe with a backslash: “\|”. Then, since we want to match any host name, we use “.+”. Because same holds true for the process name, we have “SuperDomain\|.+\|.+” so far. We add another (escaped) pipe. Finally, we want to match an agent name containing “TixChange” or “tixchange”. This means “TixChange” or “tixchange” can be preceded and followed by any number of other characters (“.*”).

How do we specify “TixChange” or “tixchange”? In regular expressions, the pipe character means “or”. So “TixChange” or “tixchange” is written as “(TixChange|tixchange)”. If we do not use the brackets, the “OR” applies to everything in front of the pipe “OR” everything after the pipe. With the brackets, we define the scope of the pipe. We could also add a third expression “(TixChange|tixchange|TIXChange)” and a fourth and so on. The result is our Source Name Specifier:

SuperDomain\|.+\|.+\|.*(tixchange|TixChange).*

We need the pipes to “anchor” the pattern and ensure that TixChange is contained in the agent name, after the third pipe. If we wrote “.+\|.*(tixchange|TixChange).*”, it would not be guaranteed that “tixchange” only occurs after the third pipe. Actually, it is not guaranteed that there is no fourth or fifth pipe before or after tixchange because we match any character with “.*”. But that would not be a valid Source Name Specifier!

We could also write “(tixchange|TixChange)” differently: it is “tixchange” where the first and fourth characters can be both upper- and lower-case characters. With square brackets, we can define “groups” or “classes” that are matched:

[tT] matches both upper case T and lower-case t
[abc] matches any of the characters ‘a’, ‘b’ or ‘c’
[a-z] matches any of the 26 lower case alphabet characters
[a-zA-Z] matches any of the 26 lower- or upper-case alphabet characters
[0-9] matches any number character
[0-9]+ matches any positive integer
\-?[0-9]+ matches any integer

Although harder to read, the following expression accomplishes nearly the same results as the expression above:

SuperDomain\|.+\|.+\|.*[tT]ix[cC]hange.*

Written this way, the expression matches “tixChange” and “Tixchange”.

Attribute Name Specifier

The Attribute Name Specifier has the same options as the Source Name Specifier and additional options that I will not cover here. I want to display the “Responses Per Interval” metrics for all (web) applications deployed in the tixchange application servers. The DX APM metric path usually describes the metrics and is also divided by an arbitrary number of pipe characters. You can think of every string between the pipes as a folder: you get the hierarchical, directory-like structure that is shown in the metric browser. The metric path is separated from the metric name with a colon “:”.

Note: There must be exactly one colon in a metric path. For example:

Frontends|Apps|TIXCHANGE Web:Responses Per Interval
Frontends|Apps|TIXCHANGE Web|URLs|shop/newOrder.shtml:Responses Per Interval
GC Monitor|Memory Pools|PS Eden Space:Amount of Space Used (bytes)

If we want to match the first metric and all other applications, not just “TIXCHANGE Web”, we need to generalize that string for our pattern.

Note: Remember to escape the pipe characters, i.e. “\|”. If there are brackets like “(bytes)” or “(ms)” you have to escape the brackets, too: “\(ms)\”.

“Frontends\|Apps\|.+:Responses Per Interval” would not only match the first metric but also the second metric because “.+” also matches “TIXCHANGE Web|URLs|shop/newOrder.shtml”. If we want to match only metrics on the “Apps” level and not the individual URL metrics, we must make sure that there are no pipe characters in the matched string.

Classes can also be used to exclude a group of characters: “[^a]” means “any character but “a”. So, we will use “[^\|]+” (pipe must still be escaped) instead of “.+”:

Frontends\|Apps\|[^\|]+:Responses Per Interval

Now we have defined regular expressions for both the Source Name Specifier and the Attribute Name Specifier.

In the query panel under “Additional Filters”, you can define the series name. “Source | Metric” will show the full Source and the Attribute Name Specifiers. You can omit the Source or use “Metric Path” to cut off at the “:” character. You can also show only one of the four Source segments (domain, host, process, agent name).

If that is not short enough for you, you can open “Legend” under “Panel” on the right. There you find a REGEX to “extract” a part of the metric path as legend. Only what you put in brackets will be used as name: “Frontends\|Apps\|([^\|]+):Responses Per Interval” will extract only the application name.

Helpful hint: The query is only executed when you leave the text box where you typed or pasted your REGEX. To test your query, click anywhere outside the text box.

Final remarks

There are many more regular expressions and many uses that go beyond the examples covered here. You can test your REGEX in a browser on sites like regex101. You will also find tutorials, videos, and other blogs about REGEX to expand your usage.

Common regular expressions

Pattern	Meaning
.	any (single) character
.*	0-n times any character
.+	at least one character, arbitrary non-zero length string
\\|	match (special) pipe character ‘\|’
\.	match (special) dot character ‘.’
(TixChange\|tixchange)	“TixChange” or “tixchange”
[tT]ix[cC]hange	“TixChange” or “tixchange” or “tixChange” or “Tixchange”
.\\|.\\|.* or better .+\\|.+\\|.+	match any agent triplet
SuperDomain\\|.+\\|.+\\|.+	match any Source Name Specifier in SuperDomain
lin.+p[0-9]+	A string starting with “lin”, followed by any characters, ending in “p” followed by numbers only, e.g. linxy7zp318
[^\\|]+	A non-zero length string that does not contain the pipe character
Frontends\\|Apps\\|[^\\|]+:Average Response Time \(ms\)	Only (aggregated) application response times

View full post