User Community Service Desk Downloads
If you can't find the product or version you're looking for, visit support.ataccama.com/downloads

Regular Expressions

Using regular expressions, you can extract information or parse a string into multiple elements. The regex engine is based on Java.util.regex, following industry standards.

Metacharacters defined

Character Description Pattern Sample matches

^

Start of a string.

^abc

abc, abcdefg, abc123

$

End of a string.

abc$

abc, endsinabc, 123abc

.

Any character (except \n newline).

a.c

abc, aac, acc, adc, aec

|

Alternation.

bill|ted

ted, bill

{…​}

Explicit quantifier notation.

ab{2}c

abbc

[…​]

Explicit set of characters to match.

a[bB]c

abc, aBc

(…​)

Logical grouping of part of an expression.

(abc){2}

abcabc

*

Zero or more of previous expressions.

ab*c

ac, abc, abbc, abbbc

+

One or more of previous expressions.

ab+c

abc, abbc, abbbc

?

Zero or one of previous expressions. Also forces minimal matching when an expression might match several strings within a search string.

ab?c

ac, abc

\

Preceding one of the previously described characters, it makes it a literal instead of a special character. Preceding a special matching character, see Character escapes.

a\sc

a c

Character escapes

Escaped character Description

Ordinary characters

Characters other than . $ ^ { [ ( | ) ] } * + ? \ match themselves.

\t

Matches a tab \u0009.

\r

Matches a carriage return \u000D.

\f

Matches a form feed \u000C.

\n

Matches a new line \u000A.

\040

Matches an ASCII character as an octal (up to three digits). Numbers with no leading zero are backreferences if they have only one digit or if they correspond to a capturing group number. For example, the character \040 represents a space.

\x20

Matches an ASCII character using hexadecimal representation (exactly two digits).

\u0020

Matches a Unicode character using a hexadecimal representation (exactly four digits).

\*

When followed by a character that is not recognized as an escaped character, it matches that character. For example, \* is the same as \x2A.

Character classes

Character class Description

.

Matches any character except \n (newline).

[aeiou]

Matches any single character included in the specified set of characters.

[^aeiou]

Matches any single character not in the specified set of characters.

[0-9a-fA-F]

Use of a hyphen (-) allows the specification of contiguous character ranges.

\w

Matches any word character. It is equivalent to [a-zA-Z_0-9].

\W

Matches any non-word character. It is equivalent to [^a- zA-Z_0-9].

\s

Matches any whitespace character. It is equivalent to [\f\n\r\t\v].

\S

Matches any non-whitespace character. It is equivalent to [^ \f\n\r\t\v].

\d

Matches any decimal digit.

\D

Matches any non-digit.

Was this page useful?