Regular Expressions
Using regular expressions, you can extract information or parse a string into multiple elements.
The regex engine is based on Java.util.regex
, following industry standards.
Metacharacters defined
Character | Description | Pattern | Sample matches |
---|---|---|---|
^ |
Start of a string. |
^abc |
abc, abcdefg, abc123 |
$ |
End of a string. |
abc$ |
abc, endsinabc, 123abc |
. |
Any character (except \n newline). |
a.c |
abc, aac, acc, adc, aec |
| |
Alternation. |
bill|ted |
ted, bill |
{…} |
Explicit quantifier notation. |
ab{2}c |
abbc |
[…] |
Explicit set of characters to match. |
a[bB]c |
abc, aBc |
(…) |
Logical grouping of part of an expression. |
(abc){2} |
abcabc |
* |
Zero or more of previous expressions. |
ab*c |
ac, abc, abbc, abbbc |
+ |
One or more of previous expressions. |
ab+c |
abc, abbc, abbbc |
? |
Zero or one of previous expressions. Also forces minimal matching when an expression might match several strings within a search string. |
ab?c |
ac, abc |
\ |
Preceding one of the previously described characters, it makes it a literal instead of a special character. Preceding a special matching character, see Character escapes. |
a\sc |
a c |
Character escapes
Escaped character | Description |
---|---|
Ordinary characters |
Characters other than |
\t |
Matches a tab |
\r |
Matches a carriage return |
\f |
Matches a form feed |
\n |
Matches a new line |
\040 |
Matches an ASCII character as an octal (up to three digits).
Numbers with no leading zero are backreferences if they have only one digit or if they correspond to a capturing group number.
For example, the character |
\x20 |
Matches an ASCII character using hexadecimal representation (exactly two digits). |
\u0020 |
Matches a Unicode character using a hexadecimal representation (exactly four digits). |
\* |
When followed by a character that is not recognized as an escaped character, it matches that character.
For example, |
Character classes
Character class | Description |
---|---|
. |
Matches any character except |
[aeiou] |
Matches any single character included in the specified set of characters. |
[^aeiou] |
Matches any single character not in the specified set of characters. |
[0-9a-fA-F] |
Use of a hyphen ( |
\w |
Matches any word character.
It is equivalent to |
\W |
Matches any non-word character.
It is equivalent to |
\s |
Matches any whitespace character.
It is equivalent to |
\S |
Matches any non-whitespace character.
It is equivalent to |
\d |
Matches any decimal digit. |
\D |
Matches any non-digit. |
Was this page useful?