Regex is very powerful and available in virtually every language. Eventually I’ll write a whole post about why every engineer should know some Regex, but this post is all about common regex patterns, flags, etc—in easy-to-read table form.
First, here are some patterns similar to ones I’ve used in recent memory:
Pattern | Meaning | Example Match |
---|---|---|
\d+ | match 1 or more digits | abc123def |
\w+$ | match 1 or more word chars before end of line | abc-123def |
ab|de | match ab or de | abc123def |
(?<=:)\w{4} | match exactly 4 word characters preceded by a “:” | abc:123def |
(?<=\w+): | match a : preceded by 1 or more word characters | abc: 123 |
Now to break down the common Regex elements.
Regex Character Classes
Character | Meaning |
---|---|
\w | any alphanumeric character, does not match accented characters1 |
\W | any non-alphanumeric character |
\s | whitespace (includes tabs, line breaks) |
\S | not whitespace |
\d | any digit (same as doing [0-9] ) |
\D | not digits |
. | match any character (except line breaks) |
[ab] | match any character inside the [] (same as a|b ) |
Regex Anchors
Anchor | Meaning |
---|---|
^ | the start of a line |
$ | the end of a line |
\b | a word boundary |
\B | not a word boundary |
Regex Quantifiers
Quantifier | Meaning |
---|---|
+ | match 1 or more of the preceding token |
* | match 0 or more of the preceding token |
{1, 2} | match 1 or 2 of the preceding token |
{2} | match exactly 2 of the preceding token |
{2,} | match 2 or more of the preceding token |
Grouping, Lookaheads, and Lookbehinds2
Syntax | Meaning | Example Match |
---|---|---|
(abc) | match the full group abc | abc123 |
a|b | match a or b, | is the regex “or” | attribute |
[abc] | match any of a, b, or c | attribute |
[^abc] | match not a, b, or c | carbs |
[a-z] | match any character in the range a-z | aBc123 |
(?=abc) | abc must come after your match: hi(?=abc) | hiabc |
(?!abc) | abc cannot follow your match: hi(?!abc) | hi_there |
(?<=abc) | abc must precede your match: (?<=abc)hi | abchi |
(?<!abc) | abc cannot precede your match: (?<!abc)hi | well_hi |
There is a lot more to regex, but combinations of the above classes, anchors, quantifiers, and groupings will get you very far. I recommend Regexr for a quick regex test bed and reference page, I’ve used it countless times (including while writing this post).
Helpful Links
- An expression to match all characters, including accented ones
Notes
\w
catches ascii word characters. Accented characters, like é, are unicode. Accented characters will also be interpreted as word boundaries (\b
), so beware! For a starting point on how to handle unicode characters, try the helpful link above. ↩︎- This is one area where you might run into some language-specific limitations. Golang, for instance, doesn’t support lookbehind matches with non-deterministic lengths ↩︎
[…] (replace with empty string), I tend to use regex. Regex is a deep topic, and I actually wrote up a regex cheatsheet post, but here’s a simple re.sub […]