Regex Cheat Sheet
0. Introduction
Regex (short for “regular expression”) is a notation for describing text mode. It is used as a professional language to match some particular modes.
1. Syntax
1
| /^hello/i --match--> "Hello regex"
|
2. Particles
These little parts will show how a regex composed of and their function.
| sign |
meaning |
| . |
any single character |
| | |
“or”, means select |
| \d |
any number |
| \D |
any non-number |
| \s |
any space character |
| \S |
any non-space character |
| \w |
any word |
| \W |
any non-word |
| \X |
unicode sequence |
| \p{Prop} |
unicode property |
| \P{Prop} |
unicode property |
2.2 Special signs
| sign |
meaning |
| \ |
used to represent escape sequences |
| \f |
formfeed |
| \n |
newline |
| \r |
carriage return |
| \t |
horizontal tab |
| \v |
vertical tab |
| \0 |
space |
| \num |
octal number |
| \xnum |
hexadecimal number |
| \x{num} |
hexadecimal number |
| \unum |
unicode escape sequence |
| \Unum |
unicode escape sequence |
| \cchar |
control character |
2.3 Character sets
| character set |
meaning |
| […] |
single character of character set |
| [^…] |
character set except these characters |
| [a-z] |
characters between ‘a’ and ‘z’ |
| [^a-z] |
characters except ‘a’ to ‘z’ |
| [a-zA-Z] |
characters between ‘a’ and ‘z’ and between ‘A’ and ‘Z’ |
2.4 POSIX character sets
| character set |
meaning |
| [[:alnum:]] |
any alphabet and number |
| [[:alpha:]] |
any alphabet |
| [[:ascii:]] |
ascii codes (0-127) |
| [[:blank:]] |
space or tab |
| [[:cntrl:]] |
control characters |
| [[:digit:]] |
digital numbers |
| [[:graph:]] |
visible characters except space |
| [[:lower:]] |
lower characters |
| [[:print:]] |
printable characters |
| [[:punct:]] |
punctions |
| [[:space:]] |
space characters |
| [[:upper:]] |
upper characters |
| [[:word:]] |
words, characters, numbers and underscore |
| [[:xdigit:]] |
hexadecimal numbers |
| [[:<:]] |
the beginning of word |
| [[:>:]] |
the end of word |
2.5 Anchors
| anchor |
meaning |
| ^ |
the beginning of strings/lines |
| \A |
the beginning of strings/lines, ignore m signs |
| $ |
the end of strings/lines |
| \Z |
the end of strings/lines, ignore m signs |
| \z |
match the end of strings only |
| \b |
match the boundary of words |
| \B |
match not the boundary of words |
| \< |
match the beginning of words |
| \> |
match the end of words |
| \G |
match the begning location |
2.6 Quantitives
| quantitive |
meaning |
| ? |
repeat 0 or 1 time |
| ?? |
repeat 0 or 1 time (not greedy match) |
| * |
repeat 0 or more times |
| *? |
repeat 0 or more times (not greedy match) |
| + |
repeat 1 or more times |
| +? |
repeat 1 or more times (not greedy match) |
| {n} |
repeat n times (n isn’t negative integer) |
| {n,} |
repeat at least n times (n isn’t negative integer) |
| {n, m} |
repeat n times at least m times at most (n, m are both non-negative integers) |
2.7 Groups
| group |
meaning |
| (…) |
capturing group |
| (?:…) |
non-capturing group |
| (?P<name>…) |
capturing group with name |
| (?<name>…) |
capturing group with name |
| (?’name’…) |
capturing group with name |
| (?>…) |
non-capturing group |
| (?|…) |
reset child-mode number |
| (?#…) |
comment the group |
| (?=…) |
posible lookahead |
| (?!…) |
negative lookahead |
| (?<=…) |
positive reverse lookahead |
| (?<!…) |
negative reverse lookahead |
2.8 Replacements
| replacement |
meaning |
| $1 |
capture contents in group 1 |
| $foo |
capture contents in group foo |
2.9 Mode prefix signs
| mode prefix sign |
meaning |
| g |
global mode |
| i |
ignore upper and lower |
| m |
multiple lines |
| s |
single line |
| x |
allow spaces and comments |
| u |
unicode characters |
| U |
non-greedy match |
2.10 Regex engines
- DFA
- POSIX DFA
- Traditional DFA
- DFA/NFA fusion
3. Websites
You can test your regex and understand it in visual way in these websites.
- regex101.com
- debuggex.com
- regexr.com