正则表达式速查表

入门

简介

这是一个正则表达式入门的快速参考表。

Python 中的正则表达式 (chatsheet.org)
JavaScript 中的正则表达式 (chatsheet.org)
PHP 中的正则表达式 (chatsheet.org)
Java 中的正则表达式 (chatsheet.org)
MySQL 中的正则表达式 (chatsheet.org)
Vim 中的正则表达式 (chatsheet.org)
Emacs 中的正则表达式 (chatsheet.org)
在线正则表达式测试工具 (regex101.com)

字符类

模式	描述
`[abc]`	匹配单个字符: a, b 或 c
`[^abc]`	匹配除了: a, b 或 c 之外的任意字符
`[a-z]`	匹配范围内的字符: a-z
`[^a-z]`	匹配不在范围内的字符: a-z
`[0-9]`	匹配范围内的数字: 0-9
`[a-zA-Z]`	匹配范围内的字符: a-z 或 A-Z
`[a-zA-Z0-9]`	匹配范围内的字符: a-z, A-Z 或 0-9

量词

模式	描述
`a?`	零个或一个 a
`a*`	零个或多个 a
`a+`	一个或多个 a
`[0-9]+`	一个或多个 0-9
`a{3}`	恰好 3 个 a
`a{3,}`	3 个或更多 a
`a{3,6}`	3 到 6 个 a
`a*`	贪婪量词
`a*?`	懒惰量词
`a*+`	占有量词

元序列

模式	描述
`.`	任意单个字符
`\s`	任意空白字符
`\S`	任意非空白字符
`\d`	任意数字, 等同于 0-9
`\D`	任意非数字, 等同于 ^0-9
`\w`	任意单词字符
`\W`	任意非单词字符
`\X`	任意 Unicode 序列, 包括换行
`\C`	匹配一个数据单元
`\R`	Unicode 换行符
`\v`	垂直制表符
`\V`	\v 的否定 - 除换行和垂直制表符外的任何内容
`\h`	水平空白字符
`\H`	\h 的否定
`\K`	重置匹配
`\n`	匹配第 n 个子模式
`\pX`	Unicode 属性 X
`\p{...}`	Unicode 属性或脚本类别
`\PX`	\pX 的否定
`\P{...}`	\p 的否定
`\Q...\E`	引用; 作为文字处理
`\k<name>`	匹配名为`name`的子模式
`\k'name'`	匹配名为`name`的子模式
`\k{name}`	匹配名为`name`的子模式
`\gn`	匹配第 n 个子模式
`\g{n}`	匹配第 n 个子模式
`\g<n>`	递归第 n 个捕获组
`\g'n'`	递归第 n 个捕获组
`\g{-n}`	匹配第 n 个相对前面的子模式
`\g<+n>`	递归第 n 个相对后面的子模式
`\g'+n'`	匹配第 n 个相对后面的子模式
`\g'letter'`	递归名为`letter`的捕获组
`\g{letter}`	匹配之前命名的捕获组`letter`
`\g<letter>`	递归名为`letter`的捕获组
`\xYY`	十六进制字符 YY
`\x{YYYY}`	十六进制字符 YYYY
`\ddd`	八进制字符 ddd
`\cY`	控制字符 Y
`[\b]`	退格字符
`\`	使任何字符变为文字

锚点

模式	描述
`\G`	匹配开始
`^`	字符串开始
`$`	字符串结束
`\A`	字符串开始
`\Z`	字符串结束
`\z`	字符串绝对结束
`\b`	单词边界
`\B`	非单词边界

替换

模式	描述
`\0`	完整匹配内容
`\1`	第 1 个捕获组的内容
`$1`	第 1 个捕获组的内容
`${foo}`	名为`foo`的捕获组内容
`\x20`	十六进制替换值
`\x{06fa}`	十六进制替换值
`\t`	制表符
`\r`	回车
`\n`	换行
`\f`	换页
`\U`	大写转换
`\L`	小写转换
`\E`	终止任何转换

| 模式 | 描述 | | ------------------- | :----------------- | -------------- | | (...) | 捕获所有封闭的内容 | | (a|b) | 匹配 a 或 b | | (?:...) | 匹配所有封闭的内容 | | (?>...) | 原子组(非捕获) | | (? | ...) | 复制子模式组号 | | (?#...) | 注释 | | (?'name'...) | 命名捕获组 | | (?<name>...) | 命名捕获组 | | (?P<name>...) | 命名捕获组 | | (?imsxXU) | 内联修饰符 | | (?(DEFINE)...) | 在使用前预定义模式 |

断言

-	-
`(?(1)yes\|no)`	条件语句
`(?(R)yes\|no)`	条件语句
`(?(R#)yes\|no)`	递归条件语句
`(?(R&name\yes\|no)`	条件语句
`(?(?=...)yes\|no)`	前瞻条件
`(?(?<=...)yes\|no)`	后顾条件

环视

-	-
`(?=...)`	正向前瞻
`(?!...)`	负向前瞻
`(?<=...)`	正向后顾
`(?<!...)`	负向后顾

环视允许你在主模式之前(后顾)或之后(前瞻)匹配一个组,而不将其包含在结果中。

标志/修饰符

模式	描述
`g`	全局
`m`	多行
`i`	忽略大小写
`x`	忽略空白
`s`	单行
`u`	Unicode
`X`	扩展
`U`	非贪婪
`A`	锚点
`J`	允许重复组名

递归

-	-
`(?R)`	递归整个模式
`(?1)`	递归第一个子模式
`(?+1)`	递归第一个相对子模式
`(?&name)`	递归子模式`name`
`(?P=name)`	匹配子模式`name`
`(?P>name)`	递归子模式`name`

POSIX 字符类

字符类	等同于	含义
`[[:alnum:]]`	`[0-9A-Za-z]`	字母和数字
`[[:alpha:]]`	`[A-Za-z]`	字母
`[[:ascii:]]`	`[\x00-\x7F]`	ASCII 码 0-127
`[[:blank:]]`	`[\t ]`	仅空格或制表符
`[[:cntrl:]]`	`[\x00-\x1F\x7F]`	控制字符
`[[:digit:]]`	`[0-9]`	十进制数字
`[[:graph:]]`	`[[:alnum:][:punct:]]`	可见字符(非空格)
`[[:lower:]]`	`[a-z]`	小写字母
`[[:print:]]`	`[ -~] == [ [:graph:]]`	可见字符
`[[:punct:]]`	!"#$%&'()*+,-./:;<=>?@^_`{\|}~	可见标点字符
`[[:space:]]`	`\t\n\v\f\r`	空白
`[[:upper:]]`	`[A-Z]`	大写字母
`[[:word:]]`	`[0-9A-Za-z_]`	单词字符
`[[:xdigit:]]`	`[0-9A-Fa-f]`	十六进制数字
`[[:<:]]`	`[\b(?=\w)]`	单词开始
`[[:>:]]`	`[\b(?<=\w)]`	单词结束

控制动词

-	-
`(*ACCEPT)`	控制动词
`(*FAIL)`	控制动词
`(*MARK:NAME)`	控制动词
`(*COMMIT)`	控制动词
`(*PRUNE)`	控制动词
`(*SKIP)`	控制动词
`(*THEN)`	Control verb
`(*UTF)`	Pattern modifier
`(*UTF8)`	Pattern modifier
`(*UTF16)`	Pattern modifier
`(*UTF32)`	Pattern modifier
`(*UCP)`	Pattern modifier
`(*CR)`	Line break modifier
`(*LF)`	Line break modifier
`(*CRLF)`	Line break modifier
`(*ANYCRLF)`	Line break modifier
`(*ANY)`	Line break modifier
`\R`	Line break modifier
`(*BSR_ANYCRLF)`	Line break modifier
`(*BSR_UNICODE)`	Line break modifier
`(*LIMIT_MATCH=x)`	Regex engine modifier
`(*LIMIT_RECURSION=d)`	Regex engine modifier
`(*NO_AUTO_POSSESS)`	Regex engine modifier
`(*NO_START_OPT)`	Regex engine modifier

正则表达式示例

字符

Pattern	Matches
`ring`	Match ring springboard etc.
`.`	Match a, 9, + etc.
`h.o`	Match hoo, h2o, h/o etc.
`ring\?`	Match ring?
`$quiet$`	Match (quiet)
`c:\\windows`	Match c:\windows

Use \ to search for these special characters:
[ \ ^ $ . | ? * + ( ) { }

替代选择

Pattern	Matches
`cat\|dog`	Match cat or dog
`id\|identity`	Match id or identity
`identity\|id`	Match id or identity

Order longer to shorter when alternatives overlap

字符类

Pattern	Matches
`[aeiou]`	Match any vowel
`[^aeiou]`	Match a NON vowel
`r[iau]ng`	Match ring, wrangle, sprung, etc.
`gr[ae]y`	Match gray or grey
`[a-zA-Z0-9]`	Match any letter or digit
`[\u3a00-\ufa99]`	Match any Unicode Hàn (中文)

In [ ] always escape . \ ] and sometimes ^ - .

简写类

Pattern	Meaning
`\w`	"Word" character (letter, digit, or underscore)
`\d`	Digit
`\s`	Whitespace (space, tab, vtab, newline)
`\W, \D, or \S`	Not word, digit, or whitespace
`[\D\S]`	Means not digit or whitespace, both match
`[^\d\s]`	Disallow digit and whitespace

事件

Pattern	Matches
`colou?r`	Match color or colour
`[BW]ill[ieamy's]*`	Match Bill, Willy, William's etc.
`[a-zA-Z]+`	Match 1 or more letters
`\d{3}-\d{2}-\d{4}`	Match a SSN
`[a-z]\w{1,7}`	Match a UW NetID

贪婪与懒惰

Pattern	Meaning
`* + {n,}` greedy	Match as much as possible
`<.+>`	Finds 1 big match in <b>bold</b>
`? +? {n,}?` lazy*	Match as little as possible
`<.+?>`	Finds 2 matches in <b>bold</b>

范围

Pattern	Meaning
`\b`	"Word" edge (next to non "word" character)
`\bring`	Word starts with "ring", ex ringtone
`ring\b`	Word ends with "ring", ex spring
`\b9\b`	Match single digit 9, not 19, 91, 99, etc..
`\b[a-zA-Z]{6}\b`	Match 6-letter words
`\B`	Not word edge
`\Bring\B`	Match springs and wringer
`^\d*$`	Entire string must be digits
`^[a-zA-Z]{4,20}$`	String must have 4-20 letters
`^[A-Z]`	String must begin with capital letter
`[\.!?"')]$`	String must end with terminal puncutation

修饰符

Pattern	Meaning
`(?i)`a-z*`(?-i)`	Ignore case ON / OFF
`(?s)`.*`(?-s)`	Match multiple lines (causes . to match newline)
`(?m)`^.*;$`(?-m)`	^ & $ match lines not whole string
`(?x)`	#free-spacing mode, this EOL comment ignored
`(?-x)`	free-spacing mode OFF
/regex/`ismx`	Modify mode for entire string

分组

Pattern	Meaning
`(in\|out)put`	Match input or output
`\d{5}(-\d{4})?`	US zip code ("+ 4" optional)

解析器在组后匹配失败会尝试每个替代方案。
可能导致灾难性的回溯。

后向引用

Pattern	Matches
`(to) (be) or not \1 \2`	Match to be or not to be
`([^\s])\1{2}`	Match non-space, then same twice more aaa, ...
`\b(\w+)\s+\1\b`	Match doubled words

非捕获组

Pattern	Meaning
`on(?:click\|load)`	Faster than: `on(click\|load)`

尽可能使用非捕获组或原子组。

原子组

Pattern	Meaning
`(?>red\|green\|blue)`	Faster than non-capturing
`(?>id\|identity)\b`	Match id, but not identity

""id"" 匹配，但在原子组后\b 失败，解析器不会回溯到组内重试 'identity'

如果替代方案重叠，按照从长到短的顺序排列。

环视

Pattern	Meaning
`(?= )`	Lookahead, if you can find ahead
`(?! )`	Lookahead,if you can not find ahead
`(?<= )`	Lookbehind, if you can find behind
`(?<! )`	Lookbehind, if you can NOT find behind
`\b\w+?(?=ing\b)`	Match warbling, string, fishing, ...
`\b(?!\w+ing\b)\w+\b`	Words NOT ending in ing
`(?<=\bpre).*?\b`	Match pretend, present, prefix, ...
`\b\w{3}(?<!pre)\w*?\b`	Words NOT starting with pre
`\b\w+(?<!ing)\b`	Match words NOT ending in ing

If-then-else 条件语句

匹配 "Mr." 或 "Ms."，如果字符串后面包含单词 "her"。

M(?(?=.*?\bher\b)s|r)\.

需要使用环视来实现条件判断。

Python 中的正则表达式

入门

导入正则表达式模块

import re

示例

re.search()

>>> sentence = 'This is a sample string'
>>> bool(re.search(r'this', sentence, flags=re.I))
True
>>> bool(re.search(r'xyz', sentence))
False

re.findall()

>>> re.findall(r'\bs?pare?\b', 'par spar apparent spare part pare')
['par', 'spar', 'spare', 'pare']
>>> re.findall(r'\b0*[1-9]\d{2,}\b', '0501 035 154 12 26 98234')
['0501', '154', '98234']

re.finditer()

>>> m_iter = re.finditer(r'[0-9]+', '45 349 651 593 4 204')
>>> [m[0] for m in m_iter if int(m[0]) < 350]
['45', '349', '4', '204']

re.split()

>>> re.split(r'\d+', 'Sample123string42with777numbers')
['Sample', 'string', 'with', 'numbers']

re.sub()

>>> ip_lines = "catapults\nconcatenate\ncat"
>>> print(re.sub(r'^', r'* ', ip_lines, flags=re.M))
* catapults
* concatenate
* cat

re.compile()

>>> pet = re.compile(r'dog')
>>> type(pet)
<class '_sre.SRE_Pattern'>
>>> bool(pet.search('They bought a dog'))
True
>>> bool(pet.search('A cat crossed their path'))
False

Functions

函数	描述
`re.findall`	返回一个包含所有匹配结果的列表
`re.finditer`	返回一个匹配对象的可迭代集合
`re.search`	返回一个 Match 对象，如果在字符串中的任何位置找到匹配
`re.split`	返回每次匹配时将字符串拆分为列表
`re.sub`	替换一个或多个匹配项为指定的字符串
`re.compile`	编译正则表达式模式，以便稍后重复使用
`re.escape`	返回一个字符串，其中所有的非字母数字字符都加上反斜杠

Flags

-	-	-
`re.I`	`re.IGNORECASE`	Ignore case
`re.M`	`re.MULTILINE`	Multiline
`re.L`	`re.LOCALE`	Make `\w`,`\b`,`\s` locale dependent
`re.S`	`re.DOTALL`	Dot matches all (including newline)
`re.U`	`re.UNICODE`	Make `\w`,`\b`,`\d`,`\s` unicode dependent
`re.X`	`re.VERBOSE`	Readable style

JavaScript 中的正则表达式

test()

let textA = "I like APPles very much";
let textB = "I like APPles";
let regex = /apples$/i;

// 输出： false
console.log(regex.test(textA));

// 输出： true
console.log(regex.test(textB));

search()

let text = "I like APPles very much";
let regexA = /apples/;
let regexB = /apples/i;

// 输出： -1
console.log(text.search(regexA));

// 输出： 7
console.log(text.search(regexB));

exec()

let text = "Do you like apples?";
let regex = /apples/;

// 输出： apples
console.log(regex.exec(text)[0]);

// 输出： Do you like apples?
console.log(regex.exec(text).input);

match()

let text = "Here are apples and apPleS";
let regex = /apples/gi;

// 输出： [ "apples", "apPleS" ]
console.log(text.match(regex));

split()

let text = "This 593 string will be brok294en at places where d1gits are.";
let regex = /\d+/g;

// 输出： [ "This ", " string will be brok", "en at places where d", "gits are." ]
console.log(text.split(regex));

matchAll()

let regex = /t(e)(st(\d?))/g;
let text = "test1test2";
let array = [...text.matchAll(regex)];

// 输出: ["test1", "e", "st1", "1"]
console.log(array[0]);

// 输出: ["test2", "e", "st2", "2"]
console.log(array[1]);

replace()

let text = "Do you like aPPles?";
let regex = /apples/i;

// 输出: Do you like mangoes?
let result = text.replace(regex, "mangoes");
console.log(result);

replaceAll()

let regex = /apples/gi;
let text = "Here are apples and apPleS";

// 输出： Here are mangoes and mangoes
let result = text.replaceAll(regex, "mangoes");
console.log(result);

PHP 中的正则表达式

Functions

-	-
`preg_match()`	Performs a regex match
`preg_match_all()`	Perform a global regular expression match
`preg_replace_callback()`	Perform a regular expression search and replace using a callback
`preg_replace()`	Perform a regular expression search and replace
`preg_split()`	Splits a string by regex pattern
`preg_grep()`	Returns array entries that match a pattern

preg_replace

$str = "Visit Microsoft!";
$regex = "/microsoft/i";

// 输出： Visit CheatSheets!
echo preg_replace($regex, "CheatSheets", $str);

preg_match

$str = "Visit CheatSheets";
$regex = "#cheatsheets#i";

// 输出： 1
echo preg_match($regex, $str);

preg_matchall

$regex = "/[a-zA-Z]+ (\d+)/";
$input_str = "June 24, August 13, and December 30";
if (preg_match_all($regex, $input_str, $matches_out)) {

    // 输出： 2
    echo count($matches_out);

    // 输出： 3
    echo count($matches_out[0]);

    // 输出： Array("June 24", "August 13", "December 30")
    print_r($matches_out[0]);

    // 输出： Array("24", "13", "30")
    print_r($matches_out[1]);
}

preg_grep

$arr = ["Jane", "jane", "Joan", "JANE"];
$regex = "/Jane/";

// 输出： Jane
echo preg_grep($regex, $arr);

preg_split

$str = "Jane\tKate\nLucy Marion";
$regex = "@\s@";

// 输出： Array("Jane", "Kate", "Lucy", "Marion")
print_r(preg_split($regex, $str));

Java 中的正则表达式

样式

第一种方式

Pattern p = Pattern.compile(".s", Pattern.CASE_INSENSITIVE);
Matcher m = p.matcher("aS");
boolean s1 = m.matches();
System.out.println(s1);   // 输出： true

第二种方式

boolean s2 = Pattern.compile("[0-9]+").matcher("123").matches();
System.out.println(s2);   // 输出： true

第三种方式

boolean s3 = Pattern.matches(".s", "XXXX");
System.out.println(s3);   // 输出： false

模式字段

-	-
`CANON_EQ`	Canonical equivalence
`CASE_INSENSITIVE`	Case-insensitive matching
`COMMENTS`	Permits whitespace and comments
`DOTALL`	Dotall mode
`MULTILINE`	Multiline mode
`UNICODE_CASE`	Unicode-aware case folding
`UNIX_LINES`	Unix lines mode

Methods

Pattern

Pattern compile(String regex , int flags)
boolean matches(String regex, CharSequence input)
String split(String regex , int limit)
String quote(String s)

Matcher

int start(int group | String name)
int end(int group | String name)
boolean find(int start)
String group(int group | String name)
Matcher reset()

String

boolean matches(String regex)
String replaceAll(String regex, String replacement)
String split(String regex, int limit)

There are more methods ...

示例

Replace sentence:

String regex = "[A-Z\n]{5}$";
String str = "I like APP\nLE";

Pattern p = Pattern.compile(regex, Pattern.MULTILINE);
Matcher m = p.matcher(str);

// 输出： I like Apple!
System.out.println(m.replaceAll("pple!"));

Array of all matches:

String str = "She sells seashells by the Seashore";
String regex = "\\w*se\\w*";

Pattern p = Pattern.compile(regex, Pattern.CASE_INSENSITIVE);
Matcher m = p.matcher(str);

List<String> matches = new ArrayList<>();
while (m.find()) {
    matches.add(m.group());
}

// 输出： [sells, seashells, Seashore]
System.out.println(matches);

MySQL 中的正则表达式

函数

名称	描述
`REGEXP`	匹配正则表达式
`REGEXP_INSTR()`	匹配正则表达式的子字符串的起始索引 (注: 仅限 MySQL 8.0+)
`REGEXP_LIKE()`	字符串是否匹配正则表达式 (注: 仅限 MySQL 8.0+)
`REGEXP_REPLACE()`	替换匹配正则表达式的子字符串 (注: 仅限 MySQL 8.0+)
`REGEXP_SUBSTR()`	返回匹配正则表达式的子字符串 (注: 仅限 MySQL 8.0+)

REGEXP

expr REGEXP pat

示例

mysql> SELECT 'abc' REGEXP '^[a-d]';
1
mysql> SELECT name FROM cities WHERE name REGEXP '^A';
mysql> SELECT name FROM cities WHERE name NOT REGEXP '^A';
mysql> SELECT name FROM cities WHERE name REGEXP 'A|B|R';
mysql> SELECT 'a' REGEXP 'A', 'a' REGEXP BINARY 'A';
1   0

REGEXP_REPLACE

REGEXP_REPLACE(expr, pat, repl[, pos[, occurrence[, match_type]]])

示例

mysql> SELECT REGEXP_REPLACE('a b c', 'b', 'X');
a X c
mysql> SELECT REGEXP_REPLACE('abc ghi', '[a-z]+', 'X', 1, 2);
abc X

REGEXP_SUBSTR

REGEXP_SUBSTR(expr, pat[, pos[, occurrence[, match_type]]])

示例

mysql> SELECT REGEXP_SUBSTR('abc def ghi', '[a-z]+');
abc
mysql> SELECT REGEXP_SUBSTR('abc def ghi', '[a-z]+', 1, 3);
ghi

REGEXP_LIKE

REGEXP_LIKE(expr, pat[, match_type])

示例

mysql> SELECT regexp_like('aba', 'b+')
1
mysql> SELECT regexp_like('aba', 'b{2}')
0
mysql> # i: case-insensitive
mysql> SELECT regexp_like('Abba', 'ABBA', 'i');
1
mysql> # m: multi-line
mysql> SELECT regexp_like('a\nb\nc', '^b$', 'm');
1

REGEXP_INSTR

REGEXP_INSTR(expr, pat[, pos[, occurrence[, return_option[, match_type]]]])

示例

mysql> SELECT regexp_instr('aa aaa aaaa', 'a{3}');
2
mysql> SELECT regexp_instr('abba', 'b{2}', 2);
2
mysql> SELECT regexp_instr('abbabba', 'b{2}', 1, 2);
5
mysql> SELECT regexp_instr('abbabba', 'b{2}', 1, 3, 1);
7