编程·

正则表达式速查表

正则表达式(regex)的快速参考,包括符号、范围、分组、断言以及一些入门示例。

入门

简介

这是一个正则表达式入门的快速参考表。

字符类

模式描述
[abc]匹配单个字符: a, bc
[^abc]匹配除了: a, bc 之外的任意字符
[a-z]匹配范围内的字符: a-z
[^a-z]匹配不在范围内的字符: a-z
[0-9]匹配范围内的数字: 0-9
[a-zA-Z]匹配范围内的字符: a-zA-Z
[a-zA-Z0-9]匹配范围内的字符: a-z, A-Z0-9

量词

模式描述
a?零个或一个 a
a*零个或多个 a
a+一个或多个 a
[0-9]+一个或多个 0-9
a{3}恰好 3 个 a
a{3,}3 个或更多 a
a{3,6}3 到 6 个 a
a*贪婪量词
a*?懒惰量词
a*+占有量词

常见元字符

  • ^
  • {
  • +
  • \
  • [
  • *
  • )
  • >
  • .
  • (
  • |
  • $
  • \
  • ?

使用\转义这些特殊字符

元序列

模式描述
.任意单个字符
\s任意空白字符
\S任意非空白字符
\d任意数字, 等同于 0-9
\D任意非数字, 等同于 ^0-9
\w任意单词字符
\W任意非单词字符
\X任意 Unicode 序列, 包括换行
\C匹配一个数据单元
\RUnicode 换行符
\v垂直制表符
\V\v 的否定 - 除换行和垂直制表符外的任何内容
\h水平空白字符
\H\h 的否定
\K重置匹配
\n匹配第 n 个子模式
\pXUnicode 属性 X
\p{...}Unicode 属性或脚本类别
\PX\pX 的否定
\P{...}\p 的否定
\Q...\E引用; 作为文字处理
\k<name>匹配名为name的子模式
\k'name'匹配名为name的子模式
\k{name}匹配名为name的子模式
\gn匹配第 n 个子模式
\g{n}匹配第 n 个子模式
\g<n>递归第 n 个捕获组
\g'n'递归第 n 个捕获组
\g{-n}匹配第 n 个相对前面的子模式
\g<+n>递归第 n 个相对后面的子模式
\g'+n'匹配第 n 个相对后面的子模式
\g'letter'递归名为letter的捕获组
\g{letter}匹配之前命名的捕获组letter
\g<letter>递归名为letter的捕获组
\xYY十六进制字符 YY
\x{YYYY}十六进制字符 YYYY
\ddd八进制字符 ddd
\cY控制字符 Y
[\b]退格字符
\使任何字符变为文字

锚点

模式描述
\G匹配开始
^字符串开始
$字符串结束
\A字符串开始
\Z字符串结束
\z字符串绝对结束
\b单词边界
\B非单词边界

替换

模式描述
\0完整匹配内容
\1第 1 个捕获组的内容
$1第 1 个捕获组的内容
${foo}名为foo的捕获组内容
\x20十六进制替换值
\x{06fa}十六进制替换值
\t制表符
\r回车
\n换行
\f换页
\U大写转换
\L小写转换
\E终止任何转换

分组构造

| 模式 | 描述 | | ------------------- | :----------------- | -------------- | | (...) | 捕获所有封闭的内容 | | (a|b) | 匹配 a 或 b | | (?:...) | 匹配所有封闭的内容 | | (?>...) | 原子组(非捕获) | | (? | ...) | 复制子模式组号 | | (?#...) | 注释 | | (?'name'...) | 命名捕获组 | | (?<name>...) | 命名捕获组 | | (?P<name>...) | 命名捕获组 | | (?imsxXU) | 内联修饰符 | | (?(DEFINE)...) | 在使用前预定义模式 |

断言

--
(?(1)yes|no)条件语句
(?(R)yes|no)条件语句
(?(R#)yes|no)递归条件语句
(?(R&name\yes|no)条件语句
(?(?=...)yes|no)前瞻条件
(?(?<=...)yes|no)后顾条件

环视

--
(?=...)正向前瞻
(?!...)负向前瞻
(?<=...)正向后顾
(?<!...)负向后顾

环视允许你在主模式之前(后顾)或之后(前瞻)匹配一个组,而不将其包含在结果中。

标志/修饰符

模式描述
g全局
m多行
i忽略大小写
x忽略空白
s单行
uUnicode
X扩展
U非贪婪
A锚点
J允许重复组名

递归

--
(?R)递归整个模式
(?1)递归第一个子模式
(?+1)递归第一个相对子模式
(?&name)递归子模式name
(?P=name)匹配子模式name
(?P>name)递归子模式name

POSIX 字符类

字符类等同于含义
[[:alnum:]][0-9A-Za-z]字母和数字
[[:alpha:]][A-Za-z]字母
[[:ascii:]][\x00-\x7F]ASCII 码 0-127
[[:blank:]][\t ]仅空格或制表符
[[:cntrl:]][\x00-\x1F\x7F]控制字符
[[:digit:]][0-9]十进制数字
[[:graph:]][[:alnum:][:punct:]]可见字符(非空格)
[[:lower:]][a-z]小写字母
[[:print:]][ -~] == [ [:graph:]]可见字符
[[:punct:]]!"#$%&'()*+,-./:;<=>?@^_`{|}~可见标点字符
[[:space:]]\t\n\v\f\r空白
[[:upper:]][A-Z]大写字母
[[:word:]][0-9A-Za-z_]单词字符
[[:xdigit:]][0-9A-Fa-f]十六进制数字
[[:<:]][\b(?=\w)]单词开始
[[:>:]][\b(?<=\w)]单词结束

控制动词

--
(*ACCEPT)控制动词
(*FAIL)控制动词
(*MARK:NAME)控制动词
(*COMMIT)控制动词
(*PRUNE)控制动词
(*SKIP)控制动词
(*THEN)Control verb
(*UTF)Pattern modifier
(*UTF8)Pattern modifier
(*UTF16)Pattern modifier
(*UTF32)Pattern modifier
(*UCP)Pattern modifier
(*CR)Line break modifier
(*LF)Line break modifier
(*CRLF)Line break modifier
(*ANYCRLF)Line break modifier
(*ANY)Line break modifier
\RLine break modifier
(*BSR_ANYCRLF)Line break modifier
(*BSR_UNICODE)Line break modifier
(*LIMIT_MATCH=x)Regex engine modifier
(*LIMIT_RECURSION=d)Regex engine modifier
(*NO_AUTO_POSSESS)Regex engine modifier
(*NO_START_OPT)Regex engine modifier

正则表达式示例

字符

PatternMatches
ringMatch ring springboard etc.
.Match a, 9, + etc.
h.oMatch hoo, h2o, h/o etc.
ring\?Match ring?
\(quiet\)Match (quiet)
c:\\windowsMatch c:\windows

Use \ to search for these special characters:
[ \ ^ $ . | ? * + ( ) { }

替代选择

PatternMatches
cat|dogMatch cat or dog
id|identityMatch id or identity
identity|idMatch id or identity

Order longer to shorter when alternatives overlap

字符类

PatternMatches
[aeiou]Match any vowel
[^aeiou]Match a NON vowel
r[iau]ngMatch ring, wrangle, sprung, etc.
gr[ae]yMatch gray or grey
[a-zA-Z0-9]Match any letter or digit
[\u3a00-\ufa99]Match any Unicode Hàn (中文)

In [ ] always escape . \ ] and sometimes ^ - .

简写类

PatternMeaning
\w"Word" character
(letter, digit, or underscore)
\dDigit
\sWhitespace
(space, tab, vtab, newline)
\W, \D, or \SNot word, digit, or whitespace
[\D\S]Means not digit or whitespace, both match
[^\d\s]Disallow digit and whitespace

事件

PatternMatches
colou?rMatch color or colour
[BW]ill[ieamy's]*Match Bill, Willy, William's etc.
[a-zA-Z]+Match 1 or more letters
\d{3}-\d{2}-\d{4}Match a SSN
[a-z]\w{1,7}Match a UW NetID

贪婪与懒惰

PatternMeaning
* + {n,}
greedy
Match as much as possible
<.+>Finds 1 big match in <b>bold</b>
*? +? {n,}?
lazy
Match as little as possible
<.+?>Finds 2 matches in <b>bold</b>

范围

PatternMeaning
\b"Word" edge (next to non "word" character)
\bringWord starts with "ring", ex ringtone
ring\bWord ends with "ring", ex spring
\b9\bMatch single digit 9, not 19, 91, 99, etc..
\b[a-zA-Z]{6}\bMatch 6-letter words
\BNot word edge
\Bring\BMatch springs and wringer
^\d*$Entire string must be digits
^[a-zA-Z]{4,20}$String must have 4-20 letters
^[A-Z]String must begin with capital letter
[\.!?"')]$String must end with terminal puncutation

修饰符

PatternMeaning
(?i)a-z*(?-i)Ignore case ON / OFF
(?s).*(?-s)Match multiple lines (causes . to match newline)
(?m)^.*;$(?-m)^ & $ match lines not whole string
(?x)#free-spacing mode, this EOL comment ignored
(?-x)free-spacing mode OFF
/regex/ismxModify mode for entire string

分组

PatternMeaning
(in|out)putMatch input or output
\d{5}(-\d{4})?US zip code ("+ 4" optional)

解析器在组后匹配失败会尝试每个替代方案。
可能导致灾难性的回溯。

后向引用

PatternMatches
(to) (be) or not \1 \2Match to be or not to be
([^\s])\1{2}Match non-space, then same twice more   aaa, ...
\b(\w+)\s+\1\bMatch doubled words

非捕获组

PatternMeaning
on(?:click|load)Faster than:
on(click|load)

尽可能使用非捕获组或原子组。

原子组

PatternMeaning
(?>red|green|blue)Faster than non-capturing
(?>id|identity)\bMatch id, but not identity

""id"" 匹配,但在原子组后\b 失败,解析器不会回溯到组内重试 'identity'

如果替代方案重叠,按照从长到短的顺序排列。

环视

PatternMeaning
(?= )Lookahead, if you can find ahead
(?! )Lookahead,if you can not find ahead
(?<= )Lookbehind, if you can find behind
(?<! )Lookbehind, if you can NOT find behind
\b\w+?(?=ing\b)Match warbling, string, fishing, ...
\b(?!\w+ing\b)\w+\bWords NOT ending in ing
(?<=\bpre).*?\bMatch pretend, present, prefix, ...
\b\w{3}(?<!pre)\w*?\bWords NOT starting with pre
\b\w+(?<!ing)\bMatch words NOT ending in ing

If-then-else 条件语句

匹配 "Mr." 或 "Ms.",如果字符串后面包含单词 "her"。

M(?(?=.*?\bher\b)s|r)\.

需要使用环视来实现条件判断。

Python 中的正则表达式

入门

导入正则表达式模块

import re

示例

re.search()

>>> sentence = 'This is a sample string'
>>> bool(re.search(r'this', sentence, flags=re.I))
True
>>> bool(re.search(r'xyz', sentence))
False

re.findall()

>>> re.findall(r'\bs?pare?\b', 'par spar apparent spare part pare')
['par', 'spar', 'spare', 'pare']
>>> re.findall(r'\b0*[1-9]\d{2,}\b', '0501 035 154 12 26 98234')
['0501', '154', '98234']

re.finditer()

>>> m_iter = re.finditer(r'[0-9]+', '45 349 651 593 4 204')
>>> [m[0] for m in m_iter if int(m[0]) < 350]
['45', '349', '4', '204']

re.split()

>>> re.split(r'\d+', 'Sample123string42with777numbers')
['Sample', 'string', 'with', 'numbers']

re.sub()

>>> ip_lines = "catapults\nconcatenate\ncat"
>>> print(re.sub(r'^', r'* ', ip_lines, flags=re.M))
* catapults
* concatenate
* cat

re.compile()

>>> pet = re.compile(r'dog')
>>> type(pet)
<class '_sre.SRE_Pattern'>
>>> bool(pet.search('They bought a dog'))
True
>>> bool(pet.search('A cat crossed their path'))
False

Functions

函数描述
re.findall返回一个包含所有匹配结果的列表
re.finditer返回一个匹配对象的可迭代集合
re.search返回一个 Match 对象,如果在字符串中的任何位置找到匹配
re.split返回每次匹配时将字符串拆分为列表
re.sub替换一个或多个匹配项为指定的字符串
re.compile编译正则表达式模式,以便稍后重复使用
re.escape返回一个字符串,其中所有的非字母数字字符都加上反斜杠

Flags

---
re.Ire.IGNORECASEIgnore case
re.Mre.MULTILINEMultiline
re.Lre.LOCALEMake \w,\b,\s locale dependent
re.Sre.DOTALLDot matches all (including newline)
re.Ure.UNICODEMake \w,\b,\d,\s unicode dependent
re.Xre.VERBOSEReadable style

JavaScript 中的正则表达式

test()

let textA = "I like APPles very much";
let textB = "I like APPles";
let regex = /apples$/i;

// 输出: false
console.log(regex.test(textA));

// 输出: true
console.log(regex.test(textB));
let text = "I like APPles very much";
let regexA = /apples/;
let regexB = /apples/i;

// 输出: -1
console.log(text.search(regexA));

// 输出: 7
console.log(text.search(regexB));

exec()

let text = "Do you like apples?";
let regex = /apples/;

// 输出: apples
console.log(regex.exec(text)[0]);

// 输出: Do you like apples?
console.log(regex.exec(text).input);

match()

let text = "Here are apples and apPleS";
let regex = /apples/gi;

// 输出: [ "apples", "apPleS" ]
console.log(text.match(regex));

split()

let text = "This 593 string will be brok294en at places where d1gits are.";
let regex = /\d+/g;

// 输出: [ "This ", " string will be brok", "en at places where d", "gits are." ]
console.log(text.split(regex));

matchAll()

let regex = /t(e)(st(\d?))/g;
let text = "test1test2";
let array = [...text.matchAll(regex)];

// 输出: ["test1", "e", "st1", "1"]
console.log(array[0]);

// 输出: ["test2", "e", "st2", "2"]
console.log(array[1]);

replace()

let text = "Do you like aPPles?";
let regex = /apples/i;

// 输出: Do you like mangoes?
let result = text.replace(regex, "mangoes");
console.log(result);

replaceAll()

let regex = /apples/gi;
let text = "Here are apples and apPleS";

// 输出: Here are mangoes and mangoes
let result = text.replaceAll(regex, "mangoes");
console.log(result);

PHP 中的正则表达式

Functions

--
preg_match()Performs a regex match
preg_match_all()Perform a global regular expression match
preg_replace_callback()Perform a regular expression search and replace using a callback
preg_replace()Perform a regular expression search and replace
preg_split()Splits a string by regex pattern
preg_grep()Returns array entries that match a pattern

preg_replace

$str = "Visit Microsoft!";
$regex = "/microsoft/i";

// 输出: Visit CheatSheets!
echo preg_replace($regex, "CheatSheets", $str);

preg_match

$str = "Visit CheatSheets";
$regex = "#cheatsheets#i";

// 输出: 1
echo preg_match($regex, $str);

preg_matchall

$regex = "/[a-zA-Z]+ (\d+)/";
$input_str = "June 24, August 13, and December 30";
if (preg_match_all($regex, $input_str, $matches_out)) {

    // 输出: 2
    echo count($matches_out);

    // 输出: 3
    echo count($matches_out[0]);

    // 输出: Array("June 24", "August 13", "December 30")
    print_r($matches_out[0]);

    // 输出: Array("24", "13", "30")
    print_r($matches_out[1]);
}

preg_grep

$arr = ["Jane", "jane", "Joan", "JANE"];
$regex = "/Jane/";

// 输出: Jane
echo preg_grep($regex, $arr);

preg_split

$str = "Jane\tKate\nLucy Marion";
$regex = "@\s@";

// 输出: Array("Jane", "Kate", "Lucy", "Marion")
print_r(preg_split($regex, $str));

Java 中的正则表达式

样式

第一种方式

Pattern p = Pattern.compile(".s", Pattern.CASE_INSENSITIVE);
Matcher m = p.matcher("aS");
boolean s1 = m.matches();
System.out.println(s1);   // 输出: true

第二种方式

boolean s2 = Pattern.compile("[0-9]+").matcher("123").matches();
System.out.println(s2);   // 输出: true

第三种方式

boolean s3 = Pattern.matches(".s", "XXXX");
System.out.println(s3);   // 输出: false

模式字段

--
CANON_EQCanonical equivalence
CASE_INSENSITIVECase-insensitive matching
COMMENTSPermits whitespace and comments
DOTALLDotall mode
MULTILINEMultiline mode
UNICODE_CASEUnicode-aware case folding
UNIX_LINESUnix lines mode

Methods

Pattern

  • Pattern compile(String regex , int flags)
  • boolean matches(String regex, CharSequence input)
  • String split(String regex , int limit)
  • String quote(String s)

Matcher

  • int start(int group | String name)
  • int end(int group | String name)
  • boolean find(int start)
  • String group(int group | String name)
  • Matcher reset()

String

  • boolean matches(String regex)
  • String replaceAll(String regex, String replacement)
  • String split(String regex, int limit)

There are more methods ...

示例

Replace sentence:

String regex = "[A-Z\n]{5}$";
String str = "I like APP\nLE";

Pattern p = Pattern.compile(regex, Pattern.MULTILINE);
Matcher m = p.matcher(str);

// 输出: I like Apple!
System.out.println(m.replaceAll("pple!"));

Array of all matches:

String str = "She sells seashells by the Seashore";
String regex = "\\w*se\\w*";

Pattern p = Pattern.compile(regex, Pattern.CASE_INSENSITIVE);
Matcher m = p.matcher(str);

List<String> matches = new ArrayList<>();
while (m.find()) {
    matches.add(m.group());
}

// 输出: [sells, seashells, Seashore]
System.out.println(matches);

MySQL 中的正则表达式

函数

名称描述
REGEXP匹配正则表达式
REGEXP_INSTR()匹配正则表达式的子字符串的起始索引
(注: 仅限 MySQL 8.0+)
REGEXP_LIKE()字符串是否匹配正则表达式
(注: 仅限 MySQL 8.0+)
REGEXP_REPLACE()替换匹配正则表达式的子字符串
(注: 仅限 MySQL 8.0+)
REGEXP_SUBSTR()返回匹配正则表达式的子字符串
(注: 仅限 MySQL 8.0+)

REGEXP

expr REGEXP pat

示例

mysql> SELECT 'abc' REGEXP '^[a-d]';
1
mysql> SELECT name FROM cities WHERE name REGEXP '^A';
mysql> SELECT name FROM cities WHERE name NOT REGEXP '^A';
mysql> SELECT name FROM cities WHERE name REGEXP 'A|B|R';
mysql> SELECT 'a' REGEXP 'A', 'a' REGEXP BINARY 'A';
1   0

REGEXP_REPLACE

REGEXP_REPLACE(expr, pat, repl[, pos[, occurrence[, match_type]]])

示例

mysql> SELECT REGEXP_REPLACE('a b c', 'b', 'X');
a X c
mysql> SELECT REGEXP_REPLACE('abc ghi', '[a-z]+', 'X', 1, 2);
abc X

REGEXP_SUBSTR

REGEXP_SUBSTR(expr, pat[, pos[, occurrence[, match_type]]])

示例

mysql> SELECT REGEXP_SUBSTR('abc def ghi', '[a-z]+');
abc
mysql> SELECT REGEXP_SUBSTR('abc def ghi', '[a-z]+', 1, 3);
ghi

REGEXP_LIKE

REGEXP_LIKE(expr, pat[, match_type])

示例

mysql> SELECT regexp_like('aba', 'b+')
1
mysql> SELECT regexp_like('aba', 'b{2}')
0
mysql> # i: case-insensitive
mysql> SELECT regexp_like('Abba', 'ABBA', 'i');
1
mysql> # m: multi-line
mysql> SELECT regexp_like('a\nb\nc', '^b$', 'm');
1

REGEXP_INSTR

REGEXP_INSTR(expr, pat[, pos[, occurrence[, return_option[, match_type]]]])

示例

mysql> SELECT regexp_instr('aa aaa aaaa', 'a{3}');
2
mysql> SELECT regexp_instr('abba', 'b{2}', 2);
2
mysql> SELECT regexp_instr('abbabba', 'b{2}', 1, 2);
5
mysql> SELECT regexp_instr('abbabba', 'b{2}', 1, 3, 1);
7