请教下大神关于正则表达式的用法

import re
text = 'ABCDE'
f1 = re.compile('(B|C|D)')
f2 = re.compile('(B|C|D)+')
print(f1.findall(text))
print(f2.findall(text))

运行结果是:
['B', 'C', 'D']
['D']

请教下大神们为什么第二个只返回一个 D，谢谢！！！

Text

pile

请教

5 条回复 2018-01-31 15:24:36 +08:00

TimePPT

PRO

2018-01-30 14:40:57 +08:00

因为 + 默认是贪婪（ greedy ）的。
故你的 f2 = re.compile('(B|C|D)+') 中正则的实际语义是「字符串中匹配（'B' | 'C'| 'D')的字符直到下一个字符不满足该表达式为止」

所以，如果 text = 'ABCDEFABCGHBCIJCD'
你会发现 f2.findall(text) 的结果是['D', 'C', 'C', 'D']

chenstack

2018-01-30 19:00:04 +08:00

如果模式有 group，findall 返回的列表中的元素是所有 group 的内容组成的元组，(B|C|D)+的只有一个 group，其匹配内容是字串 BCD 的最后一个部分 D （结合#1 的讲解），比如把模式改为((B|C|D)+)，f2.findall(text)的结果则为[('BCD', 'D')]

Ctry

2018-01-31 13:09:44 +08:00

@chenstack
@TimePPT
两位 V 友，你们好。我对这边有一点疑问：为什么'(B|C|D)+'匹配到了字串 BCD，但是 findal()函数返回的是最后一部份'D'呢？为什么不是返回整个'BCD'呢？

chenstack

2018-01-31 14:10:52 +08:00

@Ctry Help on function findall in module re:

findall(pattern, string, flags=0)
Return a list of all non-overlapping matches in the string.

If one or more capturing groups are present in the pattern, return
a list of groups; this will be a list of tuples if the pattern
has more than one group.

Empty matches are included in the result.

结合我在#2 说的，如果想存粹地获取匹配的内容，可以
import re
text = 'ABCDE'
f2 = re.compile('(B|C|D)+')
print([m.group(0) for m in re.finditer(f2, text)])

Ctry

2018-01-31 15:24:36 +08:00

@chenstack 匹配的整个内容的话我知道是'BCD'，我就是不明白这个返回的组列表（ list of groups ）里面为什么是 D，而不是 B，或者 C。是否与贪婪匹配有关？