正則表達式 Regular Expression

2023.09.22
Incomplete Python re Regular Expression

Regular Expression¹

Python: re

Functions

re.search(pattern, string, flags=0)	尋找`string`中第一個符合正則表達式 (regular expression) `pattern`的substring. 沒有則傳回 None. `flags`: class re.RegexFlag 　　(配對正則表達式的選項, 忽略大小寫等)
re.findall(pattern, string, flags=0)	尋找`string`中所有符合正則表達式 (regular expression) `pattern`的substring, 但這些 substring 彼此不重疊. 沒有則傳回 None.
re.match(pattern, string, flags=0)	尋找`string`中符合正則表達式 (regular expression) `pattern`的substring. 而這個 substring 必須為 string[0 : i], i > 0
re.fullmatch(pattern, string, flags=0)	確認整個 `string` 是否符合正則表達式 (regular expression) `pattern`. 否則傳回 None.
re.compile(pattern, flags=0)	`prog = re.compile(pattern) result = prog.match(string)` ≡ `result = re.match(pattern, string)`
re.split(pattern, string, maxsplit=0, flags=0)
re.sub(pattern, repl, string, count=0, flags=0)	`sub`: substitute (替換) `repl`: replacement (替換品)
re.subn(pattern, repl, string, count=0, flags=0)
re.escape(pattern)

class re.Match

Match.group()

Returns one or more subgroups of the match.

m = re.match(r"(\w+) (\w+)", "Isaac Newton, physicist")
m.group(0)       # The entire match: 'Isaac Newton'
m.group(1)       # The first parenthesized subgroup: 'Isaac'
m.group(2)       # The second parenthesized subgroup: 'Newton'
                 # Because there are only 2 (\w), 
                 #   so "physicist" is not matched.
m.group(1, 2)    # Multiple arguments give us a tuple: ('Isaac', 'Newton')

Using the (?P<name>...) syntax to name subgroups.

r"(\w+) (\w+)" ⇒ r"(?P<first_name>\w+) (?P<last_name>\w+)"

m = re.match(r"(?P<first_name>\w+) (?P<last_name>\w+)", \
    "Isaac Newton, physicist")
m.group('first_name') # 'Isaac'
m.group('last_name')  # 'Newton'

Match.getitem(g)

m[g] ≡ m.group(g)

Match.expand(template)

m = re.match(r"(?P<first_name>\w+) (?P<last_name>\w+)", \
    "Isaac Newton, physicist")
print(m.expand(r'His name is \1 \2')) 
  # His name is Isaac Newton
print(m.expand(r'His name is \g<1> \g<2>'))
  # His name is Isaac Newton
print(m.expand(r'His name is \g<first_name> \g<last_name>'))
  # His name is Isaac Newton

Match.groups(default=None)

Return a tuple containing all the subgroups of the match, from 1 up to however many groups are in the pattern.

The default argument is used for groups that did not participate in the match; it defaults to None.

m = re.match(r"(\d+)\.(\d+)\s(\d+)", "12.345 ")
print(m)        # None

m = re.match(r"(\d+)\.(\d+)\s?(\d+)?", "12.345")
m.groups()      # ('12', '345', None)
m.groups('678') # ('12', '345', '678')

Match.groupdict(default=None)

參考：(?P<name>...)

輔助工具

https://pythex.org/

Last Updated on 2025/04/11 by A1go

References

1
WIKI: Regular Expression