加载中...

爬虫常用正则表达式

表达式

发布日期: 2021-11-16

更新日期: 2021-11-16

文章字数: 326

阅读时长: 1 分

单字符
- . : 除换行以外所有字符
- [] : [aoe] [a-w] 匹配集合中任意一个字符
- \d：元字符，代表0-9中的任意一个
- \D : 代表任意一个非数字字符
- \w : 数字、字母、下划线、中文
- \W : 非\w
- \s : 所有的空白字符,包括空格、制表符、换页符等,等价于\f\n\r\t\v
- \S : 非空白
数量修饰
- * : 任意多次 >=0
- +: 至少一次 >=1
- ? : 可有可无 0次或一次
- {m} : 固定m次 hello{3,}
- {m,} : 至少m次
- {m,n} : m-n次
边界
- $ : 以某某结尾
- ^ : 以某某开头
分组
- (ab)
贪婪模式 : .*
非贪婪(惰性)模式 : .*?
re.I : 忽略大小写
re.M : 多行匹配
re.S : 但行匹配
re.sub : (正则表达式,替换内容,字符串)

import re

key = "javapython1myslqpython1"

print(re.findall('python1', key)[1])

key = "<html><h1>hello world<h1></html>"
print(re.findall('<h1>(.*)<h1>', key))

string = "I like 170 girl"
print(re.findall('\d', string))

key = "http://www.baidu.com and https://www.shaoshaossm.github.io"
print(re.findall('https://', key))

key = 'am@shao.com'
print(re.findall('s.*?\.', key))

key = 'saas and asa and saaas'
print(re.findall('sa{1,2}s',key))