python re正则表达式模块(Regular Expression) - 哈喽比特

1837次阅读 | 发布于6年以前

模块的的作用主要是用于字符串和文本处理，查找，搜索，替换等

复习一下基本的正则表达式吧

.：匹配除了换行符以为的任意单个字符

*：匹配任意字符，一个，零个，多个都能匹配得到俗称贪婪模式

+：匹配位于+之前的一个或者多个字符

|：匹配位于|之前或者之后的字符

^：匹配行首

$：匹配行尾

?：匹配位于？之前的零个或者一个字符，不匹配多个字符

\：表示 \ 之后的为转义字符

[]：匹配[]之中的任意单个字符,[0-9]表示匹配0到9任意一个数字

()：将位于()之内的的内容当作一个整体

{}：按{}中的次数进行匹配，100[0-9]{3}表示在100之后任意匹配一个3位数（100-999）

python中以\开头的元字符：

特殊序列符号

意义

只在字符串开始进行匹配

只在字符串结尾进行匹配

匹配位于开始或结尾的空字符串

匹配不位于开始或结尾的空字符串

相当于[0-9]

相当于[^0-9]

匹配任意空白字符:[\t\n\r\r\v]

匹配任意非空白字符:[^\t\n\r\r\v]

匹配任意数字和字母:[a-zA-Z0-9]

匹配任意非数字和字母:[^a-zA-Z0-9]

正则表达式语法表

语法意义说明

"." 任意字符

"^" 字符串开始 '^hello'匹配'helloworld'而不匹配'aaaahellobbb'

"$" 字符串结尾与上同理

"*"

0 个或多个字符（贪婪匹配）

<*>匹配chinaunix

"+"

1 个或多个字符（贪婪匹配）

与上同理

"?"

0 个或多个字符（贪婪匹配）

与上同理

*?,+?,??

以上三个取第一个匹配结果（非贪婪匹配） <*>匹配 </p> <p>{m,n} </p> <p>对于前一个字符重复m到n次，{m}亦可 </p> <p>a{6}匹配6个a、a{2,4}匹配2到4个a</p> <p>{m,n}? </p> <p>对于前一个字符重复m到n次，并取尽可能少 </p> <p>'aaaaaa'中a{2,4}只会匹配2个</p> <p>"\\" </p> <p>特殊字符转义或者特殊序列</p> <p>[] </p> <p>表示一个字符集 [0-9]、[a-z]、[A-Z]、[^0]</p> <p>"|" </p> <p>或 A|B,或运算</p> <p>(...) </p> <p>匹配括号中任意表达式</p> <p>(?#...) </p> <p>注释，可忽略</p> <p>(?=...) </p> <p>Matches if ... matches next, but doesn't consume the string. </p> <p>'(?=test)' 在hellotest中匹配hello</p> <p>(?!...) </p> <p>Matches if ... doesn't match next. </p> <p>'(?!=test)' 若hello后面不为test，匹配hello </p> <p>(?<=...) </p> <p>Matches if preceded by ... (must be fixed length). </p> <p>'(?<=hello)test' 在hellotest中匹配test </p> <p>(?<!...) </p> <p>Matches if not preceded by ... (must be fixed length). </p> <p>'(?<!hello)test' 在hellotest中不匹配test </p> <p>匹配的标志和含义</p> <p><strong>标志</strong> <strong>含义</strong></p> <p>re.I 忽略大小写</p> <p>re.L 根据本地设置而更改\w,\W,\b,\B,\s,\S的匹配内容</p> <p>re.M 多行匹配模式</p> <p>re.S 使"."元字符匹配换行符</p> <p>re.U 匹配Unicode字符</p> <p>re.X 忽略需要匹配模式中的空格，并且可以使用"#"号注释</p> <p>文本内容（提取Linux下的password文件）</p> <pre><code> man:x:6:12:man:/var/cache/man:/bin/nologin </code></pre> <p>re模块中有３个搜索函数，每个函数都接受３个参数(匹配模式，要匹配的字符串，进行匹配的标志)，如果匹配到了就返回一个对象实例，么有就返会Ｎone．</p> <p>findall():用于在字符串中查找符合正则表达式的字符串，并返回这些字符串的列表</p> <p>search():搜索整个字符串，返回对象实例</p> <p>match():只从第一个字符开始匹配，后面的不再匹配，返回对象实例</p> <pre><code> lovelinux@LoveLinux:~/py/boke$ cat text man:x:6:12:man:/var/cache/man:/bin/sh lovelinux@LoveLinux:~/py/boke$ cat test.py #/usr/bin/env python #coding:utf-8 import re with open('text','r') as txt: f = txt.read() print re.match('bin',f) print re.search('bin',f).end() lovelinux@LoveLinux:~/py/boke$ python test.py None 34 lovelinux@LoveLinux:~/py/boke$ vim test.py lovelinux@LoveLinux:~/py/boke$ python test.py None <_sre.SRE_Match object at 0x7f12fc9f9ed0> </code></pre> <p>返回是对象实例有２个方法，</p> <p>start()：返回记录匹配到字符的开始索引　</p> <p>end()：返回记录匹配到字符的结束索引</p> <pre><code> lovelinux@LoveLinux:~/py/boke$ python test.py None 31 34 lovelinux@LoveLinux:~/py/boke$ cat test.py #/usr/bin/env python #coding:utf-8 import re with open('text','r') as txt: f = txt.read() print re.match('bin',f) print re.search('bin',f).start() print re.search('bin',f).end() </code></pre> </div> </div> <div class="blockgap"></div> </div> <div class="blockgap"></div> <script> let url = window.location.href; function onWxReadyForShareArticle(){ let obj = { title: decodeURI(articleTitle), desc: decodeURI(articleSummary), link: window.location.href, imgUrl: "https://oss-cn-hangzhou.aliyuncs.com/codingsky/cdn/img/2024-02-01/89778da8387748299364fa73ad54f9a9", // 分享图标 success: function () { //console.log("update share success"); } }; wx.updateAppMessageShareData(obj); } /* function wechatShareArticle(){ //let data = {url:"https://hellobit.com.cn/doc/2024/1/29/1039.html"}; let data = { url : window.location.href }; postJsonV2(getUserToken(),"/api/tbs/v1/wxsdk/jssdk", data, function(code, data){ if(code != 200){ //console.log("get sign error,", code, data); return; } if(data.code != 0){ //console.log("get sign error, biz error ", data); return; } let appid = data.data.appid; let noncestr = data.data.noncestr; let timestamp = data.data.timestamp; let signature = data.data.signature; // get nonceStr, signature wx.config({ debug: false, // 开启调试模式,调用的所有api的返回值会在客户端alert出来，若要查看传入的参数，可以在pc端打开，参数信息会通过log打出，仅在pc端时才会打印。 appId: appid, // 必填，公众号的唯一标识 timestamp: timestamp, // 必填，生成签名的时间戳 nonceStr: noncestr, // 必填，生成签名的随机串 signature: signature,// 必填，签名 jsApiList: ['updateAppMessageShareData'] // 必填，需要使用的JS接口列表 }); wx.ready(function(){ onWxReadyForShareArticle(); }); wx.error(function(res){ if(console.error){ //console.error(res); } }); }); }*/ </script>  <div class="mcontainer" style="margin-top: 10px;margin-bottom:0px;padding-top:10px;padding-bottom:10px;background-color:rgb(48, 60, 66);color:#fff;"> <div class="row" style="text-align:center;"> <p style="margin:0px;padding:0px;" class="mfont_1">Copyright© 2013-2020</p> <p style="margin:0px;padding:0px;" class="mfont_1">All Rights Reserved <a href="https://beian.miit.gov.cn">京ICP备2023019179号-8</a></p> </div> </div> <script> window.onload = function(){ hljs.highlightAll(); /*if(typeof wechatShareArticle === "function"){ wechatShareArticle(); }*/ }; document.addEventListener('DOMContentLoaded', () => { // Get all "navbar-burger" elements const $navbarBurgers = Array.prototype.slice.call(document.querySelectorAll('.navbar-burger'), 0); // Add a click event on each of them $navbarBurgers.forEach( el => { el.addEventListener('click', () => { // Get the target from the "data-target" attribute const target = el.dataset.target; const $target = document.getElementById(target); // Toggle the "is-active" class on both the "navbar-burger" and the "navbar-menu" el.classList.toggle('is-active'); $target.classList.toggle('is-active'); }); }); }); </script>  <script type="text/javascript" src="//cpro.baidustatic.com/cpro/ui/c.js" async="async" defer="defer" ></script>  <script> var _hmt = _hmt || []; (function() { var hm = document.createElement("script"); hm.src = "https://hm.baidu.com/hm.js?713c92a31aa12d55b910e2065c7cda9d"; var s = document.getElementsByTagName("script")[0]; s.parentNode.insertBefore(hm, s); })(); </script> <script> /*var _hmt = _hmt || []; (function() { var hm = document.createElement("script"); hm.src = "https://hm.baidu.com/hm.js?e066867ae5a323a82641e57db7e7c914"; var s = document.getElementsByTagName("script")[0]; s.parentNode.insertBefore(hm, s); })();*/ </script> <script> (function(){ var bp = document.createElement('script'); var curProtocol = window.location.protocol.split(':')[0]; if (curProtocol === 'https') { bp.src = 'https://zz.bdstatic.com/linksubmit/push.js'; } else { bp.src = 'http://push.zhanzhang.baidu.com/push.js'; } var s = document.getElementsByTagName("script")[0]; s.parentNode.insertBefore(bp, s); })(); </script>   <script async src="https://www.googletagmanager.com/gtag/js?id=G-2VE3RBLF6N"></script> <script> window.dataLayer = window.dataLayer || []; function gtag(){dataLayer.push(arguments);} gtag('js', new Date()); gtag('config', 'G-2VE3RBLF6N'); </script> </body> </html>