《构建高可用Linux服务器第3版》—— 3.3　基础正则表达式

xiaoxiao2023-08-27 212

本节书摘来自华章出版社《构建高可用Linux服务器第3版》一书中的第3章，第3.3节，作者：余洪春，更多章节内容可以访问云栖社区“华章计算机”公众号查看。

3.3　基础正则表达式

首先要记住的是：正则表达式与通配符不一样，它们表示的含义并不相同！

正则表达式只是一种表示法，只要工具支持这种表示法，该工具就可以处理正则表达式的字符串。Vim、grep、Awk、Sed都支持正则表达式，也正是因为它们支持正则表达式，所以才显得很强大。我学习正则表达式的方法是先学习实例，掌握基础实例后理论基本也就到位了。所以这里我会举一些关于grep配合正则表达式的实例来说明正则表达式的强大之处。先来看看grep工具。

以前介绍过，grep工具格式如下：

grep -［acinv］ '搜索内容串' filename其中：

-a：表示以文本文件方式搜索。

-c：表示计算找到符合行的次数。

-i：表示忽略字母大小写。

-n：表示顺便输出行号。

-v：表示反向选择，即找到没有搜索字符串的行。

另外，搜索内容串可以是正则表达式。

下面举例说明。

1）搜索有the的行，并输出行号，如下所示：

#grep -n 'the' regular_express.txt``` 搜索没有the的行，并输出行号，如下所示：

grep -nv 'the' regular_express.txt`

2）利用［］搜索集合字符。

［］表示其中的某一个字符，例如［ade］表示a、d或e。

#grep -n 't［ae］st' regular_express.txt 8:I can't finish the test. 9:Oh! the soup taste good!``` 可以用^符号做［］内的前缀，表示除［］内的字符之外的字符。比如，要搜索oo前没有g的字符串所在的行，就可以使用'［^g］oo'作为搜索字符串，如下所示：

grep -n '［^g］oo' regular_express.txt

2:apple is my favorite food.

3:Football game is not use feet only.`［］内也可以用范围来表示，比如［a-z］表示26个小写字母，［0-9］表示0~9的数字，［A-Z］则表示26个大写字母。［a-zA-Z0-9］表示所有数字与英文字母。当然也可以配合^来排除字符。

搜索包含数字的行，如下所示：

#grep -n '［0-9］' regular_express.txt 5:However ,this dress is about $ 3183 dollars. 15:You are the best is menu you are the no.1.``` 3）行首字符^与行尾字符＄。符号^表示行的开头，＄表示行的结尾（不是字符，是位置），那么“^＄”则表示空，因为只有行首和行尾。这里的符号^与［］里面所使用的^意义不同，它表示的是符号^后面的串是在行的开头。比如搜索the在开头的行，如下所示：

grep -n '^the' regular_express.txt

12:the symbol '*' is represented as star.`4）搜索以小写字母开头的行。

命令如下所示：

#grep -n '^［a-z］' regular_express.txt 2:apple is my favorite food. 4:this dress doesn't fit me. 10:motorcycle is cheap than car. 12:the symbol '*' is represented as star. 18:google is the best tools for search keyword. 19:goooooogle yes! 20:go! go! Let's go. woody@xiaoc:~/tmp$``` 5）搜索开头不是英文字母的行。命令如下所示： grep -n '^［^a-zA-Z］' regular_express.txt 1:"Open Source" is a good mechanism to develop programs. 21:#I am VBird woody@xiaoc:~/tmp$ ＄表示它前面的串是在行的结尾，比如，'＼.$'表示点（.）在一行的结尾。搜索末尾是点（.）的行，如下所示：

grep -n '＼.$' regular_express.txt `

点（.）是正则表达式的特殊符号，所以要用＼转义，结果如下所示：

1:"Open Source" is a good mechanism to develop programs.

2:apple is my favorite food.

3:Football game is not use feet only.

4:this dress doesn't fit me.

5:However ,this dress is about $ 3183 dollars.

6:GNU is free air not free beer.

...6）注意在MS系统下生成的文本文件，换行时会加上一个^M字符，所以最后的字符会是隐藏的^M，在处理Windows下面的文本时要特别注意！

可以用cat dos_filetr-d'＼r'>unix_file来删除^M字符。那么'^＄'就表示只有行首、行尾的空行了！

搜索空行的命令如下所示：

#grep -n '^$' regular_express.txt 22: 23: woody@xiaoc:~/tmp$``` 7）搜索非空行的命令如下所示： grep -vn '^$' regular_express.txt 1:"Open Source" is a good mechanism to develop programs. 2:apple is my favorite food. 3:Football game is not use feet only. 4:this dress doesn't fit me. 8）正则中的重复字符*与任意一个字符点（.）。在bash中*代表通配符，用来表示任意个字符，但是在正则表达式中，其含义则不同，*表示有0个或多个某字符，请注意区分。例如，oo*表示第一个o一定存在，第二个o可以有一个或多个，也可以没有，因此代表至少一个o，点（.）代表一个任意字符，必须存在。在下面的例子中，g??d可以用'g..d'表示，good、gxxd、gabd……都符合g??d。 grep -n 'g..d' regular_express.txt 1:"Open Source" is a good mechanism to develop programs. 9:Oh! the soup taste good! 16:The world is the same with 'glad'. woody@xiaoc:~/tmp$ 搜索有两个o以上的字符串，如下所示： grep -n 'ooo*' regular_express.txt 1:"Open Source" is a good mechanism to develop programs. 2:apple is my favorite food. 3:Football game is not use feet only. 9:Oh! the soup taste good! 18:google is the best tools for search keyword. 19:goooooogle yes! grep -n'ooo*'regular_express.txt表示前两个o一定存在，第三个o可没有，也可有多个。搜索以g开头和结尾，中间是至少一个o的字符串，即gog、goog、gooog等，如下所示：

grep -n 'goo*g' regular_express.txt

18:google is the best tools for search keyword.

19:goooooogle yes!`9）限定连续重复字符的范围时使用{}。

符号.*只能限制0个或多个字符，如果要确切地限制字符的重复数量，就要用{范围}这种方式。范围是数字，用逗号（,）隔开。比如，“2,5”表示2～5个，“2”表示2个，“2,”表示2到更多个。

注意　由于{}在Shell中有特殊意义，因此作为正则表达式用的时候要用＼转义一下。

搜索包含两个o的字符串的行，如下所示：

#grep -n 'o＼{2＼}' regular_express.txt 1:"Open Source" is a good mechanism to develop programs. 2:apple is my favorite food. 3:Football game is not use feet only. 9:Oh! the soup taste good! 18:google is the best tools for search keyword. 19:goooooogle yes!``` 搜索g后面跟2～5个o，再跟一个g的字符串的行，如下所示：

grep -n 'go＼{2,5＼}g' regular_express.txt

18:google is the best tools for search keyword.`搜索包含g，且后面跟两个以上的o，再跟g的行，如下所示：

#grep -n 'go＼{2,＼}g' regular_express.txt 18:google is the best tools for search keyword. 19:goooooogle yes!``` 10）注意，［］中的符号^表示否定的意思，也可以把它放在［］中内容的后面。 '［^a-z＼.!^-］'表示没有小写字母，没有点（.），没有感叹号（!），没有空格，没有-的串，注意［］里面有个小空格。另外Shell里面的反向选择为［!range］，而在正则表达式里则是［^range］，希望大家也注意区分一下。 11）扩展正则表达式egrep。扩展正则表达式是在基础正则表达式上添加了几个特殊符号构成的，它令某些操作更加方便。比如，我们要去除空白行和行首为#的行，会这样用：

grep -v '^$' regular_express.txt | grep -v '^#'

"Open Source" is a good mechanism to develop programs.

apple is my favorite food.

Football game is not use feet only.

this dress doesn't fit me.`然而使用支持扩展正则表达式的egrep与扩展特殊符号，则会方便许多。

注意　grep只支持基础表达式，而egrep支持扩展，其实egrep是grep -E的别名，因此grep -E支持扩展正则表达式。

用法如下所示：

#egrep -v '^$|^#' regular_express.txt "Open Source" is a good mechanism to develop programs. apple is my favorite food. Football game is not use feet only. this dress doesn't fit me.``` 这里的符号表示或的关系。即满足^＄或^#的字符串。注意　egrep也很好用，我在写一个Shell脚本时就正好用到了这个。有时候，并不是只grep -v一个对象，像egrep -v"192.168.1.101102104"这种用法就很适合egrep。这里列出几个扩展特殊符号：＋：与符号.*的作用类似，表示一个或多个重复字符。 ?：与符号.*的作用类似，表示0个或一个字符。：表示或的关系，比如'gdgooddog'表示有gd、good或dog的字符串。 ()：将部分内容合成一个单元组。比如，要搜索glad或good，可以采用'g（laoo）d'这种方式。()的好处是可以对小组使用+?*等。

最新回复(0)

《构建高可用Linux服务器 第3版》—— 3.3 基础正则表达式

3.3 基础正则表达式