Spark修炼之道(基础篇)——Linux大数据开发基础:第十五节:基础正则表达式(一)...

    xiaoxiao2026-02-23  7

    参考书目:鸟哥的LINUX私房菜基础学习篇(第三版) Linux Shell Scripting Cookbook

    本节主要内容

    基础正则表达式

    1. 基础正则表达式

    (1)^行开始符

    ^匹配一行的开始,例如’^Spark’ 匹配所有Spark开始的行

    //grep -n表示查找到的结果显示行号 root@sparkslave02:~/ShellLearning# grep -n '^Spark' /hadoopLearning/spark-1.5.0-bin-hadoop2.4/README.md 3:Spark is a fast and general cluster computing system for Big Data. It provides 22:Spark is built using [Apache Maven](http://maven.apache.org/). 53:Spark also comes with several sample programs in the `examples` directory. 83:Spark uses the Hadoop core library to talk to HDFS and other Hadoop-supported

    (2)$行结束符

    Spark’ 匹配所有以Spark结束的行

    root@sparkslave02:~/ShellLearning# grep -n 'Spark$' /hadoopLearning/spark-1.5.0-bin-hadoop2.4/README.md 1:# Apache Spark 20:## Building Spark

    (3).匹配任意一个字符

    例如 Spa.k可以匹配Spark、Spaak等

    root@sparkslave02:~/ShellLearning# grep -n 'Spa.k' /hadoopLearning/spark-1.5.0-bin-hadoop2.4/README.md 1:# Apache Spark 3:Spark is a fast and general cluster computing system for Big Data. It provides 6:rich set of higher-level tools including Spark SQL for SQL and //其它省略

    上面没有匹配小写spark,要匹配可以采用

    //-i选项表示忽略大小写 root@sparkslave02:~/ShellLearning# grep -in 'Spa.k' /hadoopLearning/spark-1.5.0-bin-hadoop2.4/README.md

    (4)[]匹配其中一个

    [Ss]park只匹配Spark和spark

    root@sparkslave02:~/ShellLearning# grep -n '[Ss]park' /hadoopLearning/spark-1.5.0-bin-hadoop2.4/README.md 1:# Apache Spark 3:Spark is a fast and general cluster computing system for Big Data. It provides 6:rich set of higher-level tools including Spark SQL for SQL and DataFrames, 8:and Spark Streaming for stream processing. 10:<http://spark.apache.org/> //其它省略

    (5) [^]不匹配[]中的任何一个字符

    例如 ‘[^T]he’ ,不匹配The,但可匹配 the、che等

    root@sparkslave02:~/ShellLearning# grep -n '[^T]he' /hadoopLearning/spark-1.5.0-bin-hadoop2.4/README.md

    (6) [-]匹配固定范围的字符

    例如[a-h]he,只匹配ahe、bhe、che…hhe,不匹配ihe、the等

    root@sparkslave02:~/ShellLearning# grep -n '[a-h]he' /hadoopLearning/spark-1.5.0-bin-hadoop2.4/README.md 1:# Apache Spark 6:rich set of higher-level tools including Spark SQL for SQL and DataFrames, 10:<http://spark.apache.org/> 16:guide, on the [project web page](http://spark.apache.org/documentation.html)

    (7)? 匹配0次或1次

    例如t?he只匹配he和the,不匹配tthe

    //?属于特殊符号,需要\进行转义 root@sparkslave02:~/ShellLearning# grep -n 't\?he' /hadoopLearning/spark-1.5.0-bin-hadoop2.4/README.md 1:# Apache Spark 6:rich set of higher-level tools including Spark SQL for SQL and DataFrames, 10:<http://spark.apache.org/> 15:You can find the latest Spark documentation, including a programming 16:guide, on the [project web page](http://spark.apache.org/documentation.html) //其它省略

    (8)+ 至少匹配一次

    ‘S+park’可以匹配Spark、SSpark、SSSpark等

    root@sparkslave02:~/ShellLearning# grep -n 'S\+park' /hadoopLearning/spark-1.5.0-bin-hadoop2.4/README.md

    (9) * 匹配零次或多少

    ‘S*park’可匹配park、Spark、SSpark、SSSpark等

    root@sparkslave02:~/ShellLearning# grep -n 'S*park' /hadoopLearning/spark-1.5.0-bin-hadoop2.4/README.md 1:# Apache Spark 3:Spark is a fast and general cluster computing system for Big Data. It provides 6:rich set of higher-level tools including Spark SQL for SQL and DataFrames, 8:and Spark Streaming for stream processing. 10:<http://spark.apache.org/> 15:You can find the latest Spark documentation, including a programming //其它省略

    (10) {n},匹配n次

    例如[a-z]{3},匹配任意3个小写字母,等同于[a-z][a-z][a-z]

    root@sparkslave02:~/ShellLearning# grep -n '[a-z]\{3\}' /hadoopLearning/spark-1.5.0-bin-hadoop2.4/README.md 1:# Apache Spark 3:Spark is a fast and general cluster computing system for Big Data. It provides

    (11) 其它限定次数匹配

    {n, }至少匹配n次 {n, m}至少匹配n次,最多匹配m次

    (13) 转义字符\

    Ubuntu Linux ?,+,(,), {,}是特殊字符,在使用正则表达式时,如果不加转义符,会匹配将其视为一般字符,如果要设置为正则表达式式符,需要使用\进行转义,前面的例子已经给出示例。

    (14) ()匹配一组字符

    例如Sp(ar)\?k 匹配Spark和Spk,

    root@sparkslave02:~/ShellLearning# echo "Spark Spk Spak" | grep -n 'Sp\(ar\)\?k' 1:Spark Spk Spak

    (15) URL匹配实战

    root@sparkslave02:~/ShellLearning/Chapter15# grep -n '[A-Za-z]*://[A-Za-z]*\.\(\([A-Za-z]*\)\.\?\)*' /hadoopLearning/spark-1.5.0-bin-hadoop2.4/README.md

    上面整个例子可以分下列步骤完成: (1)匹配http://

    root@sparkslave02:~/ShellLearning/Chapter15# grep -n '[A-Za-z]*://' /hadoopLearning/spark-1.5.0-bin-hadoop2.4/README.md

    (2)匹配域名

    root@sparkslave02:~/ShellLearning/Chapter15# grep -n '[A-Za-z]*://[A-Za-z]*\.[A-Za-z]*' /hadoopLearning/spark-1.5.0-bin-hadoop2.4/README.md

    (3)处理重复部分

    root@sparkslave02:~/ShellLearning/Chapter15# grep -n '[A-Za-z]*://[A-Za-z]*\.\(\([A-Za-z]*\)\.\?\)*' /hadoopLearning/spark-1.5.0-bin-hadoop2.4/README.md
    最新回复(0)