Hetian lab day 5 Python标准库之数据结构与文本处理

xiaoxiao2023-10-29 170

文章目录

Part 1 Python标准库之数据结构与文本处理课后题Part 2 实验操作Part 3 分析与思考

Part 1 Python标准库之数据结构与文本处理课后题

【解析】search()和match()均用于扫描匹配字符串。match（）函数只检测RE是不是在string的开始位置匹配， search()会扫描整个string查找匹配, 也就是说match（）只有在0位置匹配成功的话才有返回，如果不是开始位置匹配成功的话，match()就返回none。 group(num) 或 groups() 匹配对象函数来获取匹配表达式。

匹配对象方法描述group(num=0)匹配的整个表达式的字符串，group() 可以一次输入多个组号，在这种情况下它将返回一个包含那些组所对应值的元组。groups()返回一个包含所有小组字符串的元组，从 1 到所含的小组号。 span() 返回一个元组包含匹配 (开始,结束) 的位置. findall,在字符串中找到正则表达式所匹配的所有子串，并返回一个列表，如果没有找到匹配的，则返回空列表。注意： match 和 search 是匹配一次 findall 匹配所有。

语法格式： findall(string[, pos[, endpos]])

参数： string : 待匹配的字符串。 pos : 可选参数，指定字符串的起始位置，默认为 0。 endpos : 可选参数，指定字符串的结束位置，默认为字符串的长度。【解析】

Part 2 实验操作

#!/usr/bin/python #coding=utf-8 ################################################## #运行各个模块时删除用''' '''就可以了 import string # 实验步骤一 #1、sting 文本常量和模板 ''' s1 = 'this is a line and you can read it loudly!' print s1 # capwords作用是将一个字符串中所有单词的首字母大写 print string.capwords(s1) ''' ''' # maketrans()函数将创建转换表，可以用来结合translate()方法将一组字符修改为另一组字符。 # Python maketrans() 方法用于创建字符映射的转换表，对于接受两个参数的最简单的调用方式， # 第一个参数是字符串，表示需要转换的字符，第二个参数也是字符串表示转换的目标。 # 注：两个字符串的长度必须相同，为一一对应的关系。 # 语法：str.maketrans(intab,outtab)intab是字符串中要替代的字符组成的字符串，outtab是相应的映射字符的字符串 leet = string.maketrans('abegiloprstz','463611092572') s3 = 'The quick brown fox jumped over the lazy dog' print s3 print s3.translate(leet) ''' ################################################## # 2、textwrap格式化文本段落 import textwrap # 作用：通过调整换行符在段落中出现的位置来格式化文本 ''' from textwrap_example import sample_text print 'No dedent:\n' print textwrap.fill(sample_text,width=100)#fill()函数取文本作为输入,width为输出文本的宽度 dedented_text = textwrap.dedent(sample_text)#这句用于去除现有缩进 print "dedented:\n" print dedented_text ''' ################################################## #3、re模块 import re #3.1正则式函数 # re.match()决定re是否在字符串刚开始的位置匹配 # re.search()扫描字符串，找到这个re匹配的位置 # re.findall()找到re匹配的所有子串，并把它们作为一个列表返回 # 以上三者匹配成功则返回一个match object对象，该对象有以下属性、方法 # group()return the string was matched by re,return the all string # group(n,m)group number n,string m, if there is no group number, retrun indexError # start()return the start of match # end()retrun the end of match # span()return a tuple which include the start and end of the match # re.finditer()找到re匹配的所有子串，并把它们作为一个迭代器返回 ''' p = re.compile('(a(b)c)d') m = p.match('abcd') print m.group() print m.group(0)#return all string print m.group(1) print m.group(2) ''' #3.2 re.compile函数 ''' #compile会将一个表达式字符串转换为一个regexObject regexes = [re.compile(p)for p in ['this','that']] text = 'Does this text match the pattern?' print 'Text:%r\n'%text for regex in regexes: print 'Seeking "%s" ->'%regex.pattern, if regex.search(text): print 'match' else: print 'no match' ''' #3.3 re.match函数 # re.match(pattern, string, flags) # pattern为正则表达式，成功返回match，否则返回none # string为要匹配的字符串 # flags为标志位，用于控制正则表达式的匹配方式，如是否区分大小写，多行匹配等 #以下脚本匹配第一个单词 ''' text = "Jgood is a handsome boy, he is cool, clever, and so on..." m = re.match(r"(\w+)\s",text)#\w匹配包括下划线的任何单词字符，\s匹配任何空白字符，包括空格制表符换页符 if m: print m.group(), '\n',m.group(0),'\n',m.group(1) else: print 'no match!' ''' # 4、re.search(pattern, string, flags)#正则表达式，文本，标志位 #取模式和文本作为输入，返回一个match对象，未找到则返回none。 ''' pattern = 'this' text = 'Does this text match the pattern?' match = re.search(pattern, text) s = match.start() e = match.end() print 'Found "%s"\n in "%s"\n from %d to %d ("%s")'%\ (match.re.pattern,match.string, s,e,text[s:e]) ''' # 5、re.split(string, [, maxsplit=0])通过maxsplit来限制分片数 # 以列表形式返回分割的字符串。可以使用split来分隔字符串，如re.split(r'\s+',text)将字符串按空格分隔为一个单词列表 ''' p = re.compile(r'\W+') print p.split('This is a test, short and sweet, of split().') print p.split('This is a test, short and sweet, of split().',3) ''' # 6、re.findall()和re.finditer() # re.findall()返回输入中与模式匹配而不重叠的所有子串 # 和re.finditer()返回一个迭代器，生成一个match实例 ''' text = 'abbaaabbbbaaaaa' pattern = 'ab' for match in re.findall(pattern,text): print 'Found "%s'%match print '------------------' for match in re.finditer(pattern,text): s = match.start() e = match.end() print 'Found "%s" as %d:%d'%(text[s:e], s,e) ''' # 实验步骤二 # # collection，array，copy，pprint from collections import Counter # 1、collections # 1.1 Counter()统计一个单词在给定序列中出现的次数 ''' li = ["dog", "cat", "mouse", "dog","cat", "dog", "tiger","lion"] a = Counter(li) print a print "(0), (1)".format(a.values(), a.keys()) print (a.most_common(3)) ''' # 1.2 deque,是一种由队列结构扩展而来的双端队列（double-ended queue), # 队列元素能够在队列两端添加或是删除 import time from collections import deque ''' num = 1000 def append(c): for i in range(num): c.append(i) def appendleft(c): if isinstance(c,deque): for i in range(num): c.appendleft(i) else: for i in range(num): c.insert(0,i) def pop(c): for i in range(num): c.pop() def popleft(c): if isinstance(c,deque): for i in range(num): c.popleft() else: for i in range(num): c.pop(0) if __name__ == '__main__': for container in [deque, list]: for operation in [append, appendleft, pop, popleft]: c = container(range(num)) start = time.time() operation(c) end=time.time() elapsed = end-start print "completed (0)/(1) in (2) seconds"\ .format(container.__name__, operation.__name__, elapsed) ''' #1.2.1队列基本操作 ''' q = deque(range(5)) q.append(5) q.appendleft(6) print q print q.pop() print q.popleft() print q.rotate(3)#旋转队列的操作，正参数右边参数移到左边，负参数左边的移到右边 print q print q.rotate(-1) print q ''' # 1.3 defaultdict这个类型除了在处理不存在的键的操作外与普通的字典完全相同。 # 当查找一个不存在的键操作发生时，它的default_factory会被调用，提供一个默认值并将这对键值存储下来。 from collections import defaultdict ''' s="the quick brown fox jumps over the the the lazy dog" words = s.split() location = defaultdict(list)#list可以保证插入元素的顺序，list+append;而使用set则不关心元素插入顺序，set+add它会帮助消除重复元素 for m,n in enumerate(words): location[n].append(m) print location ''' # 2、array # 类似于list对象，但是他限定了只能装一种类型的元素。节省空间但是比list操作慢 # 3、Pprint 提供比较优雅的数据结构打印方式 import pprint ''' matrix = [[1, 2, 3], [4, 5, 6],[7, 8, 9]] a=pprint.PrettyPrinter(width=20) a.pprint(matrix) ''' # 4、copy # Python对象之间复制是按引用传递的，如果需要拷贝对象则需要使用标准库中的copy模块。 # 1、copy.copy浅拷贝，只拷贝父对象，不会拷贝对象的内部子对象 # 2、copy.deepcopy深拷贝，拷贝对象及其子对象 import copy ''' a = [1, 2, 3, 4, ['a', 'b']]#原始对象 b = a#赋值，传对象的引用 c = copy.copy(a)#对象拷贝，浅拷贝 d = copy.deepcopy(a)#对象拷贝，深拷贝 a.append(5)#修改对象a a[4].append('c')#修改对象a中的['a', 'b']数组对象 print 'a = ',a print 'b = ',b#由于ab是传递对象的引用，故a对象改变，b也改变 print 'c = ',c#浅拷贝，不拷贝内部子对象，故子对象还是引用状态，所以a[4]append('c')会对浅拷贝对象造成影响 print 'd = ',d#深拷贝，拷贝内部子对象，故a[4]内部子对象的改变不会对其造成影响 ''' # 5、数据结构应用扩展 # 5.1单链链表 class Node: def __init__(self): self.data = None self.nextNode =None def set_and_return_Next(self): self.nextNode = Node() return self.nextNode def getNext(self): retrun self.data def getData(self): return self.data def setData(self, d): self.data = d class LinkedList: def buildList(self,array): self.head = Node() self.head.setData(array[0]) self.temp = self.head for i in array[1:]: self.temp = self.temp.set_and_return_Next() self.temp.setData(i) self.tail =self.temp return self.head def printList(self): tempNode=self.head while(tempNode!=self.tail): print(tempNode.getData()) tempNode=tempNode.getNext() print (self.tail.getData()) myArray=[3, 5, 4, 6, 2, 7, 8, 9, 10, 21] myList=LinkedList() myList.buildList(myArray) myList.printList()

Part 3 分析与思考

1） Python的string类提供了对字符串进行处理的方法。更进一步，通过标准库中的re包，Python可以用正则表达式(regular expression)来处理字符串。正则表达式是一个字符串模板。Python可以从字符中搜查符合该模板的部分，或者对这一部分替换成其它内容。比如你可以搜索一个文本中所有的数字。正则表达式的关键在于根据自己的需要构成模板。此外，Python标准库还为字符串的输出提供更加丰富的格式，比如： string包，textwrap包。 2）python中深拷贝和浅拷贝的区别？

copy.copy 浅拷贝只拷贝父对象，不会拷贝对象的内部的子对象。

copy.deepcopy 深拷贝拷贝对象及其子对象

最新回复(0)