1.OS基础
Python中有几个内置模块和方法来处理文件。这些方法被分割到例如os, os.path , shutil 和 pathlib 等等几个模块中。文章将列举Python中对文件最常用的操作和方法。 Python内置的 os 模块有很多有用的方法能被用来列出目录内容和过滤结果。为了获取文件系统中特定目录的所有文件和文件夹列表,可以在遗留版本的Python中使用 os.listdir() 或 在Python 3.x 中使用 os.scandir() 。 如果你还想获取文件和目录属性(如文件大小和修改日期),那么 os.scandir() 则是首选的方法。
import os
temp=os.listdir()
print(temp)
for i in temp:
print(type(i))
结果为:
['.idea', '123.json', 'flask-tutorial.pdf', 'index.py', 'list-of-companies-in-nasdaq-exchanges-csv_json.json', 'Python工程师招聘数据.csv', 'subway.csv', 'test.csv', 'test.py', 'zzx.pdf', 'zzx.txt']
<class 'list'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
Process finished with exit code 0
我们可以发现,listdir()os.listdir() 返回一个Python列表,其中每个文件是字符串名字,而我们再用scandir()尝试。
import os
temp=os.scandir()
print(type(temp))
print(temp)
for i in temp:
print(type(i))
结果为:
<class 'nt.ScandirIterator'>
<nt.ScandirIterator object at 0x0000022F4C459E30>
<class 'nt.DirEntry'>
<class 'nt.DirEntry'>
<class 'nt.DirEntry'>
<class 'nt.DirEntry'>
<class 'nt.DirEntry'>
<class 'nt.DirEntry'>
<class 'nt.DirEntry'>
<class 'nt.DirEntry'>
<class 'nt.DirEntry'>
<class 'nt.DirEntry'>
<class 'nt.DirEntry'>
os.scandir() 调用时返回一个迭代器而不是一个列表。如果想得到文件名字,你可以遍历迭代器的内容,并打印文件名。。
import os
temp=os.scandir()
print(type(temp))
print(temp)
for i in temp:
print(type(i))
print(i.name)
结果为 :
<class 'nt.DirEntry'>
.idea
<class 'nt.DirEntry'>
123.json
<class 'nt.DirEntry'>
flask-tutorial.pdf
<class 'nt.DirEntry'>
index.py
<class 'nt.DirEntry'>
list-of-companies-in-nasdaq-exchanges-csv_json.json
<class 'nt.DirEntry'>
Python工程师招聘数据.csv
<class 'nt.DirEntry'>
subway.csv
<class 'nt.DirEntry'>
test.csv
<class 'nt.DirEntry'>
test.py
<class 'nt.DirEntry'>
zzx.pdf
<class 'nt.DirEntry'>
zzx.txt
使用一个案例, 打印出目录中文件的名称
import os
pathname="D:\\文档"
print(os.listdir(pathname))
for file in os.listdir(pathname):
if os.path.isfile(os.path.join(pathname,file)):
print(os.path.join(pathname,file))
print(file)
结果为:
D:\文档\06579594.pdf
06579594.pdf
D:\文档\1-s2.0-S1053811913004709-main.pdf
1-s2.0-S1053811913004709-main.pdf
D:\文档\10240.full.pdf
10240.full.pdf
D:\文档\7220180345_张振兴_反射机制实践.cpp
7220180345_张振兴_反射机制实践.cpp
D:\文档\Comparison of fluctuations in global network topology of modeled and empirical brain functional connectivity - 副本.pdf
Comparison of fluctuations in global network topology of modeled and empirical brain functional connectivity - 副本.pdf
D:\文档\Comparison of fluctuations in global network topology of modeled and empirical brain functional connectivity.pdf
Comparison of fluctuations in global network topology of modeled and empirical brain functional connectivity.pdf
D:\文档\Complex_brain_networks_Graph_theoretical_analysis_ - 副本.pdf
Complex_brain_networks_Graph_theoretical_analysis_ - 副本.pdf
D:\文档\Complex_brain_networks_Graph_theoretical_analysis_.pdf
Complex_brain_networks_Graph_theoretical_analysis_.pdf
D:\文档\ENIGMA-Viewer_Interactive_visualization_strategies - 副本.pdf
ENIGMA-Viewer_Interactive_visualization_strategies - 副本.pdf
D:\文档\ENIGMA-Viewer_Interactive_visualization_strategies.pdf
ENIGMA-Viewer_Interactive_visualization_strategies.pdf
D:\文档\Exploring_the_Human_Connectome_Topology_in_Group_S - 副本.pdf
Exploring_the_Human_Connectome_Topology_in_Group_S - 副本.pdf
D:\文档\Exploring_the_Human_Connectome_Topology_in_Group_S.pdf
Exploring_the_Human_Connectome_Topology_in_Group_S.pdf
D:\文档\journal.pcbi.1006497.pdf
journal.pcbi.1006497.pdf
D:\文档\nihms320307 - 副本.pdf
nihms320307 - 副本.pdf
D:\文档\nihms367529 - 副本.pdf
nihms367529 - 副本.pdf
D:\文档\nihms367529.pdf
nihms367529.pdf
D:\文档\nihms482729 - 副本.pdf
nihms482729 - 副本.pdf
D:\文档\nihms482729.pdf
nihms482729.pdf
D:\文档\nihms899155 - 副本.pdf
nihms899155 - 副本.pdf
D:\文档\NIHMS899155-supplement.zip
NIHMS899155-supplement.zip
D:\文档\nihms899155.pdf
nihms899155.pdf
D:\文档\paper2737 - 副本.pdf
paper2737 - 副本.pdf
D:\文档\paper2737.pdf
paper2737.pdf
D:\文档\Relative Contributions of Anatomy, Stationary Dynamics, and Non-stationarities - 副本.PDF
Relative Contributions of Anatomy, Stationary Dynamics, and Non-stationarities - 副本.PDF
D:\文档\Relative Contributions of Anatomy, Stationary Dynamics, and Non-stationarities.PDF
Relative Contributions of Anatomy, Stationary Dynamics, and Non-stationarities.PDF
D:\文档\The human brain is intrinsically organized into dynamic, anticorrelated functional networks.pdf
The human brain is intrinsically organized into dynamic, anticorrelated functional networks.pdf
D:\文档\THREE.JS开发指南.pdf
THREE.JS开发指南.pdf
D:\文档\Time-resolved resting-state brain networks - 副本.pdf
Time-resolved resting-state brain networks - 副本.pdf
D:\文档\Time-resolved resting-state brain networks.pdf
Time-resolved resting-state brain networks.pdf
D:\文档\Xia_BC2011 - 副本.pdf
Xia_BC2011 - 副本.pdf
D:\文档\Xia_BC2011.pdf
Xia_BC2011.pdf
D:\文档\《数据可视化》课程设计-zzx.doc
《数据可视化》课程设计-zzx.doc
D:\文档\《计算机通信与网络工程》复习题.docx
《计算机通信与网络工程》复习题.docx
D:\文档\数据通信与计算机网络_复习题总[1].pdf
数据通信与计算机网络_复习题总[1].pdf
D:\文档\浙江大学智能图形图像第八期讲习班PPT.zip
浙江大学智能图形图像第八期讲习班PPT.zip
D:\文档\知识工程.doc
知识工程.doc
#第二种方法为:
import os
for file in os.scandir("D:\\文档"):
if file.is_file():
print(file.name)
使用 os.scandir() 比起 os.listdir() 看上去更清楚和更容易理解。对 ScandirIterator 的每一项调用 entry.isfile() ,如果返回 True 则表示这一项是一个文件。
2.文件名模式匹配
使用上述方法之一获取目录中的文件列表后,你可能希望搜索和特定的模式匹配的文件。 Python有几个内置修改和操作字符串的方法。当在匹配文件名时,其中的两个方法 .startswith() 和 .endswith() 非常有用。要做到这点,首先要获取一个目录列表,然后遍历。
import os
for file in os.listdir("D:\\文档"):
if file.endswith(".pdf"):
print(file)
06579594.pdf
1-s2.0-S1053811913004709-main.pdf
10240.full.pdf
Comparison of fluctuations in global network topology of modeled and empirical brain functional connectivity - 副本.pdf
Comparison of fluctuations in global network topology of modeled and empirical brain functional connectivity.pdf
Complex_brain_networks_Graph_theoretical_analysis_ - 副本.pdf
Complex_brain_networks_Graph_theoretical_analysis_.pdf
ENIGMA-Viewer_Interactive_visualization_strategies - 副本.pdf
ENIGMA-Viewer_Interactive_visualization_strategies.pdf
Exploring_the_Human_Connectome_Topology_in_Group_S - 副本.pdf
Exploring_the_Human_Connectome_Topology_in_Group_S.pdf
journal.pcbi.1006497.pdf
nihms320307 - 副本.pdf
nihms367529 - 副本.pdf
nihms367529.pdf
nihms482729 - 副本.pdf
nihms482729.pdf
nihms899155 - 副本.pdf
nihms899155.pdf
paper2737 - 副本.pdf
paper2737.pdf
The human brain is intrinsically organized into dynamic, anticorrelated functional networks.pdf
THREE.JS开发指南.pdf
Time-resolved resting-state brain networks - 副本.pdf
Time-resolved resting-state brain networks.pdf
Xia_BC2011 - 副本.pdf
Xia_BC2011.pdf
数据通信与计算机网络_复习题总[1].pdf
使用 fnmatch 进行简单文件名模式匹配
字符串方法匹配的能力是有限的。fnmatch 有对于模式匹配有更先进的函数和方法。我们将考虑使用 fnmatch.fnmatch() ,这是一个支持使用 * 和 ? 等通配符的函数。例如,使用 fnmatch 查找目录中所有 .pdf 文件,你可以这样做:
import os
import fnmatch
for file in os.listdir("D:\\文档"):
if fnmatch.fnmatch(file,"*.pdf"):
print(file)
得到和上图一样的结果
更先进的模式匹配:
import os
import fnmatch
for file in os.listdir("D:\\文档"):
if fnmatch.fnmatch(file,"Complex*.pdf"):
print(file)
结果如下
Complex_brain_networks_Graph_theoretical_analysis_ - 副本.pdf
Complex_brain_networks_Graph_theoretical_analysis_.pdf
模式中的 * 将匹配任何字符,因此运行这段代码则将查找文件名以Complex 开头的所有pdf文件。
3.实例
获取指定目录及其子目录下的 py 文件路径说明:l 用于存储找到的 py 文件路径 get_py 函数,递归查找并存储 py 文件路径于列表中
import os
pathname="D:\\文档"
list1=[]
def get_pdf(f_pathname,f_list):
for file in os.listdir(f_pathname):
if os.path.isdir(os.path.join(f_pathname,file)):
get_pdf(os.path.join(f_pathname,file),list1)
elif file[-4:].upper()==".PDF":
list1.append(file)
get_pdf(pathname,list1)
print(list1)
print(len(list1))