使用ElementTree对xml文件进行操作

xiaoxiao2022-07-13 154

需求：PASCAL VOC数据集的标签文件为XML文件，需要根据所做的课题对其进行更改，因此学习了一些最基础的XML操作方法进行操作。 Python提供了三种方法对XML进行操作：SAX、DOM、ElementTree，其中ElementTree是最简单最轻便的方法。在这个方法中，把xml文件当成了一颗树，几个基本的操作就可以很好的对其中的数据进行读取、更改。

原始XML文件

<annotation> <folder>VOC2007</folder> <filename>000009.jpg</filename> <source> <database>The VOC2007 Database</database> <annotation>PASCAL VOC2007</annotation> <image>flickr</image> <flickrid>325443404</flickrid> </source> <owner> <flickrid>autox4u</flickrid> <name>Perry Aidelbaum</name> </owner> <size> <width>500</width> <height>375</height> <depth>3</depth> </size> <segmented>0</segmented> <object> <name>horse</name> <pose>Right</pose> <truncated>0</truncated> <difficult>0</difficult> <bndbox> <xmin>69</xmin> <ymin>172</ymin> <xmax>270</xmax> <ymax>330</ymax> </bndbox> </object> <object> <name>person</name> <pose>Right</pose> <truncated>0</truncated> <difficult>0</difficult> <bndbox> <xmin>150</xmin> <ymin>141</ymin> <xmax>229</xmax> <ymax>284</ymax> </bndbox> </object> <object> <name>person</name> <pose>Right</pose> <truncated>0</truncated> <difficult>0</difficult> <bndbox> <xmin>285</xmin> <ymin>201</ymin> <xmax>327</xmax> <ymax>331</ymax> </bndbox> </object> <object> <name>person</name> <pose>Left</pose> <truncated>0</truncated> <difficult>0</difficult> <bndbox> <xmin>258</xmin> <ymin>198</ymin> <xmax>297</xmax> <ymax>329</ymax> </bndbox> </object> </annotation>

需求

需要把object标签下的非"person"种类的物体删去。

节点基本结构

<tag attrib = > text </tag> tail 例：<APP_KEY channel = ''> hello123456789 </APP_KEY> tag，即标签，用于标识该元素表示哪种数据，即APP_KEYattrib，即属性，用Dictionary形式保存，即{‘channel’ = ‘’}text，文本字符串，可以用来存储一些数据，即hello123456789tail，尾字符串，并不是必须的，例子中没有包含。

主要就是根据树的形状对节点遍历访问，然后根据node.tag、node.attrib、node.text的值进行操作

代码

from xml.etree.ElementTree import Element from xml.etree import ElementTree as ET tree = ET.parse('000009.xml') #加载xml文件 root = tree.getroot() #获取根节点 ob_node = root.findall('object') #找出根节点下所有标签为object的项， #返回一个list，find()是找出第一个项 for node in ob_node: obj = node.find('name') if (obj.text != 'person'): root.remove(node) #删除节点 # print(obj.tag, obj.attrib, obj.text) tree.write('000009.xml') #覆盖原始文件或新创建文件

最新回复(0)