XML 相关的知识可以在 w3school 上进行了解。
假设你已经对 XML 的基本概念有一些了解,本文介绍了 Python lxml 模块对 XML 的一些操作方法。
lxml is the most feature-rich and easy-to-use library for processing XML and HTML in the Python language.
文中所有代码块都是基于 etree
来操作的,代码同一行后的注释为运行结果。
Element 类
1 2 3 4 5 6 7 8 9 10 11 12 13
| root = etree.Element("ROOT") print(root.tag)
root.append(etree.Element("child1"))
child2 = etree.SubElement(root,"child2") child3 = etree.SubElement(root,"child3")
print(etree.tostring(root)) print(type(root))
|
Element 类可以作为列表来操作
1 2 3 4
| print(len(root)) for i in root: print(i.tag)
|
Element 类可以使用字典来携带属性
1 2 3 4 5 6 7
| root = etree.Element("ROOT",test1="18",test2="test2") print(etree.tostring(root))
print(root.get("test1"))
print(root.keys(),root.values())
|
给 Element 写入值
1 2 3 4 5 6 7 8 9 10 11 12
|
root.text = "TEXT" print(root.text) print(etree.tostring(root))
root = etree.Element("root") etree.SubElement(root, "child").text = "Child 1" etree.SubElement(root, "child").text = "Child 2" etree.SubElement(root, "another").text = "Child 3"
|
使用 XPATH 来查找值
w3school XPATH 语法
1 2
| print(root.xpath("string()")) print(root.xpath("//text()"))
|
序列化
把字符串转换为 Element 对象
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
| xml = """<top> <Ifmgr> <Interfaces> <Interface> <Name></Name> <AdminStatus></AdminStatus> </Interface> </Interfaces> </Ifmgr> </top>"""
data = etree.XML(xml) print(type(data))
data = etree.fromstring(xml) print(type(data))
|
Namaspaces(命名空间)
在 XML 中,元素名称是由开发者定义的,当两个不同的文档使用相同的元素名时,就会发生命名冲突。w3school 命名空间
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
| root = etree.Element("{http://www.h3c.com/netconf/data:1.0}top") print(etree.tostring(root))
Ifmgr = etree.SubElement(root, "{http://www.h3c.com/netconf/data:1.0}Ifmgr") Ifmgr.text = "G0/0" print(etree.tostring(root))
H3C_DATA_1_0 = "http://www.h3c.com/netconf/data:1.0" FULL_NS = "{%s}" %H3C_DATA_1_0
root = etree.Element(FULL_NS + "top", nsmap={None:H3C_DATA_1_0}) Ifmgr = etree.SubElement(root, "Ifmgr") Ifmgr.text = "G0/0"
print(etree.tostring(root))
|
使用 E-factory 替代 Element 和 SubElement 来快速生成 xml
1 2 3 4 5 6 7 8 9 10 11 12
| from lxml.builder import ElementMaker
E = ElementMaker(namespace=H3C_DATA_1_0,nsmap={None:H3C_DATA_1_0})
top = E.top( E.Ifmgr( E.Interfaces( E.Interface() ) ) ) print(etree.tostring(top))
|
ElementPath 查找元素
提供四种查找方法
1 2 3 4
| find() # 返回第一个匹配项,未找到则返回 None findtext() # 返回第一个匹配项的 text findall() # 返回所有匹配项列表 iterfind() # 返回所有匹配项的迭代器
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
| ret_no_ns = """<?xml version="1.0" encoding="UTF-8"?> <rpc-reply> <data> <top> <Ifmgr> <Interfaces> <Interface><IfIndex>1</IfIndex><Name>GigabitEthernet0/0</Name><AdminStatus>1</AdminStatus></Interface> <Interface><IfIndex>2</IfIndex><Name>GigabitEthernet0/1</Name><AdminStatus>1</AdminStatus></Interface> <Interface><IfIndex>129</IfIndex><Name>NULL0</Name><AdminStatus>1</AdminStatus></Interface> <Interface><IfIndex>130</IfIndex><Name>InLoopBack0</Name><AdminStatus>1</AdminStatus></Interface> </Interfaces></Ifmgr></top></data></rpc-reply>"""
ret = etree.XML(ret_no_ns.encode())
print(ret.find("data")) print(ret.find("data").tag)
print(ret.find(".//Name").tag)
print(ret.findtext(".//Name")) print(ret.find(".//Name").text)
print(ret.findall(".//Name"))
print(type(ret.iterfind("../Name")))
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21
| ret = """<?xml version="1.0" encoding="UTF-8"?> <rpc-reply xmlns="urn:ietf:params:xml:ns:netconf:base:1.0" message-id="urn:uuid:c94f3285-d747-4e19-abd3-efbe281e3133"> <data> <top xmlns="http://www.h3c.com/netconf/data:1.0"> <Ifmgr> <Interfaces> <Interface><IfIndex>1</IfIndex><Name>GigabitEthernet0/0</Name><AdminStatus>1</AdminStatus></Interface> <Interface><IfIndex>2</IfIndex><Name>GigabitEthernet0/1</Name><AdminStatus>1</AdminStatus></Interface> <Interface><IfIndex>129</IfIndex><Name>NULL0</Name><AdminStatus>1</AdminStatus></Interface> <Interface><IfIndex>130</IfIndex><Name>InLoopBack0</Name><AdminStatus>1</AdminStatus></Interface> </Interfaces></Ifmgr></top></data></rpc-reply>"""
ret = etree.fromstring(ret.encode())
data = ret.findall('.//{http://www.h3c.com/netconf/data:1.0}Name')
IfList = [i.text for i in data]
print(IfList)
|