XML处理也是个常见的编程工作,虽然说在Clojure里你很少使用XML做配置文件,但是跟遗留系统集成或者处理和其他系统通讯,可能都需要处理XML。
Clojure的标准库clojure.xml就是用来干这个事情的。一个简单的例子如下,首先我们要解析的是下面这个简单的XML:
<?
xml version="1.0" encoding="UTF-8"
?>
<
books
>
<
book
>
<
title
>
The joy of clojure
</
title
>
<
author
>
Michael Fogus / Chris House
</
author
>
</
book
>
<
book
>
<
title
>
Programming clojure
</
title
>
<
author
>
Stuart Halloway
</
author
>
</
book
>
<
book
>
<
title
>
Practical clojure
</
title
>
<
author
>
Luke Van der Hart
</
author
>
</
book
>
</
books
>
解析xml用clojure.xml/parse方法即可,该方法返回一个clojure.xml/element这个struct-map组成的一棵树:
user
=>
(use
'
[clojure.xml])
nil user
=>
(parse
"
test.xml
"
) {:tag :books, :attrs nil, :content [{:tag :book, :attrs nil, :content [{:tag :title, :attrs nil, :content [
"
The joy of clojure
"
]} {:tag :author, :attrs nil, :content [
"
Michael Fogus / Chris House
"
]}]} {:tag :book, :attrs nil, :content [{:tag :title, :attrs nil, :content [
"
Programming clojure
"
]} {:tag :author, :attrs nil, :content [
"
Stuart Halloway
"
]}]} {:tag :book, :attrs nil, :content [{:tag :title, :attrs nil, :content [
"
Practical clojure
"
]} {:tag :author, :attrs nil, :content [
"
Luke Van der Hart
"
]}]}]}
这是一个嵌套的数据结构,每个节点都是clojure.xml/element结构,element包括:
(defstruct element :tag :attrs :content)
tag、attrs和content属性,tag就是该节点的标签,attrs是一个属性的map,而content是它的内容或者子节点。element是一个struct map,它也定义了三个方法来分别获取这三个属性:
user
=>
(
def
x (parse
"
test.xml
"
))
#
'user/x
user
=>
(tag x) :books user
=>
(attrs x) nil user
=>
(content x) [{:tag :book, :attrs nil, :content [{:tag :title, :attrs nil, :content [
"
The joy of clojure
"
]} {:tag :author, :attrs nil, :content [
"
Michael Fogus / Chris House
"
]}]} {:tag :book, :attrs nil, :content [{:tag :title, :attrs nil, :content [
"
Programming clojure
"
]} {:tag :author, :attrs nil, :content [
"
Stuart Halloway
"
]}]} {:tag :book, :attrs nil, :content [{:tag :title, :attrs nil, :content [
"
Practical clojure
"
]} {:tag :author, :attrs nil, :content [
"
Luke Van der Hart
"
]}]}]
books节点是root node,它的content就是三个book子节点,子节点组织成一个vector,我们可以随意操作:
user
=>
(tag (first (content x))) :book user
=>
(content (first (content x))) [{:tag :title, :attrs nil, :content [
"
The joy of clojure
"
]} {:tag :author, :attrs nil, :content [
"
Michael Fogus / Chris House
"
]}] user
=>
(content (first (content (first (content x))))) [
"
The joy of clojure
"
]
额外提下,clojure.xml是利用SAX API做解析的。同样它还有个方法,可以将解析出来的结构还原成xml,通过emit:
user=> (emit x)
<?
xml version='1.0' encoding='UTF-8'
?>
<
books
>
<
book
>
<
title
>
The joy of clojure
</
title
>
<
author
>
Michael Fogus / Chris House
</
author
>
</
book
>
<
book
>
如果你要按照深度优先的顺序遍历xml,可以利用xml-seq将解析出来的树构成一个按照深度优先顺序排列节点的LazySeq,接下来就可以按照seq的方式处理,比如利用for来过滤节点:
user=> (for [node (xml-seq x) :when (= :author (:tag node))] (first (:content node))) ("Michael Fogus / Chris House" "Stuart Halloway" "Luke Van der Hart")
通过:when指定了条件,要求节点的tag是author,这样就可以查找出所有的author节点的content,是不是很方便?就像写英语描述一样。
更进一步,如果你想操作parse解析出来的这棵树,你还可以利用clojure.zip这个标准库,它有xml-zip函数将xml转换成zipper结构,并提供一系列方法来操作这棵树:
user=>(def xz (xml-
zip x))
#
'user/xz
user=>
(node (down xz)) {:tag :book, :attrs nil, :content [{:tag :title, :attrs nil, :content [
"
The joy of clojure
"
]} {:tag :author, :attrs nil, :content [
"
Michael Fogus / Chris House
"
]}]}user
=> (->
xz down right node) {:tag :book, :attrs nil, :content [{:tag :title, :attrs nil, :content [
"
Programming clojure
"
]} {:tag :author, :attrs nil, :content [
"
Stuart Halloway
"
]}]}user
=> (->
xz down right right node) {:tag :book, :attrs nil, :content [{:tag :title, :attrs nil, :content [
"
Practical clojure
"
]} {:tag :author, :attrs nil, :content [
"
Luke Van der Hart
"
]}]}user
=> (->
xz down right right lefts) ({:tag :book, :attrs nil, :content [{:tag :title, :attrs nil, :content [
"
The joy of clojure
"
]} {:tag :author, :attrs nil, :content [
"
Michael Fogus / Chris House
"
]}]} {:tag :book, :attrs nil, :content [{:tag :title, :attrs nil, :content [
"
Programming clojure
"
]} {:tag :author, :attrs nil, :content [
"
Stuart Halloway
"
]}]})
是不是酷得一塌糊涂?可以通过up,down,left,right,lefts,rights,来查找节点的邻近节点,可以通过node来得到节点本身。一切显得那么自然。更进一步,你还可以“编辑“这棵树,比如删除The joy of clojure这本书:
user
=>
(
def
loc
-
in
-
new
-
tree (remove (down xz)))
#
'user/loc-in-new-tree
user
=>
(root loc
-
in
-
new
-
tree) {:tag :books, :attrs nil, :content [{:tag :book, :attrs nil, :content [{:tag :title, :attrs nil, :content [
"
Programming clojure
"
]} {:tag :author, :attrs nil, :content [
"
Stuart Halloway
"
]}]} {:tag :book, :attrs nil, :content [{:tag :title, :attrs nil, :content [
"
Practical clojure
"
]} {:tag :author, :attrs nil, :content [
"
Luke Van der Hart
"
]}]}]}
ok,只剩下两本书了,更多方法还包括replace做替换,edit更改节点等。因此编辑XML并重新生成,你一般可以利用clojure.zip来更改树,最后利用clojure.xml/emit将更改还原为xml。
生成xml除了emit方法,还有一个contrib库,也就是
prxml,这个库的clojure 1.3版本有人维护了一个分支,在
这里。主要方法就是prxml,它可以将clojure的数据结构转换成xml:
user
=>
(prxml [:p [:raw!
"
<i>here & gone</i>
"
]])
<
p
><
i
>
here
&
gone
</
i
></
p
>
显然,它也可以用于生成HTML。 xpath的支持可以使用clj-xpath这个开源库,遗憾的是它目前仅支持clojure 1.2。
文章转自庄周梦蝶 ,原文发布时间2012-02-18
相关资源:敏捷开发V1.0.pptx