我是 xml 解析的新手.此 xml 文件 具有以下树:
I am new to xml parsing. This xml file has the following tree:
FHRSEstablishment
|--> Header
| |--> ...
|--> EstablishmentCollection
| |--> EstablishmentDetail
| | |-->...
| |--> Scores
| | |-->...
|--> EstablishmentCollection
| |--> EstablishmentDetail
| | |-->...
| |--> Scores
| | |-->...
但是当我使用 ElementTree 访问它并查找 child 标记和属性时,
but when I access it with ElementTree and look for the child tags and attributes,
import xml.etree.ElementTree as ET
import urllib2
tree = ET.parse(
file=urllib2.urlopen('http://ratings.food.gov.uk/OpenDataFiles/FHRS408en-GB.xml' % i))
root = tree.getroot()
for child in root:
print child.tag, child.attrib
我只得到:
Header {}
EstablishmentCollection {}
我认为这意味着它们的属性是空的.为什么会这样,如何访问嵌套在 EstablishmentDetail 和 Scores 中的子级?
which I assume means that their attributes are empty. Why is it so, and how can I access the children nested inside EstablishmentDetail and Scores?
编辑
感谢下面的答案,我可以进入树内,但是如果我想检索诸如 Scores 中的值,这将失败:
Thanks to the answers below I can get inside the tree, but if I want to retrieve values such as those in Scores, this fails:
for node in root.find('.//EstablishmentDetail/Scores'):
rating = node.attrib.get('Hygiene')
print rating
并产生
None
None
None
这是为什么呢?
你必须在你的根目录上迭代().
Yo have to iter() over your root.
那就是 root.iter() 可以解决问题!
that is root.iter() would do the trick!
import xml.etree.ElementTree as ET
import urllib2
tree =ET.parse(urllib2.urlopen('http://ratings.food.gov.uk/OpenDataFiles/FHRS408en-GB.xml'))
root = tree.getroot()
for child in root.iter():
print child.tag, child.attrib
输出:
FHRSEstablishment {}
Header {}
ExtractDate {}
ItemCount {}
ReturnCode {}
EstablishmentCollection {}
EstablishmentDetail {}
FHRSID {}
LocalAuthorityBusinessID {}
...
EstablishmentDetail 中的所有标签,您需要找到该标签,然后遍历其子标签!
EstablishmentDetail you need to find that tag and then loop through its children!也就是说,例如.
for child in root.find('.//EstablishmentDetail'):
print child.tag, child.attrib
输出:
FHRSID {}
LocalAuthorityBusinessID {}
BusinessName {}
BusinessType {}
BusinessTypeID {}
RatingValue {}
RatingKey {}
RatingDate {}
LocalAuthorityCode {}
LocalAuthorityName {}
LocalAuthorityWebSite {}
LocalAuthorityEmailAddress {}
Scores {}
SchemeType {}
NewRatingPending {}
Geocode {}
Hygiene 的分数,您所做的是,它将获得第一个 Scores 标签,并且当您在 root.find('.//Scores'):rating=child.get('Hygiene').也就是说,显然所有三个孩子都不会有元素!
What you have done is, it will get the first Scores tag and that will have Hygiene, ConfidenceInManagement, Structural tags as child when you call for each in root.find('.//Scores'):rating=child.get('Hygiene'). That is, obviously all three child will not have the element!
你需要先- 查找所有 Scores 标签.- 在找到的每个标签中找到Hygiene!
You need to first
- find all Scores tag.
- find Hygiene in every tags found!
for each in root.findall('.//Scores'):
rating = each.find('.//Hygiene')
print '' if rating is None else rating.text
输出:
5
5
5
0
5
这篇关于访问使用 ElementTree 解析的 xml 文件中的嵌套子项的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持html5模板网!
使用 python 解析非常大的 xml 文件时出现问题Troubles while parsing with python very large xml file(使用 python 解析非常大的 xml 文件时出现问题)
使用 Python 2 在 XML 中按属性查找所有节点Find all nodes by attribute in XML using Python 2(使用 Python 2 在 XML 中按属性查找所有节点)
Python - 如何解析 xml 响应并将元素值存储在变量中Python - How to parse xml response and store a elements value in a variable?(Python - 如何解析 xml 响应并将元素值存储在变量中?)
如何在 Python 中获取 XML 标记值How to get XML tag value in Python(如何在 Python 中获取 XML 标记值)
如何使用 ElementTree 正确解析 utf-8 xml?How to correctly parse utf-8 xml with ElementTree?(如何使用 ElementTree 正确解析 utf-8 xml?)
将 XML 从 URL 解析为 python 对象Parse XML from URL into python object(将 XML 从 URL 解析为 python 对象)