所以我目前正在使用 SAX 尝试从我正在处理的许多 xml 文档中提取一些信息.到目前为止,提取属性值真的很容易.但是,我不知道如何从文本节点中提取实际值.
So I am currently using SAX to try and extract some information from a a number of xml documents I am working from. Thus far, it is really easy to extract the attribute values. However, I have no clue how to go about extracting actual values from a text node.
例如,在给定的 XML 文档中:
For example, in the given XML document:
<w:rStyle w:val="Highlight" />
</w:rPr>
</w:pPr>
- <w:r>
<w:t>Text to Extract</w:t>
</w:r>
</w:p>
- <w:p w:rsidR="00B41602" w:rsidRDefault="00B41602" w:rsidP="007C3A42">
- <w:pPr>
<w:pStyle w:val="Copy" />
通过从 val 获取值,我可以毫无问题地提取突出显示".但我不知道如何进入该文本节点并退出要提取的文本".
I can extract "Highlight" no problem by getting the value from val. But I have no idea how to get into that text node and get out "Text to Extract".
这是我迄今为止提取属性值的 Java 代码...
private static final class SaxHandler extends DefaultHandler
{
// invoked when document-parsing is started:
public void startDocument() throws SAXException
{
System.out.println("Document processing starting:");
}
// notifies about finish of parsing:
public void endDocument() throws SAXException
{
System.out.println("Document processing finished.
");
}
// we enter to element 'qName':
public void startElement(String uri, String localName,
String qName, Attributes attrs) throws SAXException
{
if(qName.equalsIgnoreCase("Relationships"))
{
// do nothing
}
else if(qName.equalsIgnoreCase("Relationship"))
{
// goes into the element and if the attribute is equal to "Target"...
String val = attrs.getValue("Target");
// ...and the value is not null
if(val != null)
{
// ...and if the value contains "image" in it...
if (val.contains("image"))
{
// ...then get the id value
String id = attrs.getValue("Id");
// ...and use the substring method to isolate and print out only the image & number
int begIndex = val.lastIndexOf("/");
int endIndex = val.lastIndexOf(".");
System.out.println("Id: " + id + " & Target: " + val.substring(begIndex+1, endIndex));
}
}
}
else
{
throw new IllegalArgumentException("Element '" +
qName + "' is not allowed here");
}
}
// we leave element 'qName' without any actions:
public void endElement(String uri, String localName, String qName) throws SAXException
{
// do nothing;
}
}
但我不知道从哪里开始进入该文本节点并提取其中的值.有人有什么想法吗?
But I have no clue where to start to get into that text node and pull out the values inside. Anyone have some ideas?
下面是一些伪代码:
private boolean insideElementContainingTextNode;
private StringBuilder textBuilder;
public void startElement(String uri, String localName, String qName, Attributes attrs) {
if ("w:t".equals(qName)) { // or is it localName?
insideElementContainingTextNode = true;
textBuilder = new StringBuilder();
}
}
public void characters(char[] ch, int start, int length) {
if (insideElementContainingTextNode) {
textBuilder.append(ch, start, length);
}
}
public void endElement(String uri, String localName, String qName) {
if ("w:t".equals(qName)) { // or is it localName?
insideElementContainingTextNode = false;
String theCompleteText = this.textBuilder.toString();
this.textBuilder = null;
}
}
这篇关于在 JAVA 中使用 SAX 解析器从 XML 文件中提取文本节点的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持html5模板网!
上传进度侦听器未触发(Google 驱动器 API)Upload progress listener not fired (Google drive API)(上传进度侦听器未触发(Google 驱动器 API))
使用 Google Drive SDK 将文件保存在特定文件夹中Save file in specific folder with Google Drive SDK(使用 Google Drive SDK 将文件保存在特定文件夹中)
Google Drive Android API - 无效的 DriveId 和 Null ResourcGoogle Drive Android API - Invalid DriveId and Null ResourceId(Google Drive Android API - 无效的 DriveId 和 Null ResourceId)
谷歌驱动api服务账户查看上传文件到谷歌驱动使Google drive api services account view uploaded files to google drive using java(谷歌驱动api服务账户查看上传文件到谷歌驱动使用java
Google Drive 服务帐号返回 403 usageLimitsGoogle Drive service account returns 403 usageLimits(Google Drive 服务帐号返回 403 usageLimits)
com.google.api.client.json.jackson.JacksonFactory;Google Drcom.google.api.client.json.jackson.JacksonFactory; missing in Google Drive example(com.google.api.client.json.jackson.JacksonFactory;Google Drive 示例