我正在尝试使用 Apache Lucene 进行标记,我对从 TokenStream 获取令牌的过程感到困惑.
I'm trying to use Apache Lucene for tokenizing, and I am baffled at the process to obtain Tokens from a TokenStream.
最糟糕的是,我正在查看 JavaDocs 中解决我问题的评论.
The worst part is that I'm looking at the comments in the JavaDocs that address my question.
http://lucene.apache.org/java/3_0_1/api/core/org/apache/lucene/analysis/TokenStream.html#incrementToken%28%29
不知何故,应该使用 AttributeSource,而不是 Token.我完全不知所措.
Somehow, an AttributeSource is supposed to be used, rather than Tokens. I'm totally at a loss.
谁能解释如何从 TokenStream 中获取类似令牌的信息?
Can anyone explain how to get token-like information from a TokenStream?
是的,这有点复杂(与好方法相比),但应该这样做:
Yeah, it's a little convoluted (compared to the good ol' way), but this should do it:
TokenStream tokenStream = analyzer.tokenStream(fieldName, reader);
OffsetAttribute offsetAttribute = tokenStream.getAttribute(OffsetAttribute.class);
TermAttribute termAttribute = tokenStream.getAttribute(TermAttribute.class);
while (tokenStream.incrementToken()) {
int startOffset = offsetAttribute.startOffset();
int endOffset = offsetAttribute.endOffset();
String term = termAttribute.term();
}
根据 Donotello 的说法,TermAttribute 已被弃用,取而代之的是 CharTermAttribute.根据 jpountz(和 Lucene 的文档),addAttribute 比 getAttribute 更可取.
According to Donotello, TermAttribute has been deprecated in favor of CharTermAttribute. According to jpountz (and Lucene's documentation), addAttribute is more desirable than getAttribute.
TokenStream tokenStream = analyzer.tokenStream(fieldName, reader);
OffsetAttribute offsetAttribute = tokenStream.addAttribute(OffsetAttribute.class);
CharTermAttribute charTermAttribute = tokenStream.addAttribute(CharTermAttribute.class);
tokenStream.reset();
while (tokenStream.incrementToken()) {
int startOffset = offsetAttribute.startOffset();
int endOffset = offsetAttribute.endOffset();
String term = charTermAttribute.toString();
}
这篇关于如何从 Lucene TokenStream 中获取 Token?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持html5模板网!
如何检测 32 位 int 上的整数溢出?How can I detect integer overflow on 32 bits int?(如何检测 32 位 int 上的整数溢出?)
return 语句之前的局部变量,这有关系吗?Local variables before return statements, does it matter?(return 语句之前的局部变量,这有关系吗?)
如何将整数转换为整数?How to convert Integer to int?(如何将整数转换为整数?)
如何在给定范围内创建一个随机打乱数字的 intHow do I create an int array with randomly shuffled numbers in a given range(如何在给定范围内创建一个随机打乱数字的 int 数组)
java的行为不一致==Inconsistent behavior on java#39;s ==(java的行为不一致==)
为什么 Java 能够将 0xff000000 存储为 int?Why is Java able to store 0xff000000 as an int?(为什么 Java 能够将 0xff000000 存储为 int?)