1. <small id='iLmWi'></small><noframes id='iLmWi'>

      2. <legend id='iLmWi'><style id='iLmWi'><dir id='iLmWi'><q id='iLmWi'></q></dir></style></legend>

          <bdo id='iLmWi'></bdo><ul id='iLmWi'></ul>
        <i id='iLmWi'><tr id='iLmWi'><dt id='iLmWi'><q id='iLmWi'><span id='iLmWi'><b id='iLmWi'><form id='iLmWi'><ins id='iLmWi'></ins><ul id='iLmWi'></ul><sub id='iLmWi'></sub></form><legend id='iLmWi'></legend><bdo id='iLmWi'><pre id='iLmWi'><center id='iLmWi'></center></pre></bdo></b><th id='iLmWi'></th></span></q></dt></tr></i><div id='iLmWi'><tfoot id='iLmWi'></tfoot><dl id='iLmWi'><fieldset id='iLmWi'></fieldset></dl></div>

        <tfoot id='iLmWi'></tfoot>

        如何从 Lucene TokenStream 中获取 Token?

        时间:2023-09-27

        <small id='OlXbr'></small><noframes id='OlXbr'>

      3. <legend id='OlXbr'><style id='OlXbr'><dir id='OlXbr'><q id='OlXbr'></q></dir></style></legend>
            <bdo id='OlXbr'></bdo><ul id='OlXbr'></ul>

              <tfoot id='OlXbr'></tfoot>

                <tbody id='OlXbr'></tbody>

                <i id='OlXbr'><tr id='OlXbr'><dt id='OlXbr'><q id='OlXbr'><span id='OlXbr'><b id='OlXbr'><form id='OlXbr'><ins id='OlXbr'></ins><ul id='OlXbr'></ul><sub id='OlXbr'></sub></form><legend id='OlXbr'></legend><bdo id='OlXbr'><pre id='OlXbr'><center id='OlXbr'></center></pre></bdo></b><th id='OlXbr'></th></span></q></dt></tr></i><div id='OlXbr'><tfoot id='OlXbr'></tfoot><dl id='OlXbr'><fieldset id='OlXbr'></fieldset></dl></div>

                1. 本文介绍了如何从 Lucene TokenStream 中获取 Token?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

                  问题描述

                  我正在尝试使用 Apache Lucene 进行标记,我对从 TokenStream 获取令牌的过程感到困惑.

                  I'm trying to use Apache Lucene for tokenizing, and I am baffled at the process to obtain Tokens from a TokenStream.

                  最糟糕的是,我正在查看 JavaDocs 中解决我问题的评论.

                  The worst part is that I'm looking at the comments in the JavaDocs that address my question.

                  http://lucene.apache.org/java/3_0_1/api/core/org/apache/lucene/analysis/TokenStream.html#incrementToken%28%29

                  不知何故,应该使用 AttributeSource,而不是 Token.我完全不知所措.

                  Somehow, an AttributeSource is supposed to be used, rather than Tokens. I'm totally at a loss.

                  谁能解释如何从 TokenStream 中获取类似令牌的信息?

                  Can anyone explain how to get token-like information from a TokenStream?

                  推荐答案

                  是的,这有点复杂(与好方法相比),但应该这样做:

                  Yeah, it's a little convoluted (compared to the good ol' way), but this should do it:

                  TokenStream tokenStream = analyzer.tokenStream(fieldName, reader);
                  OffsetAttribute offsetAttribute = tokenStream.getAttribute(OffsetAttribute.class);
                  TermAttribute termAttribute = tokenStream.getAttribute(TermAttribute.class);
                  
                  while (tokenStream.incrementToken()) {
                      int startOffset = offsetAttribute.startOffset();
                      int endOffset = offsetAttribute.endOffset();
                      String term = termAttribute.term();
                  }
                  

                  方式

                  根据 Donotello 的说法,TermAttribute 已被弃用,取而代之的是 CharTermAttribute.根据 jpountz(和 Lucene 的文档),addAttributegetAttribute 更可取.

                  The new way

                  According to Donotello, TermAttribute has been deprecated in favor of CharTermAttribute. According to jpountz (and Lucene's documentation), addAttribute is more desirable than getAttribute.

                  TokenStream tokenStream = analyzer.tokenStream(fieldName, reader);
                  OffsetAttribute offsetAttribute = tokenStream.addAttribute(OffsetAttribute.class);
                  CharTermAttribute charTermAttribute = tokenStream.addAttribute(CharTermAttribute.class);
                  
                  tokenStream.reset();
                  while (tokenStream.incrementToken()) {
                      int startOffset = offsetAttribute.startOffset();
                      int endOffset = offsetAttribute.endOffset();
                      String term = charTermAttribute.toString();
                  }
                  

                  这篇关于如何从 Lucene TokenStream 中获取 Token?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持html5模板网!

                  上一篇:如何判断 Selenium for Java 中是否选中了一个复选框 下一篇:如何在Java中循环一个类属性?

                  相关文章

                  最新文章

                2. <small id='rIOIz'></small><noframes id='rIOIz'>

                  • <bdo id='rIOIz'></bdo><ul id='rIOIz'></ul>
                  <tfoot id='rIOIz'></tfoot>
                  <i id='rIOIz'><tr id='rIOIz'><dt id='rIOIz'><q id='rIOIz'><span id='rIOIz'><b id='rIOIz'><form id='rIOIz'><ins id='rIOIz'></ins><ul id='rIOIz'></ul><sub id='rIOIz'></sub></form><legend id='rIOIz'></legend><bdo id='rIOIz'><pre id='rIOIz'><center id='rIOIz'></center></pre></bdo></b><th id='rIOIz'></th></span></q></dt></tr></i><div id='rIOIz'><tfoot id='rIOIz'></tfoot><dl id='rIOIz'><fieldset id='rIOIz'></fieldset></dl></div>

                  <legend id='rIOIz'><style id='rIOIz'><dir id='rIOIz'><q id='rIOIz'></q></dir></style></legend>