• <i id='WKczS'><tr id='WKczS'><dt id='WKczS'><q id='WKczS'><span id='WKczS'><b id='WKczS'><form id='WKczS'><ins id='WKczS'></ins><ul id='WKczS'></ul><sub id='WKczS'></sub></form><legend id='WKczS'></legend><bdo id='WKczS'><pre id='WKczS'><center id='WKczS'></center></pre></bdo></b><th id='WKczS'></th></span></q></dt></tr></i><div id='WKczS'><tfoot id='WKczS'></tfoot><dl id='WKczS'><fieldset id='WKczS'></fieldset></dl></div>

      <small id='WKczS'></small><noframes id='WKczS'>

      <legend id='WKczS'><style id='WKczS'><dir id='WKczS'><q id='WKczS'></q></dir></style></legend>

    1. <tfoot id='WKczS'></tfoot>
        <bdo id='WKczS'></bdo><ul id='WKczS'></ul>

      1. 在 Lucene 中获取词频

        时间:2023-09-30

          <legend id='dYNlP'><style id='dYNlP'><dir id='dYNlP'><q id='dYNlP'></q></dir></style></legend>

          <small id='dYNlP'></small><noframes id='dYNlP'>

          • <tfoot id='dYNlP'></tfoot>
              <tbody id='dYNlP'></tbody>
                  <bdo id='dYNlP'></bdo><ul id='dYNlP'></ul>
                • <i id='dYNlP'><tr id='dYNlP'><dt id='dYNlP'><q id='dYNlP'><span id='dYNlP'><b id='dYNlP'><form id='dYNlP'><ins id='dYNlP'></ins><ul id='dYNlP'></ul><sub id='dYNlP'></sub></form><legend id='dYNlP'></legend><bdo id='dYNlP'><pre id='dYNlP'><center id='dYNlP'></center></pre></bdo></b><th id='dYNlP'></th></span></q></dt></tr></i><div id='dYNlP'><tfoot id='dYNlP'></tfoot><dl id='dYNlP'><fieldset id='dYNlP'></fieldset></dl></div>
                  本文介绍了在 Lucene 中获取词频的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

                  问题描述

                  有没有一种快速简便的方法从 Lucene 索引中获取词频,而无需通过 TermVectorFrequencies 类来完成,因为对于大型集合来说这需要大量时间?

                  Is there a fast and easy way of getting term frequencies from a Lucene index, without doing it through the TermVectorFrequencies class, since that takes an awful lot of time for large collections?

                  我的意思是,有没有像 TermEnum 这样的东西,它不仅有文档频率,还有词频?

                  What I mean is, is there something like TermEnum which has not just the document frequency but term frequency as well?

                  更新:使用 TermDocs 太慢了.

                  UPDATE: Using TermDocs is way too slow.

                  推荐答案

                  使用TermDocs 获取给定文档的词频.与文档频率一样,您可以使用感兴趣的术语从 IndexReader 获取术语文档.

                  您不会找到比 TermDocs 更快的方法而不失一些通用性.TermDocs 直接从索引段中的.frq"文件中读取,其中每个术语频率按文档顺序列出.

                  You won't find a faster method than TermDocs without losing some generality. TermDocs reads directly from the ".frq" file in an index segment, where each term frequency is listed in document order.

                  如果这太慢",请确保您已优化索引以将多个段合并为一个段.按顺序遍历文档(跳过没问题,但不能高效地在文档列表中来回跳转).

                  If that's "too slow", make sure that you've optimized your index to merge multiple segments into a single segment. Iterate over the documents in order (skips are alright, but you can't jump back and forth in the document list efficiently).

                  您的下一步可能是进行额外处理,以创建一个更专业的文件结构,省略 SkipData.就我个人而言,我会寻找更好的算法来实现我的目标,或者提供更好的硬件——大量内存,或者保存 RAMDirectory,或者提供给操作系统以在其自己的文件缓存系统上使用.

                  Your next step might be additional processing to create an even more specialized file structure that leaves out the SkipData. Personally I would look for a better algorithm to achieve my objective, or provide better hardware—lots of memory, either to hold a RAMDirectory, or to give to the OS for use on its own file-caching system.

                  这篇关于在 Lucene 中获取词频的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持html5模板网!

                  上一篇:如何在 solr 结果中获得构面范围? 下一篇:在 lucene 中使用命中荧光笔

                  相关文章

                  最新文章

                    <tfoot id='MTOOg'></tfoot>
                    <i id='MTOOg'><tr id='MTOOg'><dt id='MTOOg'><q id='MTOOg'><span id='MTOOg'><b id='MTOOg'><form id='MTOOg'><ins id='MTOOg'></ins><ul id='MTOOg'></ul><sub id='MTOOg'></sub></form><legend id='MTOOg'></legend><bdo id='MTOOg'><pre id='MTOOg'><center id='MTOOg'></center></pre></bdo></b><th id='MTOOg'></th></span></q></dt></tr></i><div id='MTOOg'><tfoot id='MTOOg'></tfoot><dl id='MTOOg'><fieldset id='MTOOg'></fieldset></dl></div>
                      <bdo id='MTOOg'></bdo><ul id='MTOOg'></ul>

                    <small id='MTOOg'></small><noframes id='MTOOg'>

                  1. <legend id='MTOOg'><style id='MTOOg'><dir id='MTOOg'><q id='MTOOg'></q></dir></style></legend>