• <tfoot id='NjUb6'></tfoot>

    1. <small id='NjUb6'></small><noframes id='NjUb6'>

        <i id='NjUb6'><tr id='NjUb6'><dt id='NjUb6'><q id='NjUb6'><span id='NjUb6'><b id='NjUb6'><form id='NjUb6'><ins id='NjUb6'></ins><ul id='NjUb6'></ul><sub id='NjUb6'></sub></form><legend id='NjUb6'></legend><bdo id='NjUb6'><pre id='NjUb6'><center id='NjUb6'></center></pre></bdo></b><th id='NjUb6'></th></span></q></dt></tr></i><div id='NjUb6'><tfoot id='NjUb6'></tfoot><dl id='NjUb6'><fieldset id='NjUb6'></fieldset></dl></div>
          <bdo id='NjUb6'></bdo><ul id='NjUb6'></ul>
        <legend id='NjUb6'><style id='NjUb6'><dir id='NjUb6'><q id='NjUb6'></q></dir></style></legend>

        Lucene 4.0 中的词频

        时间:2023-09-29
          <tbody id='3GpZ0'></tbody>

          <tfoot id='3GpZ0'></tfoot>
          <legend id='3GpZ0'><style id='3GpZ0'><dir id='3GpZ0'><q id='3GpZ0'></q></dir></style></legend>
          <i id='3GpZ0'><tr id='3GpZ0'><dt id='3GpZ0'><q id='3GpZ0'><span id='3GpZ0'><b id='3GpZ0'><form id='3GpZ0'><ins id='3GpZ0'></ins><ul id='3GpZ0'></ul><sub id='3GpZ0'></sub></form><legend id='3GpZ0'></legend><bdo id='3GpZ0'><pre id='3GpZ0'><center id='3GpZ0'></center></pre></bdo></b><th id='3GpZ0'></th></span></q></dt></tr></i><div id='3GpZ0'><tfoot id='3GpZ0'></tfoot><dl id='3GpZ0'><fieldset id='3GpZ0'></fieldset></dl></div>
            <bdo id='3GpZ0'></bdo><ul id='3GpZ0'></ul>

              <small id='3GpZ0'></small><noframes id='3GpZ0'>

                1. 本文介绍了Lucene 4.0 中的词频的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

                  问题描述

                  尝试使用 Lucene 4.0 计算词频.我的文档频率工作得很好,但不知道如何使用 API 来做词频.这是我的代码:

                  Trying to calculate term frequency using Lucene 4.0. I got document frequency working just fine, but can't figure out how to do term frequency using the API. Here's the code I have:

                  private static void addDoc(IndexWriter writer, String content) throws IOException {
                      FieldType fieldType = new FieldType();
                      fieldType.setStoreTermVectors(true);
                      fieldType.setStoreTermVectorPositions(true);
                      fieldType.setIndexed(true);
                      fieldType.setIndexOptions(IndexOptions.DOCS_AND_FREQS);
                      fieldType.setStored(true);
                      Document doc = new Document();
                      doc.add(new Field("content", content, fieldType));
                      writer.addDocument(doc);
                  }
                  
                  public static void main(String[] args) throws IOException, ParseException {
                      Directory directory = new RAMDirectory();  
                      Analyzer analyzer = new WhitespaceAnalyzer(Version.LUCENE_40);
                      IndexWriterConfig config = new IndexWriterConfig(Version.LUCENE_40, analyzer);
                      IndexWriter writer = new IndexWriter(directory, config);
                      addDoc(writer, "Lucene is stupid");
                      addDoc(writer, "Java is great");
                      writer.close();
                      IndexReader reader = DirectoryReader.open(directory);
                      System.out.println(reader.docFreq(new Term("content", "Lucene")));
                      reader.close();
                  }
                  

                  我尝试过执行类似 reader.getTermVector(0, "content")... 的操作,但找不到仅获取该文档中特定术语频率的方法.

                  I've tried doing something like reader.getTermVector(0, "content")... but can't find a method to just get the frequency of a particular term in that document.

                  谢谢!

                  推荐答案

                  K,想通了.您可以从 MultiFields 获取 DocsEnum 对象,然后对其进行迭代.

                  K, figured it out. You can get a DocsEnum object from MultiFields, and then iterate over that.

                  private static void addDoc(IndexWriter writer, String content) throws IOException {
                      FieldType fieldType = new FieldType();
                      fieldType.setStoreTermVectors(true);
                      fieldType.setStoreTermVectorPositions(true);
                      fieldType.setIndexed(true);
                      fieldType.setIndexOptions(IndexOptions.DOCS_AND_FREQS);
                      fieldType.setStored(true);
                      Document doc = new Document();
                      doc.add(new Field("content", content, fieldType));
                      writer.addDocument(doc);
                  }
                  
                  public static void main(String[] args) throws IOException, ParseException {
                      Directory directory = new RAMDirectory();  
                      Analyzer analyzer = new WhitespaceAnalyzer(Version.LUCENE_40);
                      IndexWriterConfig config = new IndexWriterConfig(Version.LUCENE_40, analyzer);
                      IndexWriter writer = new IndexWriter(directory, config);
                      addDoc(writer, "bla bla bla bleu bleu");
                      addDoc(writer, "bla bla bla bla");
                      writer.close();
                      DirectoryReader reader = DirectoryReader.open(directory);
                      DocsEnum de = MultiFields.getTermDocsEnum(reader, MultiFields.getLiveDocs(reader), "content", new BytesRef("bla"));
                      int doc;
                      while((doc = de.nextDoc()) != DocsEnum.NO_MORE_DOCS) {
                            System.out.println(de.freq());
                      }
                      reader.close();
                  }
                  

                  这篇关于Lucene 4.0 中的词频的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持html5模板网!

                  上一篇:Lucene - 检索文档中多值字段的所有值 下一篇:如何对 Solr 中的多个字段执行嵌套聚合?

                  相关文章

                  最新文章

                2. <legend id='fACZm'><style id='fACZm'><dir id='fACZm'><q id='fACZm'></q></dir></style></legend>
                  <tfoot id='fACZm'></tfoot>

                    1. <i id='fACZm'><tr id='fACZm'><dt id='fACZm'><q id='fACZm'><span id='fACZm'><b id='fACZm'><form id='fACZm'><ins id='fACZm'></ins><ul id='fACZm'></ul><sub id='fACZm'></sub></form><legend id='fACZm'></legend><bdo id='fACZm'><pre id='fACZm'><center id='fACZm'></center></pre></bdo></b><th id='fACZm'></th></span></q></dt></tr></i><div id='fACZm'><tfoot id='fACZm'></tfoot><dl id='fACZm'><fieldset id='fACZm'></fieldset></dl></div>
                        <bdo id='fACZm'></bdo><ul id='fACZm'></ul>

                    2. <small id='fACZm'></small><noframes id='fACZm'>