<i id='mO7Dm'><tr id='mO7Dm'><dt id='mO7Dm'><q id='mO7Dm'><span id='mO7Dm'><b id='mO7Dm'><form id='mO7Dm'><ins id='mO7Dm'></ins><ul id='mO7Dm'></ul><sub id='mO7Dm'></sub></form><legend id='mO7Dm'></legend><bdo id='mO7Dm'><pre id='mO7Dm'><center id='mO7Dm'></center></pre></bdo></b><th id='mO7Dm'></th></span></q></dt></tr></i><div id='mO7Dm'><tfoot id='mO7Dm'></tfoot><dl id='mO7Dm'><fieldset id='mO7Dm'></fieldset></dl></div>
  1. <legend id='mO7Dm'><style id='mO7Dm'><dir id='mO7Dm'><q id='mO7Dm'></q></dir></style></legend>
  2. <small id='mO7Dm'></small><noframes id='mO7Dm'>

    <tfoot id='mO7Dm'></tfoot>
      • <bdo id='mO7Dm'></bdo><ul id='mO7Dm'></ul>

      如何从 Lucene 中的文档术语向量中获取位置?

      时间:2023-09-30

          <bdo id='PIN7d'></bdo><ul id='PIN7d'></ul>

            <tbody id='PIN7d'></tbody>
        • <i id='PIN7d'><tr id='PIN7d'><dt id='PIN7d'><q id='PIN7d'><span id='PIN7d'><b id='PIN7d'><form id='PIN7d'><ins id='PIN7d'></ins><ul id='PIN7d'></ul><sub id='PIN7d'></sub></form><legend id='PIN7d'></legend><bdo id='PIN7d'><pre id='PIN7d'><center id='PIN7d'></center></pre></bdo></b><th id='PIN7d'></th></span></q></dt></tr></i><div id='PIN7d'><tfoot id='PIN7d'></tfoot><dl id='PIN7d'><fieldset id='PIN7d'></fieldset></dl></div>
            • <legend id='PIN7d'><style id='PIN7d'><dir id='PIN7d'><q id='PIN7d'></q></dir></style></legend>
            • <tfoot id='PIN7d'></tfoot>

              <small id='PIN7d'></small><noframes id='PIN7d'>

                本文介绍了如何从 Lucene 中的文档术语向量中获取位置?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

                问题描述

                我需要遍历 Lucene 索引中的所有文档,并获取每个术语在每个文档中出现的位置.据我能够从 Lucene javadoc 中了解到,这样做的方法是执行以下操作:

                I need to iterate over all documents in a Lucene index, and obtain the positions at which each term occurs in each document. As far as I am able to understand from the Lucene javadoc, the way to do this is to do something like this:

                IndexReader ir = obtainIndexReader();
                Terms tv = ir.getTermVector( doc, field );
                TermsEnum terms = tv.iterator();
                PostingsEnum p = null;
                while( terms.next() != null ) {
                    p = terms.postings( p, PostingsEnum.ALL );
                    while( p.nextDoc() != PostingsEnum.NO_MORE_DOCS ) {
                        int freq = p.freq();
                        for( int i = 0; i < freq; i++ ) {
                            int pos = p.nextPosition();   // Always returns -1!!!
                            BytesRef data = p.getPayload();
                            doStuff( freq, pos, data ); // Fails miserably, of course.
                        }
                    }
                }
                

                但是,即使 (1) 索引确实包含相关字段上的位置,并且 (2) 术语向量声称具有位置(即:tv.hasPositions() == true),我仍然得到-1" 适用于所有职位.

                However, even though (1) the index does indeed include positions on the relevant field and (2) the term vector claims to have positions (i.e.: tv.hasPositions() == true), I keep getting "-1" for all positions.

                首先,我是不是做错了什么?是否有另一种方法可以在每个文档的基础上迭代过帐?第二:到底发生了什么?该索引包含位置,getTermVector 返回的术语实例声称包含位置,并且我正在查看 Luke 中的正确位置值,但是当我尝试在我的代码中访问所述值时仍然得到 -1.什么给了?

                First, am I doing something wrong? Is there an alternative way of iterating over postings on a per-document basis? Second: What is going on anyway? The index contains positions, the Terms instance returned by getTermVector claims to include positions, and I'm looking at the correct position values in Luke, yet I still get -1 when I try to access said values in my code. What gives?

                相关字段配置有以下选项:

                The relevant field was configured with the following options:

                    FieldType ft = new FieldType();
                    ft.setIndexOptions( IndexOptions.DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS );
                    ft.setStoreTermVectors( true );
                    ft.setStoreTermVectorOffsets( true );
                    ft.setStoreTermVectorPayloads( true );
                    ft.setStoreTermVectorPositions( true );
                    ft.setTokenized( true );
                    return ft;
                

                推荐答案

                您是否在索引时为您的字段类型设置了 FieldType.setStoreTermVectorPositions(true)?http://lucene.apache.org/core/5_5_0/core/org/apache/lucene/document/FieldType.html#setStoreTermVectorPositions(boolean)

                Did you set FieldType.setStoreTermVectorPositions(true) on your field type at index time? http://lucene.apache.org/core/5_5_0/core/org/apache/lucene/document/FieldType.html#setStoreTermVectorPositions(boolean)

                这篇关于如何从 Lucene 中的文档术语向量中获取位置?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持html5模板网!

                上一篇:Java Lucene 4.5如何按不区分大小写进行搜索 下一篇:如何在日期之间搜索(休眠搜索)?

                相关文章

                最新文章

              1. <i id='aGhZW'><tr id='aGhZW'><dt id='aGhZW'><q id='aGhZW'><span id='aGhZW'><b id='aGhZW'><form id='aGhZW'><ins id='aGhZW'></ins><ul id='aGhZW'></ul><sub id='aGhZW'></sub></form><legend id='aGhZW'></legend><bdo id='aGhZW'><pre id='aGhZW'><center id='aGhZW'></center></pre></bdo></b><th id='aGhZW'></th></span></q></dt></tr></i><div id='aGhZW'><tfoot id='aGhZW'></tfoot><dl id='aGhZW'><fieldset id='aGhZW'></fieldset></dl></div>
                1. <legend id='aGhZW'><style id='aGhZW'><dir id='aGhZW'><q id='aGhZW'></q></dir></style></legend><tfoot id='aGhZW'></tfoot>
                  • <bdo id='aGhZW'></bdo><ul id='aGhZW'></ul>
                2. <small id='aGhZW'></small><noframes id='aGhZW'>