Lucene 中的 {Filter} 比 {Query} 快吗?

时间：2023-09-30

本文介绍了Lucene 中的 {Filter} 比 {Query} 快吗?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

在阅读Lucene in Action 2nd edition"时，我遇到了 Filter 类的描述，这些类可用于 Lucene 中的结果过滤.Lucene 有很多过滤器重复 Query 类.例如，NumericRangeQuery 和 NumericRangeFilter.

While reading "Lucene in Action 2nd edition" I came across the description of Filter classes which are could be used for result filtering in Lucene. Lucene has a lot of filters repeating Query classes. For example, NumericRangeQuery and NumericRangeFilter.

这本书说 NRF 与 NRQ 完全相同，但没有文档评分.这是否意味着如果我不需要评分或按文档字段值对文档进行排序，我应该更喜欢Filtering而不是Query


The book says that NRF does exactly the same as NRQ but without document scoring. Does this means that if I do not need scoring or sort documents by document field value I should prefer Filtering over Querying from performance point of view?
推荐答案
我收到了 Uwe Schindler 的一个很好的回答，让我在这里重新发布.
I receive a great answer from Uwe Schindler, let me repost it here.
如果你不缓存过滤器，查询会更快，因为 ConjunctionScorer在 Lucene 中有优化，目前还没有用于过滤器.过滤器很好，如果你缓存它们(例如，如果你总是有相同的访问权限特定用户的所有查询的限制).在在这种情况下，过滤器只执行一次并被进一步缓存请求，然后与查询结果集相交.

  If you dont cache filters, queries will be faster, as the ConjunctionScorer
  in Lucene has optimizations, which are currently not used for Filters.
  Filters are fine, if you cache them (e.g. if you always have the same access
  restrictions for a specific user that are applied to all his queries). In
  that case the Filter is only executed once and cached for all further
  requests and then intersected with the query result set.
如果你只想随机过滤"，例如通过可变数值范围就像地理搜索中的边界框一样，使用查询，查询在大多数案例更快(例如范围查询和类似的东西 - 称为 MultiTermQueries- 在内部也由相同的 BitSet 算法实现，如过滤器 - 实际上它们只是被记分器-impl 包装的过滤器).但是将查询和您的过滤器"查询组合在一起的记分器(ConjunctionScorer) 通常比应用搜索后过滤.这可能会有所改进，但总的来说过滤器是 Lucene 中不再需要的东西，所以有已经有一些方法可以使过滤器和查询相同，并且而是能够缓存非评分查询.这会让很多代码更容易.
If you only want to e.g. randomly "filter" e.g. by a variable numeric range
  like a bounding box in a geographic search, use queries, queries are in most
  cases faster (e.g. Range Queries and similar stuff - called MultiTermQueries
  - are internally also implemented by the same BitSet algorithm like the
  Filter - in fact they are only Filters wrapped by a Scorer-impl). But the
  Scorer that ANDs the query and your "filter" query together
  (ConjunctionScorer) is generally faster than the code that applies the
  filter after searching. This may some improvement possible, but in general
  filters are something in Lucene that is not really needed anymore, so there
  were already some approaches to make Filters and Queries the same, and
  instead then be able to also cache non-scoring queries. This would make lots
  of code easier.
过滤器可以在 Lucene 4.0 中带来巨大的速度提升，如果它们是插入 IndexReader 以在  评分之前过滤文档，但这还没有实现(见https://issues.apache.org/jira/browse/LUCENE-3212) - 我正在工作在上面.我们也可以使过滤器随机访问(很容易，因为它们是位集)，这还可以改进查询后过滤.但我也会做查询部分随机访问，如果他们可以支持的话(比如查询仅基于 FieldCache).
Filters can bring a huge speed improvement with  Lucene 4.0, if they are
  plugged ontop of the IndexReader to filter the documents before scoring,
  but that's not yet implemented (see
  https://issues.apache.org/jira/browse/LUCENE-3212) - I am working on it. We
  may also make Filters random access (it's easy as they are bitsets), which
  could improve also the after-query filtering. But I would then also make
  Queries partially random access, if they could support it (like queries that
  are only based on FieldCache).
呜呜

                        这篇关于Lucene 中的 {Filter} 比 {Query} 快吗?的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持html5模板网！



上一篇：lucene 良好实践和线程安全 
下一篇：基于数字字段在Lucene中对搜索结果进行排序 

 
相关文章
如何检测 32 位 int 上的整数溢出?How can I detect integer overflow on 32 bits int?(如何检测 32 位 int 上的整数溢出?)
return 语句之前的局部变量，这有关系吗?Local variables before return statements, does it matter?(return 语句之前的局部变量，这有关系吗?)
如何将整数转换为整数?How to convert Integer to int?(如何将整数转换为整数?)
如何在给定范围内创建一个随机打乱数字的 intHow do I create an int array with randomly shuffled numbers in a given range(如何在给定范围内创建一个随机打乱数字的 int 数组)
java的行为不一致==Inconsistent behavior on java#39;s ==(java的行为不一致==)
为什么 Java 能够将 0xff000000 存储为 int?Why is Java able to store 0xff000000 as an int?(为什么 Java 能够将 0xff000000 存储为 int?)



最新文章
如何使用 SimpleDateFormat.parse() 将 Calendar.toString()How can I Convert Calendar.toString() into date using SimpleDateFormat.parse()?(如何使用 SimpleDateFormat.parse() 将 Calendar.toString() 转换为日期?)

lucene 在查询中获得匹配的术语
如何解析/解压缩/解压缩 Nexus 生成的 Maven 存储库
如何使用“like"查询 lucene操作员?
solr 和 lucene 的区别
在 lucene 中使用命中荧光笔
在 Lucene 中获取词频
如何在 solr 结果中获得构面范围?
如何使用 Lucene Analyzer 标记字符串?
Lucene 使用 FSDirectory
Mac 用户 - 如何在 Mac 中设置 CLASSPATHS(我正在做一


<i id='Gxs0G'><tr id='Gxs0G'><dt id='Gxs0G'><q id='Gxs0G'><span id='Gxs0G'><b id='Gxs0G'><form id='Gxs0G'><ins id='Gxs0G'></ins><ul id='Gxs0G'></ul><sub id='Gxs0G'></sub></form><legend id='Gxs0G'></legend><bdo id='Gxs0G'><pre id='Gxs0G'><center id='Gxs0G'></center></pre></bdo></b><th id='Gxs0G'></th></span></q></dt></tr></i><div id='Gxs0G'><tfoot id='Gxs0G'></tfoot><dl id='Gxs0G'><fieldset id='Gxs0G'></fieldset></dl></div>
<legend id='Gxs0G'><style id='Gxs0G'><dir id='Gxs0G'><q id='Gxs0G'></q></dir></style></legend>
<bdo id='Gxs0G'></bdo><ul id='Gxs0G'></ul>
<tbody id='Gxs0G'></tbody>
<tfoot id='Gxs0G'></tfoot>
<small id='Gxs0G'></small><noframes id='Gxs0G'>