如何使用 QueryParser 执行包含特殊字符的 lucene 查

时间：2023-09-29

本文介绍了如何使用 QueryParser 执行包含特殊字符的 lucene 查询?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

事情就是这样.我有一个存储在索引中的词，其中包含特殊字符，例如'-'，最简单的代码是这样的:

Here is the thing. I have a term stored in the index, which contains special character, such as '-', the simplest code is like this:

Document doc = new Document();
doc.add(new TextField("message", "1111-2222-3333", Field.Store.YES, Field.Index.NOT_ANALYZED));
writer.addDocument(doc);

然后我使用 QueryParser 创建一个查询，如下所示:

And then I create a query using QueryParser, like this:

String queryStr = "1111-2222-3333";
QueryParser parser = new QueryParser(Version.LUCENE_36, "message", new StandardAnalyzer(Version.LUCENE_36));
Query q = parser.parse(queryStr);

然后我使用搜索器搜索查询并没有得到任何结果.我也试过这个:

And then I use a searcher to search the query and get no result. I have also tried this:

Query q = parser.parse(QueryParser.escape(queryStr));

仍然没有结果.

不使用 QueryParser 而是直接使用 TermQuery 可以做我想做的事，但是这种方式对于用户输入文本不够灵活.

Without using QueryParser and instead using TermQuery directly can do what I want, but this way is not flexible enough for user input texts.

我想也许 StandardAnalyzer 做了一些事情来省略查询字符串中的特殊字符.试了debug，发现字符串被拆分，实际查询是这样的:message:1111 message:2222 message:3333".不知道lucene到底做了什么……

I think maybe the StandardAnalyzer did something to omit the special character in the query string. I tried debug, and I found that the string is splited and the actual query is like this:"message:1111 message:2222 message:3333". I don't know what exactly lucene has done...

所以如果我想用特殊字符执行查询，我应该怎么做?我应该重写分析器还是从默认的继承查询分析器?以及如何?...

So if I want to perform the query with special character, what should I do? Should I rewrite an analyzer or inherit a queryparser from the default one? And how to?...

更新:

1 @The New Idiot @femtoRgon，我已经尝试了问题中所述的 QueryParser.escape(queryStr)，但它仍然不起作用.

1 @The New Idiot @femtoRgon, I've tried QueryParser.escape(queryStr) as stated in the problem but it still doesn't work.

2 我尝试了另一种解决问题的方法.我从Tokenizer派生了一个QueryTokenizer，只用空格截取单词，打包成一个QueryAnalyzer，它派生自Analyzer，最后将QueryAnalyzer传递给QueryParser.

2 I've tried another way to solve the problem. I derived a QueryTokenizer from Tokenizer and cut the word only by space, pack it into a QueryAnalyzer, which derives from Analyzer, and finally pass the QueryAnalyzer into QueryParser.

现在可以了.最初它不起作用，因为默认的 StandardAnalyzer 根据默认规则(将某些特殊字符识别为拆分器)切割 queryStr，当查询传递到 QueryParser 时，特殊字符已经被 StandardAnalyzer 删除.现在我使用我自己的方式来剪切 queryStr 并且它只将空格识别为分隔符，因此特殊字符保留在查询中等待处理，这很有效.

Now it works. Originally it doesn't work because the default StandardAnalyzer cut the queryStr according to default rules(which recognize some of the special characters as splitters), when the query is passed into QueryParser, the special characters are already deleted by StandardAnalyzer. Now I use my own way to cut the queryStr and it only recognize space as splitter, so the special characters remain into the query waiting for processing and this works.

3 @The New Idiot @femtoRgon，感谢您回答我的问题.

3 @The New Idiot @femtoRgon, thank you for answering my question.

如何使用 QueryParser 执行包含特殊字符的 lucene 查

问题描述

推荐答案

相关文章

最新文章