假设我有这个给定的数据
Let's say I have this given data
{
"name" : "ABC",
"favorite_cars" : [ "ferrari","toyota" ]
}, {
"name" : "ABC",
"favorite_cars" : [ "ferrari","toyota" ]
}, {
"name" : "GEORGE",
"favorite_cars" : [ "honda","Hyundae" ]
}
每当我在搜索最喜欢的汽车是丰田的人时查询此数据时,它都会返回此数据
Whenever I query this data when searching for people who's favorite car is toyota, it returns this data
{
"name" : "ABC",
"favorite_cars" : [ "ferrari","toyota" ]
}, {
"name" : "ABC",
"favorite_cars" : [ "ferrari","toyota" ]
}
结果是两条名为 ABC 的记录.如何仅选择不同的文档?我想得到的结果只有这个
the result is Two records of with a name of ABC. How do I select distinct documents only? The result I want to get is only this
{
"name" : "ABC",
"favorite_cars" : [ "ferrari","toyota" ]
}
这是我的查询
{
"fuzzy_like_this_field" : {
"favorite_cars" : {
"like_text" : "toyota",
"max_query_terms" : 12
}
}
}
我正在使用 ElasticSearch 1.0.0.使用 java api 客户端
I am using ElasticSearch 1.0.0. with the java api client
您可以使用 聚合.使用 术语聚合结果将按一个字段分组,例如name,还提供了该字段每个值的出现次数,并将按此计数对结果进行排序(降序).
You can eliminate duplicates using aggregations. With term aggregation the results will be grouped by one field, e.g. name, also providing a count of the ocurrences of each value of the field, and will sort the results by this count (descending).
{
"query": {
"fuzzy_like_this_field": {
"favorite_cars": {
"like_text": "toyota",
"max_query_terms": 12
}
}
},
"aggs": {
"grouped_by_name": {
"terms": {
"field": "name",
"size": 0
}
}
}
}
除了 hits 之外,结果还将包含 buckets,其中 key 中的唯一值和 中的计数>doc_count:
In addition to the hits, the result will also contain the buckets with the unique values in key and with the count in doc_count:
{
"took" : 4,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 2,
"max_score" : 0.19178301,
"hits" : [ {
"_index" : "pru",
"_type" : "pru",
"_id" : "vGkoVV5cR8SN3lvbWzLaFQ",
"_score" : 0.19178301,
"_source":{"name":"ABC","favorite_cars":["ferrari","toyota"]}
}, {
"_index" : "pru",
"_type" : "pru",
"_id" : "IdEbAcI6TM6oCVxCI_3fug",
"_score" : 0.19178301,
"_source":{"name":"ABC","favorite_cars":["ferrari","toyota"]}
} ]
},
"aggregations" : {
"grouped_by_name" : {
"buckets" : [ {
"key" : "abc",
"doc_count" : 2
} ]
}
}
}
请注意,由于重复消除和结果排序,使用聚合的成本会很高.
Note that using aggregations will be costly because of duplicate elimination and result sorting.
这篇关于ElasticSearch 仅返回具有不同值的文档的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持html5模板网!
如何检测 32 位 int 上的整数溢出?How can I detect integer overflow on 32 bits int?(如何检测 32 位 int 上的整数溢出?)
return 语句之前的局部变量,这有关系吗?Local variables before return statements, does it matter?(return 语句之前的局部变量,这有关系吗?)
如何将整数转换为整数?How to convert Integer to int?(如何将整数转换为整数?)
如何在给定范围内创建一个随机打乱数字的 intHow do I create an int array with randomly shuffled numbers in a given range(如何在给定范围内创建一个随机打乱数字的 int 数组)
java的行为不一致==Inconsistent behavior on java#39;s ==(java的行为不一致==)
为什么 Java 能够将 0xff000000 存储为 int?Why is Java able to store 0xff000000 as an int?(为什么 Java 能够将 0xff000000 存储为 int?)