Lucene Proximity Search for phrase with more than two words(Lucene Proximity 搜索超过两个词的短语)
问题描述
Lucene 的手册中已经清楚地解释了邻近搜索的含义,其中包含两个单词,例如 "jakarta apache"~10
中的示例http://lucene.apache.org/core/2_9_4/queryparsersyntax.html#Proximity Searches
Lucene's manual has explained the meaning of proximity search for a phrase with two words clearly, such as the "jakarta apache"~10
example in
http://lucene.apache.org/core/2_9_4/queryparsersyntax.html#Proximity Searches
但是,我想知道像 "jakarta apache lucene"~10
这样的搜索到底是做什么的?它是否允许相邻的单词最多相隔 10 个单词,或者所有成对的单词都是这样?
However, I am wondering what does a search like "jakarta apache lucene"~10
exactly do? Does it allow neighboring words to be at most 10 words apart, or all pairs of words to be that?
谢谢!
推荐答案
slop (proximity) 就像编辑距离一样工作(参见 PhraseQuery.setSlop
).因此,这些条款可以重新排序或添加额外的条款.这意味着接近度将是添加到整个查询中的最大术语数.那就是:
The slop (proximity) works like an edit distance (see PhraseQuery.setSlop
). So, the terms could be reordered or have extra terms added. This means that the proximity would be the maximum number of terms added into the whole query. That is:
"jakarta apache lucene"~3
将匹配:
- jakarta lucene apache"(距离:2)
- "jakarta extra words here apache lucene"(距离:3)
- jakarta 一些词 apache 分隔 lucene"(距离:3)
但不是:
- lucene jakarta apache"(距离:4)
- "jakarta too many extra words here apache lucene"(距离:5)
- jakarta 一些话apache进一步分隔lucene"(距离:4)
有些人被以下的困惑:
lucene jakarta apache"(距离:4)
"lucene jakarta apache" (distance: 4)
简单的解释是交换术语需要两次编辑,所以:
The simple explanation is that swapping terms takes two edits, so:
- jakarta apache lucene(距离:0)
- jakarta lucene apache(第一次交换,距离:2)
- lucene jakarta apache(第二次交换,距离:4)
更长但更准确的解释是,每次编辑都允许将术语移动一个位置.交换的第一步将两个术语相互交换.牢记这一点解释了为什么任何三个术语的集合都可以重新排列成距离不大于 4 的任何顺序.
The longer, but more accurate, explanation is that every edit allows a term to be moved by one position. The first move of a swap transposes two terms on top of each other. Keeping this in mind explains why any set of three terms can be rearranged into any order with distance no greater than 4.
- jakarta apache lucene(距离:0)
- jakarta [apache,lucene](距离:1)
- [jakarta,apache,lucene](都转置在同一个位置,距离:2)
- lucene [jakarta,apache](距离:3)
- lucene jakarta apache(距离:4)
这篇关于Lucene Proximity 搜索超过两个词的短语的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持编程学习网!
本文标题为:Lucene Proximity 搜索超过两个词的短语


基础教程推荐
- 降序排序:Java Map 2022-01-01
- 如何使用 Java 创建 X509 证书? 2022-01-01
- Java Keytool 导入证书后出错,"keytool error: java.io.FileNotFoundException &拒绝访问" 2022-01-01
- 设置 bean 时出现 Nullpointerexception 2022-01-01
- 减少 JVM 暂停时间 >1 秒使用 UseConcMarkSweepGC 2022-01-01
- “未找到匹配项"使用 matcher 的 group 方法时 2022-01-01
- Java:带有char数组的println给出乱码 2022-01-01
- FirebaseListAdapter 不推送聊天应用程序的单个项目 - Firebase-Ui 3.1 2022-01-01
- 在 Libgdx 中处理屏幕的正确方法 2022-01-01
- 无法使用修饰符“public final"访问 java.util.Ha 2022-01-01