Why Lucene doesn#39;t support any type of update to an existing document(为什么 Lucene 不支持对现有文档进行任何类型的更新)
问题描述
我的用例涉及索引一个 Lucene 文档,然后在以后的多个场合添加指向该现有文档的术语,而不是为每个新术语删除和重新添加整个文档(因为性能,而不是保留原始条款).
My use case involves index a Lucene document, then on multiple future occasions add terms that point to this existing doc, that's without deleting and re-adding the entire document for each new term (because of performance, and not keeping the original terms).
我知道文档不能真正更新.我的问题是为什么?
I do know that a document can not be truly updated. My question is why?
或者更准确地说,为什么不支持所有形式的更新(术语、存储字段)?
为什么不可能添加另一个术语来指向现有文档 - 从技术上讲:所需要的不仅仅是将现有的文档 ID 放在术语的发布列表中.为什么这么难?是否有一些不可变的统计数据?
Or more precisely, why are all forms of updates (terms, stored fields) not supported?
Why it's not possible to add another term to point to an existing document - technically: isn't all that's needed is to have the existing doc Id placed in the posting list of the term. Why is that hard? Is there some immutable statistics that are in the way?
是否有任何解决方法可以支持我将术语(索引字段)添加到现有文档的用例?
Are there any workarounds for supporting my usecase of adding a term (indexed field) to an existing doc?
推荐答案
我知道文档不能真正更新.我的问题是为什么?
I do know that a document can not be truly updated. My question is why?
Gili,编辑文档会导致相关术语发布发生变化,由于术语发布列表结构,这是有问题的.过帐列表被排序并按顺序存储在内存中.因此,要将文档添加到术语的发布列表中,您必须为其提供更高的 doc id
,这是通过删除并重新索引整个文档来完成的.
Gili, editing a document will cause changes in the related terms postings and this is problematic due to to the terms posting-list structure. The posting-list is sorted and stored sequential in memory. Thus to add a document to a term's posting-list you have to give it a higher doc id
this is done by deleting and re-index the entire document.
这篇关于为什么 Lucene 不支持对现有文档进行任何类型的更新的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持编程学习网!
本文标题为:为什么 Lucene 不支持对现有文档进行任何类型的更新


基础教程推荐
- 多个组件的复杂布局 2022-01-01
- 不推荐使用 Api 注释的描述 2022-01-01
- 从 python 访问 JVM 2022-01-01
- 如何在 JFrame 中覆盖 windowsClosing 事件 2022-01-01
- 在 Java 中创建日期的正确方法是什么? 2022-01-01
- 大摇大摆的枚举 2022-01-01
- Java Swing计时器未清除 2022-01-01
- 验证是否调用了所有 getter 方法 2022-01-01
- 如何在 Spring @Value 注解中正确指定默认值? 2022-01-01
- Java 实例变量在两个语句中声明和初始化 2022-01-01