无法使用 lucene IndexWriter.deleteDocuments(term) 删除文档

2023-06-28Java开发问题
1

本文介绍了无法使用 lucene IndexWriter.deleteDocuments(term) 删除文档的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着跟版网的小编来一起学习吧!

问题描述

这两天一直在苦苦挣扎,就是无法用indexWriter.deleteDocuments(term)

Have been struggling for this two days now, just can't delete the document with indexWriter.deleteDocuments(term)

这里我会放上做测试的代码,希望有人能指出我做错了什么,已经尝试过的事情:

Here I will put the code which will do a test, hopefully someone could point out what I have done wrong, things that have been tried:

  1. 将 lucene 版本从 2.x 更新为 5.x
  2. 使用 indexWriter.deleteDocuments() 代替 indexReader.deleteDocuments()
  3. indexOption 配置为 NONEDOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS
  1. Updating the lucene version from 2.x to 5.x
  2. Using indexWriter.deleteDocuments() instead of indexReader.deleteDocuments()
  3. Tring the indexOption configured as NONE or DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS

这里是代码:

import org.apache.lucene.analysis.core.SimpleAnalyzer;
import org.apache.lucene.document.Document;
import org.apache.lucene.document.Field;
import org.apache.lucene.document.FieldType;
import org.apache.lucene.index.*;
import org.apache.lucene.queryparser.classic.ParseException;
import org.apache.lucene.queryparser.classic.QueryParser;
import org.apache.lucene.search.IndexSearcher;
import org.apache.lucene.search.Query;
import org.apache.lucene.search.ScoreDoc;
import org.apache.lucene.store.Directory;
import org.apache.lucene.store.FSDirectory;

import java.io.IOException;
import java.nio.file.Paths;

public class TestSearch {
    static SimpleAnalyzer analyzer = new SimpleAnalyzer();

    public static void main(String[] argvs) throws IOException, ParseException {
        generateIndex("5836962b0293a47b09d345f1");
        query("5836962b0293a47b09d345f1");
        delete("5836962b0293a47b09d345f1");
        query("5836962b0293a47b09d345f1");

    }

    public static void generateIndex(String id) throws IOException {
        Directory directory = FSDirectory.open(Paths.get("/tmp/test/lucene"));
        IndexWriterConfig config = new IndexWriterConfig(analyzer);
        IndexWriter iwriter = new IndexWriter(directory, config);
        FieldType fieldType = new FieldType();
        fieldType.setStored(true);
        fieldType.setIndexOptions(IndexOptions.DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS);
        Field idField = new Field("_id", id, fieldType);
        Document doc = new Document();
        doc.add(idField);
        iwriter.addDocument(doc);
        iwriter.close();

    }

    public static void query(String id) throws ParseException, IOException {
        Query query = new QueryParser("_id", analyzer).parse(id);
        Directory directory = FSDirectory.open(Paths.get("/tmp/test/lucene"));
        IndexReader ireader  = DirectoryReader.open(directory);
        IndexSearcher isearcher = new IndexSearcher(ireader);
        ScoreDoc[] scoreDoc = isearcher.search(query, 100).scoreDocs;
        for(ScoreDoc scdoc: scoreDoc){
            Document doc = isearcher.doc(scdoc.doc);
            System.out.println(doc.get("_id"));
        }
    }

    public static void delete(String id){
        try {
             Directory directory = FSDirectory.open(Paths.get("/tmp/test/lucene"));
            IndexWriterConfig config = new IndexWriterConfig(analyzer);
            IndexWriter iwriter = new IndexWriter(directory, config);
            Term term = new Term("_id", id);
            iwriter.deleteDocuments(term);
            iwriter.commit();
            iwriter.close();
        }catch (IOException e){
            e.printStackTrace();
        }
    }
}

首先generateIndex()会在/tmp/test/lucene中生成索引,query()会显示id 会被查询成功,那么 delete() 是希望删除该文档,但再次 query() 将证明删除操作失败.

First generateIndex() will generate a index in /tmp/test/lucene, and query() will show that id will be successfully queried, then delete() was hopefully to deleting the document, but query() again will prove that the deleting action failed.

这是 pom 依赖项,以防有人可能需要测试

Here is the pom dependency in case someone may need for a test

    <dependency>
        <groupId>org.apache.lucene</groupId>
        <artifactId>lucene-core</artifactId>
        <version>5.5.4</version>
        <type>jar</type>
    </dependency>
    <dependency>
        <groupId>org.apache.lucene</groupId>
        <artifactId>lucene-analyzers-common</artifactId>
        <version>5.5.4</version>
    </dependency>
    <dependency>
        <groupId>org.apache.lucene</groupId>
        <artifactId>lucene-queryparser</artifactId>
        <version>5.5.4</version>
    </dependency>
    <dependency>
        <groupId>org.apache.lucene</groupId>
        <artifactId>lucene-analyzers-smartcn</artifactId>
        <version>5.5.4</version>
    </dependency>

渴望得到答案.

推荐答案

你的问题出在分析器上.SimpleAnalyzer 将标记定义为 letters 的最大字符串(StandardAnalyzer,甚至 WhitespaceAnalyzer,是更典型的选择),所以您要索引的值被拆分为标记:b"、a"、b"、d"、f".您定义的删除方法虽然不会通过分析器,但只是创建一个原始术语.如果您尝试将 main 替换为以下内容,则可以看到这一点:

Your problem is in the analyzer. SimpleAnalyzer defines tokens as maximal strings of letters (StandardAnalyzer, or even WhitespaceAnalyzer, are more typical choices), so the value you are indexing gets split into the tokens: "b", "a", "b", "d", "f". The delete method you've defined doesn't pass through the analyzer though, but rather just creates a raw term. You can see this in action if you try replacing your main with this:

generateIndex("5836962b0293a47b09d345f1");
query("5836962b0293a47b09d345f1");
delete("b");
query("5836962b0293a47b09d345f1");

作为一般规则,查询和术语等分析,QueryParser 会.

As a general rule, queries and terms and such do not analyze, QueryParser does.

对于(看起来像)一个标识符字段,您可能根本不想分析这个字段.在这种情况下,将其添加到 FieldType:

For (what looks like) an identifier field, you probably don't really want to analyze this field at all. In that case, add this to the FieldType:

fieldType.setTokenized(false);

然后您将不得不更改您的查询(同样,QueryParser 分析),并改用 TermQuery.

You will then have to change your query (again, QueryParser analyzes), and use TermQuery instead.

Query query = new TermQuery(new Term("_id", id));

这篇关于无法使用 lucene IndexWriter.deleteDocuments(term) 删除文档的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持跟版网!

The End

相关推荐

如何使用 JAVA 向 COM PORT 发送数据?
How to send data to COM PORT using JAVA?(如何使用 JAVA 向 COM PORT 发送数据?)...
2024-08-25 Java开发问题
21

如何使报表页面方向更改为“rtl"?
How to make a report page direction to change to quot;rtlquot;?(如何使报表页面方向更改为“rtl?)...
2024-08-25 Java开发问题
19

在 Eclipse 项目中使用西里尔文 .properties 文件
Use cyrillic .properties file in eclipse project(在 Eclipse 项目中使用西里尔文 .properties 文件)...
2024-08-25 Java开发问题
18

有没有办法在 Java 中检测 RTL 语言?
Is there any way to detect an RTL language in Java?(有没有办法在 Java 中检测 RTL 语言?)...
2024-08-25 Java开发问题
11

如何在 Java 中从 DB 加载资源包消息?
How to load resource bundle messages from DB in Java?(如何在 Java 中从 DB 加载资源包消息?)...
2024-08-25 Java开发问题
13

如何更改 Java 中的默认语言环境设置以使其保持一致?
How do I change the default locale settings in Java to make them consistent?(如何更改 Java 中的默认语言环境设置以使其保持一致?)...
2024-08-25 Java开发问题
13