<small id='KT19U'></small><noframes id='KT19U'>

      <tfoot id='KT19U'></tfoot>
        <bdo id='KT19U'></bdo><ul id='KT19U'></ul>

      <legend id='KT19U'><style id='KT19U'><dir id='KT19U'><q id='KT19U'></q></dir></style></legend>

        <i id='KT19U'><tr id='KT19U'><dt id='KT19U'><q id='KT19U'><span id='KT19U'><b id='KT19U'><form id='KT19U'><ins id='KT19U'></ins><ul id='KT19U'></ul><sub id='KT19U'></sub></form><legend id='KT19U'></legend><bdo id='KT19U'><pre id='KT19U'><center id='KT19U'></center></pre></bdo></b><th id='KT19U'></th></span></q></dt></tr></i><div id='KT19U'><tfoot id='KT19U'></tfoot><dl id='KT19U'><fieldset id='KT19U'></fieldset></dl></div>
      1. 如何从 Lucene 中的文档术语向量中获取位置?

        How to get positions from a document term vector in Lucene?(如何从 Lucene 中的文档术语向量中获取位置?)

          <i id='HAfZr'><tr id='HAfZr'><dt id='HAfZr'><q id='HAfZr'><span id='HAfZr'><b id='HAfZr'><form id='HAfZr'><ins id='HAfZr'></ins><ul id='HAfZr'></ul><sub id='HAfZr'></sub></form><legend id='HAfZr'></legend><bdo id='HAfZr'><pre id='HAfZr'><center id='HAfZr'></center></pre></bdo></b><th id='HAfZr'></th></span></q></dt></tr></i><div id='HAfZr'><tfoot id='HAfZr'></tfoot><dl id='HAfZr'><fieldset id='HAfZr'></fieldset></dl></div>

            <legend id='HAfZr'><style id='HAfZr'><dir id='HAfZr'><q id='HAfZr'></q></dir></style></legend><tfoot id='HAfZr'></tfoot>

            <small id='HAfZr'></small><noframes id='HAfZr'>

              <bdo id='HAfZr'></bdo><ul id='HAfZr'></ul>
                  <tbody id='HAfZr'></tbody>

                1. 本文介绍了如何从 Lucene 中的文档术语向量中获取位置?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着跟版网的小编来一起学习吧!

                  问题描述

                  我需要遍历 Lucene 索引中的所有文档,并获取每个术语在每个文档中出现的位置.据我能够从 Lucene javadoc 中了解到,这样做的方法是执行以下操作:

                  I need to iterate over all documents in a Lucene index, and obtain the positions at which each term occurs in each document. As far as I am able to understand from the Lucene javadoc, the way to do this is to do something like this:

                  IndexReader ir = obtainIndexReader();
                  Terms tv = ir.getTermVector( doc, field );
                  TermsEnum terms = tv.iterator();
                  PostingsEnum p = null;
                  while( terms.next() != null ) {
                      p = terms.postings( p, PostingsEnum.ALL );
                      while( p.nextDoc() != PostingsEnum.NO_MORE_DOCS ) {
                          int freq = p.freq();
                          for( int i = 0; i < freq; i++ ) {
                              int pos = p.nextPosition();   // Always returns -1!!!
                              BytesRef data = p.getPayload();
                              doStuff( freq, pos, data ); // Fails miserably, of course.
                          }
                      }
                  }
                  

                  但是,即使 (1) 索引确实包含相关字段上的位置,并且 (2) 术语向量声称具有位置(即:tv.hasPositions() == true),我仍然得到-1" 适用于所有职位.

                  However, even though (1) the index does indeed include positions on the relevant field and (2) the term vector claims to have positions (i.e.: tv.hasPositions() == true), I keep getting "-1" for all positions.

                  首先,我是不是做错了什么?是否有另一种方法可以在每个文档的基础上迭代过帐?第二:到底发生了什么?该索引包含位置,getTermVector 返回的术语实例声称包含位置,并且我正在查看 Luke 中的正确位置值,但是当我尝试在我的代码中访问所述值时仍然得到 -1.什么给了?

                  First, am I doing something wrong? Is there an alternative way of iterating over postings on a per-document basis? Second: What is going on anyway? The index contains positions, the Terms instance returned by getTermVector claims to include positions, and I'm looking at the correct position values in Luke, yet I still get -1 when I try to access said values in my code. What gives?

                  相关字段配置有以下选项:

                  The relevant field was configured with the following options:

                      FieldType ft = new FieldType();
                      ft.setIndexOptions( IndexOptions.DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS );
                      ft.setStoreTermVectors( true );
                      ft.setStoreTermVectorOffsets( true );
                      ft.setStoreTermVectorPayloads( true );
                      ft.setStoreTermVectorPositions( true );
                      ft.setTokenized( true );
                      return ft;
                  

                  推荐答案

                  您是否在索引时为您的字段类型设置了 FieldType.setStoreTermVectorPositions(true)?http://lucene.apache.org/core/5_5_0/core/org/apache/lucene/document/FieldType.html#setStoreTermVectorPositions(boolean)

                  Did you set FieldType.setStoreTermVectorPositions(true) on your field type at index time? http://lucene.apache.org/core/5_5_0/core/org/apache/lucene/document/FieldType.html#setStoreTermVectorPositions(boolean)

                  这篇关于如何从 Lucene 中的文档术语向量中获取位置?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持跟版网!

                  本站部分内容来源互联网,如果有图片或者内容侵犯了您的权益,请联系我们,我们会在确认后第一时间进行删除!

                  相关文档推荐

                  How to send data to COM PORT using JAVA?(如何使用 JAVA 向 COM PORT 发送数据?)
                  How to make a report page direction to change to quot;rtlquot;?(如何使报表页面方向更改为“rtl?)
                  Use cyrillic .properties file in eclipse project(在 Eclipse 项目中使用西里尔文 .properties 文件)
                  Is there any way to detect an RTL language in Java?(有没有办法在 Java 中检测 RTL 语言?)
                  How to load resource bundle messages from DB in Java?(如何在 Java 中从 DB 加载资源包消息?)
                  How do I change the default locale settings in Java to make them consistent?(如何更改 Java 中的默认语言环境设置以使其保持一致?)

                    <small id='hCeCV'></small><noframes id='hCeCV'>

                          <tbody id='hCeCV'></tbody>
                      1. <legend id='hCeCV'><style id='hCeCV'><dir id='hCeCV'><q id='hCeCV'></q></dir></style></legend>

                      2. <tfoot id='hCeCV'></tfoot>
                        <i id='hCeCV'><tr id='hCeCV'><dt id='hCeCV'><q id='hCeCV'><span id='hCeCV'><b id='hCeCV'><form id='hCeCV'><ins id='hCeCV'></ins><ul id='hCeCV'></ul><sub id='hCeCV'></sub></form><legend id='hCeCV'></legend><bdo id='hCeCV'><pre id='hCeCV'><center id='hCeCV'></center></pre></bdo></b><th id='hCeCV'></th></span></q></dt></tr></i><div id='hCeCV'><tfoot id='hCeCV'></tfoot><dl id='hCeCV'><fieldset id='hCeCV'></fieldset></dl></div>

                            <bdo id='hCeCV'></bdo><ul id='hCeCV'></ul>