CheckIndex: pos -1 is out of bounds

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

CheckIndex: pos -1 is out of bounds

hossman

Hey guys, a Solr user just encountered an interesting situation...

...due to a naive "LengthFilter", an Analyzer is produce a TokenStream
where the first Token has a positionIncrement of "0" which seems to
produce this error from CheckIndex...

     WARNING: would remove reference to this segment (-fix was not
specified); full exception:
java.lang.RuntimeException: term features:usa: doc 0: pos -1 is out of
bounds
         at org.apache.lucene.index.CheckIndex.check(CheckIndex.java:205)
         at org.apache.lucene.index.CheckIndex.main(CheckIndex.java:362)

...but as far as i can tell, the index is still usable.

Questions are:
   1) is CheckIndex over paranoid?
   2) shouldn't IndexWriter have protected against this if it is incorrect?



-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: CheckIndex: pos -1 is out of bounds

hossman

: Hey guys, a Solr user just encountered an interesting situation...

sorry, i forgot to paste the refrence...

http://www.nabble.com/WordDelimiterFilter%2BLenghtFilter-results-in-termPosition%3D%3D-1-to16306788.html



-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: CheckIndex: pos -1 is out of bounds

Michael McCandless-2

Interesting!

I would be inclined to allow this, and fix CheckIndex's paranoia.  As  
far as I can tell, Lucene itself does not mind if the position is -1  
(at least PhraseQuery, SpanTermQuery happily find that Term at  
position -1), although we do prevent setting positionIncrement to a  
negative number in Token.java so you can't get less than -1.  Does  
anyone know of actual cases where Lucene would choke on this?

Mike

Chris Hostetter wrote:

>
> : Hey guys, a Solr user just encountered an interesting situation...
>
> sorry, i forgot to paste the refrence...
>
> http://www.nabble.com/WordDelimiterFilter%2BLenghtFilter-results-in- 
> termPosition%3D%3D-1-to16306788.html
>
>
>
> -Hoss
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]