[jira] Created: (LUCENE-1253) LengthFilter may generate a TokenStream where first token has positionIncrement==0

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

[jira] Created: (LUCENE-1253) LengthFilter may generate a TokenStream where first token has positionIncrement==0

JIRA jira@apache.org
LengthFilter may generate a TokenStream where first token has positionIncrement==0
----------------------------------------------------------------------------------

                 Key: LUCENE-1253
                 URL: https://issues.apache.org/jira/browse/LUCENE-1253
             Project: Lucene - Java
          Issue Type: Bug
          Components: Analysis
    Affects Versions: 2.3.1
            Reporter: Walter Ferrara
            Priority: Minor


See for reference:
http://www.nabble.com/WordDelimiterFilter%2BLenghtFilter-results-in-termPosition%3D%3D-1-td16306788.html
and http://www.nabble.com/Lucene---Java-f24284.html

It seems that LengthFilter (at least) could produce a stream in which the first Token has a positionIncrement of 0, which make CheckIndex and Luke function "Reconstruct&Edit" to generate exception.

Should something be done to avoid this situation, or could the error be ignored (by allowing Term with a position of -1, and relaxing CheckIndex checks?)


--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (LUCENE-1253) LengthFilter may generate a TokenStream where first token has positionIncrement==0

JIRA jira@apache.org

    [ https://issues.apache.org/jira/browse/LUCENE-1253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12583486#action_12583486 ]

Hoss Man commented on LUCENE-1253:
----------------------------------

the more general question is: should LengthFilter have an option to (or by default) change the position of the Tokens it lets through to be realative the positions of the tokens it strips out.

ie given a stream of tokens expressed as <term,positionIncrement> ...

  <a,1> <b,1> <c,1> <ddddd,0> <e,0> <f,2> <ggggg,0> <hhhhhh,1>

should the resulting stream after using a LengthFilter with min=3 be...

  <ddddd,0> <ggggg,0> <hhhhhh,1>

...(which i believe is the current behavior) or should it be...

   <ddddd,3> <ggggg,2> <hhhhhh,1>

FWIW: StopFilter seems to have code to handle this (but I haven't tested that it works correctly)

The question of whether or not it's legal for the first token of a stream to have a positionIncrement of "0" is being discussed on the list, most likely if it needs changed, that would be done in IndexWriter DocumentsWriter

> LengthFilter may generate a TokenStream where first token has positionIncrement==0
> ----------------------------------------------------------------------------------
>
>                 Key: LUCENE-1253
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1253
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Analysis
>    Affects Versions: 2.3.1
>            Reporter: Walter Ferrara
>            Priority: Minor
>
> See for reference:
> http://www.nabble.com/WordDelimiterFilter%2BLenghtFilter-results-in-termPosition%3D%3D-1-td16306788.html
> and http://www.nabble.com/Lucene---Java-f24284.html
> It seems that LengthFilter (at least) could produce a stream in which the first Token has a positionIncrement of 0, which make CheckIndex and Luke function "Reconstruct&Edit" to generate exception.
> Should something be done to avoid this situation, or could the error be ignored (by allowing Term with a position of -1, and relaxing CheckIndex checks?)

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[jira] Updated: (LUCENE-1253) LengthFilter ignoring relative positionIncrement of tokens skipped

JIRA jira@apache.org
In reply to this post by JIRA jira@apache.org

     [ https://issues.apache.org/jira/browse/LUCENE-1253?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Hoss Man updated LUCENE-1253:
-----------------------------

    Summary: LengthFilter ignoring relative positionIncrement of tokens skipped  (was: LengthFilter may generate a TokenStream where first token has positionIncrement==0)

tweaking issue Summary to describe the more general problem with LengthFilter

> LengthFilter ignoring relative positionIncrement of tokens skipped
> ------------------------------------------------------------------
>
>                 Key: LUCENE-1253
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1253
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Analysis
>    Affects Versions: 2.3.1
>            Reporter: Walter Ferrara
>            Priority: Minor
>
> See for reference:
> http://www.nabble.com/WordDelimiterFilter%2BLenghtFilter-results-in-termPosition%3D%3D-1-td16306788.html
> and http://www.nabble.com/Lucene---Java-f24284.html
> It seems that LengthFilter (at least) could produce a stream in which the first Token has a positionIncrement of 0, which make CheckIndex and Luke function "Reconstruct&Edit" to generate exception.
> Should something be done to avoid this situation, or could the error be ignored (by allowing Term with a position of -1, and relaxing CheckIndex checks?)

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]