[jira] Created: (LUCENE-1350) SnowballFilter resets the payload

classic Classic list List threaded Threaded
19 messages Options
Reply | Threaded
Open this post in threaded view
|

[jira] Created: (LUCENE-1350) SnowballFilter resets the payload

Sebastian Nagel (Jira)
SnowballFilter resets the payload
---------------------------------

                 Key: LUCENE-1350
                 URL: https://issues.apache.org/jira/browse/LUCENE-1350
             Project: Lucene - Java
          Issue Type: Bug
          Components: Analysis, contrib/*
            Reporter: Doron Cohen
            Assignee: Doron Cohen


Passing tokens with payloads through SnowballFilter results in tokens with no payloads.
A workaround for this is to apply stemming first and only then run whatever logic creates the payload, but this is not always convenient.
Patch to follow that preserves the payload.


--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[jira] Updated: (LUCENE-1350) SnowballFilter resets the payload

Sebastian Nagel (Jira)

     [ https://issues.apache.org/jira/browse/LUCENE-1350?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Doron Cohen updated LUCENE-1350:
--------------------------------

    Attachment: LUCENE-1350.patch

Patch fix this by using Token.clone().
I'll search other filters that might reset payloads - but if you are aware of any that would be useful to know.

> SnowballFilter resets the payload
> ---------------------------------
>
>                 Key: LUCENE-1350
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1350
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Analysis, contrib/*
>            Reporter: Doron Cohen
>            Assignee: Doron Cohen
>         Attachments: LUCENE-1350.patch
>
>
> Passing tokens with payloads through SnowballFilter results in tokens with no payloads.
> A workaround for this is to apply stemming first and only then run whatever logic creates the payload, but this is not always convenient.
> Patch to follow that preserves the payload.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (LUCENE-1350) SnowballFilter resets the payload

Sebastian Nagel (Jira)
In reply to this post by Sebastian Nagel (Jira)

    [ https://issues.apache.org/jira/browse/LUCENE-1350?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12619925#action_12619925 ]

DM Smith commented on LUCENE-1350:
----------------------------------

When we go to the reuse pattern across all of Lucene, the problem will be nearly everywhere.

The pattern for Token after deprecations is removed is:
public Token next(Token token) {
...
token.clear(); // This clears Payload
token.setTermBuffer(newBuffer);
...
}
In https://issues.apache.org/jira/browse/LUCENE-1333, I've changed snowballs next(Token token) to be this pattern.

Using clone is probably not the best.
The following pattern works:
public Token next(Token token) {
...
Payload payload = token.getPayload();
token.clear(); // This clears Payload
token.setTermBuffer(newBuffer);
token.setPayload(payload);
...
}

If payload is to be preserved in the face of the reuse pattern, perhaps clear() should not clear Payload. Since Payload is experimental and marked as subject to change, I don't think that this break of backward compatibility should be an issue. If it is, I think there is a better pattern for Token.

The filter order issue concerning payload also pertains to the flags field, which is also marked experimental, and I also think it pertains to type.

The most typical pattern of Token reuse is:
token.clear(); // reset everything except startOffset, endOffset and type to their defaults.
token.setStartOffset(newStartOffset);
token.setEndOffset(newEndOffset);
token.setType(Token.DEFAULT_TYPE);
token.setTermBuffer(newTerm); // or some variation of this.

This is rather tedious and I think clear is a bit to agressive with setting payload and flags to their default. I think it would be good to add to Token the following and deprecate clear():
public void reuse(char[] buffer, int offset, int length, int startOffset, int endOffset, String type)
{
  setTermBuffer(buffer, offset, length);
  this.positionIncrement = 1;
  this.startOffset = startOffset;
  this.endOffset = endOffset;
  this.type = type;
}

public void reuse(String buffer, int offset, int length, int startOffset, int endOffset, String type)
{
  setTermBuffer(buffer, offset, length);
  this.positionIncrement = 1;
  this.startOffset = startOffset;
  this.endOffset = endOffset;
  this.type = type;
}

public void reuse(String buffer, int startOffset, int endOffset, String type)
{
  setTermBuffer(buffer);
  this.positionIncrement = 1;
  this.startOffset = startOffset;
  this.endOffset = endOffset;
  this.type = type;
}

public void reuse(char[] buffer, int offset, int length, int startOffset, int endOffset)
{
  setTermBuffer(buffer, offset, length);
  this.positionIncrement = 1;
  this.startOffset = startOffset;
  this.endOffset = endOffset;
}

public void reuse(String buffer, int offset, int length, int startOffset, int endOffset)
{
  setTermBuffer(buffer, offset, length);
  this.positionIncrement = 1;
  this.startOffset = startOffset;
  this.endOffset = endOffset;
}

public void reuse(String buffer, int startOffset, int endOffset)
{
  setTermBuffer(buffer);
  this.positionIncrement = 1;
  this.startOffset = startOffset;
  this.endOffset = endOffset;
}

> SnowballFilter resets the payload
> ---------------------------------
>
>                 Key: LUCENE-1350
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1350
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Analysis, contrib/*
>            Reporter: Doron Cohen
>            Assignee: Doron Cohen
>         Attachments: LUCENE-1350.patch
>
>
> Passing tokens with payloads through SnowballFilter results in tokens with no payloads.
> A workaround for this is to apply stemming first and only then run whatever logic creates the payload, but this is not always convenient.
> Patch to follow that preserves the payload.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (LUCENE-1350) SnowballFilter resets the payload

Sebastian Nagel (Jira)
In reply to this post by Sebastian Nagel (Jira)

    [ https://issues.apache.org/jira/browse/LUCENE-1350?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12619942#action_12619942 ]

Doron Cohen commented on LUCENE-1350:
-------------------------------------

These are interesting points to consider.

Let's look at the two cases reuse and non-reuse.

The current SnowballFilter implements the non-reuse method next(). It actually clones,  just without using clone(). So I still think that fixing it to use clone is acceptable. Hope you agree with this.

(Btw, I checked LUCENE-1142 ("Updated Snowball package") - there too the non-reuse version is used.)

For the reuse case, i.e. next(Token), there is a distinction between *consumers* and *producers* (in TokenStream API) - only consumers invoke clear(), and then set everything. Filters are consumers, so clear() is not in place.

(As a side comment, I think an explicit call like setEndOffset() is somewhat clearer than a method with 3 int args.)

> SnowballFilter resets the payload
> ---------------------------------
>
>                 Key: LUCENE-1350
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1350
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Analysis, contrib/*
>            Reporter: Doron Cohen
>            Assignee: Doron Cohen
>         Attachments: LUCENE-1350.patch
>
>
> Passing tokens with payloads through SnowballFilter results in tokens with no payloads.
> A workaround for this is to apply stemming first and only then run whatever logic creates the payload, but this is not always convenient.
> Patch to follow that preserves the payload.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (LUCENE-1350) SnowballFilter resets the payload

Sebastian Nagel (Jira)
In reply to this post by Sebastian Nagel (Jira)

    [ https://issues.apache.org/jira/browse/LUCENE-1350?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12619961#action_12619961 ]

DM Smith commented on LUCENE-1350:
----------------------------------

The non-reuse interface is deprecated. LUCENE-1333 deals with cleaning that up and applying reuse in all of Lucene. To date, it was partially applied to core. This results in sub-optimal performance with Filter chains that use both reuse and non-reuse inputs and filters.

So LUCENE-1333 updates SnowballFilter to use next(Token).

The documentation in TokenStream documents that only producers invoke clear().

To me, it is not clearcut what a producer or a consumer actually is. Obviously, input streams are producers. Some filters, generate multiple tokens as a replacement for the current one (e.g. NGram, stemming,...). To me, these are producers.

If the rule of thumb is that Filters are consumers, merely changing their token's term, then there are lot's of places that need to be changed. I noticed that SnowballFilter's methodology was fairly common:
Token token = input.next();
...
String newTerm = ....;
...
return new Token(newTerm, token.startOffset(), token.endOffset(), token.type());

In migrating this to the reuse pattern, I saw new Token(...) as a producer pattern and to maintain the equivalent behavior clear() needed to be called:
public Token next(Token token)
{
token = input.next(token);
...
String newTerm = ....;
...
token.clear(); // do most of the initialization that new Token does
token.setTermBuffer(newTerm); // new method introduced in LUCENE-1333
return token;
}

I don't know why the following pattern was not originally used (some filters do this) or why you didn't migrate to this:
Token token = input.next();
...
String newTerm = ....;
...
token.setTermText(newTerm);
return token;

This would be faster than cloning and would preserve all fields.



> SnowballFilter resets the payload
> ---------------------------------
>
>                 Key: LUCENE-1350
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1350
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Analysis, contrib/*
>            Reporter: Doron Cohen
>            Assignee: Doron Cohen
>         Attachments: LUCENE-1350.patch
>
>
> Passing tokens with payloads through SnowballFilter results in tokens with no payloads.
> A workaround for this is to apply stemming first and only then run whatever logic creates the payload, but this is not always convenient.
> Patch to follow that preserves the payload.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (LUCENE-1350) SnowballFilter resets the payload

Sebastian Nagel (Jira)
In reply to this post by Sebastian Nagel (Jira)

    [ https://issues.apache.org/jira/browse/LUCENE-1350?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12619982#action_12619982 ]

DM Smith commented on LUCENE-1350:
----------------------------------

Survey of other potential problem areas (I did not look at test cases):
o.a.l.index.memory.SynonymTokenFilter (generated synonyms do not propagate payload or flags)
o.a.l.analysis.de.GermanStemFilter (creates a replacement token and does not propagate payload or flags)
o.a.l.analysis.ngram.EdgeNGramTokenFilter (generated ngrams do not propagate payload or flags; positionIncrement is inappropriate)
o.a.l.analysis.ngram.NGramTokenFilter (generated ngrams do not propagate payload or flags; positionIncrement is inappropriate)
o.a.l.analysis.br.BrazilianStemFilter (creates a replacement token and does not propagate payload or flags)
o.a.l.analysis.fr.ElisionFilter (creates a replacement token and does not propagate payload or flags)
o.a.l.analysis.fr.FrenchStemFilter (creates a replacement token and does not propagate payload or flags)
o.a.l.analysis.shingle.ShingleFilter (generated shingles do not propagate payload or flags)
o.a.l.analysis.ru.RussianLowerCaseFilter (creates a replacement token and does not propagate payload or flags)
o.a.l.analysis.ru.RussianStemFilter (creates a replacement token and does not propagate payload or flags)
o.a.l.analysis.el.GreekLowerCaseFilter (creates a replacement token and does not propagate payload or flags)
o.a.l.analysis.compound.CompoundWordTokenFilterBase  (generated parts do not propagate payload or flags)
o.a.l.analysis.th.ThaiWordFilter (creates a replacement for non-Thai words; generated parts do not propagate payload or flags)
o.a.l.analysis.nl.DutchStemFilter (creates a replacement token and does not propagate payload or flags)

> SnowballFilter resets the payload
> ---------------------------------
>
>                 Key: LUCENE-1350
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1350
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Analysis, contrib/*
>            Reporter: Doron Cohen
>            Assignee: Doron Cohen
>         Attachments: LUCENE-1350.patch
>
>
> Passing tokens with payloads through SnowballFilter results in tokens with no payloads.
> A workaround for this is to apply stemming first and only then run whatever logic creates the payload, but this is not always convenient.
> Patch to follow that preserves the payload.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (LUCENE-1350) SnowballFilter resets the payload

Sebastian Nagel (Jira)
In reply to this post by Sebastian Nagel (Jira)

    [ https://issues.apache.org/jira/browse/LUCENE-1350?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12620408#action_12620408 ]

Doron Cohen commented on LUCENE-1350:
-------------------------------------

{quote}
The non-reuse interface is deprecated. LUCENE-1333 deals with cleaning that up and applying reuse in all of Lucene. To date, it was partially applied to core. This results in sub-optimal performance with Filter chains that use both reuse and non-reuse inputs and filters.
{quote}

Non-reuse TokenStream API is not deprecated in the trunk. I guess you mean it will be deprecated by LUCENE-1333.

{quote}
To me, it is not clearcut what a producer or a consumer actually is. Obviously, input streams are producers. Some filters, generate multiple tokens as a replacement for the current one (e.g. NGram, stemming,...). To me, these are producers.
{quote}

Right, such filters function as producers.  Javadocs should say something weaker, like "most filters are consumers" or "filters are usually consumers".

{quote}
I don't know why the following pattern was not originally used (some filters do this) or why you didn't migrate to this:
Token token = input.next();
...
String newTerm = ....;
...
token.setTermText(newTerm);
return token;

This would be faster than cloning and would preserve all fields.
{quote}

Good point, thanks.

So I wonder what's next with this issue. The complete LUCENE-1333 is dated for 2.4. So it seems in place to fix filters behavior now, to preserve payload (and flags, thanks for pointing this out), following the above (reuse) code pattern. Makes sense?


> SnowballFilter resets the payload
> ---------------------------------
>
>                 Key: LUCENE-1350
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1350
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Analysis, contrib/*
>            Reporter: Doron Cohen
>            Assignee: Doron Cohen
>         Attachments: LUCENE-1350.patch
>
>
> Passing tokens with payloads through SnowballFilter results in tokens with no payloads.
> A workaround for this is to apply stemming first and only then run whatever logic creates the payload, but this is not always convenient.
> Patch to follow that preserves the payload.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (LUCENE-1350) SnowballFilter resets the payload

Sebastian Nagel (Jira)
In reply to this post by Sebastian Nagel (Jira)

    [ https://issues.apache.org/jira/browse/LUCENE-1350?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12620435#action_12620435 ]

DM Smith commented on LUCENE-1350:
----------------------------------

{quote}
Non-reuse TokenStream API is not deprecated in the trunk. I guess you mean it will be deprecated by LUCENE-1333.
{quote}

My argument for actually deprecating it in LUCENE-1333 was that it was implicitly deprecated already.

{quote}
So I wonder what's next with this issue. The complete LUCENE-1333 is dated for 2.4. So it seems in place to fix filters behavior now, to preserve payload (and flags, thanks for pointing this out), following the above (reuse) code pattern. Makes sense?
{quote}

Yes. This issue is a bug fix and if there is a bug release, this should go with it.

I think that you should expand this issue to fix all the other TokenFilters that have the same problem. And to also propagate the flags as well. As you found out a fix is trivial. If you want, I'd be happy to work up a patch for it.

Regarding LUCENE-1333:
BTW, it can go before 2.4. I just set it to 2.4 since that was the next non-bug release.

It will be easy for me to adjust it after this goes in.

I'll change the JavaDoc to be clearer as to when clear() should be called, what is a producer and what is a consumer, as part of it since that is part of it's scope.

> SnowballFilter resets the payload
> ---------------------------------
>
>                 Key: LUCENE-1350
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1350
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Analysis, contrib/*
>            Reporter: Doron Cohen
>            Assignee: Doron Cohen
>         Attachments: LUCENE-1350.patch
>
>
> Passing tokens with payloads through SnowballFilter results in tokens with no payloads.
> A workaround for this is to apply stemming first and only then run whatever logic creates the payload, but this is not always convenient.
> Patch to follow that preserves the payload.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (LUCENE-1350) SnowballFilter resets the payload

Sebastian Nagel (Jira)
In reply to this post by Sebastian Nagel (Jira)

    [ https://issues.apache.org/jira/browse/LUCENE-1350?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12620521#action_12620521 ]

Doron Cohen commented on LUCENE-1350:
-------------------------------------

{quote}
I think that you should expand this issue to fix all the other TokenFilters that have the same problem. And to also propagate the flags as well. {quote}

Agree. I will rename the issue accordingly.

{quote}
As you found out a fix is trivial. If you want, I'd be happy to work up a patch for it.
{quote}

This would be great, thanks!  (would it fit in a single patch file, please? :-))


> SnowballFilter resets the payload
> ---------------------------------
>
>                 Key: LUCENE-1350
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1350
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Analysis, contrib/*
>            Reporter: Doron Cohen
>            Assignee: Doron Cohen
>         Attachments: LUCENE-1350.patch
>
>
> Passing tokens with payloads through SnowballFilter results in tokens with no payloads.
> A workaround for this is to apply stemming first and only then run whatever logic creates the payload, but this is not always convenient.
> Patch to follow that preserves the payload.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[jira] Updated: (LUCENE-1350) Filters which are "consumers" should not reset the payload or flags and should better reuse the token

Sebastian Nagel (Jira)
In reply to this post by Sebastian Nagel (Jira)

     [ https://issues.apache.org/jira/browse/LUCENE-1350?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Doron Cohen updated LUCENE-1350:
--------------------------------

      Description:
Passing tokens with payloads through SnowballFilter results in tokens with no payloads.
A workaround for this is to apply stemming first and only then run whatever logic creates the payload, but this is not always convenient.

Other "consumer" filters have similar problem.

These filters can - and should - reuse the token, by implementing next(Token), effectively also fixing the unwanted resetting.


  was:
Passing tokens with payloads through SnowballFilter results in tokens with no payloads.
A workaround for this is to apply stemming first and only then run whatever logic creates the payload, but this is not always convenient.
Patch to follow that preserves the payload.


    Lucene Fields: [New, Patch Available]  (was: [Patch Available, New])
    Fix Version/s: 2.3.3
          Summary: Filters which are "consumers" should not reset the payload or flags and should better reuse the token  (was: SnowballFilter resets the payload)

> Filters which are "consumers" should not reset the payload or flags and should better reuse the token
> -----------------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-1350
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1350
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Analysis, contrib/*
>            Reporter: Doron Cohen
>            Assignee: Doron Cohen
>             Fix For: 2.3.3
>
>         Attachments: LUCENE-1350.patch
>
>
> Passing tokens with payloads through SnowballFilter results in tokens with no payloads.
> A workaround for this is to apply stemming first and only then run whatever logic creates the payload, but this is not always convenient.
> Other "consumer" filters have similar problem.
> These filters can - and should - reuse the token, by implementing next(Token), effectively also fixing the unwanted resetting.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (LUCENE-1350) Filters which are "consumers" should not reset the payload or flags and should better reuse the token

Sebastian Nagel (Jira)
In reply to this post by Sebastian Nagel (Jira)

    [ https://issues.apache.org/jira/browse/LUCENE-1350?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12620970#action_12620970 ]

Michael McCandless commented on LUCENE-1350:
--------------------------------------------


It seems like there are three different things, here:

  # Many filters (eg SnowballFilter) incorrectly erase the Payload,
    token Type and token flags, because they are basically doing
    their own Token cloning.  This is pre-existing (before re-use API
    was created).
  # Separately, these filters do not use the re-use API, which we are
    wanting to migrate to anyway.
  # Adding new "reuse" methods on Token which are like clear() except
    they also take args to replace the termBuffer, start/end offset,
    etc, and they do not clear the payload/flags to their defaults.

Since in LUCENE-1333 we are aggressively moving all Lucene core &
contrib TokenStream & TokenFilters to use the re-use API (formally
deprecating the original non-reuse API), we may as well fix 1 & 2 at
once.

I think the reuse API proposal is reasonable: it mirrors the current
constructors on Token.  But, since we are migrating to reuse api, you
need the analog (of all these constructors) without making a new
Token.

But maybe change the name from "reuse" to maybe "update", "set",
"reset", "reinit", or "change"?  But: I think this method should still
reset payload, position incr, etc, to defaults?  Ie calling this
method should get you the same result as creating a new Token(...)
passing in the termBuffer, start/end offset, etc, I think?

Should we just absorb this issue into LUCENE-1333?  DM, of your list
above (of filters that lose payload), are there any that are not fixed
in LUCENE-1333?  I'm confused on the overlap and it's hard to work
with all the patches.  Actually if in LUCENE-1333 you could
consolidate down to a single patch (big toplevel "svn diff"), that'd
be great :)


> Filters which are "consumers" should not reset the payload or flags and should better reuse the token
> -----------------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-1350
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1350
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Analysis, contrib/*
>            Reporter: Doron Cohen
>            Assignee: Doron Cohen
>             Fix For: 2.3.3
>
>         Attachments: LUCENE-1350.patch
>
>
> Passing tokens with payloads through SnowballFilter results in tokens with no payloads.
> A workaround for this is to apply stemming first and only then run whatever logic creates the payload, but this is not always convenient.
> Other "consumer" filters have similar problem.
> These filters can - and should - reuse the token, by implementing next(Token), effectively also fixing the unwanted resetting.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (LUCENE-1350) Filters which are "consumers" should not reset the payload or flags and should better reuse the token

Sebastian Nagel (Jira)
In reply to this post by Sebastian Nagel (Jira)

    [ https://issues.apache.org/jira/browse/LUCENE-1350?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12621041#action_12621041 ]

Doron Cohen commented on LUCENE-1350:
-------------------------------------

Mike, thanks for clearing things...

You're right - this is fixed by LUCENE-1333.
If LUCENE-1333 gets committed soon there's no point in
doing this here, just making more work for DM in reworking 1333.
The only motivation to do this is if there will be another
fix release 2.3.3.3, in which case it would make sense to
fix this issue, but not the deprecation of the non-reuse
API done by 1333. Or do you agree with DM that since payloads
and flags are marked experimental they can remain broken
(in regard of this issue) until 2.4? (I not perfect, but I can
live with it).

For the reuse methods names, I like *reinit()*...


> Filters which are "consumers" should not reset the payload or flags and should better reuse the token
> -----------------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-1350
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1350
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Analysis, contrib/*
>            Reporter: Doron Cohen
>            Assignee: Doron Cohen
>             Fix For: 2.3.3
>
>         Attachments: LUCENE-1350.patch
>
>
> Passing tokens with payloads through SnowballFilter results in tokens with no payloads.
> A workaround for this is to apply stemming first and only then run whatever logic creates the payload, but this is not always convenient.
> Other "consumer" filters have similar problem.
> These filters can - and should - reuse the token, by implementing next(Token), effectively also fixing the unwanted resetting.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[jira] Issue Comment Edited: (LUCENE-1350) Filters which are "consumers" should not reset the payload or flags and should better reuse the token

Sebastian Nagel (Jira)
In reply to this post by Sebastian Nagel (Jira)

    [ https://issues.apache.org/jira/browse/LUCENE-1350?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12621041#action_12621041 ]

doronc edited comment on LUCENE-1350 at 8/8/08 1:18 PM:
-------------------------------------------------------------

Mike, thanks for clearing things...

You're right - this is fixed by LUCENE-1333.
If LUCENE-1333 gets committed soon there's no point in
doing this here, just making more work for DM in reworking 1333.
The only motivation to do this is if there will be another
fix release 2.3.3.3, in which case it would make sense to
fix this issue, but not the deprecation of the non-reuse
API done by 1333. Or do you agree with DM that since payloads
and flags are marked experimental they can remain broken
(in regard of this issue) until 2.4? (not perfect, but I can
live with it).

For the reuse methods names, I like *reinit()*...


      was (Author: doronc):
    Mike, thanks for clearing things...

You're right - this is fixed by LUCENE-1333.
If LUCENE-1333 gets committed soon there's no point in
doing this here, just making more work for DM in reworking 1333.
The only motivation to do this is if there will be another
fix release 2.3.3.3, in which case it would make sense to
fix this issue, but not the deprecation of the non-reuse
API done by 1333. Or do you agree with DM that since payloads
and flags are marked experimental they can remain broken
(in regard of this issue) until 2.4? (I not perfect, but I can
live with it).

For the reuse methods names, I like *reinit()*...

 

> Filters which are "consumers" should not reset the payload or flags and should better reuse the token
> -----------------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-1350
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1350
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Analysis, contrib/*
>            Reporter: Doron Cohen
>            Assignee: Doron Cohen
>             Fix For: 2.3.3
>
>         Attachments: LUCENE-1350.patch
>
>
> Passing tokens with payloads through SnowballFilter results in tokens with no payloads.
> A workaround for this is to apply stemming first and only then run whatever logic creates the payload, but this is not always convenient.
> Other "consumer" filters have similar problem.
> These filters can - and should - reuse the token, by implementing next(Token), effectively also fixing the unwanted resetting.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[jira] Updated: (LUCENE-1350) Filters which are "consumers" should not reset the payload or flags and should better reuse the token

Sebastian Nagel (Jira)
In reply to this post by Sebastian Nagel (Jira)

     [ https://issues.apache.org/jira/browse/LUCENE-1350?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

DM Smith updated LUCENE-1350:
-----------------------------

    Attachment: LUCENE-1350.patch

{quote}
Should we just absorb this issue into LUCENE-1333? DM, of your list
above (of filters that lose payload), are there any that are not fixed
in LUCENE-1333? I'm confused on the overlap and it's hard to work
with all the patches. Actually if in LUCENE-1333 you could
consolidate down to a single patch (big toplevel "svn diff"), that'd
be great
{quote}

LUCENE-1333 will have to include all of this. I have already created a patch for LUCENE-1350 and LUCENE-1333, which satisfies this requirement. If LUCENE-1350 goes first, then the patch for LUCENE-1333 will need to be re-built. If LUCENE-1333 goes first then this one can be closed.

I really don't care which is done first. If both are going to be in the next release, then I think just do LUCENE-1333. But if for some reason, we are going to do a release before 2.4 and only LUCENE-1350 is going in it, then that's fine with me.

As to the effort I have already done the work. And I was happy to do it :)

> Filters which are "consumers" should not reset the payload or flags and should better reuse the token
> -----------------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-1350
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1350
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Analysis, contrib/*
>            Reporter: Doron Cohen
>            Assignee: Doron Cohen
>             Fix For: 2.3.3
>
>         Attachments: LUCENE-1350.patch, LUCENE-1350.patch
>
>
> Passing tokens with payloads through SnowballFilter results in tokens with no payloads.
> A workaround for this is to apply stemming first and only then run whatever logic creates the payload, but this is not always convenient.
> Other "consumer" filters have similar problem.
> These filters can - and should - reuse the token, by implementing next(Token), effectively also fixing the unwanted resetting.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (LUCENE-1350) Filters which are "consumers" should not reset the payload or flags and should better reuse the token

Sebastian Nagel (Jira)
In reply to this post by Sebastian Nagel (Jira)

    [ https://issues.apache.org/jira/browse/LUCENE-1350?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12621175#action_12621175 ]

Doron Cohen commented on LUCENE-1350:
-------------------------------------

DM Thanks for taking care of this large change!
By Mike's comments on LUCENE-1333 seems LUCENE-1333 will
be committed and this one will be canceled so I feel kinda bad for
the time you put in the last patch here.

> Filters which are "consumers" should not reset the payload or flags and should better reuse the token
> -----------------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-1350
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1350
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Analysis, contrib/*
>            Reporter: Doron Cohen
>            Assignee: Doron Cohen
>             Fix For: 2.3.3
>
>         Attachments: LUCENE-1350.patch, LUCENE-1350.patch
>
>
> Passing tokens with payloads through SnowballFilter results in tokens with no payloads.
> A workaround for this is to apply stemming first and only then run whatever logic creates the payload, but this is not always convenient.
> Other "consumer" filters have similar problem.
> These filters can - and should - reuse the token, by implementing next(Token), effectively also fixing the unwanted resetting.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[jira] Updated: (LUCENE-1350) Filters which are "consumers" should not reset the payload or flags and should better reuse the token

Sebastian Nagel (Jira)
In reply to this post by Sebastian Nagel (Jira)

     [ https://issues.apache.org/jira/browse/LUCENE-1350?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Doron Cohen updated LUCENE-1350:
--------------------------------

    Attachment: LUCENE-1350-test.patch

Attaching the test which fails now but is fixed with LUCENE-1333.


> Filters which are "consumers" should not reset the payload or flags and should better reuse the token
> -----------------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-1350
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1350
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Analysis, contrib/*
>            Reporter: Doron Cohen
>            Assignee: Doron Cohen
>             Fix For: 2.3.3
>
>         Attachments: LUCENE-1350-test.patch, LUCENE-1350.patch, LUCENE-1350.patch
>
>
> Passing tokens with payloads through SnowballFilter results in tokens with no payloads.
> A workaround for this is to apply stemming first and only then run whatever logic creates the payload, but this is not always convenient.
> Other "consumer" filters have similar problem.
> These filters can - and should - reuse the token, by implementing next(Token), effectively also fixing the unwanted resetting.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[jira] Updated: (LUCENE-1350) Filters which are "consumers" should not reset the payload or flags and should better reuse the token

Sebastian Nagel (Jira)
In reply to this post by Sebastian Nagel (Jira)

     [ https://issues.apache.org/jira/browse/LUCENE-1350?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Mark Miller updated LUCENE-1350:
--------------------------------

    Fix Version/s: 2.4

> Filters which are "consumers" should not reset the payload or flags and should better reuse the token
> -----------------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-1350
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1350
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Analysis, contrib/*
>            Reporter: Doron Cohen
>            Assignee: Doron Cohen
>             Fix For: 2.3.3, 2.4
>
>         Attachments: LUCENE-1350-test.patch, LUCENE-1350.patch, LUCENE-1350.patch
>
>
> Passing tokens with payloads through SnowballFilter results in tokens with no payloads.
> A workaround for this is to apply stemming first and only then run whatever logic creates the payload, but this is not always convenient.
> Other "consumer" filters have similar problem.
> These filters can - and should - reuse the token, by implementing next(Token), effectively also fixing the unwanted resetting.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[jira] Resolved: (LUCENE-1350) Filters which are "consumers" should not reset the payload or flags and should better reuse the token

Sebastian Nagel (Jira)
In reply to this post by Sebastian Nagel (Jira)

     [ https://issues.apache.org/jira/browse/LUCENE-1350?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Michael McCandless resolved LUCENE-1350.
----------------------------------------

       Resolution: Duplicate
    Fix Version/s:     (was: 2.3.3)
    Lucene Fields: [New, Patch Available]  (was: [Patch Available, New])

Isn't this one now a dup of LUCENE-1333?

> Filters which are "consumers" should not reset the payload or flags and should better reuse the token
> -----------------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-1350
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1350
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Analysis, contrib/*
>            Reporter: Doron Cohen
>            Assignee: Doron Cohen
>             Fix For: 2.4
>
>         Attachments: LUCENE-1350-test.patch, LUCENE-1350.patch, LUCENE-1350.patch
>
>
> Passing tokens with payloads through SnowballFilter results in tokens with no payloads.
> A workaround for this is to apply stemming first and only then run whatever logic creates the payload, but this is not always convenient.
> Other "consumer" filters have similar problem.
> These filters can - and should - reuse the token, by implementing next(Token), effectively also fixing the unwanted resetting.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (LUCENE-1350) Filters which are "consumers" should not reset the payload or flags and should better reuse the token

Sebastian Nagel (Jira)
In reply to this post by Sebastian Nagel (Jira)

    [ https://issues.apache.org/jira/browse/LUCENE-1350?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12628158#action_12628158 ]

Doron Cohen commented on LUCENE-1350:
-------------------------------------

Yes it is a dup, thanks Mike for taking care of this (I planned to do this yesterday but didn't make it)

> Filters which are "consumers" should not reset the payload or flags and should better reuse the token
> -----------------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-1350
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1350
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Analysis, contrib/*
>            Reporter: Doron Cohen
>            Assignee: Doron Cohen
>             Fix For: 2.4
>
>         Attachments: LUCENE-1350-test.patch, LUCENE-1350.patch, LUCENE-1350.patch
>
>
> Passing tokens with payloads through SnowballFilter results in tokens with no payloads.
> A workaround for this is to apply stemming first and only then run whatever logic creates the payload, but this is not always convenient.
> Other "consumer" filters have similar problem.
> These filters can - and should - reuse the token, by implementing next(Token), effectively also fixing the unwanted resetting.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]