who clears attributes?

classic Classic list List threaded Threaded
61 messages Options
1234
Reply | Threaded
Open this post in threaded view
|

who clears attributes?

Yonik Seeley-2-2
CharTokenizer.incrementToken() clears *all* attributes in the entire
tokenizer chain.
StandardTokenizer.incrementToken() clears only the term attribute.

So... which is right?  Seems like the tokenizer should be responsible?

On a performance related note, CharTokenizer.clearAttribtes() could be
more efficient - 2 new objects (the unmodifiable map and the iterator
object) are created for every incrementToken.

-Yonik
http://www.lucidimagination.com

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

RE: who clears attributes?

Uwe Schindler
I already removed the unmodifiable iterator, so one new instance is removed
(see the JIRA issue). But you are right, the CharTokenizer should only clear
the TermAttribute, as it is only using this attribute.

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: [hidden email]

> -----Original Message-----
> From: [hidden email] [mailto:[hidden email]] On Behalf Of Yonik
> Seeley
> Sent: Monday, August 10, 2009 6:01 PM
> To: [hidden email]
> Subject: who clears attributes?
>
> CharTokenizer.incrementToken() clears *all* attributes in the entire
> tokenizer chain.
> StandardTokenizer.incrementToken() clears only the term attribute.
>
> So... which is right?  Seems like the tokenizer should be responsible?
>
> On a performance related note, CharTokenizer.clearAttribtes() could be
> more efficient - 2 new objects (the unmodifiable map and the iterator
> object) are created for every incrementToken.
>
> -Yonik
> http://www.lucidimagination.com
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]



---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

RE: who clears attributes?

Uwe Schindler
In my opinion, it is completely unneeded to clear the attributes in
CharTokenizer. The TermAttribute and OffsetAttribute is always initialized
correctly (at least set to termLength gets 0), when incrementToken() returns
true.

I would simply remove the call to clearAttributes() at all.

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: [hidden email]

> -----Original Message-----
> From: Uwe Schindler [mailto:[hidden email]]
> Sent: Monday, August 10, 2009 6:44 PM
> To: [hidden email]; [hidden email]
> Subject: RE: who clears attributes?
>
> I already removed the unmodifiable iterator, so one new instance is
> removed
> (see the JIRA issue). But you are right, the CharTokenizer should only
> clear
> the TermAttribute, as it is only using this attribute.
>
> -----
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: [hidden email]
>
> > -----Original Message-----
> > From: [hidden email] [mailto:[hidden email]] On Behalf Of Yonik
> > Seeley
> > Sent: Monday, August 10, 2009 6:01 PM
> > To: [hidden email]
> > Subject: who clears attributes?
> >
> > CharTokenizer.incrementToken() clears *all* attributes in the entire
> > tokenizer chain.
> > StandardTokenizer.incrementToken() clears only the term attribute.
> >
> > So... which is right?  Seems like the tokenizer should be responsible?
> >
> > On a performance related note, CharTokenizer.clearAttribtes() could be
> > more efficient - 2 new objects (the unmodifiable map and the iterator
> > object) are created for every incrementToken.
> >
> > -Yonik
> > http://www.lucidimagination.com
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: [hidden email]
> > For additional commands, e-mail: [hidden email]
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]



---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: who clears attributes?

Yonik Seeley-2-2
In reply to this post by Uwe Schindler
On Mon, Aug 10, 2009 at 12:44 PM, Uwe Schindler<[hidden email]> wrote:
>the CharTokenizer should only clear the TermAttribute, as it is only using this attribute.

It's certainly not clear to me - is there an established convention?
Either Tokenizer clears all attributes, or each tokenizer clears those
attributes it cares about.  But in the latter case, wouldn't that
potentially cause multiple TokenFilters to clear the same attribute?

-Yonik
http://www.lucidimagination.com

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: who clears attributes?

Yonik Seeley-2-2
> , or each tokenizer
should read "or each Tokenizer or TokenFilter"


On Mon, Aug 10, 2009 at 12:55 PM, Yonik
Seeley<[hidden email]> wrote:

> On Mon, Aug 10, 2009 at 12:44 PM, Uwe Schindler<[hidden email]> wrote:
>>the CharTokenizer should only clear the TermAttribute, as it is only using this attribute.
>
> It's certainly not clear to me - is there an established convention?
> Either Tokenizer clears all attributes, or each tokenizer clears those
> attributes it cares about.  But in the latter case, wouldn't that
> potentially cause multiple TokenFilters to clear the same attribute?
>
> -Yonik
> http://www.lucidimagination.com
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

RE: who clears attributes?

Uwe Schindler
In reply to this post by Yonik Seeley-2-2
> On Mon, Aug 10, 2009 at 12:44 PM, Uwe Schindler<[hidden email]> wrote:
> >the CharTokenizer should only clear the TermAttribute, as it is only
> using this attribute.

I changed this in the latest patch for
https://issues.apache.org/jira/browse/LUCENE-1796

> It's certainly not clear to me - is there an established convention?
> Either Tokenizer clears all attributes, or each tokenizer clears those
> attributes it cares about.  But in the latter case, wouldn't that
> potentially cause multiple TokenFilters to clear the same attribute?

Clearing attributes in TokenFilters is not the best. The problem is, that
calling clear() on an AttributeImpl may not only clear the directly
referenced values, the multi-attribute implementations like
Token/TokenWrapper currently used, always clear all 6 standard attributes.
Because of this, I would only clear attributes in TokenStream/Tokenizer, but
then per default for all Tokenizers. Maybe we should implement this. The
problem with that is still the iterator creation, but I have no better
solution as Maps only work with iterators for enumerating values... :(

Uwe


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: who clears attributes?

Yonik Seeley-2-2
Thinking through this a little more, I don't see an alternative to the
tokenizer clearing all attributes at the start of incrementToken().

Consider a DefaultPayloadTokenFilter that only sets a payload if one
isn't already set - it's clear that this filter can't clear the
payload attribute, so it must be cleared by the head of the chain -
the tokenizer.  Right?

-Yonik
http://www.lucidimagination.com

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: who clears attributes?

Michael Busch
Clearing the attributes should be required in those places where we
cleared (or reinit'ed) Token previously, right?

  Michael

On 8/10/09 10:42 AM, Yonik Seeley wrote:

> Thinking through this a little more, I don't see an alternative to the
> tokenizer clearing all attributes at the start of incrementToken().
>
> Consider a DefaultPayloadTokenFilter that only sets a payload if one
> isn't already set - it's clear that this filter can't clear the
> payload attribute, so it must be cleared by the head of the chain -
> the tokenizer.  Right?
>
> -Yonik
> http://www.lucidimagination.com
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>
>    


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

RE: who clears attributes?

Uwe Schindler
In reply to this post by Yonik Seeley-2-2
Yes. Is there a way to enforce this for all Tokenizers automatically? As
incrementToken() will be abstract in 3.0, there cannot be a default impl. So
all Tokenizers should call clearAttributes() as first call in
incrementToken().

Then we have still the problem of the slow iterator creation (which was
speed up a little bit by removing the unmodifiable wrapper). This can be
solved by using an additional ArrayList in AttributeSource that gets all
AttributeImpl instances, but this would bring an additional initialization
cost() on creating the Tokenizer chain.

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: [hidden email]


> -----Original Message-----
> From: [hidden email] [mailto:[hidden email]] On Behalf Of Yonik
> Seeley
> Sent: Monday, August 10, 2009 7:42 PM
> To: [hidden email]
> Subject: Re: who clears attributes?
>
> Thinking through this a little more, I don't see an alternative to the
> tokenizer clearing all attributes at the start of incrementToken().
>
> Consider a DefaultPayloadTokenFilter that only sets a payload if one
> isn't already set - it's clear that this filter can't clear the
> payload attribute, so it must be cleared by the head of the chain -
> the tokenizer.  Right?
>
> -Yonik
> http://www.lucidimagination.com
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]



---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: who clears attributes?

Earwin Burrfoot
I'll deviate from the topic somewhat.
What are exact benefits that new tokenstream API yields? Are we sure
we want it released with 2.9?
By now I only see various elaborate problems, but haven't seen a
single piece of code becoming simpler.

On Mon, Aug 10, 2009 at 21:50, Uwe Schindler<[hidden email]> wrote:

> Yes. Is there a way to enforce this for all Tokenizers automatically? As
> incrementToken() will be abstract in 3.0, there cannot be a default impl. So
> all Tokenizers should call clearAttributes() as first call in
> incrementToken().
>
> Then we have still the problem of the slow iterator creation (which was
> speed up a little bit by removing the unmodifiable wrapper). This can be
> solved by using an additional ArrayList in AttributeSource that gets all
> AttributeImpl instances, but this would bring an additional initialization
> cost() on creating the Tokenizer chain.
>
> -----
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: [hidden email]
>
>
>> -----Original Message-----
>> From: [hidden email] [mailto:[hidden email]] On Behalf Of Yonik
>> Seeley
>> Sent: Monday, August 10, 2009 7:42 PM
>> To: [hidden email]
>> Subject: Re: who clears attributes?
>>
>> Thinking through this a little more, I don't see an alternative to the
>> tokenizer clearing all attributes at the start of incrementToken().
>>
>> Consider a DefaultPayloadTokenFilter that only sets a payload if one
>> isn't already set - it's clear that this filter can't clear the
>> payload attribute, so it must be cleared by the head of the chain -
>> the tokenizer.  Right?
>>
>> -Yonik
>> http://www.lucidimagination.com
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [hidden email]
>> For additional commands, e-mail: [hidden email]
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>



--
Kirill Zakharenko/Кирилл Захаренко ([hidden email])
Home / Mobile: +7 (495) 683-567-4 / +7 (903) 5-888-423
ICQ: 104465785

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: who clears attributes?

Grant Ingersoll-2

On Aug 10, 2009, at 2:00 PM, Earwin Burrfoot wrote:

> I'll deviate from the topic somewhat.
> What are exact benefits that new tokenstream API yields? Are we sure
> we want it released with 2.9?
> By now I only see various elaborate problems, but haven't seen a
> single piece of code becoming simpler.

In theory, it sets up for more indexing/searching possibilities in  
3.0, but in the meantime, it is proving to be quite problematic due to  
back compatibility restrictions.

I have serious doubts about releasing this new API until these  
performance issues are resolved and better proven out from a usability  
standpoint.
It simply is too much to swallow for most users, as Analyzers/
TokenStreams/etc. are easily the most common place for people to  
inject their own capabilities and there is no way we should be
taking a 30% hit in performance for some theoretical speed up and new  
search capability 1 year from now.

I'm almost thinking we should have a 2.5 release instead of 2.9.  I  
know, that stinks, because we all want to get onto 3.0, but the fact  
is, 2.9 was _SUPPOSED_ to be a deprecation release,
when in reality it probably has as many changes as 2.3 did and it has  
a lot of back compatibility breakages.  Going to 2.5 would give this  
token stuff a chance to marinate, as well as
all the per segment changes and the NRT stuff.  Just a thought.

-Grant

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: who clears attributes?

Mark Miller-3
Grant Ingersoll wrote:
>
> On Aug 10, 2009, at 2:00 PM, Earwin Burrfoot wrote:
>
>  2.9 was _SUPPOSED_ to be a deprecation release,
Whats a deprecation release? We deprecate stuff in every release ...
does it make sense to do a release just to deprecate anything we might
not have yet? And if you add deprecations, wouldn't you add features to
move to?

I'm not a fan of 3.0 just being 2.9 with deprecations removed either.
Why not add new features as well? Sure, we should be *way* more careful
about breaking back compat there, but who cares if a few features are
introduced? Doing a release is a lot of project steam - why waste it ?

--
- Mark

http://www.lucidimagination.com




---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: who clears attributes?

Michael Busch
In reply to this post by Grant Ingersoll-2
I think we should change the backwards-compatibility policy as proposed
in LUCENE-1698 and remove some deprecated things (inlcuding the old
TokenStream API, maybe query parser) in 3.1, not 3.0.
I don't think we should have a 2.5 release - this clearly shows the
disadvantages of our current bw-policy.

  Michael

On 8/10/09 11:50 AM, Grant Ingersoll wrote:

>
> On Aug 10, 2009, at 2:00 PM, Earwin Burrfoot wrote:
>
>> I'll deviate from the topic somewhat.
>> What are exact benefits that new tokenstream API yields? Are we sure
>> we want it released with 2.9?
>> By now I only see various elaborate problems, but haven't seen a
>> single piece of code becoming simpler.
>
> In theory, it sets up for more indexing/searching possibilities in
> 3.0, but in the meantime, it is proving to be quite problematic due to
> back compatibility restrictions.
>
> I have serious doubts about releasing this new API until these
> performance issues are resolved and better proven out from a usability
> standpoint.
> It simply is too much to swallow for most users, as
> Analyzers/TokenStreams/etc. are easily the most common place for
> people to inject their own capabilities and there is no way we should be
> taking a 30% hit in performance for some theoretical speed up and new
> search capability 1 year from now.
>
> I'm almost thinking we should have a 2.5 release instead of 2.9.  I
> know, that stinks, because we all want to get onto 3.0, but the fact
> is, 2.9 was _SUPPOSED_ to be a deprecation release,
> when in reality it probably has as many changes as 2.3 did and it has
> a lot of back compatibility breakages.  Going to 2.5 would give this
> token stuff a chance to marinate, as well as
> all the per segment changes and the NRT stuff.  Just a thought.
>
> -Grant
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

2.5 versus 2.9, was Re: who clears attributes?

Grant Ingersoll-2

On Aug 10, 2009, at 3:06 PM, Michael Busch wrote:

> I think we should change the backwards-compatibility policy as  
> proposed in LUCENE-1698 and remove some deprecated things (inlcuding  
> the old TokenStream API, maybe query parser) in 3.1, not 3.0.

Maybe.  I'm not convinced yet that the current QP should go away  
either.  The new QP sounds good and all paper, but from the looks of  
it at first glance, it is complicated (a whole package filled with  
classes just to implement the old QP versus a single JavaCC file and a  
single Java class), while the old one has been around for a long time  
and seen a lot of field use (and admittedly has warts), whereas both  
the new QP and the new Token stuff are essentially last minute  
additions to a last minor release right before we are about to do a  
major release and remove a whole bunch of deprecated APIs and  
essentially commit to these new ways of doing things without any field  
testing.    And I haven't even mentioned the new per-segment stuff yet.

This is my reason for suggesting 2.5.  It gives us some real running  
time before we have to commit to them.  If I had to vote on 2.9 today  
in light of what it means for 3.0, it likely would be -1.

-Grant



---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: who clears attributes?

Mark Miller-3
In reply to this post by Michael Busch
Michael Busch wrote:
> I think we should change the backwards-compatibility policy as
> proposed in LUCENE-1698 and remove some deprecated things (inlcuding
> the old TokenStream API, maybe query parser) in 3.1, not 3.0.
> I don't think we should have a 2.5 release - this clearly shows the
> disadvantages of our current bw-policy.
>
>  Michael
>
I think the only advantage to that policy is to save major number space
(it will take us longer to get to Lucene 10) - and the disadvantages are
laid out in the comments.

If we find we have a lot we need to remove after 3.0, jumping to Lucene
4 makes the most sense to me.

I still like the idea of at least *attempting* back compat between major
versions - its much more intuitive than the every other minor stuff.

--
- Mark

http://www.lucidimagination.com




---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: 2.5 versus 2.9, was Re: who clears attributes?

Michael Busch
In reply to this post by Grant Ingersoll-2
You didn't really comment on my proposal: I suggested to not remove the
old Token API and old queryparser in 3.0. Instead with 3.0 change the
bw-policy, so that we can remove deprecated things in minor releases
(e.g. 3.1 in this case).

I think your 2.5 proposal has drawbacks: if we release 2.5 now to test
the new major features in the field, then do you want to stop adding new
features to trunk until we release 2.9 to not have the same situation
then again? How long should this testing in the field take?

  Michael

On 8/10/09 12:26 PM, Grant Ingersoll wrote:

>
> On Aug 10, 2009, at 3:06 PM, Michael Busch wrote:
>
>> I think we should change the backwards-compatibility policy as
>> proposed in LUCENE-1698 and remove some deprecated things (inlcuding
>> the old TokenStream API, maybe query parser) in 3.1, not 3.0.
>
> Maybe.  I'm not convinced yet that the current QP should go away
> either.  The new QP sounds good and all paper, but from the looks of
> it at first glance, it is complicated (a whole package filled with
> classes just to implement the old QP versus a single JavaCC file and a
> single Java class), while the old one has been around for a long time
> and seen a lot of field use (and admittedly has warts), whereas both
> the new QP and the new Token stuff are essentially last minute
> additions to a last minor release right before we are about to do a
> major release and remove a whole bunch of deprecated APIs and
> essentially commit to these new ways of doing things without any field
> testing.    And I haven't even mentioned the new per-segment stuff yet.
>
> This is my reason for suggesting 2.5.  It gives us some real running
> time before we have to commit to them.  If I had to vote on 2.9 today
> in light of what it means for 3.0, it likely would be -1.
>
> -Grant
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: who clears attributes?

Earwin Burrfoot
In reply to this post by Grant Ingersoll-2
On Mon, Aug 10, 2009 at 22:50, Grant Ingersoll<[hidden email]> wrote:

>
> On Aug 10, 2009, at 2:00 PM, Earwin Burrfoot wrote:
>
>> I'll deviate from the topic somewhat.
>> What are exact benefits that new tokenstream API yields? Are we sure
>> we want it released with 2.9?
>> By now I only see various elaborate problems, but haven't seen a
>> single piece of code becoming simpler.
>
> In theory, it sets up for more indexing/searching possibilities in 3.0, but
> in the meantime, it is proving to be quite problematic due to back
> compatibility restrictions.
I'm not quite sure which exact indexing/searching possibilities does
the new API open for us.
Some new ways of handling text? Okay, I'd like each token to have one
more number in addition to posIncr, so I can have my 'true multiword
synonyms'. Maybe, just maybe, there will be a pair of other
extensions. Usecases here are really scarce. Plus, if they're
successful/useful, they will most probably be included out of the box,
so we don't need much flexibility here.
Something other than text? Numbers, with good rangequeries. Dates.
Spatial data. Your-type-here. For these, flexible text-processing
stream-oriented API is totally useless.

> I have serious doubts about releasing this new API until these performance
> issues are resolved and better proven out from a usability standpoint.
> It simply is too much to swallow for most users, as
> Analyzers/TokenStreams/etc. are easily the most common place for people to
> inject their own capabilities and there is no way we should be
> taking a 30% hit in performance for some theoretical speed up and new search
> capability 1 year from now.
I have a feeling that best idea, before more damage is done, is to
rollback this new API, store the patch, and try rolling it out once
again, when we have usecases/more code to justify it.

--
Kirill Zakharenko/Кирилл Захаренко ([hidden email])
Home / Mobile: +7 (495) 683-567-4 / +7 (903) 5-888-423
ICQ: 104465785

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

RE: who clears attributes?

Uwe Schindler
In reply to this post by Grant Ingersoll-2
Hi Grant,

> I have serious doubts about releasing this new API until these
> performance issues are resolved and better proven out from a usability
> standpoint.

I think LUCENE-1796 has fixed the performance problems, which was caused by
a missing reflection-cache needed for bw compatibility. I hope to commit
soon!

2.9 may be a little bit slower when you mix old and new API and do not reuse
Tokenizers (but Robert is already adding reusableTokenStream to all contrib
analyzers). When the backwards layer is removed completely or
setOnlyUseNewAPI is enabled, there is no speed impact at all.

Michael: The TokenWrapper added cost was there in 2.9 before the TokenStream
overhaul, too, as the TokenWrapper-like code was there implemented
similarily inside DocInverter.

Uwe


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: who clears attributes?

Michael Busch
On 8/10/09 12:52 PM, Uwe Schindler wrote:
> Michael: The TokenWrapper added cost was there in 2.9 before the TokenStream
> overhaul, too, as the TokenWrapper-like code was there implemented
> similarily inside DocInverter.
>
>    

You're right. It will only be more costly in case you mix multiple old
and new TokenStreams in a chain. Then the delegation is done more than once.

  Michael

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

RE: who clears attributes?

Uwe Schindler
But TokenWrapper is used there every time, it is not used for delegating,
only for exchanging the inner Token instance.

The delegation cost are there because a Filter implementing the old-API in
front of a new-API-Tokenizer would need to be wrapped 2 times: DocInverter
-> oldAPIFilter.incrementToken() [bw layer] -> oldAPIFilter.next(Token)
[native old-style impl] -> newAPIFilter.next(Token) [bw-layer] ->
newAPIFilter.incrementToken() [native new-style impl]

If both filters would only implement new API there would be direct calls
from the filter to the input TokenStream. If all streams/filters would
implement only the old API, the bw-delegation would only be used for the
incrementToken() calls from DocInverter.

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: [hidden email]


> -----Original Message-----
> From: Michael Busch [mailto:[hidden email]]
> Sent: Monday, August 10, 2009 9:58 PM
> To: [hidden email]
> Subject: Re: who clears attributes?
>
> On 8/10/09 12:52 PM, Uwe Schindler wrote:
> > Michael: The TokenWrapper added cost was there in 2.9 before the
> TokenStream
> > overhaul, too, as the TokenWrapper-like code was there implemented
> > similarily inside DocInverter.
> >
> >
>
> You're right. It will only be more costly in case you mix multiple old
> and new TokenStreams in a chain. Then the delegation is done more than
> once.
>
>   Michael
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]



---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

1234