Cloning TermAttribute objects

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Cloning TermAttribute objects

Adriano Crestani
Hi,

Why TermAttributeImpl.clone() method uses buff.clone() instead of
System.arrayCopy to clone its internal buffer? Performance reasons?

I have the following scenario:

...
public boolean incrementToken() {
...
String twoHundredKCharsString = "abc....";
String smallString = "test";

termAttribute.setTermBuffer(twoHundredKCharsString);
State largeStringState = captureState();

termAttribute.setTermBuffer(smallString);
State smallStringState = captureState();

...
}
...

And guess what?! smallStringState has a TermAttribute object that
holds an internal buffer of 200k chars in size!!!

I was googling and found out that using cloning and arrayCopy has the
same performance for small arrays, and cloning just performs better
for large arrays.

So, if large string inputs are not a real scenario, why not use
arrayCopy instead of clone? But in case it's a real scenario, Lucene
should definitely not be copying the entire buffer for small strings.

Maybe TermAttribute interface could expose a method like
shrinkBuffer(), so the user could invoke when it needs to.

Thoughts?

Best Regards,
Adriano Crestani

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Cloning TermAttribute objects

Adriano Crestani-2
Keeping this thread alive.

I would appreciate a response from the community about this issue.

Thanks in advance,
Adriano Crestani

On Tue, Jul 13, 2010 at 3:59 AM, Adriano Crestani
<[hidden email]> wrote:

> Hi,
>
> Why TermAttributeImpl.clone() method uses buff.clone() instead of
> System.arrayCopy to clone its internal buffer? Performance reasons?
>
> I have the following scenario:
>
> ...
> public boolean incrementToken() {
> ...
> String twoHundredKCharsString = "abc....";
> String smallString = "test";
>
> termAttribute.setTermBuffer(twoHundredKCharsString);
> State largeStringState = captureState();
>
> termAttribute.setTermBuffer(smallString);
> State smallStringState = captureState();
>
> ...
> }
> ...
>
> And guess what?! smallStringState has a TermAttribute object that
> holds an internal buffer of 200k chars in size!!!
>
> I was googling and found out that using cloning and arrayCopy has the
> same performance for small arrays, and cloning just performs better
> for large arrays.
>
> So, if large string inputs are not a real scenario, why not use
> arrayCopy instead of clone? But in case it's a real scenario, Lucene
> should definitely not be copying the entire buffer for small strings.
>
> Maybe TermAttribute interface could expose a method like
> shrinkBuffer(), so the user could invoke when it needs to.
>
> Thoughts?
>
> Best Regards,
> Adriano Crestani
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]