Field constructor, avoiding String.intern()

classic Classic list List threaded Threaded
11 messages Options
Reply | Threaded
Open this post in threaded view
|

Field constructor, avoiding String.intern()

Tatu Saloranta
After profiling in-memory indexing, I noticed that
calls to String.intern() showed up surprisingly high;
especially the one from Field() constructor. This is
understandable due to overhead String.intern() has
(being native and synchronized method; overhead
incurred even if String is already interned), and the
fact this essentially gets called once per
document+field combination.

Now, it would be quite easy to improve things a bit
(in theory), such that most intern() calls could be
avoid, transparent to the calling app; for example,
for each IndexWriter() one could use a simple
HashMap() for caching interned Strings. This approach
is more than twice as fast as directly calling
intern(). One could also use per-thread cache, or
global one; all of which would probably be faster.
However, Field constructor hard-codes call to
intern(), so it would be necessary to add a new
constructor that indicates that field name is known to
be interned.
And there would also need to be a way to invoke the
new optional functionality.

Has anyone tried this approach to see if speedup is
worth the hassle (in my case it'd probably be
something like 2 - 3%, assuming profiler's 5% for
intern() is accurate)?

-+ Tatu +-


__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around
http://mail.yahoo.com 

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Field constructor, avoiding String.intern()

Wolfgang Hoschek-2
I noticed that, too, but in my case the difference was often much  
more extreme: it was one of the primary bottlenecks on indexing. This  
is the primary reason why MemoryIndex.addField(...) navigates around  
the problem by taking a parameter of type "String fieldName" instead  
of type "Field":

        public void addField(String fieldName, TokenStream stream) {
                /*
                 * Note that this method signature avoids having a user call new
                 * o.a.l.d.Field(...) which would be much too expensive due to the
                 * String.intern() usage of that class.
                  */

Wolfgang.

On Feb 14, 2006, at 1:42 PM, Tatu Saloranta wrote:

> After profiling in-memory indexing, I noticed that
> calls to String.intern() showed up surprisingly high;
> especially the one from Field() constructor. This is
> understandable due to overhead String.intern() has
> (being native and synchronized method; overhead
> incurred even if String is already interned), and the
> fact this essentially gets called once per
> document+field combination.
>
> Now, it would be quite easy to improve things a bit
> (in theory), such that most intern() calls could be
> avoid, transparent to the calling app; for example,
> for each IndexWriter() one could use a simple
> HashMap() for caching interned Strings. This approach
> is more than twice as fast as directly calling
> intern(). One could also use per-thread cache, or
> global one; all of which would probably be faster.
> However, Field constructor hard-codes call to
> intern(), so it would be necessary to add a new
> constructor that indicates that field name is known to
> be interned.
> And there would also need to be a way to invoke the
> new optional functionality.
>
> Has anyone tried this approach to see if speedup is
> worth the hassle (in my case it'd probably be
> something like 2 - 3%, assuming profiler's 5% for
> intern() is accurate)?
>
> -+ Tatu +-
>
>
> __________________________________________________
> Do You Yahoo!?
> Tired of spam?  Yahoo! Mail has the best spam protection around
> http://mail.yahoo.com
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Field constructor, avoiding String.intern()

James Kennedy
In our case, we're trying to optimize document() retrieval and we found that disabling the String interning in the Field constructor improved performance dramatically. I agree that interning should be an option on the constructor. For document retrieval, at least for a small of amount of fields, the performance gain of using equals() on interned strings is no match for the performance loss of interning the field name of each field.


Wolfgang Hoschek-2 wrote
I noticed that, too, but in my case the difference was often much  
more extreme: it was one of the primary bottlenecks on indexing. This  
is the primary reason why MemoryIndex.addField(...) navigates around  
the problem by taking a parameter of type "String fieldName" instead  
of type "Field":

        public void addField(String fieldName, TokenStream stream) {
                /*
                 * Note that this method signature avoids having a user call new
                 * o.a.l.d.Field(...) which would be much too expensive due to the
                 * String.intern() usage of that class.
                  */

Wolfgang.

On Feb 14, 2006, at 1:42 PM, Tatu Saloranta wrote:

> After profiling in-memory indexing, I noticed that
> calls to String.intern() showed up surprisingly high;
> especially the one from Field() constructor. This is
> understandable due to overhead String.intern() has
> (being native and synchronized method; overhead
> incurred even if String is already interned), and the
> fact this essentially gets called once per
> document+field combination.
>
> Now, it would be quite easy to improve things a bit
> (in theory), such that most intern() calls could be
> avoid, transparent to the calling app; for example,
> for each IndexWriter() one could use a simple
> HashMap() for caching interned Strings. This approach
> is more than twice as fast as directly calling
> intern(). One could also use per-thread cache, or
> global one; all of which would probably be faster.
> However, Field constructor hard-codes call to
> intern(), so it would be necessary to add a new
> constructor that indicates that field name is known to
> be interned.
> And there would also need to be a way to invoke the
> new optional functionality.
>
> Has anyone tried this approach to see if speedup is
> worth the hassle (in my case it'd probably be
> something like 2 - 3%, assuming profiler's 5% for
> intern() is accurate)?
>
> -+ Tatu +-
>
>
> __________________________________________________
> Do You Yahoo!?
> Tired of spam?  Yahoo! Mail has the best spam protection around
> http://mail.yahoo.com
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org
Reply | Threaded
Open this post in threaded view
|

Re: Field constructor, avoiding String.intern()

Robert Engels
I don't think it is just the performance gain of equals() where intern
() matters.

It also reduces memory consumption dramatically when working with  
large collections of documents in memory - although this could also  
be done with constants, there is nothing in Java to enforce it (thus  
the use of intern()).


On Feb 23, 2007, at 12:02 PM, James Kennedy wrote:

>
> In our case, we're trying to optimize document() retrieval and we  
> found that
> disabling the String interning in the Field constructor improved  
> performance
> dramatically. I agree that interning should be an option on the  
> constructor.
> For document retrieval, at least for a small of amount of fields, the
> performance gain of using equals() on interned strings is no match  
> for the
> performance loss of interning the field name of each field.
>
>
>
> Wolfgang Hoschek-2 wrote:
>>
>> I noticed that, too, but in my case the difference was often much
>> more extreme: it was one of the primary bottlenecks on indexing. This
>> is the primary reason why MemoryIndex.addField(...) navigates around
>> the problem by taking a parameter of type "String fieldName" instead
>> of type "Field":
>>
>> public void addField(String fieldName, TokenStream stream) {
>> /*
>> * Note that this method signature avoids having a user call new
>> * o.a.l.d.Field(...) which would be much too expensive due to the
>> * String.intern() usage of that class.
>>                   */
>>
>> Wolfgang.
>>
>> On Feb 14, 2006, at 1:42 PM, Tatu Saloranta wrote:
>>
>>> After profiling in-memory indexing, I noticed that
>>> calls to String.intern() showed up surprisingly high;
>>> especially the one from Field() constructor. This is
>>> understandable due to overhead String.intern() has
>>> (being native and synchronized method; overhead
>>> incurred even if String is already interned), and the
>>> fact this essentially gets called once per
>>> document+field combination.
>>>
>>> Now, it would be quite easy to improve things a bit
>>> (in theory), such that most intern() calls could be
>>> avoid, transparent to the calling app; for example,
>>> for each IndexWriter() one could use a simple
>>> HashMap() for caching interned Strings. This approach
>>> is more than twice as fast as directly calling
>>> intern(). One could also use per-thread cache, or
>>> global one; all of which would probably be faster.
>>> However, Field constructor hard-codes call to
>>> intern(), so it would be necessary to add a new
>>> constructor that indicates that field name is known to
>>> be interned.
>>> And there would also need to be a way to invoke the
>>> new optional functionality.
>>>
>>> Has anyone tried this approach to see if speedup is
>>> worth the hassle (in my case it'd probably be
>>> something like 2 - 3%, assuming profiler's 5% for
>>> intern() is accurate)?
>>>
>>> -+ Tatu +-
>>>
>>>
>>> __________________________________________________
>>> Do You Yahoo!?
>>> Tired of spam?  Yahoo! Mail has the best spam protection around
>>> http://mail.yahoo.com
>>>
>>> --------------------------------------------------------------------
>>> -
>>> To unsubscribe, e-mail: [hidden email]
>>> For additional commands, e-mail: [hidden email]
>>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [hidden email]
>> For additional commands, e-mail: [hidden email]
>>
>>
>>
>
> --
> View this message in context: http://www.nabble.com/Field- 
> constructor%2C-avoiding-String.intern%28%29-tf1123597.html#a9123600
> Sent from the Lucene - Java Developer mailing list archive at  
> Nabble.com.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Field constructor, avoiding String.intern()

James Kennedy
True. However, in the case where you are processing Documents one at a time  and discarding them (e.g. We use hitCollector to process all documents from a search), or memory is not an issue, it would be nice to have the ability to disable the interning for performance sake.



Robert Engels wrote
I don't think it is just the performance gain of equals() where intern
() matters.

It also reduces memory consumption dramatically when working with  
large collections of documents in memory - although this could also  
be done with constants, there is nothing in Java to enforce it (thus  
the use of intern()).


On Feb 23, 2007, at 12:02 PM, James Kennedy wrote:

>
> In our case, we're trying to optimize document() retrieval and we  
> found that
> disabling the String interning in the Field constructor improved  
> performance
> dramatically. I agree that interning should be an option on the  
> constructor.
> For document retrieval, at least for a small of amount of fields, the
> performance gain of using equals() on interned strings is no match  
> for the
> performance loss of interning the field name of each field.
>
>
>
> Wolfgang Hoschek-2 wrote:
>>
>> I noticed that, too, but in my case the difference was often much
>> more extreme: it was one of the primary bottlenecks on indexing. This
>> is the primary reason why MemoryIndex.addField(...) navigates around
>> the problem by taking a parameter of type "String fieldName" instead
>> of type "Field":
>>
>> public void addField(String fieldName, TokenStream stream) {
>> /*
>> * Note that this method signature avoids having a user call new
>> * o.a.l.d.Field(...) which would be much too expensive due to the
>> * String.intern() usage of that class.
>>                   */
>>
>> Wolfgang.
>>
>> On Feb 14, 2006, at 1:42 PM, Tatu Saloranta wrote:
>>
>>> After profiling in-memory indexing, I noticed that
>>> calls to String.intern() showed up surprisingly high;
>>> especially the one from Field() constructor. This is
>>> understandable due to overhead String.intern() has
>>> (being native and synchronized method; overhead
>>> incurred even if String is already interned), and the
>>> fact this essentially gets called once per
>>> document+field combination.
>>>
>>> Now, it would be quite easy to improve things a bit
>>> (in theory), such that most intern() calls could be
>>> avoid, transparent to the calling app; for example,
>>> for each IndexWriter() one could use a simple
>>> HashMap() for caching interned Strings. This approach
>>> is more than twice as fast as directly calling
>>> intern(). One could also use per-thread cache, or
>>> global one; all of which would probably be faster.
>>> However, Field constructor hard-codes call to
>>> intern(), so it would be necessary to add a new
>>> constructor that indicates that field name is known to
>>> be interned.
>>> And there would also need to be a way to invoke the
>>> new optional functionality.
>>>
>>> Has anyone tried this approach to see if speedup is
>>> worth the hassle (in my case it'd probably be
>>> something like 2 - 3%, assuming profiler's 5% for
>>> intern() is accurate)?
>>>
>>> -+ Tatu +-
>>>
>>>
>>> __________________________________________________
>>> Do You Yahoo!?
>>> Tired of spam?  Yahoo! Mail has the best spam protection around
>>> http://mail.yahoo.com
>>>
>>> --------------------------------------------------------------------
>>> -
>>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>
>>
>>
>
> --
> View this message in context: http://www.nabble.com/Field- 
> constructor%2C-avoiding-String.intern%28%29-tf1123597.html#a9123600
> Sent from the Lucene - Java Developer mailing list archive at  
> Nabble.com.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org
Reply | Threaded
Open this post in threaded view
|

Re: Field constructor, avoiding String.intern()

Mike Klaas
In reply to this post by James Kennedy
On 2/23/07, James Kennedy <[hidden email]> wrote:
>
> In our case, we're trying to optimize document() retrieval and we found that
> disabling the String interning in the Field constructor improved performance
> dramatically. I agree that interning should be an option on the constructor.

Out of curiosity, how much is "dramatically"?

-Mike

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Field constructor, avoiding String.intern()

Doug Cutting
In reply to this post by James Kennedy
James Kennedy wrote:
> True. However, in the case where you are processing Documents one at a time
> and discarding them (e.g. We use hitCollector to process all documents from
> a search), or memory is not an issue, it would be nice to have the ability
> to disable the interning for performance sake.

Accessing documents from a hit-collector is not advised.  It is
generally best to compose queries and filters to reduce the number of
matches.  When that's not feasible, a hit collector that uses a
FieldCache to filter by or collect field values is much faster than
accessing documents.

Doug

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Field constructor, avoiding String.intern()

Chris Hostetter-3

: Accessing documents from a hit-collector is not advised.  It is
: generally best to compose queries and filters to reduce the number of
: matches.  When that's not feasible, a hit collector that uses a
: FieldCache to filter by or collect field values is much faster than
: accessing documents.

i make no judgements about the merits of intern(), but i would like to
reiterate Doug's point (if you are using the Document class in a
HitCollector you are probably dong something wrong) and make a suggestion:

*If* people agree that interning field names makes sense when
indexing Documents, but does not make sense when dealing when reading
Documents out of an IndexReader/IndexSearche then this sounds like yet
another justification for seperating the implimentations of those use
cases...

https://issues.apache.org/jira/browse/LUCENE-778#action_12475526

...having an "option" when Documents/Fields are constructed to determine
wether intern is called seems .... odd.



-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Field constructor, avoiding String.intern()

James Kennedy
In reply to this post by Mike Klaas
Roughly search time of 10,000,000 documents (3 fields each) was cut in half.
However, keep in mind that we're using slightly modified lucene document retrieval code. Using a HitCollector to aggregate search results.


Mike Klaas wrote
On 2/23/07, James Kennedy <jk-public@troove.net> wrote:
>
> In our case, we're trying to optimize document() retrieval and we found that
> disabling the String interning in the Field constructor improved performance
> dramatically. I agree that interning should be an option on the constructor.

Out of curiosity, how much is "dramatically"?

-Mike

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org
Reply | Threaded
Open this post in threaded view
|

Re: Field constructor, avoiding String.intern()

Wolfgang Hoschek-2
In reply to this post by James Kennedy

On Feb 23, 2007, at 10:28 AM, James Kennedy wrote:

>
> True. However, in the case where you are processing Documents one  
> at a time
> and discarding them (e.g. We use hitCollector to process all  
> documents from
> a search), or memory is not an issue, it would be nice to have the  
> ability
> to disable the interning for performance sake.

I don't know how much it would increase overall throughput in a  
variety of use cases, but one approach could be to add a copy-like-
this factory method like Field.createField(Reader) to Field.java,  
analog to the method Term.createTerm(String text) that was added to  
Term.java sometime ago for a similar reason.

This would guarantee that the name continues to be interned yet  
allows to avoid the interning overhead on use cases where a field  
with the same parametrization (yet a different content String/Reader)  
is constructed many times, which is probably the most common case  
where intern() overhead might matter.

For example, something like

Field f1 = ...
Field f2 = f1.createSimilarField(Reader);

   /**
    * Optimized construction of new Terms by reusing same field as  
this Term
    * - avoids field.intern() overhead
    * @param text The text of the new term (field is implicitly same  
as this Term instance)
    * @return A new Term
    */
   public Term createTerm(String text)
   {
       return new Term(field,text,false);
   }

Wolfgang.

>
>
>
>
> Robert Engels wrote:
>>
>> I don't think it is just the performance gain of equals() where  
>> intern
>> () matters.
>>
>> It also reduces memory consumption dramatically when working with
>> large collections of documents in memory - although this could also
>> be done with constants, there is nothing in Java to enforce it (thus
>> the use of intern()).
>>
>>
>> On Feb 23, 2007, at 12:02 PM, James Kennedy wrote:
>>
>>>
>>> In our case, we're trying to optimize document() retrieval and we
>>> found that
>>> disabling the String interning in the Field constructor improved
>>> performance
>>> dramatically. I agree that interning should be an option on the
>>> constructor.
>>> For document retrieval, at least for a small of amount of fields,  
>>> the
>>> performance gain of using equals() on interned strings is no match
>>> for the
>>> performance loss of interning the field name of each field.
>>>
>>>
>>>
>>> Wolfgang Hoschek-2 wrote:
>>>>
>>>> I noticed that, too, but in my case the difference was often much
>>>> more extreme: it was one of the primary bottlenecks on indexing.  
>>>> This
>>>> is the primary reason why MemoryIndex.addField(...) navigates  
>>>> around
>>>> the problem by taking a parameter of type "String fieldName"  
>>>> instead
>>>> of type "Field":
>>>>
>>>> public void addField(String fieldName, TokenStream stream) {
>>>> /*
>>>> * Note that this method signature avoids having a user call new
>>>> * o.a.l.d.Field(...) which would be much too expensive due to  
>>>> the
>>>> * String.intern() usage of that class.
>>>>                   */
>>>>
>>>> Wolfgang.
>>>>
>>>> On Feb 14, 2006, at 1:42 PM, Tatu Saloranta wrote:
>>>>
>>>>> After profiling in-memory indexing, I noticed that
>>>>> calls to String.intern() showed up surprisingly high;
>>>>> especially the one from Field() constructor. This is
>>>>> understandable due to overhead String.intern() has
>>>>> (being native and synchronized method; overhead
>>>>> incurred even if String is already interned), and the
>>>>> fact this essentially gets called once per
>>>>> document+field combination.
>>>>>
>>>>> Now, it would be quite easy to improve things a bit
>>>>> (in theory), such that most intern() calls could be
>>>>> avoid, transparent to the calling app; for example,
>>>>> for each IndexWriter() one could use a simple
>>>>> HashMap() for caching interned Strings. This approach
>>>>> is more than twice as fast as directly calling
>>>>> intern(). One could also use per-thread cache, or
>>>>> global one; all of which would probably be faster.
>>>>> However, Field constructor hard-codes call to
>>>>> intern(), so it would be necessary to add a new
>>>>> constructor that indicates that field name is known to
>>>>> be interned.
>>>>> And there would also need to be a way to invoke the
>>>>> new optional functionality.
>>>>>
>>>>> Has anyone tried this approach to see if speedup is
>>>>> worth the hassle (in my case it'd probably be
>>>>> something like 2 - 3%, assuming profiler's 5% for
>>>>> intern() is accurate)?
>>>>>
>>>>> -+ Tatu +-
>>>>>
>>>>>
>>>>> __________________________________________________
>>>>> Do You Yahoo!?
>>>>> Tired of spam?  Yahoo! Mail has the best spam protection around
>>>>> http://mail.yahoo.com
>>>>>
>>>>> ------------------------------------------------------------------
>>>>> --
>>>>> -
>>>>> To unsubscribe, e-mail: [hidden email]
>>>>> For additional commands, e-mail: [hidden email]
>>>>>
>>>>
>>>>
>>>> -------------------------------------------------------------------
>>>> --
>>>> To unsubscribe, e-mail: [hidden email]
>>>> For additional commands, e-mail: [hidden email]
>>>>
>>>>
>>>>
>>>
>>> --
>>> View this message in context: http://www.nabble.com/Field-
>>> constructor%2C-avoiding-String.intern%28%29-tf1123597.html#a9123600
>>> Sent from the Lucene - Java Developer mailing list archive at
>>> Nabble.com.
>>>
>>>
>>> --------------------------------------------------------------------
>>> -
>>> To unsubscribe, e-mail: [hidden email]
>>> For additional commands, e-mail: [hidden email]
>>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [hidden email]
>> For additional commands, e-mail: [hidden email]
>>
>>
>>
>
> --
> View this message in context: http://www.nabble.com/Field- 
> constructor%2C-avoiding-String.intern%28%29-tf1123597.html#a9124055
> Sent from the Lucene - Java Developer mailing list archive at  
> Nabble.com.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Field constructor, avoiding String.intern()

Otis Gospodnetic-2
In reply to this post by Tatu Saloranta
Somehow I missed the very first message in this thread.
James, did you upload a patch with your changes to JIRA?

Otis

----- Original Message ----
From: James Kennedy <[hidden email]>
To: [hidden email]
Sent: Friday, February 23, 2007 1:02:38 PM
Subject: Re: [jira] Field constructor, avoiding String.intern()


In our case, we're trying to optimize document() retrieval and we found that
disabling the String interning in the Field constructor improved performance
dramatically. I agree that interning should be an option on the constructor.
For document retrieval, at least for a small of amount of fields, the
performance gain of using equals() on interned strings is no match for the
performance loss of interning the field name of each field.



Wolfgang Hoschek-2 wrote:

>
> I noticed that, too, but in my case the difference was often much  
> more extreme: it was one of the primary bottlenecks on indexing. This  
> is the primary reason why MemoryIndex.addField(...) navigates around  
> the problem by taking a parameter of type "String fieldName" instead  
> of type "Field":
>
>     public void addField(String fieldName, TokenStream stream) {
>         /*
>          * Note that this method signature avoids having a user call new
>          * o.a.l.d.Field(...) which would be much too expensive due to the
>          * String.intern() usage of that class.
>                   */
>
> Wolfgang.
>
> On Feb 14, 2006, at 1:42 PM, Tatu Saloranta wrote:
>
>> After profiling in-memory indexing, I noticed that
>> calls to String.intern() showed up surprisingly high;
>> especially the one from Field() constructor. This is
>> understandable due to overhead String.intern() has
>> (being native and synchronized method; overhead
>> incurred even if String is already interned), and the
>> fact this essentially gets called once per
>> document+field combination.
>>
>> Now, it would be quite easy to improve things a bit
>> (in theory), such that most intern() calls could be
>> avoid, transparent to the calling app; for example,
>> for each IndexWriter() one could use a simple
>> HashMap() for caching interned Strings. This approach
>> is more than twice as fast as directly calling
>> intern(). One could also use per-thread cache, or
>> global one; all of which would probably be faster.
>> However, Field constructor hard-codes call to
>> intern(), so it would be necessary to add a new
>> constructor that indicates that field name is known to
>> be interned.
>> And there would also need to be a way to invoke the
>> new optional functionality.
>>
>> Has anyone tried this approach to see if speedup is
>> worth the hassle (in my case it'd probably be
>> something like 2 - 3%, assuming profiler's 5% for
>> intern() is accurate)?
>>
>> -+ Tatu +-
>>
>>
>> __________________________________________________
>> Do You Yahoo!?
>> Tired of spam?  Yahoo! Mail has the best spam protection around
>> http://mail.yahoo.com
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [hidden email]
>> For additional commands, e-mail: [hidden email]
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>
>

--
View this message in context: http://www.nabble.com/Field-constructor%2C-avoiding-String.intern%28%29-tf1123597.html#a9123600
Sent from the Lucene - Java Developer mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]