Add Term.createTerm to avoid 99% of String.intern() calls

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view

Add Term.createTerm to avoid 99% of String.intern() calls

Wolfgang Hoschek
For the MemoryIndex, I'm seeing large performance overheads due to  
repetitive temporary string interning of o.a.l.index.Term.
For example, consider a FuzzyTermQuery or similar, scanning all terms  
via TermEnum in the index: 40% of the time is spent in String.intern
() of new Term(). [Allocating temporary memory and  
FuzzyTermEnum.termCompare are less of a problem according to profiling].

Note that the field name would only need to be interned once, not  
time and again for each term. But the non-iterning Term constructor  
is private and hence not accessible from o.a.l.index.memory.*.  
TermBuffer isn't what I'm looking for, and it's private anyway. The  
best solution I came up with is to have an additional safe public  
method in

   /** Constructs a term with the given text and the same interned  
field name as
    * this term (minimizes interning overhead). */
   public Term createTerm(String txt) { // WH
       return new Term(field, txt, false);

Besides dramatically improving performance, this has the benefit of  
keeping the non-interning constructor private.
Comments/opinions, anyone?

Here's a sketch of how it can be used:

public Term term() {
                     if (cachedTerm == null) cachedTerm = new Term
((String) sortedFields[j].getKey(), "");
                     return cachedTerm.createTerm((String)  

public boolean next() {
                     if (...) cachedTerm = null;

I'll send the full patch for MemoryIndex if this is accepted.


To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]