this == that

classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

this == that

Karl Wettin-3
The code is filled with string equality code using == rather than  
equals(). I honestly don't think it saves a single clock tick as the  
JIT takes care of it when the first line of code in the equals method  
is if (this == that) return true;

Please correct me if I'm wrong.

I can commit to do the changes to the core code if it is considered  
interesting.

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: this == that

Tatu Saloranta
--- karl wettin <[hidden email]> wrote:

> The code is filled with string equality code using
> == rather than  
> equals(). I honestly don't think it saves a single
> clock tick as the  
> JIT takes care of it when the first line of code in
> the equals method  
> is if (this == that) return true;

In case where (this == that) is true, this may well be
correct, but:

>
> Please correct me if I'm wrong.

... you are then assuming 100% match rate: if so this
might be true. But in (this != that) case difference
will be more significant; after identity comparison
String lengths are compared, and if those match, then
char-by-char comparison.
So it probably does not make sense to de-optimize code
in this way.

-+ Tatu +-


__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around
http://mail.yahoo.com 

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: this == that

Karl Wettin-3

30 apr 2006 kl. 04.48 skrev Tatu Saloranta:

>> JIT takes care of it when the first line of code in
>> the equals method
>> is if (this == that) return true;
>
> In case where (this == that) is true, this may well be
> correct, but:
>
>>
>> Please correct me if I'm wrong.
>
> ... you are then assuming 100% match rate: if so this
> might be true. But in (this != that) case difference
> will be more significant

Oh, I didn't think that far. Silly me. : )

Perhaps there should be a Wiki-page that discuss the code style.  
Things like this. And the use of bytes as floats. And stuff.

By the way, I have about eight working days to spend on Lucene  
starting tomorrow. I will work on the instanciated index, the rich  
positioning and the contextual spell suggester, in that order. So I  
hope at least something of above will be tested and working really  
soon. Time will tell.

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: this == that

DM Smith
In reply to this post by Karl Wettin-3
karl wettin wrote:
> The code is filled with string equality code using == rather than
> equals(). I honestly don't think it saves a single clock tick as the
> JIT takes care of it when the first line of code in the equals method
> is if (this == that) return true;
If the strings are intern() then it should be a touch faster.
If the strings are not interned then I think it may be a premature
optimization.

IMHO, using intern to optimize space is a reasonable optimization, but
using == to compare such strings is error prone as it is possible that
the comparison is looking at strings that have not been interned.

Unless it object identity is what is being tested or intern is an
invariant, I think it is dangerous. It is easy to forget to intern or to
propagate the pattern via cut and paste to an inappropriate context.
>
> Please correct me if I'm wrong.
>
> I can commit to do the changes to the core code if it is considered
> interesting.

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: this == that

jian chen
I am wondering if interning Strings will be really that critical for
performance. The biggest bottle neck is still disk. So, maybe we can use
String.equals(...) instead of ==.

Jian

On 5/1/06, DM Smith <[hidden email]> wrote:

>
> karl wettin wrote:
> > The code is filled with string equality code using == rather than
> > equals(). I honestly don't think it saves a single clock tick as the
> > JIT takes care of it when the first line of code in the equals method
> > is if (this == that) return true;
> If the strings are intern() then it should be a touch faster.
> If the strings are not interned then I think it may be a premature
> optimization.
>
> IMHO, using intern to optimize space is a reasonable optimization, but
> using == to compare such strings is error prone as it is possible that
> the comparison is looking at strings that have not been interned.
>
> Unless it object identity is what is being tested or intern is an
> invariant, I think it is dangerous. It is easy to forget to intern or to
> propagate the pattern via cut and paste to an inappropriate context.
> >
> > Please correct me if I'm wrong.
> >
> > I can commit to do the changes to the core code if it is considered
> > interesting.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>
Reply | Threaded
Open this post in threaded view
|

Re: this == that

Yonik Seeley
On 5/1/06, jian chen <[hidden email]> wrote:
> I am wondering if interning Strings will be really that critical for
> performance.

Probably not as much as it was for early JVMs.

> The biggest bottle neck is still disk.

Depends on the index and workload.  Queries are often CPU-bound.

-Yonik
http://incubator.apache.org/solr Solr, the open-source Lucene search server

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: this == that

Chris Hostetter-3
In reply to this post by jian chen

A couple of responses to various comments in this thread...

: > Unless it object identity is what is being tested or intern is an
: > invariant, I think it is dangerous. It is easy to forget to intern or to
: > propagate the pattern via cut and paste to an inappropriate context.

interning the Strings before doing the object equality test is an
invariant of all the lucene internals I've ever looked at ... if you think
there's a particular spot where this is not garunteed, please point it
out.

: I am wondering if interning Strings will be really that critical for
: performance. The biggest bottle neck is still disk. So, maybe we can use
: String.equals(...) instead of ==.

1) disk is not going to be a bottle neck if your entire index lives in
RAM, particularly if you are using a RAMDirectory (and not even relying on
the filesystem cache)

2) I think you are missing the point of an earlier comment (either that or
i am).  In the case where this *does* equal that, there may not be much
advantage to...

      void doSomething(String a, String b);
         String aa = a.intern();
         String bb = b.intern();
         if (aa == bb) {
            ...
         }
      }

...versus...

      void doSomething(String a, String b);
         if (a.equals(b)) {
            ...
         }
      }

...since the very first thing String.equals does is an object equality
test -- but in the case where the two strings are *NOT* equal, interning
and using == saves the cost of testing succesive characters (or bytes,
whichever String.equals might use, i haven't looked lately).  This should
make a significant differnce when the common case is that most comparisons
will fail (ie: scanning a TermEnum looking for a particular input string)




-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: this == that

Tatu Saloranta
In reply to this post by jian chen
--- jian chen <[hidden email]> wrote:

> I am wondering if interning Strings will be really
> that critical for
> performance. The biggest bottle neck is still disk.
> So, maybe we can use
> String.equals(...) instead of ==.

I would bet big bucks for it saving significant amount
of time, even with later JVMs. This based on XML
parser work, where canonical tokenization (using
String.intern() for names) saves lots of time both via
reduced GC activity, and due to ultra-fast lookups
(esp. when using HashMap etc), latter of which is due
to identity comparison being fast.

But, if anyone is really concerned about 'impurity' of
this optimization, the thing to do really is to do
extensive benchmarking and see what happens.
Theoretically it is also possible that the overhead of
String.intern() may count against using this approach
(it's a native method call, always), so there is a
chance that for some use cases, String.equals() might
be better choice. But I'd be against making this
change without benchmarking it well first to avoid
potential significant performance degradation.

-+ Tatu +-

>
> Jian
>
> On 5/1/06, DM Smith <[hidden email]> wrote:
> >
> > karl wettin wrote:
> > > The code is filled with string equality code
> using == rather than
> > > equals(). I honestly don't think it saves a
> single clock tick as the
> > > JIT takes care of it when the first line of code
> in the equals method
> > > is if (this == that) return true;
> > If the strings are intern() then it should be a
> touch faster.
> > If the strings are not interned then I think it
> may be a premature
> > optimization.
> >
> > IMHO, using intern to optimize space is a
> reasonable optimization, but
> > using == to compare such strings is error prone as
> it is possible that
> > the comparison is looking at strings that have not
> been interned.
> >
> > Unless it object identity is what is being tested
> or intern is an
> > invariant, I think it is dangerous. It is easy to
> forget to intern or to
> > propagate the pattern via cut and paste to an
> inappropriate context.
> > >
> > > Please correct me if I'm wrong.
> > >
> > > I can commit to do the changes to the core code
> if it is considered
> > > interesting.
> >
> >
>
---------------------------------------------------------------------
> > To unsubscribe, e-mail:
> [hidden email]
> > For additional commands, e-mail:
> [hidden email]
> >
> >
>


__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around
http://mail.yahoo.com 

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]