(byte)((i & 0x7f) | 0x80) == (byte)(i | 0x80)

classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

(byte)((i & 0x7f) | 0x80) == (byte)(i | 0x80)

Charlie-24
Hello,

In:

   public abstract class IndexOutput
   public void writeVInt(int i)
   writeByte((byte)((i & 0x7f) | 0x80));

I thought

  (byte)((i & 0x7f) | 0x80) == (byte)(i | 0x80)

As (byte) is able to truncate the last byte for us already, no need of
(& 0x7f). If so, we may change that line to

   writeByte((byte)(i | 0x80));

and may speed up a little bit. Correct me if (i & 0x7f) is necessary.
Thank you.

--
Best regards,
 Charlie


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: (byte)((i & 0x7f) | 0x80) == (byte)(i | 0x80)

Yonik Seeley
On 4/26/06, Charlie <[hidden email]> wrote:

> I thought
>
>   (byte)((i & 0x7f) | 0x80) == (byte)(i | 0x80)
>
> As (byte) is able to truncate the last byte for us already, no need of
> (& 0x7f). If so, we may change that line to
>
>    writeByte((byte)(i | 0x80));
>
> and may speed up a little bit. Correct me if (i & 0x7f) is necessary.
> Thank you.

I wouldn't bother optimizing these methods... I think they will be
changed in the future anyway.
1) The current code outputs modified-UTF-8 instead of true UTF-8
2) I think we may be going to byte-oriented counts for length (away
from number of java chars, which are variable-length with the latest
unicode standards)

Marvin Humphrey has done the first, and seems close to finishing #2.

http://www.mail-archive.com/java-dev@.../msg01970.html
http://www.mail-archive.com/java-dev@.../msg02109.html
http://www.mail-archive.com/java-dev@.../msg02468.html
http://www.mail-archive.com/java-dev@.../msg03801.html

-Yonik
http://incubator.apache.org/solr Solr, the open-source Lucene search server

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re[2]: (byte)((i & 0x7f) | 0x80) == (byte)(i | 0x80)

Charlie-24
ok, thanks for your reply.

But I thought
Method: public void writeVInt(int i)
is not about UTF-8, it is about how to write an int in variable length.
Is it included as a part of future unicode character writing?

--
Best regards,
 Charlie


---

>> I thought
>>
>>   (byte)((i & 0x7f) | 0x80) == (byte)(i | 0x80)
>>
>> As (byte) is able to truncate the last byte for us already, no need of
>> (& 0x7f). If so, we may change that line to
>>
>>    writeByte((byte)(i | 0x80));
>>
>> and may speed up a little bit. Correct me if (i & 0x7f) is necessary.
>> Thank you.

> I wouldn't bother optimizing these methods... I think they will be
> changed in the future anyway.
> 1) The current code outputs modified-UTF-8 instead of true UTF-8
> 2) I think we may be going to byte-oriented counts for length (away
> from number of java chars, which are variable-length with the latest
> unicode standards)

> Marvin Humphrey has done the first, and seems close to finishing #2.

> http://www.mail-archive.com/java-dev@.../msg01970.html
> http://www.mail-archive.com/java-dev@.../msg02109.html
> http://www.mail-archive.com/java-dev@.../msg02468.html
> http://www.mail-archive.com/java-dev@.../msg03801.html

> -Yonik
> http://incubator.apache.org/solr Solr, the open-source Lucene search server




---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Re[2]: (byte)((i & 0x7f) | 0x80) == (byte)(i | 0x80)

Yonik Seeley
On 4/26/06, Charlie <[hidden email]> wrote:
> ok, thanks for your reply.
>
> But I thought
> Method: public void writeVInt(int i)
> is not about UTF-8, it is about how to write an int in variable length.

Oh, sorry... wrong function.  It was a similar optimization to things
I had seen in the char writing function (that I didn't make because of
reasons outlined in my post...)


-Yonik
http://incubator.apache.org/solr Solr, the open-source Lucene search server

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Re[2]: (byte)((i & 0x7f) | 0x80) == (byte)(i | 0x80)

Marvin Humphrey
In reply to this post by Charlie-24

On Apr 26, 2006, at 10:21 AM, Charlie wrote:

> But I thought
> Method: public void writeVInt(int i)
> is not about UTF-8, it is about how to write an int in variable  
> length.
> Is it included as a part of future unicode character writing?

WriteVInt, and also WriteVLong, which contains the same code, would  
not be changing.  So the question is, does this assertion hold?

   (byte)((i & 0x7f) | 0x80) == (byte)(i | 0x80)

Marvin Humphrey
Rectangular Research
http://www.rectangular.com/


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: (byte)((i & 0x7f) | 0x80) == (byte)(i | 0x80)

Yonik Seeley
In reply to this post by Charlie-24
On 4/26/06, Charlie <[hidden email]> wrote:
>    writeByte((byte)((i & 0x7f) | 0x80));
>    writeByte((byte)(i | 0x80));

Yes, these two lines are equivalent.
It's fairly likely that the JVM already does this optimization for you though...
at least gcc -O already compiles to identical assembly for those two lines.

-Yonik
http://incubator.apache.org/solr Solr, the open-source Lucene search server

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Re[2]: (byte)((i & 0x7f) | 0x80) == (byte)(i | 0x80)

Yonik Seeley
In reply to this post by Marvin Humphrey
On 4/26/06, Marvin Humphrey <[hidden email]> wrote:
> So the question is, does this assertion hold?
>
>    (byte)((i & 0x7f) | 0x80) == (byte)(i | 0x80)

Yes.

I tested just this out, and java5 -server reports a 15% performance boost for
writeVInt alone,  tested over numbers from 1 to 1000000.  Of course
the overall boost is likely to be very small, It's a very simple
change to make.

-Yonik
http://incubator.apache.org/solr Solr, the open-source Lucene search server

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: (byte)((i & 0x7f) | 0x80) == (byte)(i | 0x80)

Tatu Saloranta
In reply to this post by Yonik Seeley
--- Yonik Seeley <[hidden email]> wrote:

> On 4/26/06, Charlie <[hidden email]> wrote:
> >    writeByte((byte)((i & 0x7f) | 0x80));
> >    writeByte((byte)(i | 0x80));
>
> Yes, these two lines are equivalent.
> It's fairly likely that the JVM already does this
> optimization for you though...
> at least gcc -O already compiles to identical
> assembly for those two lines.

For many other kinds of operations it is true; HotSpot
is quite good... but for some reason, more often than
not, it does not aggressively optimize byte-shuffling
code like this. So I am not surprised that in this
particular case there is significant performance boost
(same goes for UTF-8 decoding, for example -- it's
surprising how significant gains can be gotten with
straight-forward changes).
Perhaps HotSpot folks have focused more on efficient
speculative inlining of method calls, thinking that
these low-level cases get hand-optimized well? ;-)

-+ Tatu +-



>
> -Yonik
> http://incubator.apache.org/solr Solr, the
> open-source Lucene search server
>
>
---------------------------------------------------------------------
> To unsubscribe, e-mail:
> [hidden email]
> For additional commands, e-mail:
> [hidden email]
>
>


__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around
http://mail.yahoo.com 

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]