[jira] [Commented] (LUCENE-7966) build mr-jar and use some java 9 methods if available

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

[jira] [Commented] (LUCENE-7966) build mr-jar and use some java 9 methods if available

JIRA jira@apache.org

    [ https://issues.apache.org/jira/browse/LUCENE-7966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16162647#comment-16162647 ]

Uwe Schindler commented on LUCENE-7966:
---------------------------------------

bq. One thing that we lose here is the verbosity of exception message (field names, doc ids)

Yes, but there is a reason to change to the new methods! Robert did not add that to the issue description. I directed him on the weekend during a private chat to the new methods (which was caused by my comment in a former issue with the missing check).

To make it short: Those methods are highly optimized by Hotspot. In fact they replaced the bounds checks in ByteBuffers and other places throughout the JDK to call those methods instead of hardcoded if/then/else statements. I also talked with RĂ©mi Forax about it this summer to get more information. He strongly recommends to use the new methods from Objects and Arrays instead of manually crafted if/then/else checks. The reason is: Those methods are intrinsics and are more or less replaced by highly optimized bounds check code. In addition, if you add several levels of bounds checks more magic is happening: a high level method checks bounds and call lower level method, which does bounds checks again (e.g., a ByteBuffer in the JDK), and this lower level method again accesses a Java array (that implcitely does bounds checks, too), - then Hotspot adds all calls of those Objects' index check methods to the internal AST and it can then safely eliminate all of them except one. And if it sees that a variable is unsigned it will also remove negative checks ... and so on. This was done, because they had very hard problems in eliminating those checks everywhere when somebody creates an if statement (there are tons of ways to do the same check with if statements!), the OpenJDK developers argued to use a intrinsic replacement method instead of hardcoded bounds checks with hundreds of variants. By that they only have to optimize a single "if statement" and can replace it by assembly code and remove duplicates.

Even shorter: If you use the Objects methods instead of if statements, you can add more safety to your code, as duplicate checks are eliminated. So we can even start to add checks into BytesRef. Or we can remove the stupid try/catch blocks in ByteBufferIndexInput.

> build mr-jar and use some java 9 methods if available
> -----------------------------------------------------
>
>                 Key: LUCENE-7966
>                 URL: https://issues.apache.org/jira/browse/LUCENE-7966
>             Project: Lucene - Core
>          Issue Type: Improvement
>          Components: general/build
>            Reporter: Robert Muir
>         Attachments: LUCENE-7966.patch, LUCENE-7966.patch
>
>
> See background: http://openjdk.java.net/jeps/238
> It would be nice to use some of the newer array methods and range checking methods in java 9 for example, without waiting for lucene 10 or something. If we build an MR-jar, we can start migrating our code to use java 9 methods right now, it will use optimized methods from java 9 when thats available, otherwise fall back to java 8 code.  
> This patch adds:
> {code}
> Objects.checkIndex(int,int)
> Objects.checkFromToIndex(int,int,int)
> Objects.checkFromIndexSize(int,int,int)
> Arrays.mismatch(byte[],int,int,byte[],int,int)
> Arrays.compareUnsigned(byte[],int,int,byte[],int,int)
> Arrays.equal(byte[],int,int,byte[],int,int)
> // did not add char/int/long/short/etc but of course its possible if needed
> {code}
> It sets these up in {{org.apache.lucene.future}} as 1-1 mappings to java methods. This way, we can simply directly replace call sites with java 9 methods when java 9 is a minimum. Simple 1-1 mappings mean also that we only have to worry about testing that our java 8 fallback methods work.
> I found that many of the current byte array methods today are willy-nilly and very lenient for example, passing invalid offsets at times and relying on compare methods not throwing exceptions, etc. I fixed all the instances in core/codecs but have not looked at the problems with AnalyzingSuggester. Also SimpleText still uses a silly method in ArrayUtil in similar crazy way, have not removed that one yet.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]