[jira] [Comment Edited] (LUCENE-4848) Add Directory implementations using NIO2 APIs

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

[jira] [Comment Edited] (LUCENE-4848) Add Directory implementations using NIO2 APIs

JIRA jira@apache.org

    [ https://issues.apache.org/jira/browse/LUCENE-4848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13607577#comment-13607577 ]

Michael Poindexter edited comment on LUCENE-4848 at 3/20/13 1:08 PM:
---------------------------------------------------------------------

bq. in my previous patch I also added the new Directory implementation into the list of FSDirectories to be used randomly while testing (when tests.directory is not given). This is missing in the latest patch. It should simply be added to the list in LuceneTestCase.java:

I actually removed it on purpose for now.  I wanted the changes to LuceneTestCase to be as uninvasive as possible, so I do some work to poke at the thread pool only if AsyncFSDirectory is selected as test.directory.  Given that, I didn't want for it to be randomly selected.

bq. If its in the base class every implementor has to implement it just for a stupid test that only works with SimpleFSDir. So this must be fixed. Just move it to SimpleFSDir or maybe we remove this method completely and fix the test.

Sounds good to me!

bq. Maybe this new directory should be spun off into a separate issue? Do we have any idea how it performs on different operating systems? Should it be in core lucene? Can we nuke WindowsDirectory? I just want to understand the benefits, because it seems it does an async IO request but then blocks on the future to return back. so this seems no different than sync io to me. Besides, the rest of this patch is plenty for one issue since its the .store API (we need to proceed with caution).

You are right, it is no different than sync IO in the end.  It's really only useful on Windows where it will use IO completion ports which means there is no need to synchronize on the file position (like java does for FileChannel, or Lucene does internally for SimpleFSDirectory).  On at least Linux/BSD/Mac the Sun JDK will just do what basically FileChannel does anyway and incur the additional overhead of notifying a future, so it's unlikely to be useful there unless one is highly concerned about the Thread.interrupt thing.

If you want to spin it off I'm fine with that, it's no problem for me to split the patch, let me know if everyone thinks that's a good idea.

bq. I opted not to override the default thread factory because it's never known when this thread factory will be constructed (it may be initialized before LuceneTestCase is instantiated) and, more importantly, I think we shouldn't be messing with the VM we're running on unless we really don't have a choice.

I agree!  If there is a better way I am all ears.  I couldn't find any other way to mark the threads as ignorable (the Sun JRE at least doesn't name them in any useful way, and they have no property that could be looked up).  We can't even figure out from looking at their stacks where they come from since they are just from a generic Executor.

I think as gross as it is it should be fairly safe to set the default thread factory in LuceneTestCase.  It should work across different vendor VM's since it is documented as part of the public Java API (see http://docs.oracle.com/javase/7/docs/api/java/nio/channels/AsynchronousChannelGroup.html).  You are right that it could have been initialized before LuceneTestCase is run, but I think that's not a problem.  The two cases here are:
1.) Someone already did asynch io causing the pool to be created before LuceneTestCase.  The tests should pass since the threads exist beforehand.
2.) Nobody has done async io yet in our VM (the likely case).  We'll create the threads using a special name that says to ignore them.


               
      was (Author: mpoindexter):
    .bq in my previous patch I also added the new Directory implementation into the list of FSDirectories to be used randomly while testing (when tests.directory is not given). This is missing in the latest patch. It should simply be added to the list in LuceneTestCase.java:

I actually removed it on purpose for now.  I wanted the changes to LuceneTestCase to be as uninvasive as possible, so I do some work to poke at the thread pool only if AsyncFSDirectory is selected as test.directory.  Given that, I didn't want for it to be randomly selected.

.bq If its in the base class every implementor has to implement it just for a stupid test that only works with SimpleFSDir. So this must be fixed. Just move it to SimpleFSDir or maybe we remove this method completely and fix the test.

Sounds good to me!

.bq Maybe this new directory should be spun off into a separate issue? Do we have any idea how it performs on different operating systems? Should it be in core lucene? Can we nuke WindowsDirectory? I just want to understand the benefits, because it seems it does an async IO request but then blocks on the future to return back. so this seems no different than sync io to me. Besides, the rest of this patch is plenty for one issue since its the .store API (we need to proceed with caution).

You are right, it is no different than sync IO in the end.  It's really only useful on Windows where it will use IO completion ports which means there is no need to synchronize on the file position (like java does for FileChannel, or Lucene does internally for SimpleFSDirectory).  On at least Linux/BSD/Mac the Sun JDK will just do what basically FileChannel does anyway and incur the additional overhead of notifying a future, so it's unlikely to be useful there unless one is highly concerned about the Thread.interrupt thing.

If you want to spin it off I'm fine with that, it's no problem for me to split the patch, let me know if everyone thinks that's a good idea.

.bq I opted not to override the default thread factory because it's never known when this thread factory will be constructed (it may be initialized before LuceneTestCase is instantiated) and, more importantly, I think we shouldn't be messing with the VM we're running on unless we really don't have a choice.

I agree!  If there is a better way I am all ears.  I couldn't find any other way to mark the threads as ignorable (the Sun JRE at least doesn't name them in any useful way, and they have no property that could be looked up).  We can't even figure out from looking at their stacks where they come from since they are just from a generic Executor.

I think as gross as it is it should be fairly safe to set the default thread factory in LuceneTestCase.  It should work across different vendor VM's since it is documented as part of the public Java API (see http://docs.oracle.com/javase/7/docs/api/java/nio/channels/AsynchronousChannelGroup.html).  You are right that it could have been initialized before LuceneTestCase is run, but I think that's not a problem.  The two cases here are:
1.) Someone already did asynch io causing the pool to be created before LuceneTestCase.  The tests should pass since the threads exist beforehand.
2.) Nobody has done async io yet in our VM (the likely case).  We'll create the threads using a special name that says to ignore them.


                 

> Add Directory implementations using NIO2 APIs
> ---------------------------------------------
>
>                 Key: LUCENE-4848
>                 URL: https://issues.apache.org/jira/browse/LUCENE-4848
>             Project: Lucene - Core
>          Issue Type: Task
>            Reporter: Michael Poindexter
>            Assignee: Uwe Schindler
>            Priority: Minor
>         Attachments: jdk7directory.zip, LUCENE-4848-MMapDirectory.patch, LUCENE-4848.patch, LUCENE-4848.patch, LUCENE-4848.patch, LUCENE-4848.patch, LUCENE-4848.patch, LUCENE-4848.patch.txt
>
>
> I have implemented 3 Directory subclasses using NIO2 API's (available on JDK7).  These may be suitable for inclusion in a Lucene contrib module.
> See the mailing list at http://lucene.markmail.org/thread/lrv7miivzmjm3ml5 for more details about this code and the advantages it provides.
> The code is attached as a zip to this issue.  I'll be happy to make any changes requested.  I've included some minimal smoke tests, but any help in how to use the normal Lucene tests to perform more thorough testing would be appreciated.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]