Direct I/O

classic Classic list List threaded Threaded
10 messages Options
Reply | Threaded
Open this post in threaded view
|

Direct I/O

Michael Sokolov-4
https://bugs.openjdk.java.net/browse/JDK-8189192 makes it appear that
Direct I/O is (or may be?) available now in JDK's since JDK10. Should
we try using that API in NativeUnixDirectory in order to avoid JNI
calls?

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Direct I/O

Michael McCandless-2
Whoa!  That would be awesome -- no more JNI to use Direct I/O?
Looks like you use it like this:
FileChannel fc = FileChannel.open(p, StandardOpenOption.WRITE,
                                  ExtendedOpenOption.DIRECT
But it looks like you need to enable the jdk.unsupported module, added with http://openjdk.java.net/jeps/260



On Mon, Sep 16, 2019 at 11:55 AM Michael Sokolov <[hidden email]> wrote:
https://bugs.openjdk.java.net/browse/JDK-8189192 makes it appear that
Direct I/O is (or may be?) available now in JDK's since JDK10. Should
we try using that API in NativeUnixDirectory in order to avoid JNI
calls?

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Direct I/O

Dawid Weiss-2
Isn't that restricted to aligned block-only access though? I can
imagine this would complicate the implementation if somebody wanted to
use it directly.

Dawid

On Tue, Sep 17, 2019 at 5:37 PM Michael McCandless
<[hidden email]> wrote:

>
> Whoa!  That would be awesome -- no more JNI to use Direct I/O?
> Looks like you use it like this:
>
> FileChannel fc = FileChannel.open(p, StandardOpenOption.WRITE,
>                                   ExtendedOpenOption.DIRECT
>
> But it looks like you need to enable the jdk.unsupported module, added with http://openjdk.java.net/jeps/260
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
>
> On Mon, Sep 16, 2019 at 11:55 AM Michael Sokolov <[hidden email]> wrote:
>>
>> https://bugs.openjdk.java.net/browse/JDK-8189192 makes it appear that
>> Direct I/O is (or may be?) available now in JDK's since JDK10. Should
>> we try using that API in NativeUnixDirectory in order to avoid JNI
>> calls?
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [hidden email]
>> For additional commands, e-mail: [hidden email]
>>

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Direct I/O

Uwe Schindler
We discussed this already on Berlinbuzzwords (Mike and Michael). Yes it's possible and may work for merges where block io is possible. But most of us said: it's fine to not use io cache for merging, but it won't make pages hot. So merges are invisible to OS, so you have to warm merged segments if you write directly. If you read directly on merging, you won't pollute cache with one time reads, but it also won't use cache if already cached.
We should better make a proposal for f/madvise. The jdk people are open for that, and I am jdk committer now, so I can make a prototype.

Uwe

Am September 17, 2019 4:48:26 PM UTC schrieb Dawid Weiss <[hidden email]>:
Isn't that restricted to aligned block-only access though? I can
imagine this would complicate the implementation if somebody wanted to
use it directly.

Dawid

On Tue, Sep 17, 2019 at 5:37 PM Michael McCandless
<[hidden email]> wrote:

Whoa! That would be awesome -- no more JNI to use Direct I/O?
Looks like you use it like this:

FileChannel fc = FileChannel.open(p, StandardOpenOption.WRITE,
ExtendedOpenOption.DIRECT

But it looks like you need to enable the jdk.unsupported module, added with http://openjdk.java.net/jeps/260

Mike McCandless

http://blog.mikemccandless.com


On Mon, Sep 16, 2019 at 11:55 AM Michael Sokolov <[hidden email]> wrote:

https://bugs.openjdk.java.net/browse/JDK-8189192 makes it appear that
Direct I/O is (or may be?) available now in JDK's since JDK10. Should
we try using that API in NativeUnixDirectory in order to avoid JNI
calls?
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]


To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]


--
Uwe Schindler
Achterdiek 19, 28357 Bremen
https://www.thetaphi.de
Reply | Threaded
Open this post in threaded view
|

Re: Direct I/O

Michael McCandless-2
Dawid, it's confusing: direct IO is different from a direct ByteBuffer!

Direct IO means you bypass all kernel "smarts", so the Linux buffer cache is not used, no IO scheduling, no write cache that the pdflush daemon must periodically move to disk, etc.  This is normally a bad idea, and better to use fadvise/madvise to give kernel hints about what you are doing, and use the buffer cache for what it's good at.  Linus hates that direct IO is even an option for us ...

Back when I wrote NativeUnixDirectory, the idea was to prevent ongoing merges from so heavily impacting ongoing searches, when you are doing indexing and searching on one node.  We open the newly merged segments files using direct IO, and do our own buffering, and then all writes go straight to disk instead of using up precious hot pages that are in use for searching.  I think I ran some simple performance tests back then but I don't remember the results ... more testing is needed to see if it really helps.

At Amazon, we are using segment based replication ever 60 seconds to copy newly indexed segments out to all searchers, so we never have nodes doing both indexing or searching, it's either or ... but, copying out max sized newly merged segments to the searchers is causing some thrashing so we are exploring using direct IO for those writes, and then separately warming the new segments after the copy.

On Tue, Sep 17, 2019 at 1:16 PM Uwe Schindler <[hidden email]> wrote:
We discussed this already on Berlinbuzzwords (Mike and Michael). Yes it's possible and may work for merges where block io is possible. But most of us said: it's fine to not use io cache for merging, but it won't make pages hot. So merges are invisible to OS, so you have to warm merged segments if you write directly. If you read directly on merging, you won't pollute cache with one time reads, but it also won't use cache if already cached.
We should better make a proposal for f/madvise. The jdk people are open for that, and I am jdk committer now, so I can make a prototype.

Uwe

Am September 17, 2019 4:48:26 PM UTC schrieb Dawid Weiss <[hidden email]>:
Isn't that restricted to aligned block-only access though? I can
imagine this would complicate the implementation if somebody wanted to
use it directly.

Dawid

On Tue, Sep 17, 2019 at 5:37 PM Michael McCandless
<[hidden email]> wrote:

Whoa! That would be awesome -- no more JNI to use Direct I/O?
Looks like you use it like this:

FileChannel fc = FileChannel.open(p, StandardOpenOption.WRITE,
ExtendedOpenOption.DIRECT

But it looks like you need to enable the jdk.unsupported module, added with http://openjdk.java.net/jeps/260

Mike McCandless

http://blog.mikemccandless.com


On Mon, Sep 16, 2019 at 11:55 AM Michael Sokolov <[hidden email]> wrote:

https://bugs.openjdk.java.net/browse/JDK-8189192 makes it appear that
Direct I/O is (or may be?) available now in JDK's since JDK10. Should
we try using that API in NativeUnixDirectory in order to avoid JNI
calls?
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]


To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]


--
Uwe Schindler
Achterdiek 19, 28357 Bremen
https://www.thetaphi.de
Reply | Threaded
Open this post in threaded view
|

Re: Direct I/O

Michael McCandless-2
In reply to this post by Uwe Schindler
Aha!  Yes, Uwe, now I remember you explained this to me, that we can now do direct IO purely in java.  I think we should fix up NativeUnixDirectory, and then run some more benchmarks to see if it helps?  I'll open an issue.

And definitely big +1 to give us fadvise/madvise in Java so we can test that too.  It's better long term to give hints to the kernel and then let it manage its buffer cache appropriately.

On Tue, Sep 17, 2019 at 1:16 PM Uwe Schindler <[hidden email]> wrote:
We discussed this already on Berlinbuzzwords (Mike and Michael). Yes it's possible and may work for merges where block io is possible. But most of us said: it's fine to not use io cache for merging, but it won't make pages hot. So merges are invisible to OS, so you have to warm merged segments if you write directly. If you read directly on merging, you won't pollute cache with one time reads, but it also won't use cache if already cached.
We should better make a proposal for f/madvise. The jdk people are open for that, and I am jdk committer now, so I can make a prototype.

Uwe

Am September 17, 2019 4:48:26 PM UTC schrieb Dawid Weiss <[hidden email]>:
Isn't that restricted to aligned block-only access though? I can
imagine this would complicate the implementation if somebody wanted to
use it directly.

Dawid

On Tue, Sep 17, 2019 at 5:37 PM Michael McCandless
<[hidden email]> wrote:

Whoa! That would be awesome -- no more JNI to use Direct I/O?
Looks like you use it like this:

FileChannel fc = FileChannel.open(p, StandardOpenOption.WRITE,
ExtendedOpenOption.DIRECT

But it looks like you need to enable the jdk.unsupported module, added with http://openjdk.java.net/jeps/260

Mike McCandless

http://blog.mikemccandless.com


On Mon, Sep 16, 2019 at 11:55 AM Michael Sokolov <[hidden email]> wrote:

https://bugs.openjdk.java.net/browse/JDK-8189192 makes it appear that
Direct I/O is (or may be?) available now in JDK's since JDK10. Should
we try using that API in NativeUnixDirectory in order to avoid JNI
calls?
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]


To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]


--
Uwe Schindler
Achterdiek 19, 28357 Bremen
https://www.thetaphi.de
Reply | Threaded
Open this post in threaded view
|

Re: Direct I/O

Michael McCandless-2

On Wed, Sep 18, 2019 at 9:23 AM Michael McCandless <[hidden email]> wrote:
Aha!  Yes, Uwe, now I remember you explained this to me, that we can now do direct IO purely in java.  I think we should fix up NativeUnixDirectory, and then run some more benchmarks to see if it helps?  I'll open an issue.

And definitely big +1 to give us fadvise/madvise in Java so we can test that too.  It's better long term to give hints to the kernel and then let it manage its buffer cache appropriately.

On Tue, Sep 17, 2019 at 1:16 PM Uwe Schindler <[hidden email]> wrote:
We discussed this already on Berlinbuzzwords (Mike and Michael). Yes it's possible and may work for merges where block io is possible. But most of us said: it's fine to not use io cache for merging, but it won't make pages hot. So merges are invisible to OS, so you have to warm merged segments if you write directly. If you read directly on merging, you won't pollute cache with one time reads, but it also won't use cache if already cached.
We should better make a proposal for f/madvise. The jdk people are open for that, and I am jdk committer now, so I can make a prototype.

Uwe

Am September 17, 2019 4:48:26 PM UTC schrieb Dawid Weiss <[hidden email]>:
Isn't that restricted to aligned block-only access though? I can
imagine this would complicate the implementation if somebody wanted to
use it directly.

Dawid

On Tue, Sep 17, 2019 at 5:37 PM Michael McCandless
<[hidden email]> wrote:

Whoa! That would be awesome -- no more JNI to use Direct I/O?
Looks like you use it like this:

FileChannel fc = FileChannel.open(p, StandardOpenOption.WRITE,
ExtendedOpenOption.DIRECT

But it looks like you need to enable the jdk.unsupported module, added with http://openjdk.java.net/jeps/260

Mike McCandless

http://blog.mikemccandless.com


On Mon, Sep 16, 2019 at 11:55 AM Michael Sokolov <[hidden email]> wrote:

https://bugs.openjdk.java.net/browse/JDK-8189192 makes it appear that
Direct I/O is (or may be?) available now in JDK's since JDK10. Should
we try using that API in NativeUnixDirectory in order to avoid JNI
calls?
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]


To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]


--
Uwe Schindler
Achterdiek 19, 28357 Bremen
https://www.thetaphi.de
Reply | Threaded
Open this post in threaded view
|

Re: Direct I/O

Dawid Weiss-2
In reply to this post by Michael McCandless-2
Thanks for the explanation, Mike!

D.

On Wed, Sep 18, 2019 at 3:21 PM Michael McCandless
<[hidden email]> wrote:

>
> Dawid, it's confusing: direct IO is different from a direct ByteBuffer!
>
> Direct IO means you bypass all kernel "smarts", so the Linux buffer cache is not used, no IO scheduling, no write cache that the pdflush daemon must periodically move to disk, etc.  This is normally a bad idea, and better to use fadvise/madvise to give kernel hints about what you are doing, and use the buffer cache for what it's good at.  Linus hates that direct IO is even an option for us ...
>
> Back when I wrote NativeUnixDirectory, the idea was to prevent ongoing merges from so heavily impacting ongoing searches, when you are doing indexing and searching on one node.  We open the newly merged segments files using direct IO, and do our own buffering, and then all writes go straight to disk instead of using up precious hot pages that are in use for searching.  I think I ran some simple performance tests back then but I don't remember the results ... more testing is needed to see if it really helps.
>
> At Amazon, we are using segment based replication ever 60 seconds to copy newly indexed segments out to all searchers, so we never have nodes doing both indexing or searching, it's either or ... but, copying out max sized newly merged segments to the searchers is causing some thrashing so we are exploring using direct IO for those writes, and then separately warming the new segments after the copy.
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
>
> On Tue, Sep 17, 2019 at 1:16 PM Uwe Schindler <[hidden email]> wrote:
>>
>> We discussed this already on Berlinbuzzwords (Mike and Michael). Yes it's possible and may work for merges where block io is possible. But most of us said: it's fine to not use io cache for merging, but it won't make pages hot. So merges are invisible to OS, so you have to warm merged segments if you write directly. If you read directly on merging, you won't pollute cache with one time reads, but it also won't use cache if already cached.
>> We should better make a proposal for f/madvise. The jdk people are open for that, and I am jdk committer now, so I can make a prototype.
>>
>> Uwe
>>
>> Am September 17, 2019 4:48:26 PM UTC schrieb Dawid Weiss <[hidden email]>:
>>>
>>> Isn't that restricted to aligned block-only access though? I can
>>> imagine this would complicate the implementation if somebody wanted to
>>> use it directly.
>>>
>>> Dawid
>>>
>>> On Tue, Sep 17, 2019 at 5:37 PM Michael McCandless
>>> <[hidden email]> wrote:
>>>>
>>>>
>>>>  Whoa!  That would be awesome -- no more JNI to use Direct I/O?
>>>>  Looks like you use it like this:
>>>>
>>>>  FileChannel fc = FileChannel.open(p, StandardOpenOption.WRITE,
>>>>                                    ExtendedOpenOption.DIRECT
>>>>
>>>>  But it looks like you need to enable the jdk.unsupported module, added with http://openjdk.java.net/jeps/260
>>>>
>>>>  Mike McCandless
>>>>
>>>>  http://blog.mikemccandless.com
>>>>
>>>>
>>>>  On Mon, Sep 16, 2019 at 11:55 AM Michael Sokolov <[hidden email]> wrote:
>>>>>
>>>>>
>>>>>  https://bugs.openjdk.java.net/browse/JDK-8189192 makes it appear that
>>>>>  Direct I/O is (or may be?) available now in JDK's since JDK10. Should
>>>>>  we try using that API in NativeUnixDirectory in order to avoid JNI
>>>>>  calls?
>>>>> ________________________________
>>>>>  To unsubscribe, e-mail: [hidden email]
>>>>>  For additional commands, e-mail: [hidden email]
>>>>>
>>> ________________________________
>>> To unsubscribe, e-mail: [hidden email]
>>> For additional commands, e-mail: [hidden email]
>>>
>>
>> --
>> Uwe Schindler
>> Achterdiek 19, 28357 Bremen
>> https://www.thetaphi.de

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Direct I/O

Uwe Schindler
The direct io has in fact the problem which was just wrongly named by Dawid: Block alignment is needed - on disk and not in memory. In short: You can't read or write a single byte anywhere in file; you need a buffering layer in-between that takes care of alignment. NativeUnixDir does this.

Uwe

Am September 18, 2019 1:54:35 PM UTC schrieb Dawid Weiss <[hidden email]>:
Thanks for the explanation, Mike!

D.

On Wed, Sep 18, 2019 at 3:21 PM Michael McCandless
<[hidden email]> wrote:

Dawid, it's confusing: direct IO is different from a direct ByteBuffer!

Direct IO means you bypass all kernel "smarts", so the Linux buffer cache is not used, no IO scheduling, no write cache that the pdflush daemon must periodically move to disk, etc. This is normally a bad idea, and better to use fadvise/madvise to give kernel hints about what you are doing, and use the buffer cache for what it's good at. Linus hates that direct IO is even an option for us ...

Back when I wrote NativeUnixDirectory, the idea was to prevent ongoing merges from so heavily impacting ongoing searches, when you are doing indexing and searching on one node. We open the newly merged segments files using direct IO, and do our own buffering, and then all writes go straight to disk instead of using up precious hot pages that are in use for searching. I think I ran some simple performance tests back then but I don't remember the results ... more testing is needed to see if it really helps.

At Amazon, we are using segment based replication ever 60 seconds to copy newly indexed segments out to all searchers, so we never have nodes doing both indexing or searching, it's either or ... but, copying out max sized newly merged segments to the searchers is causing some thrashing so we are exploring using direct IO for those writes, and then separately warming the new segments after the copy.

Mike McCandless

http://blog.mikemccandless.com


On Tue, Sep 17, 2019 at 1:16 PM Uwe Schindler <[hidden email]> wrote:

We discussed this already on Berlinbuzzwords (Mike and Michael). Yes it's possible and may work for merges where block io is possible. But most of us said: it's fine to not use io cache for merging, but it won't make pages hot. So merges are invisible to OS, so you have to warm merged segments if you write directly. If you read directly on merging, you won't pollute cache with one time reads, but it also won't use cache if already cached.
We should better make a proposal for f/madvise. The jdk people are open for that, and I am jdk committer now, so I can make a prototype.

Uwe

Am September 17, 2019 4:48:26 PM UTC schrieb Dawid Weiss <[hidden email]>:

Isn't that restricted to aligned block-only access though? I can
imagine this would complicate the implementation if somebody wanted to
use it directly.

Dawid

On Tue, Sep 17, 2019 at 5:37 PM Michael McCandless
<[hidden email]> wrote:


Whoa! That would be awesome -- no more JNI to use Direct I/O?
Looks like you use it like this:

FileChannel fc = FileChannel.open(p, StandardOpenOption.WRITE,
ExtendedOpenOption.DIRECT

But it looks like you need to enable the jdk.unsupported module, added with http://openjdk.java.net/jeps/260

Mike McCandless

http://blog.mikemccandless.com


On Mon, Sep 16, 2019 at 11:55 AM Michael Sokolov <[hidden email]> wrote:


https://bugs.openjdk.java.net/browse/JDK-8189192 makes it appear that
Direct I/O is (or may be?) available now in JDK's since JDK10. Should
we try using that API in NativeUnixDirectory in order to avoid JNI
calls?
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]


To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]


--
Uwe Schindler
Achterdiek 19, 28357 Bremen
https://www.thetaphi.de

To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]


--
Uwe Schindler
Achterdiek 19, 28357 Bremen
https://www.thetaphi.de
Reply | Threaded
Open this post in threaded view
|

Re: Direct I/O

Michael McCandless-2
Ahh yes sorry you are right Dawid and Uwe.

On Wed, Sep 18, 2019 at 10:11 AM Uwe Schindler <[hidden email]> wrote:
The direct io has in fact the problem which was just wrongly named by Dawid: Block alignment is needed - on disk and not in memory. In short: You can't read or write a single byte anywhere in file; you need a buffering layer in-between that takes care of alignment. NativeUnixDir does this.

Uwe

Am September 18, 2019 1:54:35 PM UTC schrieb Dawid Weiss <[hidden email]>:
Thanks for the explanation, Mike!

D.

On Wed, Sep 18, 2019 at 3:21 PM Michael McCandless
<[hidden email]> wrote:

Dawid, it's confusing: direct IO is different from a direct ByteBuffer!

Direct IO means you bypass all kernel "smarts", so the Linux buffer cache is not used, no IO scheduling, no write cache that the pdflush daemon must periodically move to disk, etc. This is normally a bad idea, and better to use fadvise/madvise to give kernel hints about what you are doing, and use the buffer cache for what it's good at. Linus hates that direct IO is even an option for us ...

Back when I wrote NativeUnixDirectory, the idea was to prevent ongoing merges from so heavily impacting ongoing searches, when you are doing indexing and searching on one node. We open the newly merged segments files using direct IO, and do our own buffering, and then all writes go straight to disk instead of using up precious hot pages that are in use for searching. I think I ran some simple performance tests back then but I don't remember the results ... more testing is needed to see if it really helps.

At Amazon, we are using segment based replication ever 60 seconds to copy newly indexed segments out to all searchers, so we never have nodes doing both indexing or searching, it's either or ... but, copying out max sized newly merged segments to the searchers is causing some thrashing so we are exploring using direct IO for those writes, and then separately warming the new segments after the copy.

Mike McCandless

http://blog.mikemccandless.com


On Tue, Sep 17, 2019 at 1:16 PM Uwe Schindler <[hidden email]> wrote:

We discussed this already on Berlinbuzzwords (Mike and Michael). Yes it's possible and may work for merges where block io is possible. But most of us said: it's fine to not use io cache for merging, but it won't make pages hot. So merges are invisible to OS, so you have to warm merged segments if you write directly. If you read directly on merging, you won't pollute cache with one time reads, but it also won't use cache if already cached.
We should better make a proposal for f/madvise. The jdk people are open for that, and I am jdk committer now, so I can make a prototype.

Uwe

Am September 17, 2019 4:48:26 PM UTC schrieb Dawid Weiss <[hidden email]>:

Isn't that restricted to aligned block-only access though? I can
imagine this would complicate the implementation if somebody wanted to
use it directly.

Dawid

On Tue, Sep 17, 2019 at 5:37 PM Michael McCandless
<[hidden email]> wrote:


Whoa! That would be awesome -- no more JNI to use Direct I/O?
Looks like you use it like this:

FileChannel fc = FileChannel.open(p, StandardOpenOption.WRITE,
ExtendedOpenOption.DIRECT

But it looks like you need to enable the jdk.unsupported module, added with http://openjdk.java.net/jeps/260

Mike McCandless

http://blog.mikemccandless.com


On Mon, Sep 16, 2019 at 11:55 AM Michael Sokolov <[hidden email]> wrote:


https://bugs.openjdk.java.net/browse/JDK-8189192 makes it appear that
Direct I/O is (or may be?) available now in JDK's since JDK10. Should
we try using that API in NativeUnixDirectory in order to avoid JNI
calls?
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]


To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]


--
Uwe Schindler
Achterdiek 19, 28357 Bremen
https://www.thetaphi.de

To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]


--
Uwe Schindler
Achterdiek 19, 28357 Bremen
https://www.thetaphi.de