detected corrupted index / performance improvement

classic Classic list List threaded Threaded
26 messages Options
12
Reply | Threaded
Open this post in threaded view
|

Re: detected corrupted index / performance improvement

Mike Klaas
Oh, it certainly causes some random access--I don't deny that.  I  
just want to emphasize that this isn't at all the same as all "random  
writes", which would be expected to perform an order-mag slower.

Just did a test where I wrote out a 1gig file in 1K chunks.  Then  
wrote it out in 2files, alternating 512 byte chunks, then 4 files/  
256 byte chunks.  Some speed is lost--perhaps 10% at each doubling--
but the speed is still essentially "sequential" speed.  You can get  
back the original performance by using consistent sized chunks (1K to  
each file round-robin).

HDD controllers are actually quite good at batching writes into  
sequentially.  Why else do you think sync() takes to long :)

-Mike

On 7-Feb-08, at 3:35 PM, robert engels wrote:

> I don't think that is true - but I'm probably wrong though :).
>
> My understanding is that several files are written in parallel  
> (during the merge), causing random access. After the files are  
> written, then they are all reread and written as a CFS file  
> (essential sequential - although the read and write is going to  
> cause head movement).
>
> The code:
>
> private IndexOutput tvx, tvf, tvd;              // To write term  
> vectors
> private FieldsWriter fieldsWriter;
>
> is my clue that several files are written at once.
>
> On Feb 7, 2008, at 5:19 PM, Mike Klaas wrote:
>
>>
>> On 7-Feb-08, at 2:00 PM, robert engels wrote:
>>
>>> My point is that commit needs to be used in most applications,  
>>> and the commit in Lucene is very slow.
>>>
>>> You don't have 2x the IO cost, mainly because only the log file  
>>> needs to be sync'd.  The index only has to be sync'd eventually,  
>>> in order to prune the logfile - this can be done in the  
>>> background, improving the performance of update and commit cycle.
>>>
>>> Also, writing the log file is very efficiently because it is an  
>>> append/sequential operation. Writing the segment files writes  
>>> multiple files - essentially causing random access writes.
>>
>> For large segments, multiple sequentially-written large files  
>> should perform similarly to one large sequentially-written file.  
>> It is only close to random access on the smallest segments (which  
>> a sufficiently-large flush-by-ram shouldn't produce).
>>
>> -Mike
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [hidden email]
>> For additional commands, e-mail: [hidden email]
>>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: detected corrupted index / performance improvement

Michael McCandless-2

Mike, you're right: all lucene files are written sequentially
(flushing or merging).

It's just a matter of how many are open at once, and whether we are
also reading from source(s) files, which affects IO throughput far
less than truly random access writes.

Plus, as of LUCENE-843, bytes are written to tvx/tvd/tvf and fdx/fdt
"as we go", which is better because we get the bytes to the OS earlier
so it can properly schedule their arrival to stable storage.  So by
the time we flush a segment, the OS should have committed most of
those bytes.

When writing a segment, we write fnm, then open tii/tis/frq/prx at
once and write (sequentially) to them, then write to nrm.

Merging is far more IO intensive.  With mergeFactor=10, we read from
40 input streams and write to 4 output streams when merging the
tii/tis/frq/prx files.

Mike

Mike Klaas wrote:

> Oh, it certainly causes some random access--I don't deny that.  I  
> just want to emphasize that this isn't at all the same as all  
> "random writes", which would be expected to perform an order-mag  
> slower.
>
> Just did a test where I wrote out a 1gig file in 1K chunks.  Then  
> wrote it out in 2files, alternating 512 byte chunks, then 4 files/  
> 256 byte chunks.  Some speed is lost--perhaps 10% at each doubling--
> but the speed is still essentially "sequential" speed.  You can get  
> back the original performance by using consistent sized chunks (1K  
> to each file round-robin).
>
> HDD controllers are actually quite good at batching writes into  
> sequentially.  Why else do you think sync() takes to long :)
>
> -Mike
>
> On 7-Feb-08, at 3:35 PM, robert engels wrote:
>
>> I don't think that is true - but I'm probably wrong though :).
>>
>> My understanding is that several files are written in parallel  
>> (during the merge), causing random access. After the files are  
>> written, then they are all reread and written as a CFS file  
>> (essential sequential - although the read and write is going to  
>> cause head movement).
>>
>> The code:
>>
>> private IndexOutput tvx, tvf, tvd;              // To write term  
>> vectors
>> private FieldsWriter fieldsWriter;
>>
>> is my clue that several files are written at once.
>>
>> On Feb 7, 2008, at 5:19 PM, Mike Klaas wrote:
>>
>>>
>>> On 7-Feb-08, at 2:00 PM, robert engels wrote:
>>>
>>>> My point is that commit needs to be used in most applications,  
>>>> and the commit in Lucene is very slow.
>>>>
>>>> You don't have 2x the IO cost, mainly because only the log file  
>>>> needs to be sync'd.  The index only has to be sync'd eventually,  
>>>> in order to prune the logfile - this can be done in the  
>>>> background, improving the performance of update and commit cycle.
>>>>
>>>> Also, writing the log file is very efficiently because it is an  
>>>> append/sequential operation. Writing the segment files writes  
>>>> multiple files - essentially causing random access writes.
>>>
>>> For large segments, multiple sequentially-written large files  
>>> should perform similarly to one large sequentially-written file.  
>>> It is only close to random access on the smallest segments (which  
>>> a sufficiently-large flush-by-ram shouldn't produce).
>>>
>>> -Mike
>>>
>>>
>>> --------------------------------------------------------------------
>>> -
>>> To unsubscribe, e-mail: [hidden email]
>>> For additional commands, e-mail: [hidden email]
>>>
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: detected corrupted index / performance improvement

Doug Cutting
Michael McCandless wrote:
> Merging is far more IO intensive.  With mergeFactor=10, we read from
> 40 input streams and write to 4 output streams when merging the
> tii/tis/frq/prx files.

If your disk can transfer at 50MB/s, and takes 5ms/seek, then 250kB
reads and writes are the break-even point, where half the time is spent
seeking and half transferring, and throughput is 25MB/s.  With 44 files
open, that means the OS needs just 11MB of buffering to keep things
above this threshold.  Since most systems have considerably larger
buffer pools than 11MB, merging with mergeFactor=10 shouldn't be seek-bound.

Doug


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: detected corrupted index / performance improvement

Robert Engels
But that would mean we should be using at least 250k buffers for the  
IndexInput ? Not the 16k or so that is the default.

Is the OS smart enough to figure out that the file is being  
sequentially read, and adjust its physical read size to 256k, based  
on the other concurrent IO operations. Seems this would be hard for  
it to figure out, and have it not perform poorly in the general case.

On Feb 8, 2008, at 11:25 AM, Doug Cutting wrote:

> Michael McCandless wrote:
>> Merging is far more IO intensive.  With mergeFactor=10, we read from
>> 40 input streams and write to 4 output streams when merging the
>> tii/tis/frq/prx files.
>
> If your disk can transfer at 50MB/s, and takes 5ms/seek, then 250kB  
> reads and writes are the break-even point, where half the time is  
> spent seeking and half transferring, and throughput is 25MB/s.  
> With 44 files open, that means the OS needs just 11MB of buffering  
> to keep things above this threshold.  Since most systems have  
> considerably larger buffer pools than 11MB, merging with  
> mergeFactor=10 shouldn't be seek-bound.
>
> Doug
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: detected corrupted index / performance improvement

Doug Cutting
robert engels wrote:
> But that would mean we should be using at least 250k buffers for the
> IndexInput ? Not the 16k or so that is the default.
>
> Is the OS smart enough to figure out that the file is being sequentially
> read, and adjust its physical read size to 256k, based on the other
> concurrent IO operations. Seems this would be hard for it to figure out,
> and have it not perform poorly in the general case.

Benchmarks have shown that OSes do a decent job at this.  You can
increase the applications buffer sizes, but you might just end up
wasting memory if the OS is already doing the right thing.  The linux
kernel dynamically increases the readahead window based on the access
pattern: the more you read sequentially, the larger the readahead window.

Doug

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: detected corrupted index / performance improvement

Doug Cutting
Doug Cutting wrote:
> The linux
> kernel dynamically increases the readahead window based on the access
> pattern: the more you read sequentially, the larger the readahead window.

Sorry, it appears that's in 2.6.23, which isn't yet broadly used.

http://kernelnewbies.org/Linux_2_6_23#head-102af265937262a7a21766ae58fddc1a29a5d8d7

In the meantime, on Linux, one can set both the kernel's readahead
buffer size and the device's.  These are additive: the first determines
what requests will be made to the device, the second determines how much
beyond that the device will attempt to read.

# set kernel read-ahead buffer to 1MB
echo 1024 > /sys/block/sda/queue/read_ahead_kb

# set device read-ahead buffer to 1024 sectors
hdparm -a1024 /dev/sda1

I don't know how much these actually help things...

Doug

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

12