Optimizing indexes with mulitiple processors?

classic Classic list List threaded Threaded
20 messages Options
Reply | Threaded
Open this post in threaded view
|

Optimizing indexes with mulitiple processors?

Kevin Burton
Is it possible to get Lucene to do an index optimize on multiple
processors?

Its a single threaded algorithm currently right?

Its a shame since I have a quad  machine but I'm only using 1/4th of the
capacity.  Thats a heck of a performance hit.

Kevin

--


Use Rojo (RSS/Atom aggregator)! - visit http://rojo.com.
See irc.freenode.net #rojo if you want to chat.

Rojo is Hiring! - http://www.rojonetworks.com/JobsAtRojo.html

   Kevin A. Burton, Location - San Francisco, CA
      AIM/YIM - sfburtonator,  Web - http://peerfear.org/
GPG fingerprint: 5FB2 F3E2 760E 70A8 6174 D393 E84D 8D04 99F1 4412


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Optimizing indexes with mulitiple processors?

Bill Au
Optimize is disk I/O bound.  So I am not sure what multiple CPUs will buy you.

Bill

On 6/9/05, Kevin Burton <[hidden email]> wrote:

> Is it possible to get Lucene to do an index optimize on multiple
> processors?
>
> Its a single threaded algorithm currently right?
>
> Its a shame since I have a quad  machine but I'm only using 1/4th of the
> capacity.  Thats a heck of a performance hit.
>
> Kevin
>
> --
>
>
> Use Rojo (RSS/Atom aggregator)! - visit http://rojo.com.
> See irc.freenode.net #rojo if you want to chat.
>
> Rojo is Hiring! - http://www.rojonetworks.com/JobsAtRojo.html
>
>    Kevin A. Burton, Location - San Francisco, CA
>       AIM/YIM - sfburtonator,  Web - http://peerfear.org/
> GPG fingerprint: 5FB2 F3E2 760E 70A8 6174 D393 E84D 8D04 99F1 4412
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Optimizing indexes with mulitiple processors?

Chris Collins
In reply to this post by Kevin Burton
You can segment your indexes into n physical parts (perhaps 4), then index
those n parts concurrently.  When you query you will use some kind of mulit
searcher to span the parts.  The one thing you may care about is that if you
are going todo a recrawl / update of documents against the existing index, then
you will need to have a reproducible way of hashing your documents over the
index (assuming you are deleting the previous document).  

Hope that helps

Chris

--- Kevin Burton <[hidden email]> wrote:

> Is it possible to get Lucene to do an index optimize on multiple
> processors?
>
> Its a single threaded algorithm currently right?
>
> Its a shame since I have a quad  machine but I'm only using 1/4th of the
> capacity.  Thats a heck of a performance hit.
>
> Kevin
>
> --
>
>
> Use Rojo (RSS/Atom aggregator)! - visit http://rojo.com.
> See irc.freenode.net #rojo if you want to chat.
>
> Rojo is Hiring! - http://www.rojonetworks.com/JobsAtRojo.html
>
>    Kevin A. Burton, Location - San Francisco, CA
>       AIM/YIM - sfburtonator,  Web - http://peerfear.org/
> GPG fingerprint: 5FB2 F3E2 760E 70A8 6174 D393 E84D 8D04 99F1 4412
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>



---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Optimizing indexes with mulitiple processors?

Chris Collins
In reply to this post by Bill Au
I found with a fast RAID controller that I can easily be CPU bound, some of the
io is related to latency.  You can hide the latency by having overlapping IO
(you get that with multiple indexers going on at the same time).

I think there possibly could be more horsepower you can get out of the inverter
and merge aspects of the indexing.  I am currently jprobeing this at the
moment.

If your using high latency disks (such as a filer) during merge you may want to
consider increasing the size of the buffers to reduce the amount of rpc's to
the filer....however my previous attempts to change this failed.

C

--- Bill Au <[hidden email]> wrote:

> Optimize is disk I/O bound.  So I am not sure what multiple CPUs will buy
> you.
>
> Bill
>
> On 6/9/05, Kevin Burton <[hidden email]> wrote:
> > Is it possible to get Lucene to do an index optimize on multiple
> > processors?
> >
> > Its a single threaded algorithm currently right?
> >
> > Its a shame since I have a quad  machine but I'm only using 1/4th of the
> > capacity.  Thats a heck of a performance hit.
> >
> > Kevin
> >
> > --
> >
> >
> > Use Rojo (RSS/Atom aggregator)! - visit http://rojo.com.
> > See irc.freenode.net #rojo if you want to chat.
> >
> > Rojo is Hiring! - http://www.rojonetworks.com/JobsAtRojo.html
> >
> >    Kevin A. Burton, Location - San Francisco, CA
> >       AIM/YIM - sfburtonator,  Web - http://peerfear.org/
> > GPG fingerprint: 5FB2 F3E2 760E 70A8 6174 D393 E84D 8D04 99F1 4412
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: [hidden email]
> > For additional commands, e-mail: [hidden email]
> >
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>




---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Optimizing indexes with mulitiple processors?

Chris Collins
To follow up.  I was surprised to find that from the experiment of indexing 4k
documents to local disk (Dell PE with onboard RAID with 256MB cache). I got the
following data from my profile:

70 % time was spent in inverting the document
30 % in merge

Ok that part isnt surprising.  However only about 1% of 30% of the merge was
spent in the OS.flush call (not very IO bound at all with this controller).
And almost all of the invert was in the StandardAnalyzer pegged in the javacc
generated code.  The profile was based upon duration and not cpu. The profiler
was JProbe.  I was using a lower case analyzer and this was a slightly hacked
lucene-1.4.3 source code line that I swapped out some of the synchronized data
structures (hashtable ->hashmap,  Vector->ArrayList).

<<ChRiS>>

--- Chris Collins <[hidden email]> wrote:

> I found with a fast RAID controller that I can easily be CPU bound, some of
> the
> io is related to latency.  You can hide the latency by having overlapping IO
> (you get that with multiple indexers going on at the same time).
>
> I think there possibly could be more horsepower you can get out of the
> inverter
> and merge aspects of the indexing.  I am currently jprobeing this at the
> moment.
>
> If your using high latency disks (such as a filer) during merge you may want
> to
> consider increasing the size of the buffers to reduce the amount of rpc's to
> the filer....however my previous attempts to change this failed.
>
> C
>
> --- Bill Au <[hidden email]> wrote:
>
> > Optimize is disk I/O bound.  So I am not sure what multiple CPUs will buy
> > you.
> >
> > Bill
> >
> > On 6/9/05, Kevin Burton <[hidden email]> wrote:
> > > Is it possible to get Lucene to do an index optimize on multiple
> > > processors?
> > >
> > > Its a single threaded algorithm currently right?
> > >
> > > Its a shame since I have a quad  machine but I'm only using 1/4th of the
> > > capacity.  Thats a heck of a performance hit.
> > >
> > > Kevin
> > >
> > > --
> > >
> > >
> > > Use Rojo (RSS/Atom aggregator)! - visit http://rojo.com.
> > > See irc.freenode.net #rojo if you want to chat.
> > >
> > > Rojo is Hiring! - http://www.rojonetworks.com/JobsAtRojo.html
> > >
> > >    Kevin A. Burton, Location - San Francisco, CA
> > >       AIM/YIM - sfburtonator,  Web - http://peerfear.org/
> > > GPG fingerprint: 5FB2 F3E2 760E 70A8 6174 D393 E84D 8D04 99F1 4412
> > >
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: [hidden email]
> > > For additional commands, e-mail: [hidden email]
> > >
> > >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: [hidden email]
> > For additional commands, e-mail: [hidden email]
> >
> >
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>



---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Optimizing indexes with mulitiple processors?

Kevin Burton
In reply to this post by Bill Au
Bill Au wrote:

>Optimize is disk I/O bound.  So I am not sure what multiple CPUs will buy you.
>  
>

Now on my system with large indexes... I often have the CPU at 100%...

Kevin

--


Use Rojo (RSS/Atom aggregator)! - visit http://rojo.com.
See irc.freenode.net #rojo if you want to chat.

Rojo is Hiring! - http://www.rojonetworks.com/JobsAtRojo.html

   Kevin A. Burton, Location - San Francisco, CA
      AIM/YIM - sfburtonator,  Web - http://peerfear.org/
GPG fingerprint: 5FB2 F3E2 760E 70A8 6174 D393 E84D 8D04 99F1 4412


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Optimizing indexes with mulitiple processors?

Kevin Burton
In reply to this post by Chris Collins
Chris Collins wrote:

>To follow up.  I was surprised to find that from the experiment of indexing 4k
>documents to local disk (Dell PE with onboard RAID with 256MB cache). I got the
>following data from my profile:
>
>70 % time was spent in inverting the document
>30 % in merge
>
>  
>
Oh.. yeah.. thats indexing.  I'm more interested in merging multiple
indexes...

Kevin

--


Use Rojo (RSS/Atom aggregator)! - visit http://rojo.com.
See irc.freenode.net #rojo if you want to chat.

Rojo is Hiring! - http://www.rojonetworks.com/JobsAtRojo.html

   Kevin A. Burton, Location - San Francisco, CA
      AIM/YIM - sfburtonator,  Web - http://peerfear.org/
GPG fingerprint: 5FB2 F3E2 760E 70A8 6174 D393 E84D 8D04 99F1 4412


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Optimizing indexes with mulitiple processors?

Chris Collins
In reply to this post by Kevin Burton
Yes, that would line up with being pretty much cpu bound.  So if you were to
have 2 xeon's with HT then you kinda have almost 4 resources (threads) of
execution you could take advantage of.

So from my current tests where I have a multiple threads producing work for an
index and one index writer (one thread doing addDocument), I am seeing that I
am cpu bound on the indexer. Since I am on a dual xeon with HT, I could if I
was using 4 indices improve my throughput by > 1x but < 4x.

C

--- Kevin Burton <[hidden email]> wrote:

> Bill Au wrote:
>
> >Optimize is disk I/O bound.  So I am not sure what multiple CPUs will buy
> you.
> >  
> >
>
> Now on my system with large indexes... I often have the CPU at 100%...
>
> Kevin
>
> --
>
>
> Use Rojo (RSS/Atom aggregator)! - visit http://rojo.com.
> See irc.freenode.net #rojo if you want to chat.
>
> Rojo is Hiring! - http://www.rojonetworks.com/JobsAtRojo.html
>
>    Kevin A. Burton, Location - San Francisco, CA
>       AIM/YIM - sfburtonator,  Web - http://peerfear.org/
> GPG fingerprint: 5FB2 F3E2 760E 70A8 6174 D393 E84D 8D04 99F1 4412
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>




---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Optimizing indexes with mulitiple processors?

Chris Collins
In reply to this post by Kevin Burton
Well I am currently looking at merging too.  In my application merging will
occur against a filer (read as higher latency device).  I am currently working
on how to stage indices on local disk before moving to a filer.  Assume I must
move to a filer eventually for whatever crazzy reason I need to....dont ask it
aint funny :-}

In that case I have a different performance issue, that is that FSInputStream
and FSOutputStream inherit the buffer size of 1k from OS and IS  This would be
useful to increase to reduce the amount of RPC's to the filer when doing merges
..... assuming that reads and writes are sequential (CIFS supports a 64k block
and NFS supports upto I think 32k).  I haven't spent much time on this so far
so its not like I know its hard todo.  From preliminary experiments its obvious
that changing the OS buffersize is not the thing todo.

If anyone has successfully increased the FSOutputStream and FSInputStream
buffers and got it not to blow up on array copies I would love to know the
short cut.

Chris

--- Kevin Burton <[hidden email]> wrote:

> Chris Collins wrote:
>
> >To follow up.  I was surprised to find that from the experiment of indexing
> 4k
> >documents to local disk (Dell PE with onboard RAID with 256MB cache). I got
> the
> >following data from my profile:
> >
> >70 % time was spent in inverting the document
> >30 % in merge
> >
> >  
> >
> Oh.. yeah.. thats indexing.  I'm more interested in merging multiple
> indexes...
>
> Kevin
>
> --
>
>
> Use Rojo (RSS/Atom aggregator)! - visit http://rojo.com.
> See irc.freenode.net #rojo if you want to chat.
>
> Rojo is Hiring! - http://www.rojonetworks.com/JobsAtRojo.html
>
>    Kevin A. Burton, Location - San Francisco, CA
>       AIM/YIM - sfburtonator,  Web - http://peerfear.org/
> GPG fingerprint: 5FB2 F3E2 760E 70A8 6174 D393 E84D 8D04 99F1 4412
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>



---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Optimizing indexes with mulitiple processors?

Chris Collins
In reply to this post by Kevin Burton
Kevin I would be curious to know more about your merging issues.  As I
mentioned I am concerned about merge time and in my case its against a filer
that of course have high latency.  The other issue is that I effectively index
things with a primary key.  I need to ensure an efficient way of preventing old
records from trampling on new records , this occurs due to potential out of
order set of writes to the index from multiple nodes in a processing farm.

C

--- Kevin Burton <[hidden email]> wrote:

> Bill Au wrote:
>
> >Optimize is disk I/O bound.  So I am not sure what multiple CPUs will buy
> you.
> >  
> >
>
> Now on my system with large indexes... I often have the CPU at 100%...
>
> Kevin
>
> --
>
>
> Use Rojo (RSS/Atom aggregator)! - visit http://rojo.com.
> See irc.freenode.net #rojo if you want to chat.
>
> Rojo is Hiring! - http://www.rojonetworks.com/JobsAtRojo.html
>
>    Kevin A. Burton, Location - San Francisco, CA
>       AIM/YIM - sfburtonator,  Web - http://peerfear.org/
> GPG fingerprint: 5FB2 F3E2 760E 70A8 6174 D393 E84D 8D04 99F1 4412
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>




---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Optimizing indexes with mulitiple processors?

John Haxby-2
In reply to this post by Chris Collins
Chris Collins wrote:

>Ok that part isnt surprising.  However only about 1% of 30% of the merge was
>spent in the OS.flush call (not very IO bound at all with this controller).
>  
>
On Linux, at least, measuring the time taken in OS.flush is not a good
way to determine if you're I/O bound -- all that does is transfer the
data to the kernel.   Later, possibly much later, the kernel will
actually write the data to the disk.

The upshot of this is that if the size of the index is around the size
of physical memory in the system, optimizing will appear CPU bound.  
Once the index exceeds the size of physical memory, you'll see the
effects of I/O.   OS.flush will still probably be ver quick, but you'll
see a lot of I/O wait if you run, say, top.

jch

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Optimizing indexes with mulitiple processors?

Bill Au
In reply to this post by Kevin Burton
That's not true in my case.  The CPU never went over 50%.  I/O wait is
often greater
the CPU and can be as high as 90%.

Bill

On 6/10/05, Kevin Burton <[hidden email]> wrote:

> Bill Au wrote:
>
> >Optimize is disk I/O bound.  So I am not sure what multiple CPUs will buy you.
> >
> >
>
> Now on my system with large indexes... I often have the CPU at 100%...
>
> Kevin
>
> --
>
>
> Use Rojo (RSS/Atom aggregator)! - visit http://rojo.com.
> See irc.freenode.net #rojo if you want to chat.
>
> Rojo is Hiring! - http://www.rojonetworks.com/JobsAtRojo.html
>
>    Kevin A. Burton, Location - San Francisco, CA
>       AIM/YIM - sfburtonator,  Web - http://peerfear.org/
> GPG fingerprint: 5FB2 F3E2 760E 70A8 6174 D393 E84D 8D04 99F1 4412
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Optimizing indexes with mulitiple processors?

Chris Collins
In reply to this post by John Haxby-2
Hi John, your comments are correct.  But based on the fact we know on our box
we have almost 80MB sustainable bandwidth and very low latency to disk per
second, and observing that the io we are doing in lucene is small in comparison
a 1 second I am reasonably confident that this time spent is not far out (for
this run).  

As I may have mentioned before, in my case we have reasonably fast hardware
raid.  At that point the bottlenecks of course change.  We also have the case
where we write to a filer which has good bandwidth but high latency.  Here we
see that merge is io bound as you would expect.  That?s why I assume changing
the buffer sizes of the FS streams could help assuming the merge operations
read and write to the segments in a linear fashion?in this case, the latency is
not really a function of the disks, but a function of the latency in the rpc
between the client (indexer) and the filer.  By increasing the buffer sizes we
would reduce the amount of RPC?s.

From an IO bound point of view one needs to consider if you have saturated the
device or you are just stuck waiting for the disk to rotate around.  Long gone
is escalator algorithm as the preferred disk optimization of seagate :-}, disk
can take many instructions and re-order them to minimize latency issues. If its
a latency issue and not necessarily bandwidth then using overlapping io can
improve throughput (splitting the index and having multiple writer threads
would give you that). In fact in my silly filer example having multiple writers
does show good effect.  Of course this depends on if you can finagle your
application to allow you to split the indices.

Further I have done longer runs to plot throughput over time (16M doc crawls).
I only profiled 4k docs since I didn?t want to wait forever with JProbe.  Not
sure what the correct jargon is here so excuse my description.  The in memory
objects were merged out to disk but we didnt get the second order effect of the
maybeMerge function finding enough segments on that level to trigger the
merging of multiple segments for the next tier (segments * mergefactor).
Indexer throughput is not of course constant, over time the time to index one
document does increase when you take into account the cost of the merges.  But
due to the pyramid effect of how the merger works, the larger order merges of
course happen less and less.  

Back to my observations.  From the CPU part of indexing, the inversion aspect
is dwarfed by the standard tokenizer.  My hat off to Doug (what is hogging the
cpu is auto generated code :-} )Given multiple cores / ht/ smp your certainly
can capitalize on them if you so wish to write the code.  Not all IO bound
problems are created equal, if it is merely latency then you still have room to
improve throughput if you massage your indexing approach.  Using a single
indexing thread and seeing your io bound should not be a reason to give up :-}

As you can tell I have two indexing worlds, one where my disk is fast (CPU
Bound) and one where it is slow (IO Bound).  I have to capitalize on the
effects of both to get my job done and each of them have distinctive
challenges.

Regards

Chris
--- John Haxby <[hidden email]> wrote:

> Chris Collins wrote:
>
> >Ok that part isnt surprising.  However only about 1% of 30% of the merge was
> >spent in the OS.flush call (not very IO bound at all with this controller).
> >  
> >
> On Linux, at least, measuring the time taken in OS.flush is not a good
> way to determine if you're I/O bound -- all that does is transfer the
> data to the kernel.   Later, possibly much later, the kernel will
> actually write the data to the disk.
>
> The upshot of this is that if the size of the index is around the size
> of physical memory in the system, optimizing will appear CPU bound.  
> Once the index exceeds the size of physical memory, you'll see the
> effects of I/O.   OS.flush will still probably be ver quick, but you'll
> see a lot of I/O wait if you run, say, top.
>
> jch
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>




---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Optimizing indexes with mulitiple processors?

Peter A. Friend
In reply to this post by Chris Collins

On Jun 9, 2005, at 11:52 PM, Chris Collins wrote:

> In that case I have a different performance issue, that is that  
> FSInputStream
> and FSOutputStream inherit the buffer size of 1k from OS and IS  
> This would be
> useful to increase to reduce the amount of RPC's to the filer when  
> doing merges
> ..... assuming that reads and writes are sequential (CIFS supports  
> a 64k block
> and NFS supports upto I think 32k).  I haven't spent much time on  
> this so far
> so its not like I know its hard todo.  From preliminary experiments  
> its obvious
> that changing the OS buffersize is not the thing todo.
>
> If anyone has successfully increased the FSOutputStream and  
> FSInputStream
> buffers and got it not to blow up on array copies I would love to  
> know the
> short cut.

I just started up with Lucene, and I have been looking at the NFS  
issues. Since the OS doesn't report the block size in use by the  
Netapp, EMC, whatever, you need to tweak it manually. I found this in  
src/java/org/apache/lucene/store/OutputStream.java:

/** Abstract class for output to a file in a Directory.  A random-
access output
* stream.  Used for all Lucene index output operations.
* @see Directory
* @see InputStream
*/
public abstract class OutputStream {
   static final int BUFFER_SIZE = 1024;

I changed that value to 8k, and based on the truss output from an  
index run, it is working. Haven't gotten much beyond that to see if  
it causes problems elsewhere. The value also needs to be altered on  
the read end of things. Ideally, this will be made settable via a  
system property.

Peter


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Optimizing indexes with mulitiple processors?

Chris Collins
How many documents did you try to index?  I am using a relatively large
minMergeDoc that causes me to run out of memory when I make such a change. (I
am using 1/2 gb of heap btw).  I believe changing it in the outputstream object
means that a lot of in memory only objects use that size too...I assume that
the real bang for the buck is for the FSOutputStream and FSInputStream.
Unravelling that case drops me into array copy issues that I have to debug.

I dont know I would of used truss in this regard, this only points out what
size hit the kernel not what went over the wire.  I would suggest using
ethereal to ensure thats how its ending up on the wire.  As for what goes over
the wire, thats something the cifs / nfs client negotiates with the server.  I
believe NetApp for instance supports upto 32k on nfs and almost 64k with cifs.


Regards

<<ChRiS>>

--- "Peter A. Friend" <[hidden email]> wrote:

>
> On Jun 9, 2005, at 11:52 PM, Chris Collins wrote:
>
> > In that case I have a different performance issue, that is that  
> > FSInputStream
> > and FSOutputStream inherit the buffer size of 1k from OS and IS  
> > This would be
> > useful to increase to reduce the amount of RPC's to the filer when  
> > doing merges
> > ..... assuming that reads and writes are sequential (CIFS supports  
> > a 64k block
> > and NFS supports upto I think 32k).  I haven't spent much time on  
> > this so far
> > so its not like I know its hard todo.  From preliminary experiments  
> > its obvious
> > that changing the OS buffersize is not the thing todo.
> >
> > If anyone has successfully increased the FSOutputStream and  
> > FSInputStream
> > buffers and got it not to blow up on array copies I would love to  
> > know the
> > short cut.
>
> I just started up with Lucene, and I have been looking at the NFS  
> issues. Since the OS doesn't report the block size in use by the  
> Netapp, EMC, whatever, you need to tweak it manually. I found this in  
> src/java/org/apache/lucene/store/OutputStream.java:
>
> /** Abstract class for output to a file in a Directory.  A random-
> access output
> * stream.  Used for all Lucene index output operations.
> * @see Directory
> * @see InputStream
> */
> public abstract class OutputStream {
>    static final int BUFFER_SIZE = 1024;
>
> I changed that value to 8k, and based on the truss output from an  
> index run, it is working. Haven't gotten much beyond that to see if  
> it causes problems elsewhere. The value also needs to be altered on  
> the read end of things. Ideally, this will be made settable via a  
> system property.
>
> Peter
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>




---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Optimizing indexes with mulitiple processors?

Peter A. Friend

On Jun 10, 2005, at 9:33 AM, Chris Collins wrote:

> How many documents did you try to index?

Only about 4000 at the moment.

>   I am using a relatively large
> minMergeDoc that causes me to run out of memory when I make such a  
> change. (I
> am using 1/2 gb of heap btw).

I was running out of memory as well until I gave Java a larger heap  
to work with. I am assuming that a dedicated indexing machine (as  
well as search) is going to need a mountain of memory. I figure I  
will be giving Java gigs to play with.

> I believe changing it in the outputstream object
> means that a lot of in memory only objects use that size too.

This I need to look into. At a guess, I would think that there would  
be an OutputStream object for each open segment, and each file in  
that segment. A consolidated index *might* use less but of course we  
are trying to improve performance here, and the consolidated index  
does incur a cost. Assuming 10 segments and 10 files within each  
segment, that's 100 OutputStream objects or 809,600 bytes. That'll  
grow quickly with merge tweaks. Those larger writes do save a bunch  
of system calls and make (maybe) better use of your filers block  
size. This grows quickly with maxMerge tweaks. Of course this could  
be utterly incorrect, I need to look into this a bit more carefully.

> I dont know I would of used truss in this regard, this only points  
> out what
> size hit the kernel not what went over the wire.  I would suggest  
> using
> ethereal to ensure thats how its ending up on the wire.

True, hadn't gotten that far yet. :-)

Peter



---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Optimizing indexes with mulitiple processors?

Kevin Burton
In reply to this post by Chris Collins
Chris Collins wrote:

>Well I am currently looking at merging too.  In my application merging will
>occur against a filer (read as higher latency device).  I am currently working
>on how to stage indices on local disk before moving to a filer.  Assume I must
>move to a filer eventually for whatever crazzy reason I need to....dont ask it
>aint funny :-}
>
>In that case I have a different performance issue, that is that FSInputStream
>and FSOutputStream inherit the buffer size of 1k from OS and IS  This would be
>useful to increase to reduce the amount of RPC's to the filer when doing merges
>..... assuming that reads and writes are sequential (CIFS supports a 64k block
>and NFS supports upto I think 32k).
>
Yeah.. I already did this actually ... on local disks the performance
benefit wasn't noticable.  The variables are  private/final ... I made
them public and non-final and it worked.

Note that OutputStream has a bug when I set it higher... I don't have
the trace I'm afraid...

> I haven't spent much time on this so far
>so its not like I know its hard todo.  From preliminary experiments its obvious
>that changing the OS buffersize is not the thing todo.
>
>If anyone has successfully increased the FSOutputStream and FSInputStream
>buffers and got it not to blow up on array copies I would love to know the
>short cut
>
Maybe that was my problem...

Kevin

--


Use Rojo (RSS/Atom aggregator)! - visit http://rojo.com.
See irc.freenode.net #rojo if you want to chat.

Rojo is Hiring! - http://www.rojonetworks.com/JobsAtRojo.html

   Kevin A. Burton, Location - San Francisco, CA
      AIM/YIM - sfburtonator,  Web - http://peerfear.org/
GPG fingerprint: 5FB2 F3E2 760E 70A8 6174 D393 E84D 8D04 99F1 4412


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Optimizing indexes with mulitiple processors?

Kevin Burton
In reply to this post by Peter A. Friend
Peter A. Friend wrote:

>
> I changed that value to 8k, and based on the truss output from an  
> index run, it is working. Haven't gotten much beyond that to see if  
> it causes problems elsewhere. The value also needs to be altered on  
> the read end of things. Ideally, this will be made settable via a  
> system property.

Has anyone tried to tweak this on a RAID array on XFS?  Its confusing to figure out the ideal read size.

My performance benchmarks didn't show any benefit to setting this variable higher but I'm worried this is due to caching.

I tried to flush the caches by creating a 5G file and then cating that to /dev/null but I have no way to verify that this actually works.

I just made the BUFFER_SIZE veriables non-final so that I can set them at any time.

Kevin

--


Use Rojo (RSS/Atom aggregator)! - visit http://rojo.com.
See irc.freenode.net #rojo if you want to chat.

Rojo is Hiring! - http://www.rojonetworks.com/JobsAtRojo.html

   Kevin A. Burton, Location - San Francisco, CA
      AIM/YIM - sfburtonator,  Web - http://peerfear.org/
GPG fingerprint: 5FB2 F3E2 760E 70A8 6174 D393 E84D 8D04 99F1 4412


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Optimizing indexes with mulitiple processors?

Chris Collins
In reply to this post by Peter A. Friend
Dont forget that when a document is indexed it starts life in its own segment.
If you have min merge of 4k you could have an awefull lot of 1 doc segments on
the segment stack.....thats why I run out of memory.  If that is the case that
each of these at some point has a buffer of 8k or say 64k you blow up pretty
quickly.

regards

C

--- "Peter A. Friend" <[hidden email]> wrote:

>
> On Jun 10, 2005, at 9:33 AM, Chris Collins wrote:
>
> > How many documents did you try to index?
>
> Only about 4000 at the moment.
>
> >   I am using a relatively large
> > minMergeDoc that causes me to run out of memory when I make such a  
> > change. (I
> > am using 1/2 gb of heap btw).
>
> I was running out of memory as well until I gave Java a larger heap  
> to work with. I am assuming that a dedicated indexing machine (as  
> well as search) is going to need a mountain of memory. I figure I  
> will be giving Java gigs to play with.
>
> > I believe changing it in the outputstream object
> > means that a lot of in memory only objects use that size too.
>
> This I need to look into. At a guess, I would think that there would  
> be an OutputStream object for each open segment, and each file in  
> that segment. A consolidated index *might* use less but of course we  
> are trying to improve performance here, and the consolidated index  
> does incur a cost. Assuming 10 segments and 10 files within each  
> segment, that's 100 OutputStream objects or 809,600 bytes. That'll  
> grow quickly with merge tweaks. Those larger writes do save a bunch  
> of system calls and make (maybe) better use of your filers block  
> size. This grows quickly with maxMerge tweaks. Of course this could  
> be utterly incorrect, I need to look into this a bit more carefully.
>
> > I dont know I would of used truss in this regard, this only points  
> > out what
> > size hit the kernel not what went over the wire.  I would suggest  
> > using
> > ethereal to ensure thats how its ending up on the wire.
>
> True, hadn't gotten that far yet. :-)
>
> Peter
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>




---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Optimizing indexes with mulitiple processors?

Chris Collins
In reply to this post by Kevin Burton
Yeh I think the bug is related to an array copy that expects 1k blocks (if I
recall it was RAMDirectory or something like that).  

C

--- Kevin Burton <[hidden email]> wrote:

> Chris Collins wrote:
>
> >Well I am currently looking at merging too.  In my application merging will
> >occur against a filer (read as higher latency device).  I am currently
> working
> >on how to stage indices on local disk before moving to a filer.  Assume I
> must
> >move to a filer eventually for whatever crazzy reason I need to....dont ask
> it
> >aint funny :-}
> >
> >In that case I have a different performance issue, that is that
> FSInputStream
> >and FSOutputStream inherit the buffer size of 1k from OS and IS  This would
> be
> >useful to increase to reduce the amount of RPC's to the filer when doing
> merges
> >..... assuming that reads and writes are sequential (CIFS supports a 64k
> block
> >and NFS supports upto I think 32k).
> >
> Yeah.. I already did this actually ... on local disks the performance
> benefit wasn't noticable.  The variables are  private/final ... I made
> them public and non-final and it worked.
>
> Note that OutputStream has a bug when I set it higher... I don't have
> the trace I'm afraid...
>
> > I haven't spent much time on this so far
> >so its not like I know its hard todo.  From preliminary experiments its
> obvious
> >that changing the OS buffersize is not the thing todo.
> >
> >If anyone has successfully increased the FSOutputStream and FSInputStream
> >buffers and got it not to blow up on array copies I would love to know the
> >short cut
> >
> Maybe that was my problem...
>
> Kevin
>
> --
>
>
> Use Rojo (RSS/Atom aggregator)! - visit http://rojo.com.
> See irc.freenode.net #rojo if you want to chat.
>
> Rojo is Hiring! - http://www.rojonetworks.com/JobsAtRojo.html
>
>    Kevin A. Burton, Location - San Francisco, CA
>       AIM/YIM - sfburtonator,  Web - http://peerfear.org/
> GPG fingerprint: 5FB2 F3E2 760E 70A8 6174 D393 E84D 8D04 99F1 4412
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>




---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]