Binding lucene instance/threads to a particular processor(or core)

classic Classic list List threaded Threaded
18 messages Options
Reply | Threaded
Open this post in threaded view
|

Binding lucene instance/threads to a particular processor(or core)

Anshum-2
Hi,

I have been trying to bind my lucene instance (JVM - Sun Hotspot*) to a
particular core so as to improve the performance. Is there a way to do so or
is there support in lucene to explicitly control the thread - processor
linkup?

--
--
The facts expressed here belong to everybody, the opinions to me.
The distinction is yours to draw............
Reply | Threaded
Open this post in threaded view
|

Re: Binding lucene instance/threads to a particular processor(or core)

Ulf Dittmer-2
This sounds odd. Why would restricting it to a single
core improve performance? The point of using multiple
cores (and multiple threads) is to improve performance
isn't it? I'd leave thread scheduling decisions to the
JVM. Plus, I don't think there is anything in Java to
facilitate this (short of using JNI).

Are you talking about indexing or searching? You may
be able to use multiple parallel threads to improve
indexing performance. I don't think Lucene uses
multi-threading for searching; not unless you have
multiple indices, anyway.

Ulf


--- Anshum <[hidden email]> wrote:

> Hi,
>
> I have been trying to bind my lucene instance (JVM -
> Sun Hotspot*) to a
> particular core so as to improve the performance. Is
> there a way to do so or
> is there support in lucene to explicitly control the
> thread - processor
> linkup?
>
> --
> --
> The facts expressed here belong to everybody, the
> opinions to me.
> The distinction is yours to draw............
>



      ____________________________________________________________________________________
Be a better friend, newshound, and
know-it-all with Yahoo! Mobile.  Try it now.  http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Binding lucene instance/threads to a particular processor(or core)

Glen Newton
BInding threads to processors - in many situations - improves
throughput by reducing memory overhead. When a thread is running on a
core, its state is local; if it is timeshared-out and either 1)
swapped back in on the same core, it is likely that there will be  the
core's L1 cache; or 2) onto another core, there will not be a cache
hit and the state will have to be fetched from L2 or main memory,
incurring a performance hit, esp in the latter. See Lundberg, L. 1997.
Evaluating the Performance Implications of Binding Threads to
Processors. 393.http://ieeexplore.ieee.org/iel3/5020/13768/00634520.pdf
for more info.

If you are using JVM on Solaris on SPARC, you should take a look at
the following links for tuning (the Sun JVM on Solaris SPARC has many
more performance tuning parameters available), including threading:
- http://java.sun.com/docs/hotspot/threads/threads.html
- http://java.sun.com/j2se/1.5.0/docs/guide/vm/thread-priorities.html
- http://www-1.ibm.com/support/docview.wss?rs=180&context=SSEQTP&uid=swg21107291
- http://java.sun.com/javase/technologies/performance.jsp

-Glen






On 21/04/2008, Ulf Dittmer <[hidden email]> wrote:

> This sounds odd. Why would restricting it to a single
>  core improve performance? The point of using multiple
>  cores (and multiple threads) is to improve performance
>  isn't it? I'd leave thread scheduling decisions to the
>  JVM. Plus, I don't think there is anything in Java to
>  facilitate this (short of using JNI).
>
>  Are you talking about indexing or searching? You may
>  be able to use multiple parallel threads to improve
>  indexing performance. I don't think Lucene uses
>  multi-threading for searching; not unless you have
>  multiple indices, anyway.
>
>  Ulf
>
>
>
>  --- Anshum <[hidden email]> wrote:
>
>  > Hi,
>  >
>  > I have been trying to bind my lucene instance (JVM -
>  > Sun Hotspot*) to a
>  > particular core so as to improve the performance. Is
>  > there a way to do so or
>  > is there support in lucene to explicitly control the
>  > thread - processor
>  > linkup?
>  >
>  > --
>  > --
>  > The facts expressed here belong to everybody, the
>  > opinions to me.
>  > The distinction is yours to draw............
>  >
>
>
>
>
>       ____________________________________________________________________________________
>  Be a better friend, newshound, and
>  know-it-all with Yahoo! Mobile.  Try it now.  http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ
>
>  ---------------------------------------------------------------------
>  To unsubscribe, e-mail: [hidden email]
>  For additional commands, e-mail: [hidden email]
>
>


--

-

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Binding lucene instance/threads to a particular processor(or core)

Glen Newton
And this discussion on bound threads may also shed light on things:
http://coding.derkeiler.com/Archive/Java/comp.lang.java.programmer/2007-11/msg02801.html

-Glen

On 21/04/2008, Glen Newton <[hidden email]> wrote:

> BInding threads to processors - in many situations - improves
>  throughput by reducing memory overhead. When a thread is running on a
>  core, its state is local; if it is timeshared-out and either 1)
>  swapped back in on the same core, it is likely that there will be  the
>  core's L1 cache; or 2) onto another core, there will not be a cache
>  hit and the state will have to be fetched from L2 or main memory,
>  incurring a performance hit, esp in the latter. See Lundberg, L. 1997.
>  Evaluating the Performance Implications of Binding Threads to
>  Processors. 393.http://ieeexplore.ieee.org/iel3/5020/13768/00634520.pdf
>  for more info.
>
>  If you are using JVM on Solaris on SPARC, you should take a look at
>  the following links for tuning (the Sun JVM on Solaris SPARC has many
>  more performance tuning parameters available), including threading:
>  - http://java.sun.com/docs/hotspot/threads/threads.html
>  - http://java.sun.com/j2se/1.5.0/docs/guide/vm/thread-priorities.html
>  - http://www-1.ibm.com/support/docview.wss?rs=180&context=SSEQTP&uid=swg21107291
>  - http://java.sun.com/javase/technologies/performance.jsp
>
>
>  -Glen
>
>
>
>
>
>
>
>  On 21/04/2008, Ulf Dittmer <[hidden email]> wrote:
>  > This sounds odd. Why would restricting it to a single
>  >  core improve performance? The point of using multiple
>  >  cores (and multiple threads) is to improve performance
>  >  isn't it? I'd leave thread scheduling decisions to the
>  >  JVM. Plus, I don't think there is anything in Java to
>  >  facilitate this (short of using JNI).
>  >
>  >  Are you talking about indexing or searching? You may
>  >  be able to use multiple parallel threads to improve
>  >  indexing performance. I don't think Lucene uses
>  >  multi-threading for searching; not unless you have
>  >  multiple indices, anyway.
>  >
>  >  Ulf
>  >
>  >
>  >
>  >  --- Anshum <[hidden email]> wrote:
>  >
>  >  > Hi,
>  >  >
>  >  > I have been trying to bind my lucene instance (JVM -
>  >  > Sun Hotspot*) to a
>  >  > particular core so as to improve the performance. Is
>  >  > there a way to do so or
>  >  > is there support in lucene to explicitly control the
>  >  > thread - processor
>  >  > linkup?
>  >  >
>  >  > --
>  >  > --
>  >  > The facts expressed here belong to everybody, the
>  >  > opinions to me.
>  >  > The distinction is yours to draw............
>  >  >
>  >
>  >
>  >
>  >
>  >       ____________________________________________________________________________________
>  >  Be a better friend, newshound, and
>  >  know-it-all with Yahoo! Mobile.  Try it now.  http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ
>  >
>  >  ---------------------------------------------------------------------
>  >  To unsubscribe, e-mail: [hidden email]
>  >  For additional commands, e-mail: [hidden email]
>  >
>  >
>
>
>
> --
>
>  -
>


--

-

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Binding lucene instance/threads to a particular processor(or core)

Glen Newton
I realised that not everyone on this list might be able to access the
IEEE paper I pointed-out, so I will include the abstract and some
paragraphs from the paper which I have included below.

Also of interest (and should be available to all): Fedorova et al.
2005. Performance of Multithreaded Chip Multiprocessors And
Implications For Operating System Design. Usenix 2005.
http://www.eecs.harvard.edu/margo/papers/usenix05/paper.pdf
"Abstract: We investigated how operating system design should be
adapted for multithreaded chip multiprocessors (CMT) – a new
generation of processors that exploit thread-level parallelism to mask
the memory latency in modern workloads. We
determined that the L2 cache is a critical shared resource on CMT and
that an insufficient amount of L2 cache can undermine the ability to
hide memory latency on these processors. To use the L2 cache as
efficiently as possible, we propose an L2-conscious scheduling
algorithm and quantify its performance potential. Using this algorithm
it is possible to reduce miss ratios in the L2 cache by 25-37% and
improve processor throughput by 27-45%."


From Lundberg, L. 1997:
Abstract: "The default scheduling algorithm in Solaris and other
operating systems may result in frequent relocation of threads at
run-time. Excessive thread relocation cause
poor memory performance. This can be avoided by binding threads to
processors. However, binding threads to processors may result in an
unbalanced load. By considering a previously obtained theoretical
result and by evaluating a set of multithreaded Solaris
programs using a multiprocessor with 8 processors, we are able to
bound the maximum performance loss due to binding threads, The
theoretical result is also recapitulated. By evaluating another set of
multithreaded programs, we show that the gain of binding threads to
processors may be substantial, particularly for programs with fine
grained parallelism."

First paragraph: "The thread concept in Solaris [3][5] and other
operating systems makes it possible to write multithreaded programs,
which can be executed in parallel on a multiprocessor. Previous
experience from real world programs [4] show that, using the default
scheduling algorithm in Solaris, threads are frequently relocated from
one processor
to another at run-time. After each such relocation, the code and data
associated with the relocated thread is moved from the cache memory of
the 0113 processor to the cache of the new processor. This reduces the
performance. Run-time relocation of threads to processors can also
result in unpredictable response times. This is a problem in systems
which operate in a real-time environment. In order to avoid poor
memory performance and unpredictable real-time behaviour due to
frequent thread relocation, threads can be bound to processors using
the processor-bind directive [3] [5]. The major problem with binding
threads is that one can end up with an unbalanced load, i.e. some
processors may be extremely busy during some time periods while other
processors are idle."

-Glen

On 21/04/2008, Glen Newton <[hidden email]> wrote:

> And this discussion on bound threads may also shed light on things:
>  http://coding.derkeiler.com/Archive/Java/comp.lang.java.programmer/2007-11/msg02801.html
>
>
>  -Glen
>
>
>  On 21/04/2008, Glen Newton <[hidden email]> wrote:
>  > BInding threads to processors - in many situations - improves
>  >  throughput by reducing memory overhead. When a thread is running on a
>  >  core, its state is local; if it is timeshared-out and either 1)
>  >  swapped back in on the same core, it is likely that there will be  the
>  >  core's L1 cache; or 2) onto another core, there will not be a cache
>  >  hit and the state will have to be fetched from L2 or main memory,
>  >  incurring a performance hit, esp in the latter. See Lundberg, L. 1997.
>  >  Evaluating the Performance Implications of Binding Threads to
>  >  Processors. 393.http://ieeexplore.ieee.org/iel3/5020/13768/00634520.pdf
>  >  for more info.
>  >
>  >  If you are using JVM on Solaris on SPARC, you should take a look at
>  >  the following links for tuning (the Sun JVM on Solaris SPARC has many
>  >  more performance tuning parameters available), including threading:
>  >  - http://java.sun.com/docs/hotspot/threads/threads.html
>  >  - http://java.sun.com/j2se/1.5.0/docs/guide/vm/thread-priorities.html
>  >  - http://www-1.ibm.com/support/docview.wss?rs=180&context=SSEQTP&uid=swg21107291
>  >  - http://java.sun.com/javase/technologies/performance.jsp
>  >
>  >
>  >  -Glen
>  >
>  >
>  >
>  >
>  >
>  >
>  >
>  >  On 21/04/2008, Ulf Dittmer <[hidden email]> wrote:
>  >  > This sounds odd. Why would restricting it to a single
>  >  >  core improve performance? The point of using multiple
>  >  >  cores (and multiple threads) is to improve performance
>  >  >  isn't it? I'd leave thread scheduling decisions to the
>  >  >  JVM. Plus, I don't think there is anything in Java to
>  >  >  facilitate this (short of using JNI).
>  >  >
>  >  >  Are you talking about indexing or searching? You may
>  >  >  be able to use multiple parallel threads to improve
>  >  >  indexing performance. I don't think Lucene uses
>  >  >  multi-threading for searching; not unless you have
>  >  >  multiple indices, anyway.
>  >  >
>  >  >  Ulf
>  >  >
>  >  >
>  >  >
>  >  >  --- Anshum <[hidden email]> wrote:
>  >  >
>  >  >  > Hi,
>  >  >  >
>  >  >  > I have been trying to bind my lucene instance (JVM -
>  >  >  > Sun Hotspot*) to a
>  >  >  > particular core so as to improve the performance. Is
>  >  >  > there a way to do so or
>  >  >  > is there support in lucene to explicitly control the
>  >  >  > thread - processor
>  >  >  > linkup?
>  >  >  >
>  >  >  > --
>  >  >  > --
>  >  >  > The facts expressed here belong to everybody, the
>  >  >  > opinions to me.
>  >  >  > The distinction is yours to draw............
>  >  >  >
>  >  >
>  >  >
>  >  >
>  >  >
>  >  >       ____________________________________________________________________________________
>  >  >  Be a better friend, newshound, and
>  >  >  know-it-all with Yahoo! Mobile.  Try it now.  http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ
>  >  >
>  >  >  ---------------------------------------------------------------------
>  >  >  To unsubscribe, e-mail: [hidden email]
>  >  >  For additional commands, e-mail: [hidden email]
>  >  >
>  >  >
>  >
>  >
>  >
>  > --
>  >
>  >  -
>  >
>
>
>
> --
>
>  -
>


--

-
adb
Reply | Threaded
Open this post in threaded view
|

Re: Binding lucene instance/threads to a particular processor(or core)

adb
That paper from 1997 is pretty old, but mirrors our experiences in those days.
Then, we used Solaris processor sets to really improve performance by binding
one of our processes to a particular CPU while leaving the other CPUs to manage
the thread intensive work.

You can bind processes/LWPs to a CPU on Solaris with psrset.

The Solaris thread model in the late '90s was also a significant factor in
performance of multi-threaded programs.  The default thread library in Solaris 8
implemented a MxN unbound thread model (threads/LWPS).  In those days we found
that it did not perform well, so used the bound thread model (i.e. 1:1) where a
Solaris thread was bound permanently to an LWP.  That improved performance a
lot.  In Solaris 8, Sun had what they called the 'alternate' thread library (T2)
around 2000, which became the default library in Solaris 9, and implemented a
1:1 model of Solaris threads to LWPs.  That new library had dramatic performance
improvements over the old.

Some background info for Java and threading

http://java.sun.com/j2se/1.5.0/docs/guide/vm/thread-priorities.html

Antony


Glen Newton wrote:

> I realised that not everyone on this list might be able to access the
> IEEE paper I pointed-out, so I will include the abstract and some
> paragraphs from the paper which I have included below.
>
> Also of interest (and should be available to all): Fedorova et al.
> 2005. Performance of Multithreaded Chip Multiprocessors And
> Implications For Operating System Design. Usenix 2005.
> http://www.eecs.harvard.edu/margo/papers/usenix05/paper.pdf
> "Abstract: We investigated how operating system design should be
> adapted for multithreaded chip multiprocessors (CMT) – a new
> generation of processors that exploit thread-level parallelism to mask
> the memory latency in modern workloads. We
> determined that the L2 cache is a critical shared resource on CMT and
> that an insufficient amount of L2 cache can undermine the ability to
> hide memory latency on these processors. To use the L2 cache as
> efficiently as possible, we propose an L2-conscious scheduling
> algorithm and quantify its performance potential. Using this algorithm
> it is possible to reduce miss ratios in the L2 cache by 25-37% and
> improve processor throughput by 27-45%."
>
>
> From Lundberg, L. 1997:
> Abstract: "The default scheduling algorithm in Solaris and other
> operating systems may result in frequent relocation of threads at
> run-time. Excessive thread relocation cause
> poor memory performance. This can be avoided by binding threads to
> processors. However, binding threads to processors may result in an
> unbalanced load. By considering a previously obtained theoretical
> result and by evaluating a set of multithreaded Solaris
> programs using a multiprocessor with 8 processors, we are able to
> bound the maximum performance loss due to binding threads, The
> theoretical result is also recapitulated. By evaluating another set of
> multithreaded programs, we show that the gain of binding threads to
> processors may be substantial, particularly for programs with fine
> grained parallelism."
>
> First paragraph: "The thread concept in Solaris [3][5] and other
> operating systems makes it possible to write multithreaded programs,
> which can be executed in parallel on a multiprocessor. Previous
> experience from real world programs [4] show that, using the default
> scheduling algorithm in Solaris, threads are frequently relocated from
> one processor
> to another at run-time. After each such relocation, the code and data
> associated with the relocated thread is moved from the cache memory of
> the 0113 processor to the cache of the new processor. This reduces the
> performance. Run-time relocation of threads to processors can also
> result in unpredictable response times. This is a problem in systems
> which operate in a real-time environment. In order to avoid poor
> memory performance and unpredictable real-time behaviour due to
> frequent thread relocation, threads can be bound to processors using
> the processor-bind directive [3] [5]. The major problem with binding
> threads is that one can end up with an unbalanced load, i.e. some
> processors may be extremely busy during some time periods while other
> processors are idle."
>
> -Glen
>
> On 21/04/2008, Glen Newton <[hidden email]> wrote:
>> And this discussion on bound threads may also shed light on things:
>>  http://coding.derkeiler.com/Archive/Java/comp.lang.java.programmer/2007-11/msg02801.html
>>
>>
>>  -Glen
>>
>>
>>  On 21/04/2008, Glen Newton <[hidden email]> wrote:
>>  > BInding threads to processors - in many situations - improves
>>  >  throughput by reducing memory overhead. When a thread is running on a
>>  >  core, its state is local; if it is timeshared-out and either 1)
>>  >  swapped back in on the same core, it is likely that there will be  the
>>  >  core's L1 cache; or 2) onto another core, there will not be a cache
>>  >  hit and the state will have to be fetched from L2 or main memory,
>>  >  incurring a performance hit, esp in the latter. See Lundberg, L. 1997.
>>  >  Evaluating the Performance Implications of Binding Threads to
>>  >  Processors. 393.http://ieeexplore.ieee.org/iel3/5020/13768/00634520.pdf
>>  >  for more info.
>>  >
>>  >  If you are using JVM on Solaris on SPARC, you should take a look at
>>  >  the following links for tuning (the Sun JVM on Solaris SPARC has many
>>  >  more performance tuning parameters available), including threading:
>>  >  - http://java.sun.com/docs/hotspot/threads/threads.html
>>  >  - http://java.sun.com/j2se/1.5.0/docs/guide/vm/thread-priorities.html
>>  >  - http://www-1.ibm.com/support/docview.wss?rs=180&context=SSEQTP&uid=swg21107291
>>  >  - http://java.sun.com/javase/technologies/performance.jsp
>>  >
>>  >
>>  >  -Glen
>>  >
>>  >
>>  >
>>  >
>>  >
>>  >
>>  >
>>  >  On 21/04/2008, Ulf Dittmer <[hidden email]> wrote:
>>  >  > This sounds odd. Why would restricting it to a single
>>  >  >  core improve performance? The point of using multiple
>>  >  >  cores (and multiple threads) is to improve performance
>>  >  >  isn't it? I'd leave thread scheduling decisions to the
>>  >  >  JVM. Plus, I don't think there is anything in Java to
>>  >  >  facilitate this (short of using JNI).
>>  >  >
>>  >  >  Are you talking about indexing or searching? You may
>>  >  >  be able to use multiple parallel threads to improve
>>  >  >  indexing performance. I don't think Lucene uses
>>  >  >  multi-threading for searching; not unless you have
>>  >  >  multiple indices, anyway.
>>  >  >
>>  >  >  Ulf
>>  >  >
>>  >  >
>>  >  >
>>  >  >  --- Anshum <[hidden email]> wrote:
>>  >  >
>>  >  >  > Hi,
>>  >  >  >
>>  >  >  > I have been trying to bind my lucene instance (JVM -
>>  >  >  > Sun Hotspot*) to a
>>  >  >  > particular core so as to improve the performance. Is
>>  >  >  > there a way to do so or
>>  >  >  > is there support in lucene to explicitly control the
>>  >  >  > thread - processor
>>  >  >  > linkup?
>>  >  >  >
>>  >  >  > --
>>  >  >  > --
>>  >  >  > The facts expressed here belong to everybody, the
>>  >  >  > opinions to me.
>>  >  >  > The distinction is yours to draw............
>>  >  >  >
>>  >  >
>>  >  >
>>  >  >
>>  >  >
>>  >  >       ____________________________________________________________________________________
>>  >  >  Be a better friend, newshound, and
>>  >  >  know-it-all with Yahoo! Mobile.  Try it now.  http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ
>>  >  >
>>  >  >  ---------------------------------------------------------------------
>>  >  >  To unsubscribe, e-mail: [hidden email]
>>  >  >  For additional commands, e-mail: [hidden email]
>>  >  >
>>  >  >
>>  >
>>  >
>>  >
>>  > --
>>  >
>>  >  -
>>  >
>>
>>
>>
>> --
>>
>>  -
>>
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Binding lucene instance/threads to a particular processor(or core)

Anshum-2
The paper seems pretty good but I am still wondering if there was a way to
achieve this through the command line parameters. I'm just trying this to
optimize the code, if this works, would let all know else would keep
everyone informed :)
Any other suggestions for handling a concurrency of over 7 search requests
per second for an index size of over 15Gigs containing over 13 million
records?
Also, could someone help me with obtaining a 'index size' - 'concurrency' -
'processor power' - 'memory' relationship formula (or something similar)?

--
Anshum


On Tue, Apr 22, 2008 at 3:55 AM, Antony Bowesman <[hidden email]> wrote:

> That paper from 1997 is pretty old, but mirrors our experiences in those
> days. Then, we used Solaris processor sets to really improve performance by
> binding one of our processes to a particular CPU while leaving the other
> CPUs to manage the thread intensive work.
>
> You can bind processes/LWPs to a CPU on Solaris with psrset.
>
> The Solaris thread model in the late '90s was also a significant factor in
> performance of multi-threaded programs.  The default thread library in
> Solaris 8 implemented a MxN unbound thread model (threads/LWPS).  In those
> days we found that it did not perform well, so used the bound thread model
> (i.e. 1:1) where a Solaris thread was bound permanently to an LWP.  That
> improved performance a lot.  In Solaris 8, Sun had what they called the
> 'alternate' thread library (T2) around 2000, which became the default
> library in Solaris 9, and implemented a 1:1 model of Solaris threads to
> LWPs.  That new library had dramatic performance improvements over the old.
>
> Some background info for Java and threading
>
> http://java.sun.com/j2se/1.5.0/docs/guide/vm/thread-priorities.html
>
> Antony
>
>
>
> Glen Newton wrote:
>
> > I realised that not everyone on this list might be able to access the
> > IEEE paper I pointed-out, so I will include the abstract and some
> > paragraphs from the paper which I have included below.
> >
> > Also of interest (and should be available to all): Fedorova et al.
> > 2005. Performance of Multithreaded Chip Multiprocessors And
> > Implications For Operating System Design. Usenix 2005.
> > http://www.eecs.harvard.edu/margo/papers/usenix05/paper.pdf
> > "Abstract: We investigated how operating system design should be
> > adapted for multithreaded chip multiprocessors (CMT) – a new
> > generation of processors that exploit thread-level parallelism to mask
> > the memory latency in modern workloads. We
> > determined that the L2 cache is a critical shared resource on CMT and
> > that an insufficient amount of L2 cache can undermine the ability to
> > hide memory latency on these processors. To use the L2 cache as
> > efficiently as possible, we propose an L2-conscious scheduling
> > algorithm and quantify its performance potential. Using this algorithm
> > it is possible to reduce miss ratios in the L2 cache by 25-37% and
> > improve processor throughput by 27-45%."
> >
> >
> > From Lundberg, L. 1997:
> > Abstract: "The default scheduling algorithm in Solaris and other
> > operating systems may result in frequent relocation of threads at
> > run-time. Excessive thread relocation cause
> > poor memory performance. This can be avoided by binding threads to
> > processors. However, binding threads to processors may result in an
> > unbalanced load. By considering a previously obtained theoretical
> > result and by evaluating a set of multithreaded Solaris
> > programs using a multiprocessor with 8 processors, we are able to
> > bound the maximum performance loss due to binding threads, The
> > theoretical result is also recapitulated. By evaluating another set of
> > multithreaded programs, we show that the gain of binding threads to
> > processors may be substantial, particularly for programs with fine
> > grained parallelism."
> >
> > First paragraph: "The thread concept in Solaris [3][5] and other
> > operating systems makes it possible to write multithreaded programs,
> > which can be executed in parallel on a multiprocessor. Previous
> > experience from real world programs [4] show that, using the default
> > scheduling algorithm in Solaris, threads are frequently relocated from
> > one processor
> > to another at run-time. After each such relocation, the code and data
> > associated with the relocated thread is moved from the cache memory of
> > the 0113 processor to the cache of the new processor. This reduces the
> > performance. Run-time relocation of threads to processors can also
> > result in unpredictable response times. This is a problem in systems
> > which operate in a real-time environment. In order to avoid poor
> > memory performance and unpredictable real-time behaviour due to
> > frequent thread relocation, threads can be bound to processors using
> > the processor-bind directive [3] [5]. The major problem with binding
> > threads is that one can end up with an unbalanced load, i.e. some
> > processors may be extremely busy during some time periods while other
> > processors are idle."
> >
> > -Glen
> >
> > On 21/04/2008, Glen Newton <[hidden email]> wrote:
> >
> > > And this discussion on bound threads may also shed light on things:
> > >
> > > http://coding.derkeiler.com/Archive/Java/comp.lang.java.programmer/2007-11/msg02801.html
> > >
> > >
> > >  -Glen
> > >
> > >
> > >  On 21/04/2008, Glen Newton <[hidden email]> wrote:
> > >  > BInding threads to processors - in many situations - improves
> > >  >  throughput by reducing memory overhead. When a thread is running
> > > on a
> > >  >  core, its state is local; if it is timeshared-out and either 1)
> > >  >  swapped back in on the same core, it is likely that there will be
> > >  the
> > >  >  core's L1 cache; or 2) onto another core, there will not be a
> > > cache
> > >  >  hit and the state will have to be fetched from L2 or main memory,
> > >  >  incurring a performance hit, esp in the latter. See Lundberg, L.
> > > 1997.
> > >  >  Evaluating the Performance Implications of Binding Threads to
> > >  >  Processors. 393.
> > > http://ieeexplore.ieee.org/iel3/5020/13768/00634520.pdf
> > >  >  for more info.
> > >  >
> > >  >  If you are using JVM on Solaris on SPARC, you should take a look
> > > at
> > >  >  the following links for tuning (the Sun JVM on Solaris SPARC has
> > > many
> > >  >  more performance tuning parameters available), including
> > > threading:
> > >  >  - http://java.sun.com/docs/hotspot/threads/threads.html
> > >  >  -
> > > http://java.sun.com/j2se/1.5.0/docs/guide/vm/thread-priorities.html
> > >  >  -
> > > http://www-1.ibm.com/support/docview.wss?rs=180&context=SSEQTP&uid=swg21107291
> > >  >  - http://java.sun.com/javase/technologies/performance.jsp
> > >  >
> > >  >
> > >  >  -Glen
> > >  >
> > >  >
> > >  >
> > >  >
> > >  >
> > >  >
> > >  >
> > >  >  On 21/04/2008, Ulf Dittmer <[hidden email]> wrote:
> > >  >  > This sounds odd. Why would restricting it to a single
> > >  >  >  core improve performance? The point of using multiple
> > >  >  >  cores (and multiple threads) is to improve performance
> > >  >  >  isn't it? I'd leave thread scheduling decisions to the
> > >  >  >  JVM. Plus, I don't think there is anything in Java to
> > >  >  >  facilitate this (short of using JNI).
> > >  >  >
> > >  >  >  Are you talking about indexing or searching? You may
> > >  >  >  be able to use multiple parallel threads to improve
> > >  >  >  indexing performance. I don't think Lucene uses
> > >  >  >  multi-threading for searching; not unless you have
> > >  >  >  multiple indices, anyway.
> > >  >  >
> > >  >  >  Ulf
> > >  >  >
> > >  >  >
> > >  >  >
> > >  >  >  --- Anshum <[hidden email]> wrote:
> > >  >  >
> > >  >  >  > Hi,
> > >  >  >  >
> > >  >  >  > I have been trying to bind my lucene instance (JVM -
> > >  >  >  > Sun Hotspot*) to a
> > >  >  >  > particular core so as to improve the performance. Is
> > >  >  >  > there a way to do so or
> > >  >  >  > is there support in lucene to explicitly control the
> > >  >  >  > thread - processor
> > >  >  >  > linkup?
> > >  >  >  >
> > >  >  >  > --
> > >  >  >  > --
> > >  >  >  > The facts expressed here belong to everybody, the
> > >  >  >  > opinions to me.
> > >  >  >  > The distinction is yours to draw............
> > >  >  >  >
> > >  >  >
> > >  >  >
> > >  >  >
> > >  >  >
> > >  >  >
> > > ____________________________________________________________________________________
> > >  >  >  Be a better friend, newshound, and
> > >  >  >  know-it-all with Yahoo! Mobile.  Try it now.
> > > http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ
> > >  >  >
> > >  >  >
> > >  ---------------------------------------------------------------------
> > >  >  >  To unsubscribe, e-mail: [hidden email]
> > >  >  >  For additional commands, e-mail:
> > > [hidden email]
> > >  >  >
> > >  >  >
> > >  >
> > >  >
> > >  >
> > >  > --
> > >  >
> > >  >  -
> > >  >
> > >
> > >
> > >
> > > --
> > >
> > >  -
> > >
> > >
> >
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>


--
--
The facts expressed here belong to everybody, the opinions to me.
The distinction is yours to draw............
Reply | Threaded
Open this post in threaded view
|

Re: Binding lucene instance/threads to a particular processor(or core)

Glen Newton
Anshun,

I think I am dealing with an index of similar scale: 6.4 million
records, 83 GB index (see [1] for more info)

I mistakenly thought from your original posting that you were
interested in binding threads to processors for indexing, but it is
sounding like you want to do this for searching. I am not sure if this
would work well for searching, as there is a great deal more ephemeral
state. But I am not sure.

With respect to handling concurrency for search, could you describe
your environment better?
- Is it an web-base application?
- What sort of problems do you have now?
- What are your java command line parameters (heap, etc.)
- What are the performance expectations, i.e. average search < N1
seconds; median search time < N2 seconds
- Are you storing any fields and if so, do you need to store so many?
- Do you have the choice of serving searches from multiple machines?
i.e. load balance across >1 machine; We've found this to be one of the
best solutions for scaling;
- One thing that I do for both indexing and searching is that the
threads that are doing these tasks I always shift their priority to
MAX, so that they are run in preference to threads doing other things,
like preparing Documents, etc.

If it is a web-type environment, one solution is to set-up a
ThreadPoolExecutor[2] with a fixed number of threads and a limited
queue size (use a bound BlockingQueue[3]) . You would have to
experiment with the numbers to get the sweet-spot for your situation.
I would suggest starting with 2 times the number processors (cores)
and a queue of say 20. Requests are queued, but if the request cannot
be queued, at least the application then knows that it is too busy and
you can give the user a message "The system is too busy at this
moment, please try again in a few seconds..." kind of thing. The
advantage is also that when things are very busy, you have the
accepted requests handled in a reasonable amount of time, and some
users being told things are too busy, as opposed to all the requests
going in and the system thrashing (and perhaps running-out of memory
at the same time) and everyone's queries being very slow.

But I would suggest setting-up a test harness to emulate your
production conditions to try these things out...

-Glen

[1]http://zzzoot.blogspot.com/2008/04/lucene-indexing-performance-benchmarks.html
[2]http://java.sun.com/j2se/1.5.0/docs/api/java/util/concurrent/ThreadPoolExecutor.html
[3]http://java.sun.com/j2se/1.5.0/docs/api/java/util/concurrent/BlockingQueue.html


2008/4/22 Anshum <[hidden email]>:

> The paper seems pretty good but I am still wondering if there was a way to
>  achieve this through the command line parameters. I'm just trying this to
>  optimize the code, if this works, would let all know else would keep
>  everyone informed :)
>  Any other suggestions for handling a concurrency of over 7 search requests
>  per second for an index size of over 15Gigs containing over 13 million
>  records?
>  Also, could someone help me with obtaining a 'index size' - 'concurrency' -
>  'processor power' - 'memory' relationship formula (or something similar)?
>
>  --
>  Anshum
>
>
>
>
>  On Tue, Apr 22, 2008 at 3:55 AM, Antony Bowesman <[hidden email]> wrote:
>
>  > That paper from 1997 is pretty old, but mirrors our experiences in those
>  > days. Then, we used Solaris processor sets to really improve performance by
>  > binding one of our processes to a particular CPU while leaving the other
>  > CPUs to manage the thread intensive work.
>  >
>  > You can bind processes/LWPs to a CPU on Solaris with psrset.
>  >
>  > The Solaris thread model in the late '90s was also a significant factor in
>  > performance of multi-threaded programs.  The default thread library in
>  > Solaris 8 implemented a MxN unbound thread model (threads/LWPS).  In those
>  > days we found that it did not perform well, so used the bound thread model
>  > (i.e. 1:1) where a Solaris thread was bound permanently to an LWP.  That
>  > improved performance a lot.  In Solaris 8, Sun had what they called the
>  > 'alternate' thread library (T2) around 2000, which became the default
>  > library in Solaris 9, and implemented a 1:1 model of Solaris threads to
>  > LWPs.  That new library had dramatic performance improvements over the old.
>  >
>  > Some background info for Java and threading
>  >
>  > http://java.sun.com/j2se/1.5.0/docs/guide/vm/thread-priorities.html
>  >
>  > Antony
>  >
>  >
>  >
>  > Glen Newton wrote:
>  >
>  > > I realised that not everyone on this list might be able to access the
>  > > IEEE paper I pointed-out, so I will include the abstract and some
>  > > paragraphs from the paper which I have included below.
>  > >
>  > > Also of interest (and should be available to all): Fedorova et al.
>  > > 2005. Performance of Multithreaded Chip Multiprocessors And
>  > > Implications For Operating System Design. Usenix 2005.
>  > > http://www.eecs.harvard.edu/margo/papers/usenix05/paper.pdf
>  > > "Abstract: We investigated how operating system design should be
>  > > adapted for multithreaded chip multiprocessors (CMT) – a new
>  > > generation of processors that exploit thread-level parallelism to mask
>  > > the memory latency in modern workloads. We
>  > > determined that the L2 cache is a critical shared resource on CMT and
>  > > that an insufficient amount of L2 cache can undermine the ability to
>  > > hide memory latency on these processors. To use the L2 cache as
>  > > efficiently as possible, we propose an L2-conscious scheduling
>  > > algorithm and quantify its performance potential. Using this algorithm
>  > > it is possible to reduce miss ratios in the L2 cache by 25-37% and
>  > > improve processor throughput by 27-45%."
>  > >
>  > >
>  > > From Lundberg, L. 1997:
>  > > Abstract: "The default scheduling algorithm in Solaris and other
>  > > operating systems may result in frequent relocation of threads at
>  > > run-time. Excessive thread relocation cause
>  > > poor memory performance. This can be avoided by binding threads to
>  > > processors. However, binding threads to processors may result in an
>  > > unbalanced load. By considering a previously obtained theoretical
>  > > result and by evaluating a set of multithreaded Solaris
>  > > programs using a multiprocessor with 8 processors, we are able to
>  > > bound the maximum performance loss due to binding threads, The
>  > > theoretical result is also recapitulated. By evaluating another set of
>  > > multithreaded programs, we show that the gain of binding threads to
>  > > processors may be substantial, particularly for programs with fine
>  > > grained parallelism."
>  > >
>  > > First paragraph: "The thread concept in Solaris [3][5] and other
>  > > operating systems makes it possible to write multithreaded programs,
>  > > which can be executed in parallel on a multiprocessor. Previous
>  > > experience from real world programs [4] show that, using the default
>  > > scheduling algorithm in Solaris, threads are frequently relocated from
>  > > one processor
>  > > to another at run-time. After each such relocation, the code and data
>  > > associated with the relocated thread is moved from the cache memory of
>  > > the 0113 processor to the cache of the new processor. This reduces the
>  > > performance. Run-time relocation of threads to processors can also
>  > > result in unpredictable response times. This is a problem in systems
>  > > which operate in a real-time environment. In order to avoid poor
>  > > memory performance and unpredictable real-time behaviour due to
>  > > frequent thread relocation, threads can be bound to processors using
>  > > the processor-bind directive [3] [5]. The major problem with binding
>  > > threads is that one can end up with an unbalanced load, i.e. some
>  > > processors may be extremely busy during some time periods while other
>  > > processors are idle."
>  > >
>  > > -Glen
>  > >
>  > > On 21/04/2008, Glen Newton <[hidden email]> wrote:
>  > >
>  > > > And this discussion on bound threads may also shed light on things:
>  > > >
>  > > > http://coding.derkeiler.com/Archive/Java/comp.lang.java.programmer/2007-11/msg02801.html
>  > > >
>  > > >
>  > > >  -Glen
>  > > >
>  > > >
>  > > >  On 21/04/2008, Glen Newton <[hidden email]> wrote:
>  > > >  > BInding threads to processors - in many situations - improves
>  > > >  >  throughput by reducing memory overhead. When a thread is running
>  > > > on a
>  > > >  >  core, its state is local; if it is timeshared-out and either 1)
>  > > >  >  swapped back in on the same core, it is likely that there will be
>  > > >  the
>  > > >  >  core's L1 cache; or 2) onto another core, there will not be a
>  > > > cache
>  > > >  >  hit and the state will have to be fetched from L2 or main memory,
>  > > >  >  incurring a performance hit, esp in the latter. See Lundberg, L.
>  > > > 1997.
>  > > >  >  Evaluating the Performance Implications of Binding Threads to
>  > > >  >  Processors. 393.
>  > > > http://ieeexplore.ieee.org/iel3/5020/13768/00634520.pdf
>  > > >  >  for more info.
>  > > >  >
>  > > >  >  If you are using JVM on Solaris on SPARC, you should take a look
>  > > > at
>  > > >  >  the following links for tuning (the Sun JVM on Solaris SPARC has
>  > > > many
>  > > >  >  more performance tuning parameters available), including
>  > > > threading:
>  > > >  >  - http://java.sun.com/docs/hotspot/threads/threads.html
>  > > >  >  -
>  > > > http://java.sun.com/j2se/1.5.0/docs/guide/vm/thread-priorities.html
>  > > >  >  -
>  > > > http://www-1.ibm.com/support/docview.wss?rs=180&context=SSEQTP&uid=swg21107291
>  > > >  >  - http://java.sun.com/javase/technologies/performance.jsp
>  > > >  >
>  > > >  >
>  > > >  >  -Glen
>  > > >  >
>  > > >  >
>  > > >  >
>  > > >  >
>  > > >  >
>  > > >  >
>  > > >  >
>  > > >  >  On 21/04/2008, Ulf Dittmer <[hidden email]> wrote:
>  > > >  >  > This sounds odd. Why would restricting it to a single
>  > > >  >  >  core improve performance? The point of using multiple
>  > > >  >  >  cores (and multiple threads) is to improve performance
>  > > >  >  >  isn't it? I'd leave thread scheduling decisions to the
>  > > >  >  >  JVM. Plus, I don't think there is anything in Java to
>  > > >  >  >  facilitate this (short of using JNI).
>  > > >  >  >
>  > > >  >  >  Are you talking about indexing or searching? You may
>  > > >  >  >  be able to use multiple parallel threads to improve
>  > > >  >  >  indexing performance. I don't think Lucene uses
>  > > >  >  >  multi-threading for searching; not unless you have
>  > > >  >  >  multiple indices, anyway.
>  > > >  >  >
>  > > >  >  >  Ulf
>  > > >  >  >
>  > > >  >  >
>  > > >  >  >
>  > > >  >  >  --- Anshum <[hidden email]> wrote:
>  > > >  >  >
>  > > >  >  >  > Hi,
>  > > >  >  >  >
>  > > >  >  >  > I have been trying to bind my lucene instance (JVM -
>  > > >  >  >  > Sun Hotspot*) to a
>  > > >  >  >  > particular core so as to improve the performance. Is
>  > > >  >  >  > there a way to do so or
>  > > >  >  >  > is there support in lucene to explicitly control the
>  > > >  >  >  > thread - processor
>  > > >  >  >  > linkup?
>  > > >  >  >  >
>  > > >  >  >  > --
>  > > >  >  >  > --
>  > > >  >  >  > The facts expressed here belong to everybody, the
>  > > >  >  >  > opinions to me.
>  > > >  >  >  > The distinction is yours to draw............
>  > > >  >  >  >
>  > > >  >  >
>  > > >  >  >
>  > > >  >  >
>  > > >  >  >
>  > > >  >  >
>  > > > ____________________________________________________________________________________
>  > > >  >  >  Be a better friend, newshound, and
>  > > >  >  >  know-it-all with Yahoo! Mobile.  Try it now.
>  > > > http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ
>  > > >  >  >
>  > > >  >  >
>  > > >  ---------------------------------------------------------------------
>  > > >  >  >  To unsubscribe, e-mail: [hidden email]
>  > > >  >  >  For additional commands, e-mail:
>  > > > [hidden email]
>  > > >  >  >
>  > > >  >  >
>  > > >  >
>  > > >  >
>  > > >  >
>  > > >  > --
>  > > >  >
>  > > >  >  -
>  > > >  >
>  > > >
>  > > >
>  > > >
>  > > > --
>  > > >
>  > > >  -
>  > > >
>  > > >
>  > >
>  > >
>  >
>  > ---------------------------------------------------------------------
>  > To unsubscribe, e-mail: [hidden email]
>  > For additional commands, e-mail: [hidden email]
>  >
>  >
>
>
>  --
>
>
> --
>  The facts expressed here belong to everybody, the opinions to me.
>  The distinction is yours to draw............
>



--

-
Reply | Threaded
Open this post in threaded view
|

RE: Binding lucene instance/threads to a particular processor(or core)

Renaud Waldura-5
In reply to this post by Anshum-2
Anshum:

Have you looked into the ConcurrentMultiSearcher? It would have you split
your index into N sub-indices, and search each with a dedicated thread.

--Renaud
 

-----Original Message-----
From: Anshum [mailto:[hidden email]]
Sent: Monday, April 21, 2008 9:10 PM
To: [hidden email]
Subject: Re: Binding lucene instance/threads to a particular processor(or
core)

The paper seems pretty good but I am still wondering if there was a way to
achieve this through the command line parameters. I'm just trying this to
optimize the code, if this works, would let all know else would keep
everyone informed :) Any other suggestions for handling a concurrency of
over 7 search requests per second for an index size of over 15Gigs
containing over 13 million records?
Also, could someone help me with obtaining a 'index size' - 'concurrency' -
'processor power' - 'memory' relationship formula (or something similar)?

--
Anshum


On Tue, Apr 22, 2008 at 3:55 AM, Antony Bowesman <[hidden email]> wrote:

> That paper from 1997 is pretty old, but mirrors our experiences in
> those days. Then, we used Solaris processor sets to really improve
> performance by binding one of our processes to a particular CPU while
> leaving the other CPUs to manage the thread intensive work.
>
> You can bind processes/LWPs to a CPU on Solaris with psrset.
>
> The Solaris thread model in the late '90s was also a significant
> factor in performance of multi-threaded programs.  The default thread
> library in Solaris 8 implemented a MxN unbound thread model
> (threads/LWPS).  In those days we found that it did not perform well,
> so used the bound thread model (i.e. 1:1) where a Solaris thread was
> bound permanently to an LWP.  That improved performance a lot.  In
> Solaris 8, Sun had what they called the 'alternate' thread library
> (T2) around 2000, which became the default library in Solaris 9, and
> implemented a 1:1 model of Solaris threads to LWPs.  That new library had
dramatic performance improvements over the old.

>
> Some background info for Java and threading
>
> http://java.sun.com/j2se/1.5.0/docs/guide/vm/thread-priorities.html
>
> Antony
>
>
>
> Glen Newton wrote:
>
> > I realised that not everyone on this list might be able to access
> > the IEEE paper I pointed-out, so I will include the abstract and
> > some paragraphs from the paper which I have included below.
> >
> > Also of interest (and should be available to all): Fedorova et al.
> > 2005. Performance of Multithreaded Chip Multiprocessors And
> > Implications For Operating System Design. Usenix 2005.
> > http://www.eecs.harvard.edu/margo/papers/usenix05/paper.pdf
> > "Abstract: We investigated how operating system design should be
> > adapted for multithreaded chip multiprocessors (CMT) - a new
> > generation of processors that exploit thread-level parallelism to
> > mask the memory latency in modern workloads. We determined that the
> > L2 cache is a critical shared resource on CMT and that an
> > insufficient amount of L2 cache can undermine the ability to hide
> > memory latency on these processors. To use the L2 cache as
> > efficiently as possible, we propose an L2-conscious scheduling
> > algorithm and quantify its performance potential. Using this
> > algorithm it is possible to reduce miss ratios in the L2 cache by
> > 25-37% and improve processor throughput by 27-45%."
> >
> >
> > From Lundberg, L. 1997:
> > Abstract: "The default scheduling algorithm in Solaris and other
> > operating systems may result in frequent relocation of threads at
> > run-time. Excessive thread relocation cause poor memory performance.
> > This can be avoided by binding threads to processors. However,
> > binding threads to processors may result in an unbalanced load. By
> > considering a previously obtained theoretical result and by
> > evaluating a set of multithreaded Solaris programs using a
> > multiprocessor with 8 processors, we are able to bound the maximum
> > performance loss due to binding threads, The theoretical result is
> > also recapitulated. By evaluating another set of multithreaded
> > programs, we show that the gain of binding threads to processors may
> > be substantial, particularly for programs with fine grained
> > parallelism."
> >
> > First paragraph: "The thread concept in Solaris [3][5] and other
> > operating systems makes it possible to write multithreaded programs,
> > which can be executed in parallel on a multiprocessor. Previous
> > experience from real world programs [4] show that, using the default
> > scheduling algorithm in Solaris, threads are frequently relocated
> > from one processor to another at run-time. After each such
> > relocation, the code and data associated with the relocated thread
> > is moved from the cache memory of the 0113 processor to the cache of
> > the new processor. This reduces the performance. Run-time relocation
> > of threads to processors can also result in unpredictable response
> > times. This is a problem in systems which operate in a real-time
> > environment. In order to avoid poor memory performance and
> > unpredictable real-time behaviour due to frequent thread relocation,
> > threads can be bound to processors using the processor-bind
> > directive [3] [5]. The major problem with binding threads is that
> > one can end up with an unbalanced load, i.e. some processors may be
> > extremely busy during some time periods while other processors are
> > idle."
> >
> > -Glen
> >
> > On 21/04/2008, Glen Newton <[hidden email]> wrote:
> >
> > > And this discussion on bound threads may also shed light on things:
> > >
> > > http://coding.derkeiler.com/Archive/Java/comp.lang.java.programmer
> > > /2007-11/msg02801.html
> > >
> > >
> > >  -Glen
> > >
> > >
> > >  On 21/04/2008, Glen Newton <[hidden email]> wrote:
> > >  > BInding threads to processors - in many situations - improves  
> > > >  throughput by reducing memory overhead. When a thread is
> > > running on a  >  core, its state is local; if it is timeshared-out
> > > and either 1)  >  swapped back in on the same core, it is likely
> > > that there will be  the  >  core's L1 cache; or 2) onto another
> > > core, there will not be a cache  >  hit and the state will have to
> > > be fetched from L2 or main memory,  >  incurring a performance
> > > hit, esp in the latter. See Lundberg, L.
> > > 1997.
> > >  >  Evaluating the Performance Implications of Binding Threads to  
> > > >  Processors. 393.
> > > http://ieeexplore.ieee.org/iel3/5020/13768/00634520.pdf
> > >  >  for more info.
> > >  >
> > >  >  If you are using JVM on Solaris on SPARC, you should take a
> > > look at  >  the following links for tuning (the Sun JVM on Solaris
> > > SPARC has many  >  more performance tuning parameters available),
> > > including
> > > threading:
> > >  >  - http://java.sun.com/docs/hotspot/threads/threads.html
> > >  >  -
> > > http://java.sun.com/j2se/1.5.0/docs/guide/vm/thread-priorities.htm
> > > l
> > >  >  -
> > > http://www-1.ibm.com/support/docview.wss?rs=180&context=SSEQTP&uid
> > > =swg21107291  >  -
> > > http://java.sun.com/javase/technologies/performance.jsp
> > >  >
> > >  >
> > >  >  -Glen
> > >  >
> > >  >
> > >  >
> > >  >
> > >  >
> > >  >
> > >  >
> > >  >  On 21/04/2008, Ulf Dittmer <[hidden email]> wrote:
> > >  >  > This sounds odd. Why would restricting it to a single  >  >  
> > > core improve performance? The point of using multiple  >  >  cores
> > > (and multiple threads) is to improve performance  >  >  isn't it?
> > > I'd leave thread scheduling decisions to the  >  >  JVM. Plus, I
> > > don't think there is anything in Java to  >  >  facilitate this
> > > (short of using JNI).
> > >  >  >
> > >  >  >  Are you talking about indexing or searching? You may  >  >  
> > > be able to use multiple parallel threads to improve  >  >  
> > > indexing performance. I don't think Lucene uses  >  >  
> > > multi-threading for searching; not unless you have  >  >  multiple
> > > indices, anyway.
> > >  >  >
> > >  >  >  Ulf
> > >  >  >
> > >  >  >
> > >  >  >
> > >  >  >  --- Anshum <[hidden email]> wrote:
> > >  >  >
> > >  >  >  > Hi,
> > >  >  >  >
> > >  >  >  > I have been trying to bind my lucene instance (JVM -  >  
> > > >  > Sun Hotspot*) to a  >  >  > particular core so as to improve
> > > the performance. Is  >  >  > there a way to do so or  >  >  > is
> > > there support in lucene to explicitly control the  >  >  > thread
> > > - processor  >  >  > linkup?
> > >  >  >  >
> > >  >  >  > --
> > >  >  >  > --
> > >  >  >  > The facts expressed here belong to everybody, the  >  >  
> > > > opinions to me.
> > >  >  >  > The distinction is yours to draw............
> > >  >  >  >
> > >  >  >
> > >  >  >
> > >  >  >
> > >  >  >
> > >  >  >
> > > __________________________________________________________________
> > > __________________  >  >  Be a better friend, newshound, and  >  >  
> > > know-it-all with Yahoo! Mobile.  Try it now.
> > > http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ
> > >  >  >
> > >  >  >
> > >  
> > > ------------------------------------------------------------------
> > > ---  >  >  To unsubscribe, e-mail:
> > > [hidden email]
> > >  >  >  For additional commands, e-mail:
> > > [hidden email]
> > >  >  >
> > >  >  >
> > >  >
> > >  >
> > >  >
> > >  > --
> > >  >
> > >  >  -
> > >  >
> > >
> > >
> > >
> > > --
> > >
> > >  -
> > >
> > >
> >
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>


--
--
The facts expressed here belong to everybody, the opinions to me.
The distinction is yours to draw............



---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

RE: Binding lucene instance/threads to a particular processor(or core)

Renaud Waldura-5
In reply to this post by Glen Newton
> one solution is to set-up a ThreadPoolExecutor[2] with a fixed
> number of threads and a limited queue size (use a bound BlockingQueue[3])

Yes, this is precisely how the ConcurrentMultiSearcher works.
https://issues.apache.org/jira/browse/LUCENE-423



-----Original Message-----
From: Glen Newton [mailto:[hidden email]]
Sent: Tuesday, April 22, 2008 5:40 AM
To: [hidden email]
Subject: Re: Binding lucene instance/threads to a particular processor(or
core)

Anshun,

I think I am dealing with an index of similar scale: 6.4 million records, 83
GB index (see [1] for more info)

I mistakenly thought from your original posting that you were interested in
binding threads to processors for indexing, but it is sounding like you want
to do this for searching. I am not sure if this would work well for
searching, as there is a great deal more ephemeral state. But I am not sure.

With respect to handling concurrency for search, could you describe your
environment better?
- Is it an web-base application?
- What sort of problems do you have now?
- What are your java command line parameters (heap, etc.)
- What are the performance expectations, i.e. average search < N1 seconds;
median search time < N2 seconds
- Are you storing any fields and if so, do you need to store so many?
- Do you have the choice of serving searches from multiple machines?
i.e. load balance across >1 machine; We've found this to be one of the best
solutions for scaling;
- One thing that I do for both indexing and searching is that the threads
that are doing these tasks I always shift their priority to MAX, so that
they are run in preference to threads doing other things, like preparing
Documents, etc.

If it is a web-type environment, one solution is to set-up a
ThreadPoolExecutor[2] with a fixed number of threads and a limited queue
size (use a bound BlockingQueue[3]) . You would have to experiment with the
numbers to get the sweet-spot for your situation.
I would suggest starting with 2 times the number processors (cores) and a
queue of say 20. Requests are queued, but if the request cannot be queued,
at least the application then knows that it is too busy and you can give the
user a message "The system is too busy at this moment, please try again in a
few seconds..." kind of thing. The advantage is also that when things are
very busy, you have the accepted requests handled in a reasonable amount of
time, and some users being told things are too busy, as opposed to all the
requests going in and the system thrashing (and perhaps running-out of
memory at the same time) and everyone's queries being very slow.

But I would suggest setting-up a test harness to emulate your production
conditions to try these things out...

-Glen

[1]http://zzzoot.blogspot.com/2008/04/lucene-indexing-performance-benchmarks
.html
[2]http://java.sun.com/j2se/1.5.0/docs/api/java/util/concurrent/ThreadPoolEx
ecutor.html
[3]http://java.sun.com/j2se/1.5.0/docs/api/java/util/concurrent/BlockingQueu
e.html


2008/4/22 Anshum <[hidden email]>:
> The paper seems pretty good but I am still wondering if there was a
> way to  achieve this through the command line parameters. I'm just
> trying this to  optimize the code, if this works, would let all know
> else would keep  everyone informed :)  Any other suggestions for
> handling a concurrency of over 7 search requests  per second for an
> index size of over 15Gigs containing over 13 million  records?
>  Also, could someone help me with obtaining a 'index size' -
> 'concurrency' -  'processor power' - 'memory' relationship formula (or
something similar)?
>
>  --
>  Anshum
>
>
>
>
>  On Tue, Apr 22, 2008 at 3:55 AM, Antony Bowesman <[hidden email]>
wrote:

>
>  > That paper from 1997 is pretty old, but mirrors our experiences in
> those  > days. Then, we used Solaris processor sets to really improve
> performance by  > binding one of our processes to a particular CPU
> while leaving the other  > CPUs to manage the thread intensive work.
>  >
>  > You can bind processes/LWPs to a CPU on Solaris with psrset.
>  >
>  > The Solaris thread model in the late '90s was also a significant
> factor in  > performance of multi-threaded programs.  The default
> thread library in  > Solaris 8 implemented a MxN unbound thread model
> (threads/LWPS).  In those  > days we found that it did not perform
> well, so used the bound thread model  > (i.e. 1:1) where a Solaris
> thread was bound permanently to an LWP.  That  > improved performance
> a lot.  In Solaris 8, Sun had what they called the  > 'alternate'
> thread library (T2) around 2000, which became the default  > library
> in Solaris 9, and implemented a 1:1 model of Solaris threads to  > LWPs.
That new library had dramatic performance improvements over the old.

>  >
>  > Some background info for Java and threading  >  >
> http://java.sun.com/j2se/1.5.0/docs/guide/vm/thread-priorities.html
>  >
>  > Antony
>  >
>  >
>  >
>  > Glen Newton wrote:
>  >
>  > > I realised that not everyone on this list might be able to access
> the  > > IEEE paper I pointed-out, so I will include the abstract and
> some  > > paragraphs from the paper which I have included below.
>  > >
>  > > Also of interest (and should be available to all): Fedorova et al.
>  > > 2005. Performance of Multithreaded Chip Multiprocessors And  > >
> Implications For Operating System Design. Usenix 2005.
>  > > http://www.eecs.harvard.edu/margo/papers/usenix05/paper.pdf
>  > > "Abstract: We investigated how operating system design should be  
> > > adapted for multithreaded chip multiprocessors (CMT) - a new  > >
> generation of processors that exploit thread-level parallelism to mask  
> > > the memory latency in modern workloads. We  > > determined that
> the L2 cache is a critical shared resource on CMT and  > > that an
> insufficient amount of L2 cache can undermine the ability to  > > hide
> memory latency on these processors. To use the L2 cache as  > >
> efficiently as possible, we propose an L2-conscious scheduling  > >
> algorithm and quantify its performance potential. Using this algorithm  
> > > it is possible to reduce miss ratios in the L2 cache by 25-37% and  
> > > improve processor throughput by 27-45%."
>  > >
>  > >
>  > > From Lundberg, L. 1997:
>  > > Abstract: "The default scheduling algorithm in Solaris and other  
> > > operating systems may result in frequent relocation of threads at  
> > > run-time. Excessive thread relocation cause  > > poor memory
> performance. This can be avoided by binding threads to  > >
> processors. However, binding threads to processors may result in an  >
> > unbalanced load. By considering a previously obtained theoretical  >
> > result and by evaluating a set of multithreaded Solaris  > >
> programs using a multiprocessor with 8 processors, we are able to  > >
> bound the maximum performance loss due to binding threads, The  > >
> theoretical result is also recapitulated. By evaluating another set of  
> > > multithreaded programs, we show that the gain of binding threads
> to  > > processors may be substantial, particularly for programs with
> fine  > > grained parallelism."
>  > >
>  > > First paragraph: "The thread concept in Solaris [3][5] and other  
> > > operating systems makes it possible to write multithreaded
> programs,  > > which can be executed in parallel on a multiprocessor.
> Previous  > > experience from real world programs [4] show that, using
> the default  > > scheduling algorithm in Solaris, threads are
> frequently relocated from  > > one processor  > > to another at
> run-time. After each such relocation, the code and data  > >
> associated with the relocated thread is moved from the cache memory of  
> > > the 0113 processor to the cache of the new processor. This reduces
> the  > > performance. Run-time relocation of threads to processors can
> also  > > result in unpredictable response times. This is a problem in
> systems  > > which operate in a real-time environment. In order to
> avoid poor  > > memory performance and unpredictable real-time
> behaviour due to  > > frequent thread relocation, threads can be bound
> to processors using  > > the processor-bind directive [3] [5]. The
> major problem with binding  > > threads is that one can end up with an
> unbalanced load, i.e. some  > > processors may be extremely busy
> during some time periods while other  > > processors are idle."
>  > >
>  > > -Glen
>  > >
>  > > On 21/04/2008, Glen Newton <[hidden email]> wrote:
>  > >
>  > > > And this discussion on bound threads may also shed light on things:
>  > > >
>  > > >
> http://coding.derkeiler.com/Archive/Java/comp.lang.java.programmer/200
> 7-11/msg02801.html
>  > > >
>  > > >
>  > > >  -Glen
>  > > >
>  > > >
>  > > >  On 21/04/2008, Glen Newton <[hidden email]> wrote:
>  > > >  > BInding threads to processors - in many situations -
> improves  > > >  >  throughput by reducing memory overhead. When a
> thread is running  > > > on a  > > >  >  core, its state is local; if
> it is timeshared-out and either 1)  > > >  >  swapped back in on the
> same core, it is likely that there will be  > > >  the  > > >  >  
> core's L1 cache; or 2) onto another core, there will not be a  > > >
> cache  > > >  >  hit and the state will have to be fetched from L2 or
> main memory,  > > >  >  incurring a performance hit, esp in the
> latter. See Lundberg, L.
>  > > > 1997.
>  > > >  >  Evaluating the Performance Implications of Binding Threads
> to  > > >  >  Processors. 393.
>  > > > http://ieeexplore.ieee.org/iel3/5020/13768/00634520.pdf
>  > > >  >  for more info.
>  > > >  >
>  > > >  >  If you are using JVM on Solaris on SPARC, you should take a
> look  > > > at  > > >  >  the following links for tuning (the Sun JVM
> on Solaris SPARC has  > > > many  > > >  >  more performance tuning
> parameters available), including  > > > threading:
>  > > >  >  - http://java.sun.com/docs/hotspot/threads/threads.html
>  > > >  >  -
>  > > >
> http://java.sun.com/j2se/1.5.0/docs/guide/vm/thread-priorities.html
>  > > >  >  -
>  > > >
> http://www-1.ibm.com/support/docview.wss?rs=180&context=SSEQTP&uid=swg
> 21107291  > > >  >  -
> http://java.sun.com/javase/technologies/performance.jsp
>  > > >  >
>  > > >  >
>  > > >  >  -Glen
>  > > >  >
>  > > >  >
>  > > >  >
>  > > >  >
>  > > >  >
>  > > >  >
>  > > >  >
>  > > >  >  On 21/04/2008, Ulf Dittmer <[hidden email]> wrote:
>  > > >  >  > This sounds odd. Why would restricting it to a single  >
> > >  >  >  core improve performance? The point of using multiple  > >
> >  >  >  cores (and multiple threads) is to improve performance  > > >  
> >  >  isn't it? I'd leave thread scheduling decisions to the  > > >  >  
> >  JVM. Plus, I don't think there is anything in Java to  > > >  >  >  
> facilitate this (short of using JNI).
>  > > >  >  >
>  > > >  >  >  Are you talking about indexing or searching? You may  >
> > >  >  >  be able to use multiple parallel threads to improve  > > >  
> >  >  indexing performance. I don't think Lucene uses  > > >  >  >  
> multi-threading for searching; not unless you have  > > >  >  >  
> multiple indices, anyway.
>  > > >  >  >
>  > > >  >  >  Ulf
>  > > >  >  >
>  > > >  >  >
>  > > >  >  >
>  > > >  >  >  --- Anshum <[hidden email]> wrote:
>  > > >  >  >
>  > > >  >  >  > Hi,
>  > > >  >  >  >
>  > > >  >  >  > I have been trying to bind my lucene instance (JVM -  
> > > >  >  >  > Sun Hotspot*) to a  > > >  >  >  > particular core so
> as to improve the performance. Is  > > >  >  >  > there a way to do so
> or  > > >  >  >  > is there support in lucene to explicitly control
> the  > > >  >  >  > thread - processor  > > >  >  >  > linkup?
>  > > >  >  >  >
>  > > >  >  >  > --
>  > > >  >  >  > --
>  > > >  >  >  > The facts expressed here belong to everybody, the  > >
> >  >  >  > opinions to me.
>  > > >  >  >  > The distinction is yours to draw............
>  > > >  >  >  >
>  > > >  >  >
>  > > >  >  >
>  > > >  >  >
>  > > >  >  >
>  > > >  >  >
>  > > >
> ______________________________________________________________________
> ______________  > > >  >  >  Be a better friend, newshound, and  > > >  
> >  >  know-it-all with Yahoo! Mobile.  Try it now.
>  > > > http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ
>  > > >  >  >
>  > > >  >  >
>  > > >  
> ---------------------------------------------------------------------
>  > > >  >  >  To unsubscribe, e-mail:
> [hidden email]
>  > > >  >  >  For additional commands, e-mail:
>  > > > [hidden email]  > > >  >  >  > > >  >  >  > >
> >  >  > > >  >  > > >  >  > > >  > --  > > >  >  > > >  >  -  > > >  >  
> > > >  > > >  > > >  > > > --  > > >  > > >  -  > > >  > > >  > >  > >  
> >  >
> ---------------------------------------------------------------------
>  > To unsubscribe, e-mail: [hidden email]
>  > For additional commands, e-mail: [hidden email]  
> >  >
>
>
>  --
>
>
> --
>  The facts expressed here belong to everybody, the opinions to me.
>  The distinction is yours to draw............
>



--

-



---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Binding lucene instance/threads to a particular processor(or core)

Glen Newton
So even if you only have one index, this is the way to go to manage
this kind of problem.

Looking at the implementation and having used ThreadPoolExecutor (TPE)
a lot, I would make the following suggestions for this class so as to
better support this particular use case:
Better access to the configuration of the TPE is needed: the ability
to choose the number of threads (pools size and max pool size), the
type and nature of the queue, etc. Also, the default behaviour of TPE
is to throw an exception when a job is submitted and the queue is
full: throwing an exception is expensive, especially when dozens or
hundreds of searches are being rejected and many exceptions are
occurring in a high load situation. Instead, being able to set the
RejectedExecutionHandler on the TPE would allow for a graceful
handling of rejected queries (feedback to user application, etc).
Also, TPE allows for a custom ThreadFactory, which I use to produce
threads with the highest priority to do the searching.

Right now the implementation sets the #threads and queue size as a
function of the number of Searchables, which is reasonable for this
use case. But it would generalize better if these were a function of
the number of cores on the machine, or some combination.

That said, I would suggest having adding the ability to set the
ThreadPoolExecutor completely, with a getter/setter. This would allow
this class to be useful beyond the use case of multiple indexes,
becoming more generalizable to a number of use cases including
allowing it to support the use case of one (or more indexes) and a
high work load of queries that need to be managed. It could use the
same defaults if the TPE is not set externally (is null).

-Glen

2008/4/22 Renaud Waldura <[hidden email]>:

> > one solution is to set-up a ThreadPoolExecutor[2] with a fixed
>  > number of threads and a limited queue size (use a bound BlockingQueue[3])
>
>  Yes, this is precisely how the ConcurrentMultiSearcher works.
>  https://issues.apache.org/jira/browse/LUCENE-423
>
>
>
>
>  -----Original Message-----
>  From: Glen Newton [mailto:[hidden email]]
>  Sent: Tuesday, April 22, 2008 5:40 AM
>  To: [hidden email]
>  Subject: Re: Binding lucene instance/threads to a particular processor(or
>  core)
>
>
>
> Anshun,
>
>  I think I am dealing with an index of similar scale: 6.4 million records, 83
>  GB index (see [1] for more info)
>
>  I mistakenly thought from your original posting that you were interested in
>  binding threads to processors for indexing, but it is sounding like you want
>  to do this for searching. I am not sure if this would work well for
>  searching, as there is a great deal more ephemeral state. But I am not sure.
>
>  With respect to handling concurrency for search, could you describe your
>  environment better?
>  - Is it an web-base application?
>  - What sort of problems do you have now?
>  - What are your java command line parameters (heap, etc.)
>  - What are the performance expectations, i.e. average search < N1 seconds;
>  median search time < N2 seconds
>  - Are you storing any fields and if so, do you need to store so many?
>  - Do you have the choice of serving searches from multiple machines?
>  i.e. load balance across >1 machine; We've found this to be one of the best
>  solutions for scaling;
>  - One thing that I do for both indexing and searching is that the threads
>  that are doing these tasks I always shift their priority to MAX, so that
>  they are run in preference to threads doing other things, like preparing
>  Documents, etc.
>
>  If it is a web-type environment, one solution is to set-up a
>  ThreadPoolExecutor[2] with a fixed number of threads and a limited queue
>  size (use a bound BlockingQueue[3]) . You would have to experiment with the
>  numbers to get the sweet-spot for your situation.
>  I would suggest starting with 2 times the number processors (cores) and a
>  queue of say 20. Requests are queued, but if the request cannot be queued,
>  at least the application then knows that it is too busy and you can give the
>  user a message "The system is too busy at this moment, please try again in a
>  few seconds..." kind of thing. The advantage is also that when things are
>  very busy, you have the accepted requests handled in a reasonable amount of
>  time, and some users being told things are too busy, as opposed to all the
>  requests going in and the system thrashing (and perhaps running-out of
>  memory at the same time) and everyone's queries being very slow.
>
>  But I would suggest setting-up a test harness to emulate your production
>  conditions to try these things out...
>
>  -Glen
>
>  [1]http://zzzoot.blogspot.com/2008/04/lucene-indexing-performance-benchmarks
>  .html
>  [2]http://java.sun.com/j2se/1.5.0/docs/api/java/util/concurrent/ThreadPoolEx
>  ecutor.html
>  [3]http://java.sun.com/j2se/1.5.0/docs/api/java/util/concurrent/BlockingQueu
>  e.html
>
>
>  2008/4/22 Anshum <[hidden email]>:
>  > The paper seems pretty good but I am still wondering if there was a
>  > way to  achieve this through the command line parameters. I'm just
>  > trying this to  optimize the code, if this works, would let all know
>  > else would keep  everyone informed :)  Any other suggestions for
>  > handling a concurrency of over 7 search requests  per second for an
>  > index size of over 15Gigs containing over 13 million  records?
>  >  Also, could someone help me with obtaining a 'index size' -
>  > 'concurrency' -  'processor power' - 'memory' relationship formula (or
>  something similar)?
>  >
>  >  --
>  >  Anshum
>  >
>  >
>  >
>  >
>  >  On Tue, Apr 22, 2008 at 3:55 AM, Antony Bowesman <[hidden email]>
>  wrote:
>  >
>  >  > That paper from 1997 is pretty old, but mirrors our experiences in
>  > those  > days. Then, we used Solaris processor sets to really improve
>  > performance by  > binding one of our processes to a particular CPU
>  > while leaving the other  > CPUs to manage the thread intensive work.
>  >  >
>  >  > You can bind processes/LWPs to a CPU on Solaris with psrset.
>  >  >
>  >  > The Solaris thread model in the late '90s was also a significant
>  > factor in  > performance of multi-threaded programs.  The default
>  > thread library in  > Solaris 8 implemented a MxN unbound thread model
>  > (threads/LWPS).  In those  > days we found that it did not perform
>  > well, so used the bound thread model  > (i.e. 1:1) where a Solaris
>  > thread was bound permanently to an LWP.  That  > improved performance
>  > a lot.  In Solaris 8, Sun had what they called the  > 'alternate'
>  > thread library (T2) around 2000, which became the default  > library
>  > in Solaris 9, and implemented a 1:1 model of Solaris threads to  > LWPs.
>  That new library had dramatic performance improvements over the old.
>  >  >
>  >  > Some background info for Java and threading  >  >
>  > http://java.sun.com/j2se/1.5.0/docs/guide/vm/thread-priorities.html
>  >  >
>  >  > Antony
>  >  >
>  >  >
>  >  >
>  >  > Glen Newton wrote:
>  >  >
>  >  > > I realised that not everyone on this list might be able to access
>  > the  > > IEEE paper I pointed-out, so I will include the abstract and
>  > some  > > paragraphs from the paper which I have included below.
>  >  > >
>  >  > > Also of interest (and should be available to all): Fedorova et al.
>  >  > > 2005. Performance of Multithreaded Chip Multiprocessors And  > >
>  > Implications For Operating System Design. Usenix 2005.
>  >  > > http://www.eecs.harvard.edu/margo/papers/usenix05/paper.pdf
>  >  > > "Abstract: We investigated how operating system design should be
>  > > > adapted for multithreaded chip multiprocessors (CMT) - a new  > >
>  > generation of processors that exploit thread-level parallelism to mask
>  > > > the memory latency in modern workloads. We  > > determined that
>  > the L2 cache is a critical shared resource on CMT and  > > that an
>  > insufficient amount of L2 cache can undermine the ability to  > > hide
>  > memory latency on these processors. To use the L2 cache as  > >
>  > efficiently as possible, we propose an L2-conscious scheduling  > >
>  > algorithm and quantify its performance potential. Using this algorithm
>  > > > it is possible to reduce miss ratios in the L2 cache by 25-37% and
>  > > > improve processor throughput by 27-45%."
>  >  > >
>  >  > >
>  >  > > From Lundberg, L. 1997:
>  >  > > Abstract: "The default scheduling algorithm in Solaris and other
>  > > > operating systems may result in frequent relocation of threads at
>  > > > run-time. Excessive thread relocation cause  > > poor memory
>  > performance. This can be avoided by binding threads to  > >
>  > processors. However, binding threads to processors may result in an  >
>  > > unbalanced load. By considering a previously obtained theoretical  >
>  > > result and by evaluating a set of multithreaded Solaris  > >
>  > programs using a multiprocessor with 8 processors, we are able to  > >
>  > bound the maximum performance loss due to binding threads, The  > >
>  > theoretical result is also recapitulated. By evaluating another set of
>  > > > multithreaded programs, we show that the gain of binding threads
>  > to  > > processors may be substantial, particularly for programs with
>  > fine  > > grained parallelism."
>  >  > >
>  >  > > First paragraph: "The thread concept in Solaris [3][5] and other
>  > > > operating systems makes it possible to write multithreaded
>  > programs,  > > which can be executed in parallel on a multiprocessor.
>  > Previous  > > experience from real world programs [4] show that, using
>  > the default  > > scheduling algorithm in Solaris, threads are
>  > frequently relocated from  > > one processor  > > to another at
>  > run-time. After each such relocation, the code and data  > >
>  > associated with the relocated thread is moved from the cache memory of
>  > > > the 0113 processor to the cache of the new processor. This reduces
>  > the  > > performance. Run-time relocation of threads to processors can
>  > also  > > result in unpredictable response times. This is a problem in
>  > systems  > > which operate in a real-time environment. In order to
>  > avoid poor  > > memory performance and unpredictable real-time
>  > behaviour due to  > > frequent thread relocation, threads can be bound
>  > to processors using  > > the processor-bind directive [3] [5]. The
>  > major problem with binding  > > threads is that one can end up with an
>  > unbalanced load, i.e. some  > > processors may be extremely busy
>  > during some time periods while other  > > processors are idle."
>  >  > >
>  >  > > -Glen
>  >  > >
>  >  > > On 21/04/2008, Glen Newton <[hidden email]> wrote:
>  >  > >
>  >  > > > And this discussion on bound threads may also shed light on things:
>  >  > > >
>  >  > > >
>  > http://coding.derkeiler.com/Archive/Java/comp.lang.java.programmer/200
>  > 7-11/msg02801.html
>  >  > > >
>  >  > > >
>  >  > > >  -Glen
>  >  > > >
>  >  > > >
>  >  > > >  On 21/04/2008, Glen Newton <[hidden email]> wrote:
>  >  > > >  > BInding threads to processors - in many situations -
>  > improves  > > >  >  throughput by reducing memory overhead. When a
>  > thread is running  > > > on a  > > >  >  core, its state is local; if
>  > it is timeshared-out and either 1)  > > >  >  swapped back in on the
>  > same core, it is likely that there will be  > > >  the  > > >  >
>  > core's L1 cache; or 2) onto another core, there will not be a  > > >
>  > cache  > > >  >  hit and the state will have to be fetched from L2 or
>  > main memory,  > > >  >  incurring a performance hit, esp in the
>  > latter. See Lundberg, L.
>  >  > > > 1997.
>  >  > > >  >  Evaluating the Performance Implications of Binding Threads
>  > to  > > >  >  Processors. 393.
>  >  > > > http://ieeexplore.ieee.org/iel3/5020/13768/00634520.pdf
>  >  > > >  >  for more info.
>  >  > > >  >
>  >  > > >  >  If you are using JVM on Solaris on SPARC, you should take a
>  > look  > > > at  > > >  >  the following links for tuning (the Sun JVM
>  > on Solaris SPARC has  > > > many  > > >  >  more performance tuning
>  > parameters available), including  > > > threading:
>  >  > > >  >  - http://java.sun.com/docs/hotspot/threads/threads.html
>  >  > > >  >  -
>  >  > > >
>  > http://java.sun.com/j2se/1.5.0/docs/guide/vm/thread-priorities.html
>  >  > > >  >  -
>  >  > > >
>  > http://www-1.ibm.com/support/docview.wss?rs=180&context=SSEQTP&uid=swg
>  > 21107291  > > >  >  -
>  > http://java.sun.com/javase/technologies/performance.jsp
>  >  > > >  >
>  >  > > >  >
>  >  > > >  >  -Glen
>  >  > > >  >
>  >  > > >  >
>  >  > > >  >
>  >  > > >  >
>  >  > > >  >
>  >  > > >  >
>  >  > > >  >
>  >  > > >  >  On 21/04/2008, Ulf Dittmer <[hidden email]> wrote:
>  >  > > >  >  > This sounds odd. Why would restricting it to a single  >
>  > > >  >  >  core improve performance? The point of using multiple  > >
>  > >  >  >  cores (and multiple threads) is to improve performance  > > >
>  > >  >  isn't it? I'd leave thread scheduling decisions to the  > > >  >
>  > >  JVM. Plus, I don't think there is anything in Java to  > > >  >  >
>  > facilitate this (short of using JNI).
>  >  > > >  >  >
>  >  > > >  >  >  Are you talking about indexing or searching? You may  >
>  > > >  >  >  be able to use multiple parallel threads to improve  > > >
>  > >  >  indexing performance. I don't think Lucene uses  > > >  >  >
>  > multi-threading for searching; not unless you have  > > >  >  >
>  > multiple indices, anyway.
>  >  > > >  >  >
>  >  > > >  >  >  Ulf
>  >  > > >  >  >
>  >  > > >  >  >
>  >  > > >  >  >
>  >  > > >  >  >  --- Anshum <[hidden email]> wrote:
>  >  > > >  >  >
>  >  > > >  >  >  > Hi,
>  >  > > >  >  >  >
>  >  > > >  >  >  > I have been trying to bind my lucene instance (JVM -
>  > > > >  >  >  > Sun Hotspot*) to a  > > >  >  >  > particular core so
>  > as to improve the performance. Is  > > >  >  >  > there a way to do so
>  > or  > > >  >  >  > is there support in lucene to explicitly control
>  > the  > > >  >  >  > thread - processor  > > >  >  >  > linkup?
>  >  > > >  >  >  >
>  >  > > >  >  >  > --
>  >  > > >  >  >  > --
>  >  > > >  >  >  > The facts expressed here belong to everybody, the  > >
>  > >  >  >  > opinions to me.
>  >  > > >  >  >  > The distinction is yours to draw............
>  >  > > >  >  >  >
>  >  > > >  >  >
>  >  > > >  >  >
>  >  > > >  >  >
>  >  > > >  >  >
>  >  > > >  >  >
>  >  > > >
>  > ______________________________________________________________________
>  > ______________  > > >  >  >  Be a better friend, newshound, and  > > >
>  > >  >  know-it-all with Yahoo! Mobile.  Try it now.
>  >  > > > http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ
>  >  > > >  >  >
>  >  > > >  >  >
>  >  > > >
>  > ---------------------------------------------------------------------
>  >  > > >  >  >  To unsubscribe, e-mail:
>  > [hidden email]
>  >  > > >  >  >  For additional commands, e-mail:
>  >  > > > [hidden email]  > > >  >  >  > > >  >  >  > >
>  > >  >  > > >  >  > > >  >  > > >  > --  > > >  >  > > >  >  -  > > >  >
>  > > > >  > > >  > > >  > > > --  > > >  > > >  -  > > >  > > >  > >  > >
>  > >  >
>  > ---------------------------------------------------------------------
>  >  > To unsubscribe, e-mail: [hidden email]
>  >  > For additional commands, e-mail: [hidden email]
>  > >  >
>  >
>  >
>  >  --
>  >
>  >
>  > --
>  >  The facts expressed here belong to everybody, the opinions to me.
>  >  The distinction is yours to draw............
>  >
>
>
>
>  --
>
>  -
>
>
>
>  ---------------------------------------------------------------------
>  To unsubscribe, e-mail: [hidden email]
>  For additional commands, e-mail: [hidden email]
>
>



--

-

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

RE: Binding lucene instance/threads to a particular processor(or core)

Renaud Waldura-5
That's an excellent idea. I would certainely use such an improved
MultiSearcher.
You should submit a patch.
 

-----Original Message-----
From: Glen Newton [mailto:[hidden email]]
Sent: Tuesday, April 22, 2008 10:50 AM
To: [hidden email]
Subject: Re: Binding lucene instance/threads to a particular processor(or
core)

So even if you only have one index, this is the way to go to manage this
kind of problem.

Looking at the implementation and having used ThreadPoolExecutor (TPE) a
lot, I would make the following suggestions for this class so as to better
support this particular use case:
Better access to the configuration of the TPE is needed: the ability to
choose the number of threads (pools size and max pool size), the type and
nature of the queue, etc. Also, the default behaviour of TPE is to throw an
exception when a job is submitted and the queue is
full: throwing an exception is expensive, especially when dozens or hundreds
of searches are being rejected and many exceptions are occurring in a high
load situation. Instead, being able to set the RejectedExecutionHandler on
the TPE would allow for a graceful handling of rejected queries (feedback to
user application, etc).
Also, TPE allows for a custom ThreadFactory, which I use to produce threads
with the highest priority to do the searching.

Right now the implementation sets the #threads and queue size as a function
of the number of Searchables, which is reasonable for this use case. But it
would generalize better if these were a function of the number of cores on
the machine, or some combination.

That said, I would suggest having adding the ability to set the
ThreadPoolExecutor completely, with a getter/setter. This would allow this
class to be useful beyond the use case of multiple indexes, becoming more
generalizable to a number of use cases including allowing it to support the
use case of one (or more indexes) and a high work load of queries that need
to be managed. It could use the same defaults if the TPE is not set
externally (is null).

-Glen

2008/4/22 Renaud Waldura <[hidden email]>:

> > one solution is to set-up a ThreadPoolExecutor[2] with a fixed
>  > number of threads and a limited queue size (use a bound
> BlockingQueue[3])
>
>  Yes, this is precisely how the ConcurrentMultiSearcher works.
>  https://issues.apache.org/jira/browse/LUCENE-423
>
>
>
>
>  -----Original Message-----
>  From: Glen Newton [mailto:[hidden email]]
>  Sent: Tuesday, April 22, 2008 5:40 AM
>  To: [hidden email]
>  Subject: Re: Binding lucene instance/threads to a particular
> processor(or
>  core)
>
>
>
> Anshun,
>
>  I think I am dealing with an index of similar scale: 6.4 million
> records, 83  GB index (see [1] for more info)
>
>  I mistakenly thought from your original posting that you were
> interested in  binding threads to processors for indexing, but it is
> sounding like you want  to do this for searching. I am not sure if
> this would work well for  searching, as there is a great deal more
ephemeral state. But I am not sure.

>
>  With respect to handling concurrency for search, could you describe
> your  environment better?
>  - Is it an web-base application?
>  - What sort of problems do you have now?
>  - What are your java command line parameters (heap, etc.)
>  - What are the performance expectations, i.e. average search < N1
> seconds;  median search time < N2 seconds
>  - Are you storing any fields and if so, do you need to store so many?
>  - Do you have the choice of serving searches from multiple machines?
>  i.e. load balance across >1 machine; We've found this to be one of
> the best  solutions for scaling;
>  - One thing that I do for both indexing and searching is that the
> threads  that are doing these tasks I always shift their priority to
> MAX, so that  they are run in preference to threads doing other
> things, like preparing  Documents, etc.
>
>  If it is a web-type environment, one solution is to set-up a  
> ThreadPoolExecutor[2] with a fixed number of threads and a limited
> queue  size (use a bound BlockingQueue[3]) . You would have to
> experiment with the  numbers to get the sweet-spot for your situation.
>  I would suggest starting with 2 times the number processors (cores)
> and a  queue of say 20. Requests are queued, but if the request cannot
> be queued,  at least the application then knows that it is too busy
> and you can give the  user a message "The system is too busy at this
> moment, please try again in a  few seconds..." kind of thing. The
> advantage is also that when things are  very busy, you have the
> accepted requests handled in a reasonable amount of  time, and some
> users being told things are too busy, as opposed to all the  requests
> going in and the system thrashing (and perhaps running-out of  memory at
the same time) and everyone's queries being very slow.

>
>  But I would suggest setting-up a test harness to emulate your
> production  conditions to try these things out...
>
>  -Glen
>
>  
> [1]http://zzzoot.blogspot.com/2008/04/lucene-indexing-performance-benc
> hmarks
>  .html
>  
> [2]http://java.sun.com/j2se/1.5.0/docs/api/java/util/concurrent/Thread
> PoolEx
>  ecutor.html
>  
> [3]http://java.sun.com/j2se/1.5.0/docs/api/java/util/concurrent/Blocki
> ngQueu
>  e.html
>
>
>  2008/4/22 Anshum <[hidden email]>:
>  > The paper seems pretty good but I am still wondering if there was a  
> > way to  achieve this through the command line parameters. I'm just  
> > trying this to  optimize the code, if this works, would let all know  
> > else would keep  everyone informed :)  Any other suggestions for  >
> handling a concurrency of over 7 search requests  per second for an  >
> index size of over 15Gigs containing over 13 million  records?
>  >  Also, could someone help me with obtaining a 'index size' -  >
> 'concurrency' -  'processor power' - 'memory' relationship formula (or  
> something similar)?
>  >
>  >  --
>  >  Anshum
>  >
>  >
>  >
>  >
>  >  On Tue, Apr 22, 2008 at 3:55 AM, Antony Bowesman
> <[hidden email]>
>  wrote:
>  >
>  >  > That paper from 1997 is pretty old, but mirrors our experiences
> in  > those  > days. Then, we used Solaris processor sets to really
> improve  > performance by  > binding one of our processes to a
> particular CPU  > while leaving the other  > CPUs to manage the thread
intensive work.

>  >  >
>  >  > You can bind processes/LWPs to a CPU on Solaris with psrset.
>  >  >
>  >  > The Solaris thread model in the late '90s was also a significant  
> > factor in  > performance of multi-threaded programs.  The default  >
> thread library in  > Solaris 8 implemented a MxN unbound thread model  
> > (threads/LWPS).  In those  > days we found that it did not perform  
> > well, so used the bound thread model  > (i.e. 1:1) where a Solaris  
> > thread was bound permanently to an LWP.  That  > improved
> performance  > a lot.  In Solaris 8, Sun had what they called the  >
'alternate'

>  > thread library (T2) around 2000, which became the default  >
> library  > in Solaris 9, and implemented a 1:1 model of Solaris threads to
> LWPs.
>  That new library had dramatic performance improvements over the old.
>  >  >
>  >  > Some background info for Java and threading  >  >  >
> http://java.sun.com/j2se/1.5.0/docs/guide/vm/thread-priorities.html
>  >  >
>  >  > Antony
>  >  >
>  >  >
>  >  >
>  >  > Glen Newton wrote:
>  >  >
>  >  > > I realised that not everyone on this list might be able to
> access  > the  > > IEEE paper I pointed-out, so I will include the
> abstract and  > some  > > paragraphs from the paper which I have included
below.

>  >  > >
>  >  > > Also of interest (and should be available to all): Fedorova et al.
>  >  > > 2005. Performance of Multithreaded Chip Multiprocessors And  >
> >  > Implications For Operating System Design. Usenix 2005.
>  >  > > http://www.eecs.harvard.edu/margo/papers/usenix05/paper.pdf
>  >  > > "Abstract: We investigated how operating system design should
> be  > > > adapted for multithreaded chip multiprocessors (CMT) - a new  
> > >  > generation of processors that exploit thread-level parallelism
> to mask  > > > the memory latency in modern workloads. We  > >
> determined that  > the L2 cache is a critical shared resource on CMT
> and  > > that an  > insufficient amount of L2 cache can undermine the
> ability to  > > hide  > memory latency on these processors. To use the
> L2 cache as  > >  > efficiently as possible, we propose an
> L2-conscious scheduling  > >  > algorithm and quantify its performance
> potential. Using this algorithm  > > > it is possible to reduce miss
> ratios in the L2 cache by 25-37% and  > > > improve processor throughput
by 27-45%."

>  >  > >
>  >  > >
>  >  > > From Lundberg, L. 1997:
>  >  > > Abstract: "The default scheduling algorithm in Solaris and
> other  > > > operating systems may result in frequent relocation of
> threads at  > > > run-time. Excessive thread relocation cause  > >
> poor memory  > performance. This can be avoided by binding threads to  
> > >  > processors. However, binding threads to processors may result
> in an  >  > > unbalanced load. By considering a previously obtained
> theoretical  >  > > result and by evaluating a set of multithreaded
> Solaris  > >  > programs using a multiprocessor with 8 processors, we
> are able to  > >  > bound the maximum performance loss due to binding
> threads, The  > >  > theoretical result is also recapitulated. By
> evaluating another set of  > > > multithreaded programs, we show that
> the gain of binding threads  > to  > > processors may be substantial,
> particularly for programs with  > fine  > > grained parallelism."
>  >  > >
>  >  > > First paragraph: "The thread concept in Solaris [3][5] and
> other  > > > operating systems makes it possible to write
> multithreaded  > programs,  > > which can be executed in parallel on a
multiprocessor.

>  > Previous  > > experience from real world programs [4] show that,
> using  > the default  > > scheduling algorithm in Solaris, threads are  
> > frequently relocated from  > > one processor  > > to another at  >
> run-time. After each such relocation, the code and data  > >  >
> associated with the relocated thread is moved from the cache memory of  
> > > > the 0113 processor to the cache of the new processor. This
> reduces  > the  > > performance. Run-time relocation of threads to
> processors can  > also  > > result in unpredictable response times.
> This is a problem in  > systems  > > which operate in a real-time
> environment. In order to  > avoid poor  > > memory performance and
> unpredictable real-time  > behaviour due to  > > frequent thread
> relocation, threads can be bound  > to processors using  > > the
> processor-bind directive [3] [5]. The  > major problem with binding  >
> > threads is that one can end up with an  > unbalanced load, i.e. some  
> > > processors may be extremely busy  > during some time periods while
other  > > processors are idle."
>  >  > >
>  >  > > -Glen
>  >  > >
>  >  > > On 21/04/2008, Glen Newton <[hidden email]> wrote:
>  >  > >
>  >  > > > And this discussion on bound threads may also shed light on
things:

>  >  > > >
>  >  > > >
>  >
> http://coding.derkeiler.com/Archive/Java/comp.lang.java.programmer/200
>  > 7-11/msg02801.html
>  >  > > >
>  >  > > >
>  >  > > >  -Glen
>  >  > > >
>  >  > > >
>  >  > > >  On 21/04/2008, Glen Newton <[hidden email]> wrote:
>  >  > > >  > BInding threads to processors - in many situations -  >
> improves  > > >  >  throughput by reducing memory overhead. When a  >
> thread is running  > > > on a  > > >  >  core, its state is local; if  
> > it is timeshared-out and either 1)  > > >  >  swapped back in on the  
> > same core, it is likely that there will be  > > >  the  > > >  >  >
> core's L1 cache; or 2) onto another core, there will not be a  > > >  
> > cache  > > >  >  hit and the state will have to be fetched from L2
> or  > main memory,  > > >  >  incurring a performance hit, esp in the  
> > latter. See Lundberg, L.
>  >  > > > 1997.
>  >  > > >  >  Evaluating the Performance Implications of Binding
> Threads  > to  > > >  >  Processors. 393.
>  >  > > > http://ieeexplore.ieee.org/iel3/5020/13768/00634520.pdf
>  >  > > >  >  for more info.
>  >  > > >  >
>  >  > > >  >  If you are using JVM on Solaris on SPARC, you should
> take a  > look  > > > at  > > >  >  the following links for tuning
> (the Sun JVM  > on Solaris SPARC has  > > > many  > > >  >  more
> performance tuning  > parameters available), including  > > > threading:
>  >  > > >  >  - http://java.sun.com/docs/hotspot/threads/threads.html
>  >  > > >  >  -
>  >  > > >
>  > http://java.sun.com/j2se/1.5.0/docs/guide/vm/thread-priorities.html
>  >  > > >  >  -
>  >  > > >
>  >
> http://www-1.ibm.com/support/docview.wss?rs=180&context=SSEQTP&uid=swg
>  > 21107291  > > >  >  -
>  > http://java.sun.com/javase/technologies/performance.jsp
>  >  > > >  >
>  >  > > >  >
>  >  > > >  >  -Glen
>  >  > > >  >
>  >  > > >  >
>  >  > > >  >
>  >  > > >  >
>  >  > > >  >
>  >  > > >  >
>  >  > > >  >
>  >  > > >  >  On 21/04/2008, Ulf Dittmer <[hidden email]> wrote:
>  >  > > >  >  > This sounds odd. Why would restricting it to a single  
> >  > > >  >  >  core improve performance? The point of using multiple  
> > >  > >  >  >  cores (and multiple threads) is to improve performance  
> > > >  > >  >  isn't it? I'd leave thread scheduling decisions to the  
> > > >  >  > >  JVM. Plus, I don't think there is anything in Java to  
> > > >  >  >  > facilitate this (short of using JNI).
>  >  > > >  >  >
>  >  > > >  >  >  Are you talking about indexing or searching? You may  
> >  > > >  >  >  be able to use multiple parallel threads to improve  >
> > >  > >  >  indexing performance. I don't think Lucene uses  > > >  >  
> >  > multi-threading for searching; not unless you have  > > >  >  >  
> > multiple indices, anyway.
>  >  > > >  >  >
>  >  > > >  >  >  Ulf
>  >  > > >  >  >
>  >  > > >  >  >
>  >  > > >  >  >
>  >  > > >  >  >  --- Anshum <[hidden email]> wrote:
>  >  > > >  >  >
>  >  > > >  >  >  > Hi,
>  >  > > >  >  >  >
>  >  > > >  >  >  > I have been trying to bind my lucene instance (JVM
> -  > > > >  >  >  > Sun Hotspot*) to a  > > >  >  >  > particular core
> so  > as to improve the performance. Is  > > >  >  >  > there a way to
> do so  > or  > > >  >  >  > is there support in lucene to explicitly
> control  > the  > > >  >  >  > thread - processor  > > >  >  >  > linkup?
>  >  > > >  >  >  >
>  >  > > >  >  >  > --
>  >  > > >  >  >  > --
>  >  > > >  >  >  > The facts expressed here belong to everybody, the  
> > >  > >  >  >  > opinions to me.
>  >  > > >  >  >  > The distinction is yours to draw............
>  >  > > >  >  >  >
>  >  > > >  >  >
>  >  > > >  >  >
>  >  > > >  >  >
>  >  > > >  >  >
>  >  > > >  >  >
>  >  > > >
>  >
> ______________________________________________________________________
>  > ______________  > > >  >  >  Be a better friend, newshound, and  >
> > >  > >  >  know-it-all with Yahoo! Mobile.  Try it now.
>  >  > > > http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ
>  >  > > >  >  >
>  >  > > >  >  >
>  >  > > >
>  >
> ---------------------------------------------------------------------
>  >  > > >  >  >  To unsubscribe, e-mail:
>  > [hidden email]
>  >  > > >  >  >  For additional commands, e-mail:
>  >  > > > [hidden email]  > > >  >  >  > > >  >  >  
> > >  > >  >  > > >  >  > > >  >  > > >  > --  > > >  >  > > >  >  -  >
> > >  >  > > > >  > > >  > > >  > > > --  > > >  > > >  -  > > >  > > >  
> > >  > >  > >  >  >
> ---------------------------------------------------------------------
>  >  > To unsubscribe, e-mail: [hidden email]
>  >  > For additional commands, e-mail:
> [hidden email]  > >  >  >  >  >  --  >  >  > --  >  
> The facts expressed here belong to everybody, the opinions to me.
>  >  The distinction is yours to draw............
>  >
>
>
>
>  --
>
>  -
>
>
>
>  ---------------------------------------------------------------------
>  To unsubscribe, e-mail: [hidden email]
>  For additional commands, e-mail: [hidden email]
>
>



--

-

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]



---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Binding lucene instance/threads to a particular processor(or core)

Anshum-2
In reply to this post by Glen Newton
Hi Glen,

As far as stats for index/search are concerned, here they are:
* Yes, it is a web based application
* I am currently facing issues when the number of concurrent searches goes
high. The search is not able to handle over 2.5 searches per second.
* JVM command line parameters: -server mode; Max heap size 2GB (As it is a
32 bit machine/OS and that is the limit in the case)
* Performance expectations :
  * Mean search time :1-2 seconds (though that is a lot considering that I
have tried other engines, though with smaller indexes and they seem to have
sub 0.01s searching time)
* No, I do not store any fields.
* Yes, I am using multiple machines ( more than a couple ).serving the same
index; a little more than just a vanilla load balancing mechanism; sending
similar requests to the same server to better OS level caching.
* We do not index and search on the same machine, so in that case, I would
not think that changing the priority of these threads would matter. In other
words, there is nothing but search that happens on the search server. If
that is what you meant when you wrote about increasing the thead priority.

I had already started trying the TPE and  guess it would provide a lot of
improvement but 'm looking for other ways as well ! :)

--
Anshum

On Tue, Apr 22, 2008 at 11:19 PM, Glen Newton <[hidden email]> wrote:

> So even if you only have one index, this is the way to go to manage
> this kind of problem.
>
> Looking at the implementation and having used ThreadPoolExecutor (TPE)
> a lot, I would make the following suggestions for this class so as to
> better support this particular use case:
> Better access to the configuration of the TPE is needed: the ability
> to choose the number of threads (pools size and max pool size), the
> type and nature of the queue, etc. Also, the default behaviour of TPE
> is to throw an exception when a job is submitted and the queue is
> full: throwing an exception is expensive, especially when dozens or
> hundreds of searches are being rejected and many exceptions are
> occurring in a high load situation. Instead, being able to set the
> RejectedExecutionHandler on the TPE would allow for a graceful
> handling of rejected queries (feedback to user application, etc).
> Also, TPE allows for a custom ThreadFactory, which I use to produce
> threads with the highest priority to do the searching.
>
> Right now the implementation sets the #threads and queue size as a
> function of the number of Searchables, which is reasonable for this
> use case. But it would generalize better if these were a function of
> the number of cores on the machine, or some combination.
>
> That said, I would suggest having adding the ability to set the
> ThreadPoolExecutor completely, with a getter/setter. This would allow
> this class to be useful beyond the use case of multiple indexes,
> becoming more generalizable to a number of use cases including
> allowing it to support the use case of one (or more indexes) and a
> high work load of queries that need to be managed. It could use the
> same defaults if the TPE is not set externally (is null).
>
> -Glen
>
> 2008/4/22 Renaud Waldura <[hidden email]>:
> > > one solution is to set-up a ThreadPoolExecutor[2] with a fixed
> >  > number of threads and a limited queue size (use a bound
> BlockingQueue[3])
> >
> >  Yes, this is precisely how the ConcurrentMultiSearcher works.
> >  https://issues.apache.org/jira/browse/LUCENE-423
> >
> >
> >
> >
> >  -----Original Message-----
> >  From: Glen Newton [mailto:[hidden email]]
> >  Sent: Tuesday, April 22, 2008 5:40 AM
> >  To: [hidden email]
> >  Subject: Re: Binding lucene instance/threads to a particular
> processor(or
> >  core)
> >
> >
> >
> > Anshun,
> >
> >  I think I am dealing with an index of similar scale: 6.4 million
> records, 83
> >  GB index (see [1] for more info)
> >
> >  I mistakenly thought from your original posting that you were
> interested in
> >  binding threads to processors for indexing, but it is sounding like you
> want
> >  to do this for searching. I am not sure if this would work well for
> >  searching, as there is a great deal more ephemeral state. But I am not
> sure.
> >
> >  With respect to handling concurrency for search, could you describe
> your
> >  environment better?
> >  - Is it an web-base application?
> >  - What sort of problems do you have now?
> >  - What are your java command line parameters (heap, etc.)
> >  - What are the performance expectations, i.e. average search < N1
> seconds;
> >  median search time < N2 seconds
> >  - Are you storing any fields and if so, do you need to store so many?
> >  - Do you have the choice of serving searches from multiple machines?
> >  i.e. load balance across >1 machine; We've found this to be one of the
> best
> >  solutions for scaling;
> >  - One thing that I do for both indexing and searching is that the
> threads
> >  that are doing these tasks I always shift their priority to MAX, so
> that
> >  they are run in preference to threads doing other things, like
> preparing
> >  Documents, etc.
> >
> >  If it is a web-type environment, one solution is to set-up a
> >  ThreadPoolExecutor[2] with a fixed number of threads and a limited
> queue
> >  size (use a bound BlockingQueue[3]) . You would have to experiment with
> the
> >  numbers to get the sweet-spot for your situation.
> >  I would suggest starting with 2 times the number processors (cores) and
> a
> >  queue of say 20. Requests are queued, but if the request cannot be
> queued,
> >  at least the application then knows that it is too busy and you can
> give the
> >  user a message "The system is too busy at this moment, please try again
> in a
> >  few seconds..." kind of thing. The advantage is also that when things
> are
> >  very busy, you have the accepted requests handled in a reasonable
> amount of
> >  time, and some users being told things are too busy, as opposed to all
> the
> >  requests going in and the system thrashing (and perhaps running-out of
> >  memory at the same time) and everyone's queries being very slow.
> >
> >  But I would suggest setting-up a test harness to emulate your
> production
> >  conditions to try these things out...
> >
> >  -Glen
> >
> >  [1]
> http://zzzoot.blogspot.com/2008/04/lucene-indexing-performance-benchmarks
> >  .html
> >  [2]
> http://java.sun.com/j2se/1.5.0/docs/api/java/util/concurrent/ThreadPoolEx
> >  ecutor.html
> >  [3]
> http://java.sun.com/j2se/1.5.0/docs/api/java/util/concurrent/BlockingQueu
> >  e.html
> >
> >
> >  2008/4/22 Anshum <[hidden email]>:
> >  > The paper seems pretty good but I am still wondering if there was a
> >  > way to  achieve this through the command line parameters. I'm just
> >  > trying this to  optimize the code, if this works, would let all know
> >  > else would keep  everyone informed :)  Any other suggestions for
> >  > handling a concurrency of over 7 search requests  per second for an
> >  > index size of over 15Gigs containing over 13 million  records?
> >  >  Also, could someone help me with obtaining a 'index size' -
> >  > 'concurrency' -  'processor power' - 'memory' relationship formula
> (or
> >  something similar)?
> >  >
> >  >  --
> >  >  Anshum
> >  >
> >  >
> >  >
> >  >
> >  >  On Tue, Apr 22, 2008 at 3:55 AM, Antony Bowesman <[hidden email]>
> >  wrote:
> >  >
> >  >  > That paper from 1997 is pretty old, but mirrors our experiences in
> >  > those  > days. Then, we used Solaris processor sets to really improve
> >  > performance by  > binding one of our processes to a particular CPU
> >  > while leaving the other  > CPUs to manage the thread intensive work.
> >  >  >
> >  >  > You can bind processes/LWPs to a CPU on Solaris with psrset.
> >  >  >
> >  >  > The Solaris thread model in the late '90s was also a significant
> >  > factor in  > performance of multi-threaded programs.  The default
> >  > thread library in  > Solaris 8 implemented a MxN unbound thread model
> >  > (threads/LWPS).  In those  > days we found that it did not perform
> >  > well, so used the bound thread model  > (i.e. 1:1) where a Solaris
> >  > thread was bound permanently to an LWP.  That  > improved performance
> >  > a lot.  In Solaris 8, Sun had what they called the  > 'alternate'
> >  > thread library (T2) around 2000, which became the default  > library
> >  > in Solaris 9, and implemented a 1:1 model of Solaris threads to  >
> LWPs.
> >  That new library had dramatic performance improvements over the old.
> >  >  >
> >  >  > Some background info for Java and threading  >  >
> >  > http://java.sun.com/j2se/1.5.0/docs/guide/vm/thread-priorities.html
> >  >  >
> >  >  > Antony
> >  >  >
> >  >  >
> >  >  >
> >  >  > Glen Newton wrote:
> >  >  >
> >  >  > > I realised that not everyone on this list might be able to
> access
> >  > the  > > IEEE paper I pointed-out, so I will include the abstract and
> >  > some  > > paragraphs from the paper which I have included below.
> >  >  > >
> >  >  > > Also of interest (and should be available to all): Fedorova et
> al.
> >  >  > > 2005. Performance of Multithreaded Chip Multiprocessors And  > >
> >  > Implications For Operating System Design. Usenix 2005.
> >  >  > > http://www.eecs.harvard.edu/margo/papers/usenix05/paper.pdf
> >  >  > > "Abstract: We investigated how operating system design should be
> >  > > > adapted for multithreaded chip multiprocessors (CMT) - a new  > >
> >  > generation of processors that exploit thread-level parallelism to
> mask
> >  > > > the memory latency in modern workloads. We  > > determined that
> >  > the L2 cache is a critical shared resource on CMT and  > > that an
> >  > insufficient amount of L2 cache can undermine the ability to  > >
> hide
> >  > memory latency on these processors. To use the L2 cache as  > >
> >  > efficiently as possible, we propose an L2-conscious scheduling  > >
> >  > algorithm and quantify its performance potential. Using this
> algorithm
> >  > > > it is possible to reduce miss ratios in the L2 cache by 25-37%
> and
> >  > > > improve processor throughput by 27-45%."
> >  >  > >
> >  >  > >
> >  >  > > From Lundberg, L. 1997:
> >  >  > > Abstract: "The default scheduling algorithm in Solaris and other
> >  > > > operating systems may result in frequent relocation of threads at
> >  > > > run-time. Excessive thread relocation cause  > > poor memory
> >  > performance. This can be avoided by binding threads to  > >
> >  > processors. However, binding threads to processors may result in an
>  >
> >  > > unbalanced load. By considering a previously obtained theoretical
>  >
> >  > > result and by evaluating a set of multithreaded Solaris  > >
> >  > programs using a multiprocessor with 8 processors, we are able to  >
> >
> >  > bound the maximum performance loss due to binding threads, The  > >
> >  > theoretical result is also recapitulated. By evaluating another set
> of
> >  > > > multithreaded programs, we show that the gain of binding threads
> >  > to  > > processors may be substantial, particularly for programs with
> >  > fine  > > grained parallelism."
> >  >  > >
> >  >  > > First paragraph: "The thread concept in Solaris [3][5] and other
> >  > > > operating systems makes it possible to write multithreaded
> >  > programs,  > > which can be executed in parallel on a multiprocessor.
> >  > Previous  > > experience from real world programs [4] show that,
> using
> >  > the default  > > scheduling algorithm in Solaris, threads are
> >  > frequently relocated from  > > one processor  > > to another at
> >  > run-time. After each such relocation, the code and data  > >
> >  > associated with the relocated thread is moved from the cache memory
> of
> >  > > > the 0113 processor to the cache of the new processor. This
> reduces
> >  > the  > > performance. Run-time relocation of threads to processors
> can
> >  > also  > > result in unpredictable response times. This is a problem
> in
> >  > systems  > > which operate in a real-time environment. In order to
> >  > avoid poor  > > memory performance and unpredictable real-time
> >  > behaviour due to  > > frequent thread relocation, threads can be
> bound
> >  > to processors using  > > the processor-bind directive [3] [5]. The
> >  > major problem with binding  > > threads is that one can end up with
> an
> >  > unbalanced load, i.e. some  > > processors may be extremely busy
> >  > during some time periods while other  > > processors are idle."
> >  >  > >
> >  >  > > -Glen
> >  >  > >
> >  >  > > On 21/04/2008, Glen Newton <[hidden email]> wrote:
> >  >  > >
> >  >  > > > And this discussion on bound threads may also shed light on
> things:
> >  >  > > >
> >  >  > > >
> >  >
> http://coding.derkeiler.com/Archive/Java/comp.lang.java.programmer/200
> >  > 7-11/msg02801.html
> >  >  > > >
> >  >  > > >
> >  >  > > >  -Glen
> >  >  > > >
> >  >  > > >
> >  >  > > >  On 21/04/2008, Glen Newton <[hidden email]> wrote:
> >  >  > > >  > BInding threads to processors - in many situations -
> >  > improves  > > >  >  throughput by reducing memory overhead. When a
> >  > thread is running  > > > on a  > > >  >  core, its state is local; if
> >  > it is timeshared-out and either 1)  > > >  >  swapped back in on the
> >  > same core, it is likely that there will be  > > >  the  > > >  >
> >  > core's L1 cache; or 2) onto another core, there will not be a  > > >
> >  > cache  > > >  >  hit and the state will have to be fetched from L2 or
> >  > main memory,  > > >  >  incurring a performance hit, esp in the
> >  > latter. See Lundberg, L.
> >  >  > > > 1997.
> >  >  > > >  >  Evaluating the Performance Implications of Binding Threads
> >  > to  > > >  >  Processors. 393.
> >  >  > > > http://ieeexplore.ieee.org/iel3/5020/13768/00634520.pdf
> >  >  > > >  >  for more info.
> >  >  > > >  >
> >  >  > > >  >  If you are using JVM on Solaris on SPARC, you should take
> a
> >  > look  > > > at  > > >  >  the following links for tuning (the Sun JVM
> >  > on Solaris SPARC has  > > > many  > > >  >  more performance tuning
> >  > parameters available), including  > > > threading:
> >  >  > > >  >  - http://java.sun.com/docs/hotspot/threads/threads.html
> >  >  > > >  >  -
> >  >  > > >
> >  > http://java.sun.com/j2se/1.5.0/docs/guide/vm/thread-priorities.html
> >  >  > > >  >  -
> >  >  > > >
> >  >
> http://www-1.ibm.com/support/docview.wss?rs=180&context=SSEQTP&uid=swg
> >  > 21107291  > > >  >  -
> >  > http://java.sun.com/javase/technologies/performance.jsp
> >  >  > > >  >
> >  >  > > >  >
> >  >  > > >  >  -Glen
> >  >  > > >  >
> >  >  > > >  >
> >  >  > > >  >
> >  >  > > >  >
> >  >  > > >  >
> >  >  > > >  >
> >  >  > > >  >
> >  >  > > >  >  On 21/04/2008, Ulf Dittmer <[hidden email]> wrote:
> >  >  > > >  >  > This sounds odd. Why would restricting it to a single  >
> >  > > >  >  >  core improve performance? The point of using multiple  > >
> >  > >  >  >  cores (and multiple threads) is to improve performance  > >
> >
> >  > >  >  isn't it? I'd leave thread scheduling decisions to the  > > >
>  >
> >  > >  JVM. Plus, I don't think there is anything in Java to  > > >  >  >
> >  > facilitate this (short of using JNI).
> >  >  > > >  >  >
> >  >  > > >  >  >  Are you talking about indexing or searching? You may  >
> >  > > >  >  >  be able to use multiple parallel threads to improve  > > >
> >  > >  >  indexing performance. I don't think Lucene uses  > > >  >  >
> >  > multi-threading for searching; not unless you have  > > >  >  >
> >  > multiple indices, anyway.
> >  >  > > >  >  >
> >  >  > > >  >  >  Ulf
> >  >  > > >  >  >
> >  >  > > >  >  >
> >  >  > > >  >  >
> >  >  > > >  >  >  --- Anshum <[hidden email]> wrote:
> >  >  > > >  >  >
> >  >  > > >  >  >  > Hi,
> >  >  > > >  >  >  >
> >  >  > > >  >  >  > I have been trying to bind my lucene instance (JVM -
> >  > > > >  >  >  > Sun Hotspot*) to a  > > >  >  >  > particular core so
> >  > as to improve the performance. Is  > > >  >  >  > there a way to do
> so
> >  > or  > > >  >  >  > is there support in lucene to explicitly control
> >  > the  > > >  >  >  > thread - processor  > > >  >  >  > linkup?
> >  >  > > >  >  >  >
> >  >  > > >  >  >  > --
> >  >  > > >  >  >  > --
> >  >  > > >  >  >  > The facts expressed here belong to everybody, the  >
> >
> >  > >  >  >  > opinions to me.
> >  >  > > >  >  >  > The distinction is yours to draw............
> >  >  > > >  >  >  >
> >  >  > > >  >  >
> >  >  > > >  >  >
> >  >  > > >  >  >
> >  >  > > >  >  >
> >  >  > > >  >  >
> >  >  > > >
> >  >
> ______________________________________________________________________
> >  > ______________  > > >  >  >  Be a better friend, newshound, and  > >
> >
> >  > >  >  know-it-all with Yahoo! Mobile.  Try it now.
> >  >  > > > http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ
> >  >  > > >  >  >
> >  >  > > >  >  >
> >  >  > > >
> >  > ---------------------------------------------------------------------
> >  >  > > >  >  >  To unsubscribe, e-mail:
> >  > [hidden email]
> >  >  > > >  >  >  For additional commands, e-mail:
> >  >  > > > [hidden email]  > > >  >  >  > > >  >  >  >
> >
> >  > >  >  > > >  >  > > >  >  > > >  > --  > > >  >  > > >  >  -  > > >
>  >
> >  > > > >  > > >  > > >  > > > --  > > >  > > >  -  > > >  > > >  > >  >
> >
> >  > >  >
> >  > ---------------------------------------------------------------------
> >  >  > To unsubscribe, e-mail: [hidden email]
> >  >  > For additional commands, e-mail: [hidden email]
> >  > >  >
> >  >
> >  >
> >  >  --
> >  >
> >  >
> >  > --
> >  >  The facts expressed here belong to everybody, the opinions to me.
> >  >  The distinction is yours to draw............
> >  >
> >
> >
> >
> >  --
> >
> >  -
> >
> >
> >
> >  ---------------------------------------------------------------------
> >  To unsubscribe, e-mail: [hidden email]
> >  For additional commands, e-mail: [hidden email]
> >
> >
>
>
>
> --
>
> -
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>


--
--
The facts expressed here belong to everybody, the opinions to me.
The distinction is yours to draw............
Reply | Threaded
Open this post in threaded view
|

Re: Binding lucene instance/threads to a particular processor(or core)

Toke Eskildsen
In reply to this post by Anshum-2
On Tue, 2008-04-22 at 09:40 +0530, Anshum wrote:
> Any other suggestions for handling a concurrency of over 7 search requests
> per second for an index size of over 15Gigs containing over 13 million
> records?

Our index is 30GB+ with 9 million records and a machine handles an
average search in about 250 ms. This is without proper warm-up and using
conventional harddisks, simple queries and simple processing of hits
(extraction of a field from the first 20 hits).

How often do you change your index and do you perform any kind of
warm-up? What is a typical query, how many hits does it return and what
do you do with the hits?

> Also, could someone help me with obtaining a 'index size' - 'concurrency' -
> 'processor power' - 'memory' relationship formula (or something similar)?

I doubt that you'll find a formula as such, as there are so many
variables influencing performance.


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Binding lucene instance/threads to a particular processor(or core)

Glen Newton
In reply to this post by Anshum-2
Hi Anshum,

2008/4/23 Anshum <[hidden email]>:
> Hi Glen,
>
>  As far as stats for index/search are concerned, here they are:
>  * Yes, it is a web based application
>  * I am currently facing issues when the number of concurrent searches goes
>  high. The search is not able to handle over 2.5 searches per second.
What is the OS, hardware etc you are using?

I am assuming you are only opening one instance of IndexSearcher for
all of your threads/queries (I would do it in the servlet init()
method) to use, as indicated in
http://lucene.apache.org/java/2_3_1/api/org/apache/lucene/search/IndexSearcher.html

>  * JVM command line parameters: -server mode; Max heap size 2GB (As it is a
>  32 bit machine/OS and that is the limit in the case)
What JVM/version?

Have you tried running a profiler to see what is happening or even
jconsole (add  -Dcom.sun.management.jmxremote to your command line
parameters and (in Linux at least) run jconsole after the JVM has
started.

A general suggestion:
 -XX:+AggressiveOpts -XX:+ScavengeBeforeFullGC   -Xms2000m
-XX:+UseConcMarkSweepGC

Although the garbage collection depends on your configuration.

>  * Performance expectations :
>   * Mean search time :1-2 seconds (though that is a lot considering that I
>  have tried other engines, though with smaller indexes and they seem to have
>  sub 0.01s searching time)

Yes, I am also surprised by this poor performance.

>  * No, I do not store any fields.
>  * Yes, I am using multiple machines ( more than a couple ).serving the same
>  index; a little more than just a vanilla load balancing mechanism; sending
>  similar requests to the same server to better OS level caching
>  * We do not index and search on the same machine, so in that case, I would
>  not think that changing the priority of these threads would matter. In other
>  words, there is nothing but search that happens on the search server. If
>  that is what you meant when you wrote about increasing the thead priority.

Your servlet environment has many threads that have to do a lot of
things before they get around to the part of your code that does the
searching. Once they get to the search part, you want them to have
higher priority than the threads that are handling requests much
further back in the servlet pipeline. So set the thread priority
higher when it gets to your code. Of course you would need to test to
see the impact on performance in your particular case.

>
>  I had already started trying the TPE and  guess it would provide a lot of
>  improvement but 'm looking for other ways as well ! :)

-glen


>  --
>  Anshum
>
>
>
>  On Tue, Apr 22, 2008 at 11:19 PM, Glen Newton <[hidden email]> wrote:
>
>  > So even if you only have one index, this is the way to go to manage
>  > this kind of problem.
>  >
>  > Looking at the implementation and having used ThreadPoolExecutor (TPE)
>  > a lot, I would make the following suggestions for this class so as to
>  > better support this particular use case:
>  > Better access to the configuration of the TPE is needed: the ability
>  > to choose the number of threads (pools size and max pool size), the
>  > type and nature of the queue, etc. Also, the default behaviour of TPE
>  > is to throw an exception when a job is submitted and the queue is
>  > full: throwing an exception is expensive, especially when dozens or
>  > hundreds of searches are being rejected and many exceptions are
>  > occurring in a high load situation. Instead, being able to set the
>  > RejectedExecutionHandler on the TPE would allow for a graceful
>  > handling of rejected queries (feedback to user application, etc).
>  > Also, TPE allows for a custom ThreadFactory, which I use to produce
>  > threads with the highest priority to do the searching.
>  >
>  > Right now the implementation sets the #threads and queue size as a
>  > function of the number of Searchables, which is reasonable for this
>  > use case. But it would generalize better if these were a function of
>  > the number of cores on the machine, or some combination.
>  >
>  > That said, I would suggest having adding the ability to set the
>  > ThreadPoolExecutor completely, with a getter/setter. This would allow
>  > this class to be useful beyond the use case of multiple indexes,
>  > becoming more generalizable to a number of use cases including
>  > allowing it to support the use case of one (or more indexes) and a
>  > high work load of queries that need to be managed. It could use the
>  > same defaults if the TPE is not set externally (is null).
>  >
>  > -Glen
>  >
>  > 2008/4/22 Renaud Waldura <[hidden email]>:
>  > > > one solution is to set-up a ThreadPoolExecutor[2] with a fixed
>  > >  > number of threads and a limited queue size (use a bound
>  > BlockingQueue[3])
>  > >
>  > >  Yes, this is precisely how the ConcurrentMultiSearcher works.
>  > >  https://issues.apache.org/jira/browse/LUCENE-423
>  > >
>  > >
>  > >
>  > >
>  > >  -----Original Message-----
>  > >  From: Glen Newton [mailto:[hidden email]]
>  > >  Sent: Tuesday, April 22, 2008 5:40 AM
>  > >  To: [hidden email]
>  > >  Subject: Re: Binding lucene instance/threads to a particular
>  > processor(or
>  > >  core)
>  > >
>  > >
>  > >
>  > > Anshun,
>  > >
>  > >  I think I am dealing with an index of similar scale: 6.4 million
>  > records, 83
>  > >  GB index (see [1] for more info)
>  > >
>  > >  I mistakenly thought from your original posting that you were
>  > interested in
>  > >  binding threads to processors for indexing, but it is sounding like you
>  > want
>  > >  to do this for searching. I am not sure if this would work well for
>  > >  searching, as there is a great deal more ephemeral state. But I am not
>  > sure.
>  > >
>  > >  With respect to handling concurrency for search, could you describe
>  > your
>  > >  environment better?
>  > >  - Is it an web-base application?
>  > >  - What sort of problems do you have now?
>  > >  - What are your java command line parameters (heap, etc.)
>  > >  - What are the performance expectations, i.e. average search < N1
>  > seconds;
>  > >  median search time < N2 seconds
>  > >  - Are you storing any fields and if so, do you need to store so many?
>  > >  - Do you have the choice of serving searches from multiple machines?
>  > >  i.e. load balance across >1 machine; We've found this to be one of the
>  > best
>  > >  solutions for scaling;
>  > >  - One thing that I do for both indexing and searching is that the
>  > threads
>  > >  that are doing these tasks I always shift their priority to MAX, so
>  > that
>  > >  they are run in preference to threads doing other things, like
>  > preparing
>  > >  Documents, etc.
>  > >
>  > >  If it is a web-type environment, one solution is to set-up a
>  > >  ThreadPoolExecutor[2] with a fixed number of threads and a limited
>  > queue
>  > >  size (use a bound BlockingQueue[3]) . You would have to experiment with
>  > the
>  > >  numbers to get the sweet-spot for your situation.
>  > >  I would suggest starting with 2 times the number processors (cores) and
>  > a
>  > >  queue of say 20. Requests are queued, but if the request cannot be
>  > queued,
>  > >  at least the application then knows that it is too busy and you can
>  > give the
>  > >  user a message "The system is too busy at this moment, please try again
>  > in a
>  > >  few seconds..." kind of thing. The advantage is also that when things
>  > are
>  > >  very busy, you have the accepted requests handled in a reasonable
>  > amount of
>  > >  time, and some users being told things are too busy, as opposed to all
>  > the
>  > >  requests going in and the system thrashing (and perhaps running-out of
>  > >  memory at the same time) and everyone's queries being very slow.
>  > >
>  > >  But I would suggest setting-up a test harness to emulate your
>  > production
>  > >  conditions to try these things out...
>  > >
>  > >  -Glen
>  > >
>  > >  [1]
>  > http://zzzoot.blogspot.com/2008/04/lucene-indexing-performance-benchmarks
>  > >  .html
>  > >  [2]
>  > http://java.sun.com/j2se/1.5.0/docs/api/java/util/concurrent/ThreadPoolEx
>  > >  ecutor.html
>  > >  [3]
>  > http://java.sun.com/j2se/1.5.0/docs/api/java/util/concurrent/BlockingQueu
>  > >  e.html
>  > >
>  > >
>  > >  2008/4/22 Anshum <[hidden email]>:
>  > >  > The paper seems pretty good but I am still wondering if there was a
>  > >  > way to  achieve this through the command line parameters. I'm just
>  > >  > trying this to  optimize the code, if this works, would let all know
>  > >  > else would keep  everyone informed :)  Any other suggestions for
>  > >  > handling a concurrency of over 7 search requests  per second for an
>  > >  > index size of over 15Gigs containing over 13 million  records?
>  > >  >  Also, could someone help me with obtaining a 'index size' -
>  > >  > 'concurrency' -  'processor power' - 'memory' relationship formula
>  > (or
>  > >  something similar)?
>  > >  >
>  > >  >  --
>  > >  >  Anshum
>  > >  >
>  > >  >
>  > >  >
>  > >  >
>  > >  >  On Tue, Apr 22, 2008 at 3:55 AM, Antony Bowesman <[hidden email]>
>  > >  wrote:
>  > >  >
>  > >  >  > That paper from 1997 is pretty old, but mirrors our experiences in
>  > >  > those  > days. Then, we used Solaris processor sets to really improve
>  > >  > performance by  > binding one of our processes to a particular CPU
>  > >  > while leaving the other  > CPUs to manage the thread intensive work.
>  > >  >  >
>  > >  >  > You can bind processes/LWPs to a CPU on Solaris with psrset.
>  > >  >  >
>  > >  >  > The Solaris thread model in the late '90s was also a significant
>  > >  > factor in  > performance of multi-threaded programs.  The default
>  > >  > thread library in  > Solaris 8 implemented a MxN unbound thread model
>  > >  > (threads/LWPS).  In those  > days we found that it did not perform
>  > >  > well, so used the bound thread model  > (i.e. 1:1) where a Solaris
>  > >  > thread was bound permanently to an LWP.  That  > improved performance
>  > >  > a lot.  In Solaris 8, Sun had what they called the  > 'alternate'
>  > >  > thread library (T2) around 2000, which became the default  > library
>  > >  > in Solaris 9, and implemented a 1:1 model of Solaris threads to  >
>  > LWPs.
>  > >  That new library had dramatic performance improvements over the old.
>  > >  >  >
>  > >  >  > Some background info for Java and threading  >  >
>  > >  > http://java.sun.com/j2se/1.5.0/docs/guide/vm/thread-priorities.html
>  > >  >  >
>  > >  >  > Antony
>  > >  >  >
>  > >  >  >
>  > >  >  >
>  > >  >  > Glen Newton wrote:
>  > >  >  >
>  > >  >  > > I realised that not everyone on this list might be able to
>  > access
>  > >  > the  > > IEEE paper I pointed-out, so I will include the abstract and
>  > >  > some  > > paragraphs from the paper which I have included below.
>  > >  >  > >
>  > >  >  > > Also of interest (and should be available to all): Fedorova et
>  > al.
>  > >  >  > > 2005. Performance of Multithreaded Chip Multiprocessors And  > >
>  > >  > Implications For Operating System Design. Usenix 2005.
>  > >  >  > > http://www.eecs.harvard.edu/margo/papers/usenix05/paper.pdf
>  > >  >  > > "Abstract: We investigated how operating system design should be
>  > >  > > > adapted for multithreaded chip multiprocessors (CMT) - a new  > >
>  > >  > generation of processors that exploit thread-level parallelism to
>  > mask
>  > >  > > > the memory latency in modern workloads. We  > > determined that
>  > >  > the L2 cache is a critical shared resource on CMT and  > > that an
>  > >  > insufficient amount of L2 cache can undermine the ability to  > >
>  > hide
>  > >  > memory latency on these processors. To use the L2 cache as  > >
>  > >  > efficiently as possible, we propose an L2-conscious scheduling  > >
>  > >  > algorithm and quantify its performance potential. Using this
>  > algorithm
>  > >  > > > it is possible to reduce miss ratios in the L2 cache by 25-37%
>  > and
>  > >  > > > improve processor throughput by 27-45%."
>  > >  >  > >
>  > >  >  > >
>  > >  >  > > From Lundberg, L. 1997:
>  > >  >  > > Abstract: "The default scheduling algorithm in Solaris and other
>  > >  > > > operating systems may result in frequent relocation of threads at
>  > >  > > > run-time. Excessive thread relocation cause  > > poor memory
>  > >  > performance. This can be avoided by binding threads to  > >
>  > >  > processors. However, binding threads to processors may result in an
>  >  >
>  > >  > > unbalanced load. By considering a previously obtained theoretical
>  >  >
>  > >  > > result and by evaluating a set of multithreaded Solaris  > >
>  > >  > programs using a multiprocessor with 8 processors, we are able to  >
>  > >
>  > >  > bound the maximum performance loss due to binding threads, The  > >
>  > >  > theoretical result is also recapitulated. By evaluating another set
>  > of
>  > >  > > > multithreaded programs, we show that the gain of binding threads
>  > >  > to  > > processors may be substantial, particularly for programs with
>  > >  > fine  > > grained parallelism."
>  > >  >  > >
>  > >  >  > > First paragraph: "The thread concept in Solaris [3][5] and other
>  > >  > > > operating systems makes it possible to write multithreaded
>  > >  > programs,  > > which can be executed in parallel on a multiprocessor.
>  > >  > Previous  > > experience from real world programs [4] show that,
>  > using
>  > >  > the default  > > scheduling algorithm in Solaris, threads are
>  > >  > frequently relocated from  > > one processor  > > to another at
>  > >  > run-time. After each such relocation, the code and data  > >
>  > >  > associated with the relocated thread is moved from the cache memory
>  > of
>  > >  > > > the 0113 processor to the cache of the new processor. This
>  > reduces
>  > >  > the  > > performance. Run-time relocation of threads to processors
>  > can
>  > >  > also  > > result in unpredictable response times. This is a problem
>  > in
>  > >  > systems  > > which operate in a real-time environment. In order to
>  > >  > avoid poor  > > memory performance and unpredictable real-time
>  > >  > behaviour due to  > > frequent thread relocation, threads can be
>  > bound
>  > >  > to processors using  > > the processor-bind directive [3] [5]. The
>  > >  > major problem with binding  > > threads is that one can end up with
>  > an
>  > >  > unbalanced load, i.e. some  > > processors may be extremely busy
>  > >  > during some time periods while other  > > processors are idle."
>  > >  >  > >
>  > >  >  > > -Glen
>  > >  >  > >
>  > >  >  > > On 21/04/2008, Glen Newton <[hidden email]> wrote:
>  > >  >  > >
>  > >  >  > > > And this discussion on bound threads may also shed light on
>  > things:
>  > >  >  > > >
>  > >  >  > > >
>  > >  >
>  > http://coding.derkeiler.com/Archive/Java/comp.lang.java.programmer/200
>  > >  > 7-11/msg02801.html
>  > >  >  > > >
>  > >  >  > > >
>  > >  >  > > >  -Glen
>  > >  >  > > >
>  > >  >  > > >
>  > >  >  > > >  On 21/04/2008, Glen Newton <[hidden email]> wrote:
>  > >  >  > > >  > BInding threads to processors - in many situations -
>  > >  > improves  > > >  >  throughput by reducing memory overhead. When a
>  > >  > thread is running  > > > on a  > > >  >  core, its state is local; if
>  > >  > it is timeshared-out and either 1)  > > >  >  swapped back in on the
>  > >  > same core, it is likely that there will be  > > >  the  > > >  >
>  > >  > core's L1 cache; or 2) onto another core, there will not be a  > > >
>  > >  > cache  > > >  >  hit and the state will have to be fetched from L2 or
>  > >  > main memory,  > > >  >  incurring a performance hit, esp in the
>  > >  > latter. See Lundberg, L.
>  > >  >  > > > 1997.
>  > >  >  > > >  >  Evaluating the Performance Implications of Binding Threads
>  > >  > to  > > >  >  Processors. 393.
>  > >  >  > > > http://ieeexplore.ieee.org/iel3/5020/13768/00634520.pdf
>  > >  >  > > >  >  for more info.
>  > >  >  > > >  >
>  > >  >  > > >  >  If you are using JVM on Solaris on SPARC, you should take
>  > a
>  > >  > look  > > > at  > > >  >  the following links for tuning (the Sun JVM
>  > >  > on Solaris SPARC has  > > > many  > > >  >  more performance tuning
>  > >  > parameters available), including  > > > threading:
>  > >  >  > > >  >  - http://java.sun.com/docs/hotspot/threads/threads.html
>  > >  >  > > >  >  -
>  > >  >  > > >
>  > >  > http://java.sun.com/j2se/1.5.0/docs/guide/vm/thread-priorities.html
>  > >  >  > > >  >  -
>  > >  >  > > >
>  > >  >
>  > http://www-1.ibm.com/support/docview.wss?rs=180&context=SSEQTP&uid=swg
>  > >  > 21107291  > > >  >  -
>  > >  > http://java.sun.com/javase/technologies/performance.jsp
>  > >  >  > > >  >
>  > >  >  > > >  >
>  > >  >  > > >  >  -Glen
>  > >  >  > > >  >
>  > >  >  > > >  >
>  > >  >  > > >  >
>  > >  >  > > >  >
>  > >  >  > > >  >
>  > >  >  > > >  >
>  > >  >  > > >  >
>  > >  >  > > >  >  On 21/04/2008, Ulf Dittmer <[hidden email]> wrote:
>  > >  >  > > >  >  > This sounds odd. Why would restricting it to a single  >
>  > >  > > >  >  >  core improve performance? The point of using multiple  > >
>  > >  > >  >  >  cores (and multiple threads) is to improve performance  > >
>  > >
>  > >  > >  >  isn't it? I'd leave thread scheduling decisions to the  > > >
>  >  >
>  > >  > >  JVM. Plus, I don't think there is anything in Java to  > > >  >  >
>  > >  > facilitate this (short of using JNI).
>  > >  >  > > >  >  >
>  > >  >  > > >  >  >  Are you talking about indexing or searching? You may  >
>  > >  > > >  >  >  be able to use multiple parallel threads to improve  > > >
>  > >  > >  >  indexing performance. I don't think Lucene uses  > > >  >  >
>  > >  > multi-threading for searching; not unless you have  > > >  >  >
>  > >  > multiple indices, anyway.
>  > >  >  > > >  >  >
>  > >  >  > > >  >  >  Ulf
>  > >  >  > > >  >  >
>  > >  >  > > >  >  >
>  > >  >  > > >  >  >
>  > >  >  > > >  >  >  --- Anshum <[hidden email]> wrote:
>  > >  >  > > >  >  >
>  > >  >  > > >  >  >  > Hi,
>  > >  >  > > >  >  >  >
>  > >  >  > > >  >  >  > I have been trying to bind my lucene instance (JVM -
>  > >  > > > >  >  >  > Sun Hotspot*) to a  > > >  >  >  > particular core so
>  > >  > as to improve the performance. Is  > > >  >  >  > there a way to do
>  > so
>  > >  > or  > > >  >  >  > is there support in lucene to explicitly control
>  > >  > the  > > >  >  >  > thread - processor  > > >  >  >  > linkup?
>  > >  >  > > >  >  >  >
>  > >  >  > > >  >  >  > --
>  > >  >  > > >  >  >  > --
>  > >  >  > > >  >  >  > The facts expressed here belong to everybody, the  >
>  > >
>  > >  > >  >  >  > opinions to me.
>  > >  >  > > >  >  >  > The distinction is yours to draw............
>  > >  >  > > >  >  >  >
>  > >  >  > > >  >  >
>  > >  >  > > >  >  >
>  > >  >  > > >  >  >
>  > >  >  > > >  >  >
>  > >  >  > > >  >  >
>  > >  >  > > >
>  > >  >
>  > ______________________________________________________________________
>  > >  > ______________  > > >  >  >  Be a better friend, newshound, and  > >
>  > >
>  > >  > >  >  know-it-all with Yahoo! Mobile.  Try it now.
>  > >  >  > > > http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ
>  > >  >  > > >  >  >
>  > >  >  > > >  >  >
>  > >  >  > > >
>  > >  > ---------------------------------------------------------------------
>  > >  >  > > >  >  >  To unsubscribe, e-mail:
>  > >  > [hidden email]
>  > >  >  > > >  >  >  For additional commands, e-mail:
>  > >  >  > > > [hidden email]  > > >  >  >  > > >  >  >  >
>  > >
>  > >  > >  >  > > >  >  > > >  >  > > >  > --  > > >  >  > > >  >  -  > > >
>  >  >
>  > >  > > > >  > > >  > > >  > > > --  > > >  > > >  -  > > >  > > >  > >  >
>  > >
>  > >  > >  >
>  > >  > ---------------------------------------------------------------------
>  > >  >  > To unsubscribe, e-mail: [hidden email]
>  > >  >  > For additional commands, e-mail: [hidden email]
>  > >  > >  >
>  > >  >
>  > >  >
>  > >  >  --
>  > >  >
>  > >  >
>  > >  > --
>  > >  >  The facts expressed here belong to everybody, the opinions to me.
>  > >  >  The distinction is yours to draw............
>  > >  >
>  > >
>  > >
>  > >
>  > >  --
>  > >
>  > >  -
>  > >
>  > >
>  > >
>  > >  ---------------------------------------------------------------------
>  > >  To unsubscribe, e-mail: [hidden email]
>  > >  For additional commands, e-mail: [hidden email]
>  > >
>  > >
>  >
>  >
>  >
>  > --
>  >
>  > -
>  >
>  > ---------------------------------------------------------------------
>  > To unsubscribe, e-mail: [hidden email]
>  > For additional commands, e-mail: [hidden email]
>  >
>  >
>
>
>  --
>  --
>  The facts expressed here belong to everybody, the opinions to me.
>  The distinction is yours to draw............
>



--

-

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Binding lucene instance/threads to a particular processor(or core)

Anshum-2
Hi Glen,

I am using Red Hat Enterprise Linux ES release 4, kernel : 2.6.9-55.ELsmp.
Its a 32 bit Dual processor, HT enabled machine with 12G of RAM.
The JVM would be : Java HotSpot(TM) Client VM (build 1.6.0_02-ea-b02, mixed
mode)

and Yes, I am using a single searcher instance for all searches.'
As far as increasing the thread priority is concerned, would certainly give
it a shot as well :)

I understand that there would be an improvement if I move from the Dual
Processor Single Core HT enabled machine to a Single Processor Quad Core or
a Dual Processor Dual Core machine but I still don't have any stats for
translating the upgrade to a quantitative performance improvement value.
Also, let me run the profiler and analyze the bottleneck. I have already
done it at a macro level, let me dig down a little deeper.

--
Anshum

On Thu, Apr 24, 2008 at 12:47 AM, Glen Newton <[hidden email]> wrote:

> Hi Anshum,
>
> 2008/4/23 Anshum <[hidden email]>:
> > Hi Glen,
> >
> >  As far as stats for index/search are concerned, here they are:
> >  * Yes, it is a web based application
> >  * I am currently facing issues when the number of concurrent searches
> goes
> >  high. The search is not able to handle over 2.5 searches per second.
> What is the OS, hardware etc you are using?
>
> I am assuming you are only opening one instance of IndexSearcher for
> all of your threads/queries (I would do it in the servlet init()
> method) to use, as indicated in
>
> http://lucene.apache.org/java/2_3_1/api/org/apache/lucene/search/IndexSearcher.html
>
> >  * JVM command line parameters: -server mode; Max heap size 2GB (As it
> is a
> >  32 bit machine/OS and that is the limit in the case)
> What JVM/version?
>
> Have you tried running a profiler to see what is happening or even
> jconsole (add  -Dcom.sun.management.jmxremote to your command line
> parameters and (in Linux at least) run jconsole after the JVM has
> started.
>
> A general suggestion:
>  -XX:+AggressiveOpts -XX:+ScavengeBeforeFullGC   -Xms2000m
> -XX:+UseConcMarkSweepGC
>
> Although the garbage collection depends on your configuration.
>
> >  * Performance expectations :
> >   * Mean search time :1-2 seconds (though that is a lot considering that
> I
> >  have tried other engines, though with smaller indexes and they seem to
> have
> >  sub 0.01s searching time)
>
> Yes, I am also surprised by this poor performance.
>
> >  * No, I do not store any fields.
> >  * Yes, I am using multiple machines ( more than a couple ).serving the
> same
> >  index; a little more than just a vanilla load balancing mechanism;
> sending
> >  similar requests to the same server to better OS level caching
> >  * We do not index and search on the same machine, so in that case, I
> would
> >  not think that changing the priority of these threads would matter. In
> other
> >  words, there is nothing but search that happens on the search server.
> If
> >  that is what you meant when you wrote about increasing the thead
> priority.
>
> Your servlet environment has many threads that have to do a lot of
> things before they get around to the part of your code that does the
> searching. Once they get to the search part, you want them to have
> higher priority than the threads that are handling requests much
> further back in the servlet pipeline. So set the thread priority
> higher when it gets to your code. Of course you would need to test to
> see the impact on performance in your particular case.
>
> >
> >  I had already started trying the TPE and  guess it would provide a lot
> of
> >  improvement but 'm looking for other ways as well ! :)
>
> -glen
>
>
> >  --
> >  Anshum
> >
> >
> >
> >  On Tue, Apr 22, 2008 at 11:19 PM, Glen Newton <[hidden email]>
> wrote:
> >
> >  > So even if you only have one index, this is the way to go to manage
> >  > this kind of problem.
> >  >
> >  > Looking at the implementation and having used ThreadPoolExecutor
> (TPE)
> >  > a lot, I would make the following suggestions for this class so as to
> >  > better support this particular use case:
> >  > Better access to the configuration of the TPE is needed: the ability
> >  > to choose the number of threads (pools size and max pool size), the
> >  > type and nature of the queue, etc. Also, the default behaviour of TPE
> >  > is to throw an exception when a job is submitted and the queue is
> >  > full: throwing an exception is expensive, especially when dozens or
> >  > hundreds of searches are being rejected and many exceptions are
> >  > occurring in a high load situation. Instead, being able to set the
> >  > RejectedExecutionHandler on the TPE would allow for a graceful
> >  > handling of rejected queries (feedback to user application, etc).
> >  > Also, TPE allows for a custom ThreadFactory, which I use to produce
> >  > threads with the highest priority to do the searching.
> >  >
> >  > Right now the implementation sets the #threads and queue size as a
> >  > function of the number of Searchables, which is reasonable for this
> >  > use case. But it would generalize better if these were a function of
> >  > the number of cores on the machine, or some combination.
> >  >
> >  > That said, I would suggest having adding the ability to set the
> >  > ThreadPoolExecutor completely, with a getter/setter. This would allow
> >  > this class to be useful beyond the use case of multiple indexes,
> >  > becoming more generalizable to a number of use cases including
> >  > allowing it to support the use case of one (or more indexes) and a
> >  > high work load of queries that need to be managed. It could use the
> >  > same defaults if the TPE is not set externally (is null).
> >  >
> >  > -Glen
> >  >
> >  > 2008/4/22 Renaud Waldura <[hidden email]>:
> >  > > > one solution is to set-up a ThreadPoolExecutor[2] with a fixed
> >  > >  > number of threads and a limited queue size (use a bound
> >  > BlockingQueue[3])
> >  > >
> >  > >  Yes, this is precisely how the ConcurrentMultiSearcher works.
> >  > >  https://issues.apache.org/jira/browse/LUCENE-423
> >  > >
> >  > >
> >  > >
> >  > >
> >  > >  -----Original Message-----
> >  > >  From: Glen Newton [mailto:[hidden email]]
> >  > >  Sent: Tuesday, April 22, 2008 5:40 AM
> >  > >  To: [hidden email]
> >  > >  Subject: Re: Binding lucene instance/threads to a particular
> >  > processor(or
> >  > >  core)
> >  > >
> >  > >
> >  > >
> >  > > Anshun,
> >  > >
> >  > >  I think I am dealing with an index of similar scale: 6.4 million
> >  > records, 83
> >  > >  GB index (see [1] for more info)
> >  > >
> >  > >  I mistakenly thought from your original posting that you were
> >  > interested in
> >  > >  binding threads to processors for indexing, but it is sounding
> like you
> >  > want
> >  > >  to do this for searching. I am not sure if this would work well
> for
> >  > >  searching, as there is a great deal more ephemeral state. But I am
> not
> >  > sure.
> >  > >
> >  > >  With respect to handling concurrency for search, could you
> describe
> >  > your
> >  > >  environment better?
> >  > >  - Is it an web-base application?
> >  > >  - What sort of problems do you have now?
> >  > >  - What are your java command line parameters (heap, etc.)
> >  > >  - What are the performance expectations, i.e. average search < N1
> >  > seconds;
> >  > >  median search time < N2 seconds
> >  > >  - Are you storing any fields and if so, do you need to store so
> many?
> >  > >  - Do you have the choice of serving searches from multiple
> machines?
> >  > >  i.e. load balance across >1 machine; We've found this to be one of
> the
> >  > best
> >  > >  solutions for scaling;
> >  > >  - One thing that I do for both indexing and searching is that the
> >  > threads
> >  > >  that are doing these tasks I always shift their priority to MAX,
> so
> >  > that
> >  > >  they are run in preference to threads doing other things, like
> >  > preparing
> >  > >  Documents, etc.
> >  > >
> >  > >  If it is a web-type environment, one solution is to set-up a
> >  > >  ThreadPoolExecutor[2] with a fixed number of threads and a limited
> >  > queue
> >  > >  size (use a bound BlockingQueue[3]) . You would have to experiment
> with
> >  > the
> >  > >  numbers to get the sweet-spot for your situation.
> >  > >  I would suggest starting with 2 times the number processors
> (cores) and
> >  > a
> >  > >  queue of say 20. Requests are queued, but if the request cannot be
> >  > queued,
> >  > >  at least the application then knows that it is too busy and you
> can
> >  > give the
> >  > >  user a message "The system is too busy at this moment, please try
> again
> >  > in a
> >  > >  few seconds..." kind of thing. The advantage is also that when
> things
> >  > are
> >  > >  very busy, you have the accepted requests handled in a reasonable
> >  > amount of
> >  > >  time, and some users being told things are too busy, as opposed to
> all
> >  > the
> >  > >  requests going in and the system thrashing (and perhaps
> running-out of
> >  > >  memory at the same time) and everyone's queries being very slow.
> >  > >
> >  > >  But I would suggest setting-up a test harness to emulate your
> >  > production
> >  > >  conditions to try these things out...
> >  > >
> >  > >  -Glen
> >  > >
> >  > >  [1]
> >  >
> http://zzzoot.blogspot.com/2008/04/lucene-indexing-performance-benchmarks
> >  > >  .html
> >  > >  [2]
> >  >
> http://java.sun.com/j2se/1.5.0/docs/api/java/util/concurrent/ThreadPoolEx
> >  > >  ecutor.html
> >  > >  [3]
> >  >
> http://java.sun.com/j2se/1.5.0/docs/api/java/util/concurrent/BlockingQueu
> >  > >  e.html
> >  > >
> >  > >
> >  > >  2008/4/22 Anshum <[hidden email]>:
> >  > >  > The paper seems pretty good but I am still wondering if there
> was a
> >  > >  > way to  achieve this through the command line parameters. I'm
> just
> >  > >  > trying this to  optimize the code, if this works, would let all
> know
> >  > >  > else would keep  everyone informed :)  Any other suggestions for
> >  > >  > handling a concurrency of over 7 search requests  per second for
> an
> >  > >  > index size of over 15Gigs containing over 13 million  records?
> >  > >  >  Also, could someone help me with obtaining a 'index size' -
> >  > >  > 'concurrency' -  'processor power' - 'memory' relationship
> formula
> >  > (or
> >  > >  something similar)?
> >  > >  >
> >  > >  >  --
> >  > >  >  Anshum
> >  > >  >
> >  > >  >
> >  > >  >
> >  > >  >
> >  > >  >  On Tue, Apr 22, 2008 at 3:55 AM, Antony Bowesman <
> [hidden email]>
> >  > >  wrote:
> >  > >  >
> >  > >  >  > That paper from 1997 is pretty old, but mirrors our
> experiences in
> >  > >  > those  > days. Then, we used Solaris processor sets to really
> improve
> >  > >  > performance by  > binding one of our processes to a particular
> CPU
> >  > >  > while leaving the other  > CPUs to manage the thread intensive
> work.
> >  > >  >  >
> >  > >  >  > You can bind processes/LWPs to a CPU on Solaris with psrset.
> >  > >  >  >
> >  > >  >  > The Solaris thread model in the late '90s was also a
> significant
> >  > >  > factor in  > performance of multi-threaded programs.  The
> default
> >  > >  > thread library in  > Solaris 8 implemented a MxN unbound thread
> model
> >  > >  > (threads/LWPS).  In those  > days we found that it did not
> perform
> >  > >  > well, so used the bound thread model  > (i.e. 1:1) where a
> Solaris
> >  > >  > thread was bound permanently to an LWP.  That  > improved
> performance
> >  > >  > a lot.  In Solaris 8, Sun had what they called the  >
> 'alternate'
> >  > >  > thread library (T2) around 2000, which became the default  >
> library
> >  > >  > in Solaris 9, and implemented a 1:1 model of Solaris threads to
>  >
> >  > LWPs.
> >  > >  That new library had dramatic performance improvements over the
> old.
> >  > >  >  >
> >  > >  >  > Some background info for Java and threading  >  >
> >  > >  >
> http://java.sun.com/j2se/1.5.0/docs/guide/vm/thread-priorities.html
> >  > >  >  >
> >  > >  >  > Antony
> >  > >  >  >
> >  > >  >  >
> >  > >  >  >
> >  > >  >  > Glen Newton wrote:
> >  > >  >  >
> >  > >  >  > > I realised that not everyone on this list might be able to
> >  > access
> >  > >  > the  > > IEEE paper I pointed-out, so I will include the
> abstract and
> >  > >  > some  > > paragraphs from the paper which I have included below.
> >  > >  >  > >
> >  > >  >  > > Also of interest (and should be available to all): Fedorova
> et
> >  > al.
> >  > >  >  > > 2005. Performance of Multithreaded Chip Multiprocessors And
>  > >
> >  > >  > Implications For Operating System Design. Usenix 2005.
> >  > >  >  > > http://www.eecs.harvard.edu/margo/papers/usenix05/paper.pdf
> >  > >  >  > > "Abstract: We investigated how operating system design
> should be
> >  > >  > > > adapted for multithreaded chip multiprocessors (CMT) - a new
>  > >
> >  > >  > generation of processors that exploit thread-level parallelism
> to
> >  > mask
> >  > >  > > > the memory latency in modern workloads. We  > > determined
> that
> >  > >  > the L2 cache is a critical shared resource on CMT and  > > that
> an
> >  > >  > insufficient amount of L2 cache can undermine the ability to  >
> >
> >  > hide
> >  > >  > memory latency on these processors. To use the L2 cache as  > >
> >  > >  > efficiently as possible, we propose an L2-conscious scheduling
>  > >
> >  > >  > algorithm and quantify its performance potential. Using this
> >  > algorithm
> >  > >  > > > it is possible to reduce miss ratios in the L2 cache by
> 25-37%
> >  > and
> >  > >  > > > improve processor throughput by 27-45%."
> >  > >  >  > >
> >  > >  >  > >
> >  > >  >  > > From Lundberg, L. 1997:
> >  > >  >  > > Abstract: "The default scheduling algorithm in Solaris and
> other
> >  > >  > > > operating systems may result in frequent relocation of
> threads at
> >  > >  > > > run-time. Excessive thread relocation cause  > > poor memory
> >  > >  > performance. This can be avoided by binding threads to  > >
> >  > >  > processors. However, binding threads to processors may result in
> an
> >  >  >
> >  > >  > > unbalanced load. By considering a previously obtained
> theoretical
> >  >  >
> >  > >  > > result and by evaluating a set of multithreaded Solaris  > >
> >  > >  > programs using a multiprocessor with 8 processors, we are able
> to  >
> >  > >
> >  > >  > bound the maximum performance loss due to binding threads, The
>  > >
> >  > >  > theoretical result is also recapitulated. By evaluating another
> set
> >  > of
> >  > >  > > > multithreaded programs, we show that the gain of binding
> threads
> >  > >  > to  > > processors may be substantial, particularly for programs
> with
> >  > >  > fine  > > grained parallelism."
> >  > >  >  > >
> >  > >  >  > > First paragraph: "The thread concept in Solaris [3][5] and
> other
> >  > >  > > > operating systems makes it possible to write multithreaded
> >  > >  > programs,  > > which can be executed in parallel on a
> multiprocessor.
> >  > >  > Previous  > > experience from real world programs [4] show that,
> >  > using
> >  > >  > the default  > > scheduling algorithm in Solaris, threads are
> >  > >  > frequently relocated from  > > one processor  > > to another at
> >  > >  > run-time. After each such relocation, the code and data  > >
> >  > >  > associated with the relocated thread is moved from the cache
> memory
> >  > of
> >  > >  > > > the 0113 processor to the cache of the new processor. This
> >  > reduces
> >  > >  > the  > > performance. Run-time relocation of threads to
> processors
> >  > can
> >  > >  > also  > > result in unpredictable response times. This is a
> problem
> >  > in
> >  > >  > systems  > > which operate in a real-time environment. In order
> to
> >  > >  > avoid poor  > > memory performance and unpredictable real-time
> >  > >  > behaviour due to  > > frequent thread relocation, threads can be
> >  > bound
> >  > >  > to processors using  > > the processor-bind directive [3] [5].
> The
> >  > >  > major problem with binding  > > threads is that one can end up
> with
> >  > an
> >  > >  > unbalanced load, i.e. some  > > processors may be extremely busy
> >  > >  > during some time periods while other  > > processors are idle."
> >  > >  >  > >
> >  > >  >  > > -Glen
> >  > >  >  > >
> >  > >  >  > > On 21/04/2008, Glen Newton <[hidden email]> wrote:
> >  > >  >  > >
> >  > >  >  > > > And this discussion on bound threads may also shed light
> on
> >  > things:
> >  > >  >  > > >
> >  > >  >  > > >
> >  > >  >
> >  >
> http://coding.derkeiler.com/Archive/Java/comp.lang.java.programmer/200
> >  > >  > 7-11/msg02801.html
> >  > >  >  > > >
> >  > >  >  > > >
> >  > >  >  > > >  -Glen
> >  > >  >  > > >
> >  > >  >  > > >
> >  > >  >  > > >  On 21/04/2008, Glen Newton <[hidden email]>
> wrote:
> >  > >  >  > > >  > BInding threads to processors - in many situations -
> >  > >  > improves  > > >  >  throughput by reducing memory overhead. When
> a
> >  > >  > thread is running  > > > on a  > > >  >  core, its state is
> local; if
> >  > >  > it is timeshared-out and either 1)  > > >  >  swapped back in on
> the
> >  > >  > same core, it is likely that there will be  > > >  the  > > >  >
> >  > >  > core's L1 cache; or 2) onto another core, there will not be a  >
> > >
> >  > >  > cache  > > >  >  hit and the state will have to be fetched from
> L2 or
> >  > >  > main memory,  > > >  >  incurring a performance hit, esp in the
> >  > >  > latter. See Lundberg, L.
> >  > >  >  > > > 1997.
> >  > >  >  > > >  >  Evaluating the Performance Implications of Binding
> Threads
> >  > >  > to  > > >  >  Processors. 393.
> >  > >  >  > > > http://ieeexplore.ieee.org/iel3/5020/13768/00634520.pdf
> >  > >  >  > > >  >  for more info.
> >  > >  >  > > >  >
> >  > >  >  > > >  >  If you are using JVM on Solaris on SPARC, you should
> take
> >  > a
> >  > >  > look  > > > at  > > >  >  the following links for tuning (the
> Sun JVM
> >  > >  > on Solaris SPARC has  > > > many  > > >  >  more performance
> tuning
> >  > >  > parameters available), including  > > > threading:
> >  > >  >  > > >  >  -
> http://java.sun.com/docs/hotspot/threads/threads.html
> >  > >  >  > > >  >  -
> >  > >  >  > > >
> >  > >  >
> http://java.sun.com/j2se/1.5.0/docs/guide/vm/thread-priorities.html
> >  > >  >  > > >  >  -
> >  > >  >  > > >
> >  > >  >
> >  >
> http://www-1.ibm.com/support/docview.wss?rs=180&context=SSEQTP&uid=swg
> >  > >  > 21107291  > > >  >  -
> >  > >  > http://java.sun.com/javase/technologies/performance.jsp
> >  > >  >  > > >  >
> >  > >  >  > > >  >
> >  > >  >  > > >  >  -Glen
> >  > >  >  > > >  >
> >  > >  >  > > >  >
> >  > >  >  > > >  >
> >  > >  >  > > >  >
> >  > >  >  > > >  >
> >  > >  >  > > >  >
> >  > >  >  > > >  >
> >  > >  >  > > >  >  On 21/04/2008, Ulf Dittmer <[hidden email]>
> wrote:
> >  > >  >  > > >  >  > This sounds odd. Why would restricting it to a
> single  >
> >  > >  > > >  >  >  core improve performance? The point of using multiple
>  > >
> >  > >  > >  >  >  cores (and multiple threads) is to improve performance
>  > >
> >  > >
> >  > >  > >  >  isn't it? I'd leave thread scheduling decisions to the  >
> > >
> >  >  >
> >  > >  > >  JVM. Plus, I don't think there is anything in Java to  > > >
>  >  >
> >  > >  > facilitate this (short of using JNI).
> >  > >  >  > > >  >  >
> >  > >  >  > > >  >  >  Are you talking about indexing or searching? You
> may  >
> >  > >  > > >  >  >  be able to use multiple parallel threads to improve
>  > > >
> >  > >  > >  >  indexing performance. I don't think Lucene uses  > > >  >
>  >
> >  > >  > multi-threading for searching; not unless you have  > > >  >  >
> >  > >  > multiple indices, anyway.
> >  > >  >  > > >  >  >
> >  > >  >  > > >  >  >  Ulf
> >  > >  >  > > >  >  >
> >  > >  >  > > >  >  >
> >  > >  >  > > >  >  >
> >  > >  >  > > >  >  >  --- Anshum <[hidden email]> wrote:
> >  > >  >  > > >  >  >
> >  > >  >  > > >  >  >  > Hi,
> >  > >  >  > > >  >  >  >
> >  > >  >  > > >  >  >  > I have been trying to bind my lucene instance
> (JVM -
> >  > >  > > > >  >  >  > Sun Hotspot*) to a  > > >  >  >  > particular
> core so
> >  > >  > as to improve the performance. Is  > > >  >  >  > there a way to
> do
> >  > so
> >  > >  > or  > > >  >  >  > is there support in lucene to explicitly
> control
> >  > >  > the  > > >  >  >  > thread - processor  > > >  >  >  > linkup?
> >  > >  >  > > >  >  >  >
> >  > >  >  > > >  >  >  > --
> >  > >  >  > > >  >  >  > --
> >  > >  >  > > >  >  >  > The facts expressed here belong to everybody,
> the  >
> >  > >
> >  > >  > >  >  >  > opinions to me.
> >  > >  >  > > >  >  >  > The distinction is yours to draw............
> >  > >  >  > > >  >  >  >
> >  > >  >  > > >  >  >
> >  > >  >  > > >  >  >
> >  > >  >  > > >  >  >
> >  > >  >  > > >  >  >
> >  > >  >  > > >  >  >
> >  > >  >  > > >
> >  > >  >
> >  >
> ______________________________________________________________________
> >  > >  > ______________  > > >  >  >  Be a better friend, newshound, and
>  > >
> >  > >
> >  > >  > >  >  know-it-all with Yahoo! Mobile.  Try it now.
> >  > >  >  > > >
> http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ
> >  > >  >  > > >  >  >
> >  > >  >  > > >  >  >
> >  > >  >  > > >
> >  > >  >
> ---------------------------------------------------------------------
> >  > >  >  > > >  >  >  To unsubscribe, e-mail:
> >  > >  > [hidden email]
> >  > >  >  > > >  >  >  For additional commands, e-mail:
> >  > >  >  > > > [hidden email]  > > >  >  >  > > >  >
>  >  >
> >  > >
> >  > >  > >  >  > > >  >  > > >  >  > > >  > --  > > >  >  > > >  >  -  >
> > >
> >  >  >
> >  > >  > > > >  > > >  > > >  > > > --  > > >  > > >  -  > > >  > > >  >
> >  >
> >  > >
> >  > >  > >  >
> >  > >  >
> ---------------------------------------------------------------------
> >  > >  >  > To unsubscribe, e-mail:
> [hidden email]
> >  > >  >  > For additional commands, e-mail:
> [hidden email]
> >  > >  > >  >
> >  > >  >
> >  > >  >
> >  > >  >  --
> >  > >  >
> >  > >  >
> >  > >  > --
> >  > >  >  The facts expressed here belong to everybody, the opinions to
> me.
> >  > >  >  The distinction is yours to draw............
> >  > >  >
> >  > >
> >  > >
> >  > >
> >  > >  --
> >  > >
> >  > >  -
> >  > >
> >  > >
> >  > >
> >  > >
>  ---------------------------------------------------------------------
> >  > >  To unsubscribe, e-mail: [hidden email]
> >  > >  For additional commands, e-mail: [hidden email]
> >  > >
> >  > >
> >  >
> >  >
> >  >
> >  > --
> >  >
> >  > -
> >  >
> >  > ---------------------------------------------------------------------
> >  > To unsubscribe, e-mail: [hidden email]
> >  > For additional commands, e-mail: [hidden email]
> >  >
> >  >
> >
> >
> >  --
> >  --
> >  The facts expressed here belong to everybody, the opinions to me.
> >  The distinction is yours to draw............
> >
>
>
>
> --
>
> -
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>


--
--
The facts expressed here belong to everybody, the opinions to me.
The distinction is yours to draw............
Reply | Threaded
Open this post in threaded view
|

Re: Binding lucene instance/threads to a particular processor(or core)

Anshum-2
In reply to this post by Glen Newton
Hi Glen,

After starting the JVM as you suggested, the daemon (actually the JVM) crashed unexpectedly.
I didn't really understand the reason behind it from the log (attached), could you just have a look and help me deciphering the same.

Thanks,
Anshum



On Thu, Apr 24, 2008 at 12:47 AM, Glen Newton <[hidden email]> wrote:
Hi Anshum,

2008/4/23 Anshum <[hidden email]>:
> Hi Glen,
>
>  As far as stats for index/search are concerned, here they are:
>  * Yes, it is a web based application
>  * I am currently facing issues when the number of concurrent searches goes
>  high. The search is not able to handle over 2.5 searches per second.
What is the OS, hardware etc you are using?

I am assuming you are only opening one instance of IndexSearcher for
all of your threads/queries (I would do it in the servlet init()
method) to use, as indicated in
http://lucene.apache.org/java/2_3_1/api/org/apache/lucene/search/IndexSearcher.html

>  * JVM command line parameters: -server mode; Max heap size 2GB (As it is a
>  32 bit machine/OS and that is the limit in the case)
What JVM/version?

Have you tried running a profiler to see what is happening or even
jconsole (add  -Dcom.sun.management.jmxremote to your command line
parameters and (in Linux at least) run jconsole after the JVM has
started.

A general suggestion:
 -XX:+AggressiveOpts -XX:+ScavengeBeforeFullGC   -Xms2000m
-XX:+UseConcMarkSweepGC

Although the garbage collection depends on your configuration.

>  * Performance expectations :
>   * Mean search time :1-2 seconds (though that is a lot considering that I
>  have tried other engines, though with smaller indexes and they seem to have
>  sub 0.01s searching time)

Yes, I am also surprised by this poor performance.

>  * No, I do not store any fields.
>  * Yes, I am using multiple machines ( more than a couple ).serving the same
>  index; a little more than just a vanilla load balancing mechanism; sending
>  similar requests to the same server to better OS level caching
>  * We do not index and search on the same machine, so in that case, I would
>  not think that changing the priority of these threads would matter. In other
>  words, there is nothing but search that happens on the search server. If
>  that is what you meant when you wrote about increasing the thead priority.

Your servlet environment has many threads that have to do a lot of
things before they get around to the part of your code that does the
searching. Once they get to the search part, you want them to have
higher priority than the threads that are handling requests much
further back in the servlet pipeline. So set the thread priority
higher when it gets to your code. Of course you would need to test to
see the impact on performance in your particular case.

>
>  I had already started trying the TPE and  guess it would provide a lot of
>  improvement but 'm looking for other ways as well ! :)

-glen


>  --
>  Anshum
>
>
>
>  On Tue, Apr 22, 2008 at 11:19 PM, Glen Newton <[hidden email]> wrote:
>
>  > So even if you only have one index, this is the way to go to manage
>  > this kind of problem.
>  >
>  > Looking at the implementation and having used ThreadPoolExecutor (TPE)
>  > a lot, I would make the following suggestions for this class so as to
>  > better support this particular use case:
>  > Better access to the configuration of the TPE is needed: the ability
>  > to choose the number of threads (pools size and max pool size), the
>  > type and nature of the queue, etc. Also, the default behaviour of TPE
>  > is to throw an exception when a job is submitted and the queue is
>  > full: throwing an exception is expensive, especially when dozens or
>  > hundreds of searches are being rejected and many exceptions are
>  > occurring in a high load situation. Instead, being able to set the
>  > RejectedExecutionHandler on the TPE would allow for a graceful
>  > handling of rejected queries (feedback to user application, etc).
>  > Also, TPE allows for a custom ThreadFactory, which I use to produce
>  > threads with the highest priority to do the searching.
>  >
>  > Right now the implementation sets the #threads and queue size as a
>  > function of the number of Searchables, which is reasonable for this
>  > use case. But it would generalize better if these were a function of
>  > the number of cores on the machine, or some combination.
>  >
>  > That said, I would suggest having adding the ability to set the
>  > ThreadPoolExecutor completely, with a getter/setter. This would allow
>  > this class to be useful beyond the use case of multiple indexes,
>  > becoming more generalizable to a number of use cases including
>  > allowing it to support the use case of one (or more indexes) and a
>  > high work load of queries that need to be managed. It could use the
>  > same defaults if the TPE is not set externally (is null).
>  >
>  > -Glen
>  >
>  > 2008/4/22 Renaud Waldura <[hidden email]>:
>  > > > one solution is to set-up a ThreadPoolExecutor[2] with a fixed
>  > >  > number of threads and a limited queue size (use a bound
>  > BlockingQueue[3])
>  > >
>  > >  Yes, this is precisely how the ConcurrentMultiSearcher works.
>  > >  https://issues.apache.org/jira/browse/LUCENE-423
>  > >
>  > >
>  > >
>  > >
>  > >  -----Original Message-----
>  > >  From: Glen Newton [mailto:[hidden email]]
>  > >  Sent: Tuesday, April 22, 2008 5:40 AM
>  > >  To: [hidden email]
>  > >  Subject: Re: Binding lucene instance/threads to a particular
>  > processor(or
>  > >  core)
>  > >
>  > >
>  > >
>  > > Anshun,
>  > >
>  > >  I think I am dealing with an index of similar scale: 6.4 million
>  > records, 83
>  > >  GB index (see [1] for more info)
>  > >
>  > >  I mistakenly thought from your original posting that you were
>  > interested in
>  > >  binding threads to processors for indexing, but it is sounding like you
>  > want
>  > >  to do this for searching. I am not sure if this would work well for
>  > >  searching, as there is a great deal more ephemeral state. But I am not
>  > sure.
>  > >
>  > >  With respect to handling concurrency for search, could you describe
>  > your
>  > >  environment better?
>  > >  - Is it an web-base application?
>  > >  - What sort of problems do you have now?
>  > >  - What are your java command line parameters (heap, etc.)
>  > >  - What are the performance expectations, i.e. average search < N1
>  > seconds;
>  > >  median search time < N2 seconds
>  > >  - Are you storing any fields and if so, do you need to store so many?
>  > >  - Do you have the choice of serving searches from multiple machines?
>  > >  i.e. load balance across >1 machine; We've found this to be one of the
>  > best
>  > >  solutions for scaling;
>  > >  - One thing that I do for both indexing and searching is that the
>  > threads
>  > >  that are doing these tasks I always shift their priority to MAX, so
>  > that
>  > >  they are run in preference to threads doing other things, like
>  > preparing
>  > >  Documents, etc.
>  > >
>  > >  If it is a web-type environment, one solution is to set-up a
>  > >  ThreadPoolExecutor[2] with a fixed number of threads and a limited
>  > queue
>  > >  size (use a bound BlockingQueue[3]) . You would have to experiment with
>  > the
>  > >  numbers to get the sweet-spot for your situation.
>  > >  I would suggest starting with 2 times the number processors (cores) and
>  > a
>  > >  queue of say 20. Requests are queued, but if the request cannot be
>  > queued,
>  > >  at least the application then knows that it is too busy and you can
>  > give the
>  > >  user a message "The system is too busy at this moment, please try again
>  > in a
>  > >  few seconds..." kind of thing. The advantage is also that when things
>  > are
>  > >  very busy, you have the accepted requests handled in a reasonable
>  > amount of
>  > >  time, and some users being told things are too busy, as opposed to all
>  > the
>  > >  requests going in and the system thrashing (and perhaps running-out of
>  > >  memory at the same time) and everyone's queries being very slow.
>  > >
>  > >  But I would suggest setting-up a test harness to emulate your
>  > production
>  > >  conditions to try these things out...
>  > >
>  > >  -Glen
>  > >
>  > >  [1]
>  > http://zzzoot.blogspot.com/2008/04/lucene-indexing-performance-benchmarks
>  > >  .html
>  > >  [2]
>  > http://java.sun.com/j2se/1.5.0/docs/api/java/util/concurrent/ThreadPoolEx
>  > >  ecutor.html
>  > >  [3]
>  > http://java.sun.com/j2se/1.5.0/docs/api/java/util/concurrent/BlockingQueu
>  > >  e.html
>  > >
>  > >
>  > >  2008/4/22 Anshum <[hidden email]>:
>  > >  > The paper seems pretty good but I am still wondering if there was a
>  > >  > way to  achieve this through the command line parameters. I'm just
>  > >  > trying this to  optimize the code, if this works, would let all know
>  > >  > else would keep  everyone informed :)  Any other suggestions for
>  > >  > handling a concurrency of over 7 search requests  per second for an
>  > >  > index size of over 15Gigs containing over 13 million  records?
>  > >  >  Also, could someone help me with obtaining a 'index size' -
>  > >  > 'concurrency' -  'processor power' - 'memory' relationship formula
>  > (or
>  > >  something similar)?
>  > >  >
>  > >  >  --
>  > >  >  Anshum
>  > >  >
>  > >  >
>  > >  >
>  > >  >
>  > >  >  On Tue, Apr 22, 2008 at 3:55 AM, Antony Bowesman <[hidden email]>
>  > >  wrote:
>  > >  >
>  > >  >  > That paper from 1997 is pretty old, but mirrors our experiences in
>  > >  > those  > days. Then, we used Solaris processor sets to really improve
>  > >  > performance by  > binding one of our processes to a particular CPU
>  > >  > while leaving the other  > CPUs to manage the thread intensive work.
>  > >  >  >
>  > >  >  > You can bind processes/LWPs to a CPU on Solaris with psrset.
>  > >  >  >
>  > >  >  > The Solaris thread model in the late '90s was also a significant
>  > >  > factor in  > performance of multi-threaded programs.  The default
>  > >  > thread library in  > Solaris 8 implemented a MxN unbound thread model
>  > >  > (threads/LWPS).  In those  > days we found that it did not perform
>  > >  > well, so used the bound thread model  > (i.e. 1:1) where a Solaris
>  > >  > thread was bound permanently to an LWP.  That  > improved performance
>  > >  > a lot.  In Solaris 8, Sun had what they called the  > 'alternate'
>  > >  > thread library (T2) around 2000, which became the default  > library
>  > >  > in Solaris 9, and implemented a 1:1 model of Solaris threads to  >
>  > LWPs.
>  > >  That new library had dramatic performance improvements over the old.
>  > >  >  >
>  > >  >  > Some background info for Java and threading  >  >
>  > >  > http://java.sun.com/j2se/1.5.0/docs/guide/vm/thread-priorities.html
>  > >  >  >
>  > >  >  > Antony
>  > >  >  >
>  > >  >  >
>  > >  >  >
>  > >  >  > Glen Newton wrote:
>  > >  >  >
>  > >  >  > > I realised that not everyone on this list might be able to
>  > access
>  > >  > the  > > IEEE paper I pointed-out, so I will include the abstract and
>  > >  > some  > > paragraphs from the paper which I have included below.
>  > >  >  > >
>  > >  >  > > Also of interest (and should be available to all): Fedorova et
>  > al.
>  > >  >  > > 2005. Performance of Multithreaded Chip Multiprocessors And  > >
>  > >  > Implications For Operating System Design. Usenix 2005.
>  > >  >  > > http://www.eecs.harvard.edu/margo/papers/usenix05/paper.pdf
>  > >  >  > > "Abstract: We investigated how operating system design should be
>  > >  > > > adapted for multithreaded chip multiprocessors (CMT) - a new  > >
>  > >  > generation of processors that exploit thread-level parallelism to
>  > mask
>  > >  > > > the memory latency in modern workloads. We  > > determined that
>  > >  > the L2 cache is a critical shared resource on CMT and  > > that an
>  > >  > insufficient amount of L2 cache can undermine the ability to  > >
>  > hide
>  > >  > memory latency on these processors. To use the L2 cache as  > >
>  > >  > efficiently as possible, we propose an L2-conscious scheduling  > >
>  > >  > algorithm and quantify its performance potential. Using this
>  > algorithm
>  > >  > > > it is possible to reduce miss ratios in the L2 cache by 25-37%
>  > and
>  > >  > > > improve processor throughput by 27-45%."
>  > >  >  > >
>  > >  >  > >
>  > >  >  > > From Lundberg, L. 1997:
>  > >  >  > > Abstract: "The default scheduling algorithm in Solaris and other
>  > >  > > > operating systems may result in frequent relocation of threads at
>  > >  > > > run-time. Excessive thread relocation cause  > > poor memory
>  > >  > performance. This can be avoided by binding threads to  > >
>  > >  > processors. However, binding threads to processors may result in an
>  >  >
>  > >  > > unbalanced load. By considering a previously obtained theoretical
>  >  >
>  > >  > > result and by evaluating a set of multithreaded Solaris  > >
>  > >  > programs using a multiprocessor with 8 processors, we are able to  >
>  > >
>  > >  > bound the maximum performance loss due to binding threads, The  > >
>  > >  > theoretical result is also recapitulated. By evaluating another set
>  > of
>  > >  > > > multithreaded programs, we show that the gain of binding threads
>  > >  > to  > > processors may be substantial, particularly for programs with
>  > >  > fine  > > grained parallelism."
>  > >  >  > >
>  > >  >  > > First paragraph: "The thread concept in Solaris [3][5] and other
>  > >  > > > operating systems makes it possible to write multithreaded
>  > >  > programs,  > > which can be executed in parallel on a multiprocessor.
>  > >  > Previous  > > experience from real world programs [4] show that,
>  > using
>  > >  > the default  > > scheduling algorithm in Solaris, threads are
>  > >  > frequently relocated from  > > one processor  > > to another at
>  > >  > run-time. After each such relocation, the code and data  > >
>  > >  > associated with the relocated thread is moved from the cache memory
>  > of
>  > >  > > > the 0113 processor to the cache of the new processor. This
>  > reduces
>  > >  > the  > > performance. Run-time relocation of threads to processors
>  > can
>  > >  > also  > > result in unpredictable response times. This is a problem
>  > in
>  > >  > systems  > > which operate in a real-time environment. In order to
>  > >  > avoid poor  > > memory performance and unpredictable real-time
>  > >  > behaviour due to  > > frequent thread relocation, threads can be
>  > bound
>  > >  > to processors using  > > the processor-bind directive [3] [5]. The
>  > >  > major problem with binding  > > threads is that one can end up with
>  > an
>  > >  > unbalanced load, i.e. some  > > processors may be extremely busy
>  > >  > during some time periods while other  > > processors are idle."
>  > >  >  > >
>  > >  >  > > -Glen
>  > >  >  > >
>  > >  >  > > On 21/04/2008, Glen Newton <[hidden email]> wrote:
>  > >  >  > >
>  > >  >  > > > And this discussion on bound threads may also shed light on
>  > things:
>  > >  >  > > >
>  > >  >  > > >
>  > >  >
>  > http://coding.derkeiler.com/Archive/Java/comp.lang.java.programmer/200
>  > >  > 7-11/msg02801.html
>  > >  >  > > >
>  > >  >  > > >
>  > >  >  > > >  -Glen
>  > >  >  > > >
>  > >  >  > > >
>  > >  >  > > >  On 21/04/2008, Glen Newton <[hidden email]> wrote:
>  > >  >  > > >  > BInding threads to processors - in many situations -
>  > >  > improves  > > >  >  throughput by reducing memory overhead. When a
>  > >  > thread is running  > > > on a  > > >  >  core, its state is local; if
>  > >  > it is timeshared-out and either 1)  > > >  >  swapped back in on the
>  > >  > same core, it is likely that there will be  > > >  the  > > >  >
>  > >  > core's L1 cache; or 2) onto another core, there will not be a  > > >
>  > >  > cache  > > >  >  hit and the state will have to be fetched from L2 or
>  > >  > main memory,  > > >  >  incurring a performance hit, esp in the
>  > >  > latter. See Lundberg, L.
>  > >  >  > > > 1997.
>  > >  >  > > >  >  Evaluating the Performance Implications of Binding Threads
>  > >  > to  > > >  >  Processors. 393.
>  > >  >  > > > http://ieeexplore.ieee.org/iel3/5020/13768/00634520.pdf
>  > >  >  > > >  >  for more info.
>  > >  >  > > >  >
>  > >  >  > > >  >  If you are using JVM on Solaris on SPARC, you should take
>  > a
>  > >  > look  > > > at  > > >  >  the following links for tuning (the Sun JVM
>  > >  > on Solaris SPARC has  > > > many  > > >  >  more performance tuning
>  > >  > parameters available), including  > > > threading:
>  > >  >  > > >  >  - http://java.sun.com/docs/hotspot/threads/threads.html
>  > >  >  > > >  >  -
>  > >  >  > > >
>  > >  > http://java.sun.com/j2se/1.5.0/docs/guide/vm/thread-priorities.html
>  > >  >  > > >  >  -
>  > >  >  > > >
>  > >  >
>  > http://www-1.ibm.com/support/docview.wss?rs=180&context=SSEQTP&uid=swg
>  > >  > 21107291  > > >  >  -
>  > >  > http://java.sun.com/javase/technologies/performance.jsp
>  > >  >  > > >  >
>  > >  >  > > >  >
>  > >  >  > > >  >  -Glen
>  > >  >  > > >  >
>  > >  >  > > >  >
>  > >  >  > > >  >
>  > >  >  > > >  >
>  > >  >  > > >  >
>  > >  >  > > >  >
>  > >  >  > > >  >
>  > >  >  > > >  >  On 21/04/2008, Ulf Dittmer <[hidden email]> wrote:
>  > >  >  > > >  >  > This sounds odd. Why would restricting it to a single  >
>  > >  > > >  >  >  core improve performance? The point of using multiple  > >
>  > >  > >  >  >  cores (and multiple threads) is to improve performance  > >
>  > >
>  > >  > >  >  isn't it? I'd leave thread scheduling decisions to the  > > >
>  >  >
>  > >  > >  JVM. Plus, I don't think there is anything in Java to  > > >  >  >
>  > >  > facilitate this (short of using JNI).
>  > >  >  > > >  >  >
>  > >  >  > > >  >  >  Are you talking about indexing or searching? You may  >
>  > >  > > >  >  >  be able to use multiple parallel threads to improve  > > >
>  > >  > >  >  indexing performance. I don't think Lucene uses  > > >  >  >
>  > >  > multi-threading for searching; not unless you have  > > >  >  >
>  > >  > multiple indices, anyway.
>  > >  >  > > >  >  >
>  > >  >  > > >  >  >  Ulf
>  > >  >  > > >  >  >
>  > >  >  > > >  >  >
>  > >  >  > > >  >  >
>  > >  >  > > >  >  >  --- Anshum <[hidden email]> wrote:
>  > >  >  > > >  >  >
>  > >  >  > > >  >  >  > Hi,
>  > >  >  > > >  >  >  >
>  > >  >  > > >  >  >  > I have been trying to bind my lucene instance (JVM -
>  > >  > > > >  >  >  > Sun Hotspot*) to a  > > >  >  >  > particular core so
>  > >  > as to improve the performance. Is  > > >  >  >  > there a way to do
>  > so
>  > >  > or  > > >  >  >  > is there support in lucene to explicitly control
>  > >  > the  > > >  >  >  > thread - processor  > > >  >  >  > linkup?
>  > >  >  > > >  >  >  >
>  > >  >  > > >  >  >  > --
>  > >  >  > > >  >  >  > --
>  > >  >  > > >  >  >  > The facts expressed here belong to everybody, the  >
>  > >
>  > >  > >  >  >  > opinions to me.
>  > >  >  > > >  >  >  > The distinction is yours to draw............
>  > >  >  > > >  >  >  >
>  > >  >  > > >  >  >
>  > >  >  > > >  >  >
>  > >  >  > > >  >  >
>  > >  >  > > >  >  >
>  > >  >  > > >  >  >
>  > >  >  > > >
>  > >  >
>  > ______________________________________________________________________
>  > >  > ______________  > > >  >  >  Be a better friend, newshound, and  > >
>  > >
>  > >  > >  >  know-it-all with Yahoo! Mobile.  Try it now.
>  > >  >  > > > http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ
>  > >  >  > > >  >  >
>  > >  >  > > >  >  >
>  > >  >  > > >
>  > >  > ---------------------------------------------------------------------
>  > >  >  > > >  >  >  To unsubscribe, e-mail:
>  > >  > [hidden email]
>  > >  >  > > >  >  >  For additional commands, e-mail:
>  > >  >  > > > [hidden email]  > > >  >  >  > > >  >  >  >
>  > >
>  > >  > >  >  > > >  >  > > >  >  > > >  > --  > > >  >  > > >  >  -  > > >
>  >  >
>  > >  > > > >  > > >  > > >  > > > --  > > >  > > >  -  > > >  > > >  > >  >
>  > >
>  > >  > >  >
>  > >  > ---------------------------------------------------------------------
>  > >  >  > To unsubscribe, e-mail: [hidden email]
>  > >  >  > For additional commands, e-mail: [hidden email]
>  > >  > >  >
>  > >  >
>  > >  >
>  > >  >  --
>  > >  >
>  > >  >
>  > >  > --
>  > >  >  The facts expressed here belong to everybody, the opinions to me.
>  > >  >  The distinction is yours to draw............
>  > >  >
>  > >
>  > >
>  > >
>  > >  --
>  > >
>  > >  -
>  > >
>  > >
>  > >
>  > >  ---------------------------------------------------------------------
>  > >  To unsubscribe, e-mail: [hidden email]
>  > >  For additional commands, e-mail: [hidden email]
>  > >
>  > >
>  >
>  >
>  >
>  > --
>  >
>  > -
>  >
>  > ---------------------------------------------------------------------
>  > To unsubscribe, e-mail: [hidden email]
>  > For additional commands, e-mail: [hidden email]
>  >
>  >
>
>
>  --
>  --
>  The facts expressed here belong to everybody, the opinions to me.
>  The distinction is yours to draw............
>



--

-

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]




--
--
The facts expressed here belong to everybody, the opinions to me.
The distinction is yours to draw............
---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

jvm_Crash_error.log (17K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Binding lucene instance/threads to a particular processor(or core)

Glen Newton
Hi Anshum,

I looked at the log and couldn't make too much sense of it.

But I have an update to my original suggestion: try the following
command line parameters:
   -Xms1024m -XX:-UseParallelGC -XX:+ScavengeBeforeFullGC

I got rid of the "-XX:+AggressiveOpts" because perhaps these are not
always stable.
I changed the "-XX:+UseConcMarkSweepGC" to "-XX:-UseParallelGC" as
I've found it to be ~10% faster in both indexing and searching, in the
multithreaded multi-core configuration that I have (YMMV).

Let me know if the above parameters work OK.

Other possibilities:
1 - -XX:+UseBiasedLocking   See
http://java.sun.com/performance/reference/whitepapers/tuning.html#section4.2.5

2 - -XX:+UseFastAccessorMethods

3 - -XX:CompileThreshold=2

But I would suggest doing some profiling to see where the bottlenexk
is in your system, using either a commercial profiler or some the
tools included in the JDK:
http://java.sun.com/javase/6/webnotes/trouble/TSG-VM/html/tooldescr.html

thanks,

Glen

2008/4/25 Anshum <[hidden email]>:

> Hi Glen,
>
> After starting the JVM as you suggested, the daemon (actually the JVM)
> crashed unexpectedly.
> I didn't really understand the reason behind it from the log (attached),
> could you just have a look and help me deciphering the same.
>
>  Thanks,
>  Anshum
>
>
>
>
>
> On Thu, Apr 24, 2008 at 12:47 AM, Glen Newton <[hidden email]> wrote:
> > Hi Anshum,
> >
> > 2008/4/23 Anshum <[hidden email]>:
> >
> > > Hi Glen,
> > >
> > >  As far as stats for index/search are concerned, here they are:
> > >  * Yes, it is a web based application
> > >  * I am currently facing issues when the number of concurrent searches
> goes
> > >  high. The search is not able to handle over 2.5 searches per second.
> > What is the OS, hardware etc you are using?
> >
> > I am assuming you are only opening one instance of IndexSearcher for
> > all of your threads/queries (I would do it in the servlet init()
> > method) to use, as indicated in
> >
> http://lucene.apache.org/java/2_3_1/api/org/apache/lucene/search/IndexSearcher.html
> >
> >
> > >  * JVM command line parameters: -server mode; Max heap size 2GB (As it
> is a
> > >  32 bit machine/OS and that is the limit in the case)
> > What JVM/version?
> >
> > Have you tried running a profiler to see what is happening or even
> > jconsole (add  -Dcom.sun.management.jmxremote to your command line
> > parameters and (in Linux at least) run jconsole after the JVM has
> > started.
> >
> > A general suggestion:
> >  -XX:+AggressiveOpts -XX:+ScavengeBeforeFullGC   -Xms2000m
> > -XX:+UseConcMarkSweepGC
> >
> > Although the garbage collection depends on your configuration.
> >
> >
> > >  * Performance expectations :
> > >   * Mean search time :1-2 seconds (though that is a lot considering that
> I
> > >  have tried other engines, though with smaller indexes and they seem to
> have
> > >  sub 0.01s searching time)
> >
> > Yes, I am also surprised by this poor performance.
> >
> >
> > >  * No, I do not store any fields.
> > >  * Yes, I am using multiple machines ( more than a couple ).serving the
> same
> > >  index; a little more than just a vanilla load balancing mechanism;
> sending
> > >  similar requests to the same server to better OS level caching
> > >  * We do not index and search on the same machine, so in that case, I
> would
> > >  not think that changing the priority of these threads would matter. In
> other
> > >  words, there is nothing but search that happens on the search server.
> If
> > >  that is what you meant when you wrote about increasing the thead
> priority.
> >
> > Your servlet environment has many threads that have to do a lot of
> > things before they get around to the part of your code that does the
> > searching. Once they get to the search part, you want them to have
> > higher priority than the threads that are handling requests much
> > further back in the servlet pipeline. So set the thread priority
> > higher when it gets to your code. Of course you would need to test to
> > see the impact on performance in your particular case.
> >
> >
> > >
> > >  I had already started trying the TPE and  guess it would provide a lot
> of
> > >  improvement but 'm looking for other ways as well ! :)
> >
> > -glen
> >
> >
> >
> >
> >
> > >  --
> > >  Anshum
> > >
> > >
> > >
> > >  On Tue, Apr 22, 2008 at 11:19 PM, Glen Newton <[hidden email]>
> wrote:
> > >
> > >  > So even if you only have one index, this is the way to go to manage
> > >  > this kind of problem.
> > >  >
> > >  > Looking at the implementation and having used ThreadPoolExecutor
> (TPE)
> > >  > a lot, I would make the following suggestions for this class so as to
> > >  > better support this particular use case:
> > >  > Better access to the configuration of the TPE is needed: the ability
> > >  > to choose the number of threads (pools size and max pool size), the
> > >  > type and nature of the queue, etc. Also, the default behaviour of TPE
> > >  > is to throw an exception when a job is submitted and the queue is
> > >  > full: throwing an exception is expensive, especially when dozens or
> > >  > hundreds of searches are being rejected and many exceptions are
> > >  > occurring in a high load situation. Instead, being able to set the
> > >  > RejectedExecutionHandler on the TPE would allow for a graceful
> > >  > handling of rejected queries (feedback to user application, etc).
> > >  > Also, TPE allows for a custom ThreadFactory, which I use to produce
> > >  > threads with the highest priority to do the searching.
> > >  >
> > >  > Right now the implementation sets the #threads and queue size as a
> > >  > function of the number of Searchables, which is reasonable for this
> > >  > use case. But it would generalize better if these were a function of
> > >  > the number of cores on the machine, or some combination.
> > >  >
> > >  > That said, I would suggest having adding the ability to set the
> > >  > ThreadPoolExecutor completely, with a getter/setter. This would allow
> > >  > this class to be useful beyond the use case of multiple indexes,
> > >  > becoming more generalizable to a number of use cases including
> > >  > allowing it to support the use case of one (or more indexes) and a
> > >  > high work load of queries that need to be managed. It could use the
> > >  > same defaults if the TPE is not set externally (is null).
> > >  >
> > >  > -Glen
> > >  >
> > >  > 2008/4/22 Renaud Waldura <[hidden email]>:
> > >  > > > one solution is to set-up a ThreadPoolExecutor[2] with a fixed
> > >  > >  > number of threads and a limited queue size (use a bound
> > >  > BlockingQueue[3])
> > >  > >
> > >  > >  Yes, this is precisely how the ConcurrentMultiSearcher works.
> > >  > >  https://issues.apache.org/jira/browse/LUCENE-423
> > >  > >
> > >  > >
> > >  > >
> > >  > >
> > >  > >  -----Original Message-----
> > >  > >  From: Glen Newton [mailto:[hidden email]]
> > >  > >  Sent: Tuesday, April 22, 2008 5:40 AM
> > >  > >  To: [hidden email]
> > >  > >  Subject: Re: Binding lucene instance/threads to a particular
> > >  > processor(or
> > >  > >  core)
> > >  > >
> > >  > >
> > >  > >
> > >  > > Anshun,
> > >  > >
> > >  > >  I think I am dealing with an index of similar scale: 6.4 million
> > >  > records, 83
> > >  > >  GB index (see [1] for more info)
> > >  > >
> > >  > >  I mistakenly thought from your original posting that you were
> > >  > interested in
> > >  > >  binding threads to processors for indexing, but it is sounding
> like you
> > >  > want
> > >  > >  to do this for searching. I am not sure if this would work well
> for
> > >  > >  searching, as there is a great deal more ephemeral state. But I am
> not
> > >  > sure.
> > >  > >
> > >  > >  With respect to handling concurrency for search, could you
> describe
> > >  > your
> > >  > >  environment better?
> > >  > >  - Is it an web-base application?
> > >  > >  - What sort of problems do you have now?
> > >  > >  - What are your java command line parameters (heap, etc.)
> > >  > >  - What are the performance expectations, i.e. average search < N1
> > >  > seconds;
> > >  > >  median search time < N2 seconds
> > >  > >  - Are you storing any fields and if so, do you need to store so
> many?
> > >  > >  - Do you have the choice of serving searches from multiple
> machines?
> > >  > >  i.e. load balance across >1 machine; We've found this to be one of
> the
> > >  > best
> > >  > >  solutions for scaling;
> > >  > >  - One thing that I do for both indexing and searching is that the
> > >  > threads
> > >  > >  that are doing these tasks I always shift their priority to MAX,
> so
> > >  > that
> > >  > >  they are run in preference to threads doing other things, like
> > >  > preparing
> > >  > >  Documents, etc.
> > >  > >
> > >  > >  If it is a web-type environment, one solution is to set-up a
> > >  > >  ThreadPoolExecutor[2] with a fixed number of threads and a limited
> > >  > queue
> > >  > >  size (use a bound BlockingQueue[3]) . You would have to experiment
> with
> > >  > the
> > >  > >  numbers to get the sweet-spot for your situation.
> > >  > >  I would suggest starting with 2 times the number processors
> (cores) and
> > >  > a
> > >  > >  queue of say 20. Requests are queued, but if the request cannot be
> > >  > queued,
> > >  > >  at least the application then knows that it is too busy and you
> can
> > >  > give the
> > >  > >  user a message "The system is too busy at this moment, please try
> again
> > >  > in a
> > >  > >  few seconds..." kind of thing. The advantage is also that when
> things
> > >  > are
> > >  > >  very busy, you have the accepted requests handled in a reasonable
> > >  > amount of
> > >  > >  time, and some users being told things are too busy, as opposed to
> all
> > >  > the
> > >  > >  requests going in and the system thrashing (and perhaps
> running-out of
> > >  > >  memory at the same time) and everyone's queries being very slow.
> > >  > >
> > >  > >  But I would suggest setting-up a test harness to emulate your
> > >  > production
> > >  > >  conditions to try these things out...
> > >  > >
> > >  > >  -Glen
> > >  > >
> > >  > >  [1]
> > >  >
> http://zzzoot.blogspot.com/2008/04/lucene-indexing-performance-benchmarks
> > >  > >  .html
> > >  > >  [2]
> > >  >
> http://java.sun.com/j2se/1.5.0/docs/api/java/util/concurrent/ThreadPoolEx
> > >  > >  ecutor.html
> > >  > >  [3]
> > >  >
> http://java.sun.com/j2se/1.5.0/docs/api/java/util/concurrent/BlockingQueu
> > >  > >  e.html
> > >  > >
> > >  > >
> > >  > >  2008/4/22 Anshum <[hidden email]>:
> > >  > >  > The paper seems pretty good but I am still wondering if there
> was a
> > >  > >  > way to  achieve this through the command line parameters. I'm
> just
> > >  > >  > trying this to  optimize the code, if this works, would let all
> know
> > >  > >  > else would keep  everyone informed :)  Any other suggestions for
> > >  > >  > handling a concurrency of over 7 search requests  per second for
> an
> > >  > >  > index size of over 15Gigs containing over 13 million  records?
> > >  > >  >  Also, could someone help me with obtaining a 'index size' -
> > >  > >  > 'concurrency' -  'processor power' - 'memory' relationship
> formula
> > >  > (or
> > >  > >  something similar)?
> > >  > >  >
> > >  > >  >  --
> > >  > >  >  Anshum
> > >  > >  >
> > >  > >  >
> > >  > >  >
> > >  > >  >
> > >  > >  >  On Tue, Apr 22, 2008 at 3:55 AM, Antony Bowesman
> <[hidden email]>
> > >  > >  wrote:
> > >  > >  >
> > >  > >  >  > That paper from 1997 is pretty old, but mirrors our
> experiences in
> > >  > >  > those  > days. Then, we used Solaris processor sets to really
> improve
> > >  > >  > performance by  > binding one of our processes to a particular
> CPU
> > >  > >  > while leaving the other  > CPUs to manage the thread intensive
> work.
> > >  > >  >  >
> > >  > >  >  > You can bind processes/LWPs to a CPU on Solaris with psrset.
> > >  > >  >  >
> > >  > >  >  > The Solaris thread model in the late '90s was also a
> significant
> > >  > >  > factor in  > performance of multi-threaded programs.  The
> default
> > >  > >  > thread library in  > Solaris 8 implemented a MxN unbound thread
> model
> > >  > >  > (threads/LWPS).  In those  > days we found that it did not
> perform
> > >  > >  > well, so used the bound thread model  > (i.e. 1:1) where a
> Solaris
> > >  > >  > thread was bound permanently to an LWP.  That  > improved
> performance
> > >  > >  > a lot.  In Solaris 8, Sun had what they called the  >
> 'alternate'
> > >  > >  > thread library (T2) around 2000, which became the default  >
> library
> > >  > >  > in Solaris 9, and implemented a 1:1 model of Solaris threads to
> >
> > >  > LWPs.
> > >  > >  That new library had dramatic performance improvements over the
> old.
> > >  > >  >  >
> > >  > >  >  > Some background info for Java and threading  >  >
> > >  > >  >
> http://java.sun.com/j2se/1.5.0/docs/guide/vm/thread-priorities.html
> > >  > >  >  >
> > >  > >  >  > Antony
> > >  > >  >  >
> > >  > >  >  >
> > >  > >  >  >
> > >  > >  >  > Glen Newton wrote:
> > >  > >  >  >
> > >  > >  >  > > I realised that not everyone on this list might be able to
> > >  > access
> > >  > >  > the  > > IEEE paper I pointed-out, so I will include the
> abstract and
> > >  > >  > some  > > paragraphs from the paper which I have included below.
> > >  > >  >  > >
> > >  > >  >  > > Also of interest (and should be available to all): Fedorova
> et
> > >  > al.
> > >  > >  >  > > 2005. Performance of Multithreaded Chip Multiprocessors And
> > >
> > >  > >  > Implications For Operating System Design. Usenix 2005.
> > >  > >  >  > > http://www.eecs.harvard.edu/margo/papers/usenix05/paper.pdf
> > >  > >  >  > > "Abstract: We investigated how operating system design
> should be
> > >  > >  > > > adapted for multithreaded chip multiprocessors (CMT) - a new
> > >
> > >  > >  > generation of processors that exploit thread-level parallelism
> to
> > >  > mask
> > >  > >  > > > the memory latency in modern workloads. We  > > determined
> that
> > >  > >  > the L2 cache is a critical shared resource on CMT and  > > that
> an
> > >  > >  > insufficient amount of L2 cache can undermine the ability to  >
> >
> > >  > hide
> > >  > >  > memory latency on these processors. To use the L2 cache as  > >
> > >  > >  > efficiently as possible, we propose an L2-conscious scheduling
> > >
> > >  > >  > algorithm and quantify its performance potential. Using this
> > >  > algorithm
> > >  > >  > > > it is possible to reduce miss ratios in the L2 cache by
> 25-37%
> > >  > and
> > >  > >  > > > improve processor throughput by 27-45%."
> > >  > >  >  > >
> > >  > >  >  > >
> > >  > >  >  > > From Lundberg, L. 1997:
> > >  > >  >  > > Abstract: "The default scheduling algorithm in Solaris and
> other
> > >  > >  > > > operating systems may result in frequent relocation of
> threads at
> > >  > >  > > > run-time. Excessive thread relocation cause  > > poor memory
> > >  > >  > performance. This can be avoided by binding threads to  > >
> > >  > >  > processors. However, binding threads to processors may result in
> an
> > >  >  >
> > >  > >  > > unbalanced load. By considering a previously obtained
> theoretical
> > >  >  >
> > >  > >  > > result and by evaluating a set of multithreaded Solaris  > >
> > >  > >  > programs using a multiprocessor with 8 processors, we are able
> to  >
> > >  > >
> > >  > >  > bound the maximum performance loss due to binding threads, The
> > >
> > >  > >  > theoretical result is also recapitulated. By evaluating another
> set
> > >  > of
> > >  > >  > > > multithreaded programs, we show that the gain of binding
> threads
> > >  > >  > to  > > processors may be substantial, particularly for programs
> with
> > >  > >  > fine  > > grained parallelism."
> > >  > >  >  > >
> > >  > >  >  > > First paragraph: "The thread concept in Solaris [3][5] and
> other
> > >  > >  > > > operating systems makes it possible to write multithreaded
> > >  > >  > programs,  > > which can be executed in parallel on a
> multiprocessor.
> > >  > >  > Previous  > > experience from real world programs [4] show that,
> > >  > using
> > >  > >  > the default  > > scheduling algorithm in Solaris, threads are
> > >  > >  > frequently relocated from  > > one processor  > > to another at
> > >  > >  > run-time. After each such relocation, the code and data  > >
> > >  > >  > associated with the relocated thread is moved from the cache
> memory
> > >  > of
> > >  > >  > > > the 0113 processor to the cache of the new processor. This
> > >  > reduces
> > >  > >  > the  > > performance. Run-time relocation of threads to
> processors
> > >  > can
> > >  > >  > also  > > result in unpredictable response times. This is a
> problem
> > >  > in
> > >  > >  > systems  > > which operate in a real-time environment. In order
> to
> > >  > >  > avoid poor  > > memory performance and unpredictable real-time
> > >  > >  > behaviour due to  > > frequent thread relocation, threads can be
> > >  > bound
> > >  > >  > to processors using  > > the processor-bind directive [3] [5].
> The
> > >  > >  > major problem with binding  > > threads is that one can end up
> with
> > >  > an
> > >  > >  > unbalanced load, i.e. some  > > processors may be extremely busy
> > >  > >  > during some time periods while other  > > processors are idle."
> > >  > >  >  > >
> > >  > >  >  > > -Glen
> > >  > >  >  > >
> > >  > >  >  > > On 21/04/2008, Glen Newton <[hidden email]> wrote:
> > >  > >  >  > >
> > >  > >  >  > > > And this discussion on bound threads may also shed light
> on
> > >  > things:
> > >  > >  >  > > >
> > >  > >  >  > > >
> > >  > >  >
> > >  >
> http://coding.derkeiler.com/Archive/Java/comp.lang.java.programmer/200
> > >  > >  > 7-11/msg02801.html
> > >  > >  >  > > >
> > >  > >  >  > > >
> > >  > >  >  > > >  -Glen
> > >  > >  >  > > >
> > >  > >  >  > > >
> > >  > >  >  > > >  On 21/04/2008, Glen Newton <[hidden email]>
> wrote:
> > >  > >  >  > > >  > BInding threads to processors - in many situations -
> > >  > >  > improves  > > >  >  throughput by reducing memory overhead. When
> a
> > >  > >  > thread is running  > > > on a  > > >  >  core, its state is
> local; if
> > >  > >  > it is timeshared-out and either 1)  > > >  >  swapped back in on
> the
> > >  > >  > same core, it is likely that there will be  > > >  the  > > >  >
> > >  > >  > core's L1 cache; or 2) onto another core, there will not be a  >
> > >
> > >  > >  > cache  > > >  >  hit and the state will have to be fetched from
> L2 or
> > >  > >  > main memory,  > > >  >  incurring a performance hit, esp in the
> > >  > >  > latter. See Lundberg, L.
> > >  > >  >  > > > 1997.
> > >  > >  >  > > >  >  Evaluating the Performance Implications of Binding
> Threads
> > >  > >  > to  > > >  >  Processors. 393.
> > >  > >  >  > > > http://ieeexplore.ieee.org/iel3/5020/13768/00634520.pdf
> > >  > >  >  > > >  >  for more info.
> > >  > >  >  > > >  >
> > >  > >  >  > > >  >  If you are using JVM on Solaris on SPARC, you should
> take
> > >  > a
> > >  > >  > look  > > > at  > > >  >  the following links for tuning (the
> Sun JVM
> > >  > >  > on Solaris SPARC has  > > > many  > > >  >  more performance
> tuning
> > >  > >  > parameters available), including  > > > threading:
> > >  > >  >  > > >  >  -
> http://java.sun.com/docs/hotspot/threads/threads.html
> > >  > >  >  > > >  >  -
> > >  > >  >  > > >
> > >  > >  >
> http://java.sun.com/j2se/1.5.0/docs/guide/vm/thread-priorities.html
> > >  > >  >  > > >  >  -
> > >  > >  >  > > >
> > >  > >  >
> > >  >
> http://www-1.ibm.com/support/docview.wss?rs=180&context=SSEQTP&uid=swg
> > >  > >  > 21107291  > > >  >  -
> > >  > >  > http://java.sun.com/javase/technologies/performance.jsp
> > >  > >  >  > > >  >
> > >  > >  >  > > >  >
> > >  > >  >  > > >  >  -Glen
> > >  > >  >  > > >  >
> > >  > >  >  > > >  >
> > >  > >  >  > > >  >
> > >  > >  >  > > >  >
> > >  > >  >  > > >  >
> > >  > >  >  > > >  >
> > >  > >  >  > > >  >
> > >  > >  >  > > >  >  On 21/04/2008, Ulf Dittmer <[hidden email]>
> wrote:
> > >  > >  >  > > >  >  > This sounds odd. Why would restricting it to a
> single  >
> > >  > >  > > >  >  >  core improve performance? The point of using multiple
> > >
> > >  > >  > >  >  >  cores (and multiple threads) is to improve performance
> > >
> > >  > >
> > >  > >  > >  >  isn't it? I'd leave thread scheduling decisions to the  >
> > >
> > >  >  >
> > >  > >  > >  JVM. Plus, I don't think there is anything in Java to  > > >
> >  >
> > >  > >  > facilitate this (short of using JNI).
> > >  > >  >  > > >  >  >
> > >  > >  >  > > >  >  >  Are you talking about indexing or searching? You
> may  >
> > >  > >  > > >  >  >  be able to use multiple parallel threads to improve
> > > >
> > >  > >  > >  >  indexing performance. I don't think Lucene uses  > > >  >
> >
> > >  > >  > multi-threading for searching; not unless you have  > > >  >  >
> > >  > >  > multiple indices, anyway.
> > >  > >  >  > > >  >  >
> > >  > >  >  > > >  >  >  Ulf
> > >  > >  >  > > >  >  >
> > >  > >  >  > > >  >  >
> > >  > >  >  > > >  >  >
> > >  > >  >  > > >  >  >  --- Anshum <[hidden email]> wrote:
> > >  > >  >  > > >  >  >
> > >  > >  >  > > >  >  >  > Hi,
> > >  > >  >  > > >  >  >  >
> > >  > >  >  > > >  >  >  > I have been trying to bind my lucene instance
> (JVM -
> > >  > >  > > > >  >  >  > Sun Hotspot*) to a  > > >  >  >  > particular
> core so
> > >  > >  > as to improve the performance. Is  > > >  >  >  > there a way to
> do
> > >  > so
> > >  > >  > or  > > >  >  >  > is there support in lucene to explicitly
> control
> > >  > >  > the  > > >  >  >  > thread - processor  > > >  >  >  > linkup?
> > >  > >  >  > > >  >  >  >
> > >  > >  >  > > >  >  >  > --
> > >  > >  >  > > >  >  >  > --
> > >  > >  >  > > >  >  >  > The facts expressed here belong to everybody,
> the  >
> > >  > >
> > >  > >  > >  >  >  > opinions to me.
> > >  > >  >  > > >  >  >  > The distinction is yours to draw............
> > >  > >  >  > > >  >  >  >
> > >  > >  >  > > >  >  >
> > >  > >  >  > > >  >  >
> > >  > >  >  > > >  >  >
> > >  > >  >  > > >  >  >
> > >  > >  >  > > >  >  >
> > >  > >  >  > > >
> > >  > >  >
> > >  >
> ______________________________________________________________________
> > >  > >  > ______________  > > >  >  >  Be a better friend, newshound, and
> > >
> > >  > >
> > >  > >  > >  >  know-it-all with Yahoo! Mobile.  Try it now.
> > >  > >  >  > > >
> http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ
> > >  > >  >  > > >  >  >
> > >  > >  >  > > >  >  >
> > >  > >  >  > > >
> > >  > >  >
> ---------------------------------------------------------------------
> > >  > >  >  > > >  >  >  To unsubscribe, e-mail:
> > >  > >  > [hidden email]
> > >  > >  >  > > >  >  >  For additional commands, e-mail:
> > >  > >  >  > > > [hidden email]  > > >  >  >  > > >  >
> >  >
> > >  > >
> > >  > >  > >  >  > > >  >  > > >  >  > > >  > --  > > >  >  > > >  >  -  >
> > >
> > >  >  >
> > >  > >  > > > >  > > >  > > >  > > > --  > > >  > > >  -  > > >  > > >  >
> >  >
> > >  > >
> > >  > >  > >  >
> > >  > >  >
> ---------------------------------------------------------------------
> > >  > >  >  > To unsubscribe, e-mail:
> [hidden email]
> > >  > >  >  > For additional commands, e-mail:
> [hidden email]
> > >  > >  > >  >
> > >  > >  >
> > >  > >  >
> > >  > >  >  --
> > >  > >  >
> > >  > >  >
> > >  > >  > --
> > >  > >  >  The facts expressed here belong to everybody, the opinions to
> me.
> > >  > >  >  The distinction is yours to draw............
> > >  > >  >
> > >  > >
> > >  > >
> > >  > >
> > >  > >  --
> > >  > >
> > >  > >  -
> > >  > >
> > >  > >
> > >  > >
> > >  > >
> ---------------------------------------------------------------------
> > >  > >  To unsubscribe, e-mail: [hidden email]
> > >  > >  For additional commands, e-mail: [hidden email]
> > >  > >
> > >  > >
> > >  >
> > >  >
> > >  >
> > >  > --
> > >  >
> > >  > -
> > >  >
> > >  > ---------------------------------------------------------------------
> > >  > To unsubscribe, e-mail: [hidden email]
> > >  > For additional commands, e-mail: [hidden email]
> > >  >
> > >  >
> > >
> > >
> > >  --
> > >  --
> > >  The facts expressed here belong to everybody, the opinions to me.
> > >  The distinction is yours to draw............
> > >
> >
> >
> >
> > --
> >
> >
> >
> >
> > -
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: [hidden email]
> > For additional commands, e-mail: [hidden email]
> >
> >
>
>
>
> --
> --
> The facts expressed here belong to everybody, the opinions to me.
> The distinction is yours to draw............
> ---------------------------------------------------------------------
>  To unsubscribe, e-mail: [hidden email]
>  For additional commands, e-mail: [hidden email]
>



--

-

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]