solr performance

classic Classic list List threaded Threaded
23 messages Options
12
Reply | Threaded
Open this post in threaded view
|

solr performance

Jack L
Hello,

I have a question about solr's performance of accepting
inserts and indexing. If I have 10 million documents that
I'd like to index, I suppose it will take some time to
submit them to solr. Is there any faster way to do this
than through the web interface?

--
Best regards,
Jack

__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around
http://mail.yahoo.com 
Reply | Threaded
Open this post in threaded view
|

Re: solr performance

Erik Hatcher
You could build your index using Lucene directly and then point a  
Solr instance at it once its built.  My suspicion is that the  
overhead of forming a document as an XML string and posting to Solr  
via HTTP won't be that much different than indexing with Lucene  
directly.

My largest Solr index is currently at 1.4M and it takes a max of 3ms  
to add a document (according to Solr's console), most of them 1ms.  
My single threaded indexer is indexing around 1000 documents per  
minute, but I think I can get this number even faster by  
parallelizing the indexer.

I'm curious what rates others are indexing at ???

        Erik



On Feb 20, 2007, at 2:21 AM, Jack L wrote:

> Hello,
>
> I have a question about solr's performance of accepting
> inserts and indexing. If I have 10 million documents that
> I'd like to index, I suppose it will take some time to
> submit them to solr. Is there any faster way to do this
> than through the web interface?
>
> --
> Best regards,
> Jack
>
> __________________________________________________
> Do You Yahoo!?
> Tired of spam?  Yahoo! Mail has the best spam protection around
> http://mail.yahoo.com

Reply | Threaded
Open this post in threaded view
|

AW: solr performance

Burkamp, Christian
I do agree. There's probably no need to go to the index directly.
My current solr test server has more than 5M documents and a size of about 60GB.
I still index at 13 docs per second and this still includes filtering of the documents.
(If you have your content ready in XML format performance will be even better).
It seems to me that indexing performance does not drop as the index increases.
Optimizing the index although does take huge amounts of time for large indexes.

--Christian

-----Urspr√ľngliche Nachricht-----
Von: Erik Hatcher [mailto:[hidden email]]
Gesendet: Dienstag, 20. Februar 2007 11:43
An: [hidden email]
Betreff: Re: solr performance


You could build your index using Lucene directly and then point a  
Solr instance at it once its built.  My suspicion is that the  
overhead of forming a document as an XML string and posting to Solr  
via HTTP won't be that much different than indexing with Lucene  
directly.

My largest Solr index is currently at 1.4M and it takes a max of 3ms  
to add a document (according to Solr's console), most of them 1ms.  
My single threaded indexer is indexing around 1000 documents per  
minute, but I think I can get this number even faster by  
parallelizing the indexer.

I'm curious what rates others are indexing at ???

        Erik



On Feb 20, 2007, at 2:21 AM, Jack L wrote:

> Hello,
>
> I have a question about solr's performance of accepting inserts and
> indexing. If I have 10 million documents that I'd like to index, I
> suppose it will take some time to submit them to solr. Is there any
> faster way to do this than through the web interface?
>
> --
> Best regards,
> Jack
>
> __________________________________________________
> Do You Yahoo!?
> Tired of spam?  Yahoo! Mail has the best spam protection around
> http://mail.yahoo.com

Reply | Threaded
Open this post in threaded view
|

Re: AW: solr performance

Walter Underwood, Netflix
Indexing rates depend heavily on document size (text) and pre-indexing
processing. Other things probably matter, too, like number of fields.

My application is indexing 20X faster than Christian's, because I have
small documents (a few hundred bytes) that are extracted from an RDBMS
and submitted in Solr's XML format.

I am probably seeing something close to the maximum rate at 250 docs/s.
This is on a dual-CPU 3 GHz Xeon, Fedora Core 4, JDK 1.5. A fast RAID
would probably make it go faster, but that is about the only speedup
I can think of.

This has been discussed before, so check the mailing list archives.

wunder

On 2/20/07 2:58 AM, "Burkamp, Christian" <[hidden email]> wrote:

> I do agree. There's probably no need to go to the index directly.
> My current solr test server has more than 5M documents and a size of about
> 60GB.
> I still index at 13 docs per second and this still includes filtering of the
> documents.
> (If you have your content ready in XML format performance will be even
> better).
> It seems to me that indexing performance does not drop as the index increases.
> Optimizing the index although does take huge amounts of time for large
> indexes.
>
> --Christian
>
> -----Urspr√ľngliche Nachricht-----
> Von: Erik Hatcher [mailto:[hidden email]]
> Gesendet: Dienstag, 20. Februar 2007 11:43
> An: [hidden email]
> Betreff: Re: solr performance
>
>
> You could build your index using Lucene directly and then point a
> Solr instance at it once its built.  My suspicion is that the
> overhead of forming a document as an XML string and posting to Solr
> via HTTP won't be that much different than indexing with Lucene
> directly.
>
> My largest Solr index is currently at 1.4M and it takes a max of 3ms
> to add a document (according to Solr's console), most of them 1ms.
> My single threaded indexer is indexing around 1000 documents per
> minute, but I think I can get this number even faster by
> parallelizing the indexer.
>
> I'm curious what rates others are indexing at ???
>
> Erik
>
>
>
> On Feb 20, 2007, at 2:21 AM, Jack L wrote:
>
>> Hello,
>>
>> I have a question about solr's performance of accepting inserts and
>> indexing. If I have 10 million documents that I'd like to index, I
>> suppose it will take some time to submit them to solr. Is there any
>> faster way to do this than through the web interface?
>>
>> --
>> Best regards,
>> Jack
>>
>> __________________________________________________
>> Do You Yahoo!?
>> Tired of spam?  Yahoo! Mail has the best spam protection around
>> http://mail.yahoo.com
>

Reply | Threaded
Open this post in threaded view
|

Resolr performance

Jack L
In reply to this post by Erik Hatcher
Thanks to all who replied. It's encouraging :)

The numbers vary quite a bit though, from 13 docs/s (Burkamp)
to 250 docs/s (Walter) to 1000 docs/s I understand the results also depend
on the doc size and hardware.

I have a question for Erik: you mentioned "single threaded indexer"
(below). I'm not familiar with solr at all and did a search on solr
wiki for "thread" and didn't find anything. Is it so that I can
actually configure solr to be single-threaded and multi-threaded?

And I'm not sure what you meant by parallelizing the indexer?
Running multiple instances of the indexer, or multiple instances
of solr?

Thanks,

Jack

> My largest Solr index is currently at 1.4M and it takes a max of 3ms
> to add a document (according to Solr's console), most of them 1ms.
> My single threaded indexer is indexing around 1000 documents per  
> minute, but I think I can get this number even faster by  
> parallelizing the indexer.


__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around
http://mail.yahoo.com 
Reply | Threaded
Open this post in threaded view
|

Resolr performance

Chris Hostetter-3

: The numbers vary quite a bit though, from 13 docs/s (Burkamp)
: to 250 docs/s (Walter) to 1000 docs/s I understand the results also depend
: on the doc size and hardware.

It also depends a lot on how much analysis you do of each field ... and
that doesn't even begin to get totheissue of what kinds of work
you do to gather the data up and format it into XML docs to send to Solr
... i've yet to see an application where Solr is the bottleneck during,
typically we set up a producer/consumer model for indexing and the queue
is almost allways empty (a few docs might queue up during a segment merge)

: I have a question for Erik: you mentioned "single threaded indexer"
: (below). I'm not familiar with solr at all and did a search on solr
: wiki for "thread" and didn't find anything. Is it so that I can
: actually configure solr to be single-threaded and multi-threaded?
:
: And I'm not sure what you meant by parallelizing the indexer?
: Running multiple instances of the indexer, or multiple instances
: of solr?

i believe what erik was refering to was making the client code you write
which gathers up your data and submits it to solr multithreaded ... for
most cases a single threaded app that reads docs one at a time from your
authoritative data store and sends them to Solr works fine, but if
you want to speed things up, unless you find that Solr is your bottleneck,
making the process sending the updates multithreaded can probably help.



-Hoss

Reply | Threaded
Open this post in threaded view
|

Re: Resolr performance

Walter Underwood, Netflix
In reply to this post by Jack L
Try running your submits while watching a CPU load meter.
Do this on a multi-CPU machine.

If all CPUs are busy, you are running as fast as possible.

If one CPU is busy (around 50% usage on a dual-CPU system),
parallel submits might help.

If no CPU is 100% busy, the bottleneck is probably disk
or network.

wunder

On 2/20/07 10:46 AM, "Jack L" <[hidden email]> wrote:

> Thanks to all who replied. It's encouraging :)
>
> The numbers vary quite a bit though, from 13 docs/s (Burkamp)
> to 250 docs/s (Walter) to 1000 docs/s I understand the results also depend
> on the doc size and hardware.
>
> I have a question for Erik: you mentioned "single threaded indexer"
> (below). I'm not familiar with solr at all and did a search on solr
> wiki for "thread" and didn't find anything. Is it so that I can
> actually configure solr to be single-threaded and multi-threaded?
>
> And I'm not sure what you meant by parallelizing the indexer?
> Running multiple instances of the indexer, or multiple instances
> of solr?
>
> Thanks,
>
> Jack
>
>> My largest Solr index is currently at 1.4M and it takes a max of 3ms
>> to add a document (according to Solr's console), most of them 1ms.
>> My single threaded indexer is indexing around 1000 documents per
>> minute, but I think I can get this number even faster by
>> parallelizing the indexer.
>
>
> __________________________________________________
> Do You Yahoo!?
> Tired of spam?  Yahoo! Mail has the best spam protection around
> http://mail.yahoo.com 

Reply | Threaded
Open this post in threaded view
|

Re: Resolr performance

Erik Hatcher
In reply to this post by Jack L

On Feb 20, 2007, at 1:46 PM, Jack L wrote:
> The numbers vary quite a bit though, from 13 docs/s (Burkamp)
> to 250 docs/s (Walter) to 1000 docs/s I understand the results also  
> depend
> on the doc size and hardware.

my number 1000 was per minute, not second!   however, i've done a few  
runs today where i fire up Solr and run 4 of my Ruby-based indexers  
on 4 separate large files to load (of 50k documents each).  I've  
indexed (while reading e-mail, editing code, etc) 200k chunks a  
couple of times today at around 158 documents / sec.

> I have a question for Erik: you mentioned "single threaded indexer"
> (below). I'm not familiar with solr at all and did a search on solr
> wiki for "thread" and didn't find anything. Is it so that I can
> actually configure solr to be single-threaded and multi-threaded?
>
> And I'm not sure what you meant by parallelizing the indexer?
> Running multiple instances of the indexer, or multiple instances
> of solr?

Thanks to the others that clarified.  I run my indexers in  
parallel... but a single instance of Solr (which in turn handles  
requests in parallel as well).

        Erik

Reply | Threaded
Open this post in threaded view
|

Re[4]: solr performance

Jack L
Thanks for all who replied.

> my number 1000 was per minute, not second!

I can't read! :-p

> couple of times today at around 158 documents / sec.

This is not bad at all. How about search performance?
How many concurrent queries have people been having?
What does the response time look like?

> Thanks to the others that clarified.  I run my indexers in
> parallel... but a single instance of Solr (which in turn handles  
> requests in parallel as well).

Do you feel if multi-threaded posting is helpful?
I suppose when solr does indexing, it's bound more
on solr indexer than the poster?

Jack







__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around
http://mail.yahoo.com 
Reply | Threaded
Open this post in threaded view
|

Re: Re[4]: solr performance

Mike Klaas
On 2/21/07, Jack L <[hidden email]> wrote:

> > Thanks to the others that clarified.  I run my indexers in
> > parallel... but a single instance of Solr (which in turn handles
> > requests in parallel as well).
>
> Do you feel if multi-threaded posting is helpful?
> I suppose when solr does indexing, it's bound more
> on solr indexer than the poster?

It certainly is bound more on solr than the poster, but I've found
multithreading beneficial as it removes whatever latency factors might
exist--http connections, xml parsing, i/o, the poster, etc.  For us,
concurrent analysis was less of a gain, but then again our analysis is
relatively light.

-Mike
Reply | Threaded
Open this post in threaded view
|

Re: Re[4]: solr performance

Erik Hatcher
In reply to this post by Jack L

On Feb 21, 2007, at 4:25 PM, Jack L wrote:
>> couple of times today at around 158 documents / sec.
>
> This is not bad at all. How about search performance?
> How many concurrent queries have people been having?
> What does the response time look like?

I'm the only user :)   What I've done is a proof-of-concept for our  
library.  We have 3.7M records that I've indexed and faceted.  Search  
performance (in my unrealistic single user scenario) is blazing (50ms  
or so) for purely full-text queries.  For queries that return facets,  
the response times are actually quite good too (~900ms, or less  
depending on the request) - provided the filter cache is warmed and  
large enough.  This is running on my laptop (MacBook Pro, 2GB RAM,  
1.83GHz) - I'm sure on a beefier box it'll only get better.

>> Thanks to the others that clarified.  I run my indexers in
>> parallel... but a single instance of Solr (which in turn handles
>> requests in parallel as well).
>
> Do you feel if multi-threaded posting is helpful?

It depends.  If the data processing can be parallelized and your  
hardware supports it, it can certainly make a big difference... it  
did in my case.  Both CPUs were cooking during my parallel indexing  
runs.

        Erik



Reply | Threaded
Open this post in threaded view
|

Re: Re[4]: solr performance

sunnyShiny06
Hi,
I was reading this post and I wondering how can I parallelize document processing???
Thanks Erik

Erik Hatcher wrote
On Feb 21, 2007, at 4:25 PM, Jack L wrote:
>> couple of times today at around 158 documents / sec.
>
> This is not bad at all. How about search performance?
> How many concurrent queries have people been having?
> What does the response time look like?

I'm the only user :)   What I've done is a proof-of-concept for our  
library.  We have 3.7M records that I've indexed and faceted.  Search  
performance (in my unrealistic single user scenario) is blazing (50ms  
or so) for purely full-text queries.  For queries that return facets,  
the response times are actually quite good too (~900ms, or less  
depending on the request) - provided the filter cache is warmed and  
large enough.  This is running on my laptop (MacBook Pro, 2GB RAM,  
1.83GHz) - I'm sure on a beefier box it'll only get better.

>> Thanks to the others that clarified.  I run my indexers in
>> parallel... but a single instance of Solr (which in turn handles
>> requests in parallel as well).
>
> Do you feel if multi-threaded posting is helpful?

It depends.  If the data processing can be parallelized and your  
hardware supports it, it can certainly make a big difference... it  
did in my case.  Both CPUs were cooking during my parallel indexing  
runs.

        Erik


Reply | Threaded
Open this post in threaded view
|

Re: Resolr performance

sunnyShiny06
In reply to this post by Walter Underwood, Netflix
Hi,

When I check my CPU, all my CPU are not full, how can I change this ?
Do I have to change a parameter ??

Thanks a lot ,
Johanna

Walter Underwood wrote
Try running your submits while watching a CPU load meter.
Do this on a multi-CPU machine.

If all CPUs are busy, you are running as fast as possible.

If one CPU is busy (around 50% usage on a dual-CPU system),
parallel submits might help.

If no CPU is 100% busy, the bottleneck is probably disk
or network.

wunder

On 2/20/07 10:46 AM, "Jack L" <jlist9@yahoo.ca> wrote:

> Thanks to all who replied. It's encouraging :)
>
> The numbers vary quite a bit though, from 13 docs/s (Burkamp)
> to 250 docs/s (Walter) to 1000 docs/s I understand the results also depend
> on the doc size and hardware.
>
> I have a question for Erik: you mentioned "single threaded indexer"
> (below). I'm not familiar with solr at all and did a search on solr
> wiki for "thread" and didn't find anything. Is it so that I can
> actually configure solr to be single-threaded and multi-threaded?
>
> And I'm not sure what you meant by parallelizing the indexer?
> Running multiple instances of the indexer, or multiple instances
> of solr?
>
> Thanks,
>
> Jack
>
>> My largest Solr index is currently at 1.4M and it takes a max of 3ms
>> to add a document (according to Solr's console), most of them 1ms.
>> My single threaded indexer is indexing around 1000 documents per
>> minute, but I think I can get this number even faster by
>> parallelizing the indexer.
>
>
> __________________________________________________
> Do You Yahoo!?
> Tired of spam?  Yahoo! Mail has the best spam protection around
> http://mail.yahoo.com 
Reply | Threaded
Open this post in threaded view
|

Re: solr performance

Mark Miller-3
In reply to this post by sunnyShiny06
Kick off some indexing more than once - eg, post a folder of docs, and
while thats working, post another.

I've been thinking about a multi threaded UpdateProcessor as well - that
could be interesting.

- Mark

sunnyfr wrote:

> Hi,
> I was reading this post and I wondering how can I parallelize document
> processing???
> Thanks Erik
>
>
> Erik Hatcher wrote:
>  
>> On Feb 21, 2007, at 4:25 PM, Jack L wrote:
>>    
>>>> couple of times today at around 158 documents / sec.
>>>>        
>>> This is not bad at all. How about search performance?
>>> How many concurrent queries have people been having?
>>> What does the response time look like?
>>>      
>> I'm the only user :)   What I've done is a proof-of-concept for our  
>> library.  We have 3.7M records that I've indexed and faceted.  Search  
>> performance (in my unrealistic single user scenario) is blazing (50ms  
>> or so) for purely full-text queries.  For queries that return facets,  
>> the response times are actually quite good too (~900ms, or less  
>> depending on the request) - provided the filter cache is warmed and  
>> large enough.  This is running on my laptop (MacBook Pro, 2GB RAM,  
>> 1.83GHz) - I'm sure on a beefier box it'll only get better.
>>
>>    
>>>> Thanks to the others that clarified.  I run my indexers in
>>>> parallel... but a single instance of Solr (which in turn handles
>>>> requests in parallel as well).
>>>>        
>>> Do you feel if multi-threaded posting is helpful?
>>>      
>> It depends.  If the data processing can be parallelized and your  
>> hardware supports it, it can certainly make a big difference... it  
>> did in my case.  Both CPUs were cooking during my parallel indexing  
>> runs.
>>
>> Erik
>>
>>
>>
>>
>>
>>    
>
>  

Reply | Threaded
Open this post in threaded view
|

Re: solr performance

sunnyShiny06
Ok ...
Actually my problem is more multi thread which take long time ... like 3sec when 100 threads/sec.
I thought that could have helped me .. but no link actually :s
sorry

markrmiller wrote
Kick off some indexing more than once - eg, post a folder of docs, and
while thats working, post another.

I've been thinking about a multi threaded UpdateProcessor as well - that
could be interesting.

- Mark

sunnyfr wrote:
> Hi,
> I was reading this post and I wondering how can I parallelize document
> processing???
> Thanks Erik
>
>
> Erik Hatcher wrote:
>  
>> On Feb 21, 2007, at 4:25 PM, Jack L wrote:
>>    
>>>> couple of times today at around 158 documents / sec.
>>>>        
>>> This is not bad at all. How about search performance?
>>> How many concurrent queries have people been having?
>>> What does the response time look like?
>>>      
>> I'm the only user :)   What I've done is a proof-of-concept for our  
>> library.  We have 3.7M records that I've indexed and faceted.  Search  
>> performance (in my unrealistic single user scenario) is blazing (50ms  
>> or so) for purely full-text queries.  For queries that return facets,  
>> the response times are actually quite good too (~900ms, or less  
>> depending on the request) - provided the filter cache is warmed and  
>> large enough.  This is running on my laptop (MacBook Pro, 2GB RAM,  
>> 1.83GHz) - I'm sure on a beefier box it'll only get better.
>>
>>    
>>>> Thanks to the others that clarified.  I run my indexers in
>>>> parallel... but a single instance of Solr (which in turn handles
>>>> requests in parallel as well).
>>>>        
>>> Do you feel if multi-threaded posting is helpful?
>>>      
>> It depends.  If the data processing can be parallelized and your  
>> hardware supports it, it can certainly make a big difference... it  
>> did in my case.  Both CPUs were cooking during my parallel indexing  
>> runs.
>>
>> Erik
>>
>>
>>
>>
>>
>>    
>
>  
Reply | Threaded
Open this post in threaded view
|

Re: solr performance

Yonik Seeley-2
In reply to this post by Mark Miller-3
On Thu, Dec 4, 2008 at 8:39 AM, Mark Miller <[hidden email]> wrote:
> Kick off some indexing more than once - eg, post a folder of docs, and while
> thats working, post another.
>
> I've been thinking about a multi threaded UpdateProcessor as well - that
> could be interesting.

Not sure how that would work (unless you didn't want responses), but
I've thought about it from the SolrJ side - something you could
quickly add documents to and it would manage a number of threads under
the covers to maximize throughput.  Not sure what would be the best
for error handling though - perhaps just polling (allow user to ask
for failed or successful operations).

-Yonik
Reply | Threaded
Open this post in threaded view
|

Re: Resolr performance

Yonik Seeley-2
In reply to this post by sunnyShiny06
On Thu, Dec 4, 2008 at 8:36 AM, sunnyfr <[hidden email]> wrote:
> When I check my CPU, all my CPU are not full, how can I change this ?

If this is while you are indexing, then it simply means that you are
not feeding documents to Solr fast enough (use multiple threads to
send to Solr, and send multiple documents in each update request if
possible).  If CPU utilization is still low, then it means you are IO
(disk) bound... if you want to go faster, get faster disks.

-Yonik

> Do I have to change a parameter ??
>
> Thanks a lot ,
> Johanna
>
>
> Walter Underwood wrote:
>>
>> Try running your submits while watching a CPU load meter.
>> Do this on a multi-CPU machine.
>>
>> If all CPUs are busy, you are running as fast as possible.
>>
>> If one CPU is busy (around 50% usage on a dual-CPU system),
>> parallel submits might help.
>>
>> If no CPU is 100% busy, the bottleneck is probably disk
>> or network.
>>
>> wunder
>>
>> On 2/20/07 10:46 AM, "Jack L" <[hidden email]> wrote:
>>
>>> Thanks to all who replied. It's encouraging :)
>>>
>>> The numbers vary quite a bit though, from 13 docs/s (Burkamp)
>>> to 250 docs/s (Walter) to 1000 docs/s I understand the results also
>>> depend
>>> on the doc size and hardware.
>>>
>>> I have a question for Erik: you mentioned "single threaded indexer"
>>> (below). I'm not familiar with solr at all and did a search on solr
>>> wiki for "thread" and didn't find anything. Is it so that I can
>>> actually configure solr to be single-threaded and multi-threaded?
>>>
>>> And I'm not sure what you meant by parallelizing the indexer?
>>> Running multiple instances of the indexer, or multiple instances
>>> of solr?
>>>
>>> Thanks,
>>>
>>> Jack
>>>
>>>> My largest Solr index is currently at 1.4M and it takes a max of 3ms
>>>> to add a document (according to Solr's console), most of them 1ms.
>>>> My single threaded indexer is indexing around 1000 documents per
>>>> minute, but I think I can get this number even faster by
>>>> parallelizing the indexer.
>>>
>>>
>>> __________________________________________________
>>> Do You Yahoo!?
>>> Tired of spam?  Yahoo! Mail has the best spam protection around
>>> http://mail.yahoo.com
>>
>>
>>
>
> --
> View this message in context: http://www.nabble.com/solr-performance-tp9055437p20833521.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>
Reply | Threaded
Open this post in threaded view
|

Re: Resolr performance

sunnyShiny06
When I run my stress test ..sending multi thread ... around 100/sec I don't start indexation at all ...
?
maybe my cache ??? will check that

Yonik Seeley wrote
On Thu, Dec 4, 2008 at 8:36 AM, sunnyfr <johanna.34@gmail.com> wrote:
> When I check my CPU, all my CPU are not full, how can I change this ?

If this is while you are indexing, then it simply means that you are
not feeding documents to Solr fast enough (use multiple threads to
send to Solr, and send multiple documents in each update request if
possible).  If CPU utilization is still low, then it means you are IO
(disk) bound... if you want to go faster, get faster disks.

-Yonik

> Do I have to change a parameter ??
>
> Thanks a lot ,
> Johanna
>
>
> Walter Underwood wrote:
>>
>> Try running your submits while watching a CPU load meter.
>> Do this on a multi-CPU machine.
>>
>> If all CPUs are busy, you are running as fast as possible.
>>
>> If one CPU is busy (around 50% usage on a dual-CPU system),
>> parallel submits might help.
>>
>> If no CPU is 100% busy, the bottleneck is probably disk
>> or network.
>>
>> wunder
>>
>> On 2/20/07 10:46 AM, "Jack L" <jlist9@yahoo.ca> wrote:
>>
>>> Thanks to all who replied. It's encouraging :)
>>>
>>> The numbers vary quite a bit though, from 13 docs/s (Burkamp)
>>> to 250 docs/s (Walter) to 1000 docs/s I understand the results also
>>> depend
>>> on the doc size and hardware.
>>>
>>> I have a question for Erik: you mentioned "single threaded indexer"
>>> (below). I'm not familiar with solr at all and did a search on solr
>>> wiki for "thread" and didn't find anything. Is it so that I can
>>> actually configure solr to be single-threaded and multi-threaded?
>>>
>>> And I'm not sure what you meant by parallelizing the indexer?
>>> Running multiple instances of the indexer, or multiple instances
>>> of solr?
>>>
>>> Thanks,
>>>
>>> Jack
>>>
>>>> My largest Solr index is currently at 1.4M and it takes a max of 3ms
>>>> to add a document (according to Solr's console), most of them 1ms.
>>>> My single threaded indexer is indexing around 1000 documents per
>>>> minute, but I think I can get this number even faster by
>>>> parallelizing the indexer.
>>>
>>>
>>> __________________________________________________
>>> Do You Yahoo!?
>>> Tired of spam?  Yahoo! Mail has the best spam protection around
>>> http://mail.yahoo.com
>>
>>
>>
>
> --
> View this message in context: http://www.nabble.com/solr-performance-tp9055437p20833521.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>
Reply | Threaded
Open this post in threaded view
|

Re: Resolr performance

Yonik Seeley-2
On Thu, Dec 4, 2008 at 8:52 AM, sunnyfr <[hidden email]> wrote:
>
> When I run my stress test ..sending multi thread ... around 100/sec I don't
> start indexation at all ...

If you can't go higher than 100 requests / sec and the CPUs arent at
100% then the possibilities are:
- If the index is bigger than free memory the OS can use to cache,
then cache misses (at the OS level) can cause CPU to go lower - these
cache mises are most
likely to happen when retrieving stored fields for hits.
- You can also be network IO bound if you are doing requests from a
different machine.
- Internal locking contention... pretty much every system will reach a
peak number of requests/sec and then start declining as you add more
concurrent requests.

If you haven't yet, try a nightly build from December - the
index-level locking should be improved under high load for non-Windows
systems.

-Yonik


> maybe my cache ??? will check that
>
>
> Yonik Seeley wrote:
>>
>> On Thu, Dec 4, 2008 at 8:36 AM, sunnyfr <[hidden email]> wrote:
>>> When I check my CPU, all my CPU are not full, how can I change this ?
>>
>> If this is while you are indexing, then it simply means that you are
>> not feeding documents to Solr fast enough (use multiple threads to
>> send to Solr, and send multiple documents in each update request if
>> possible).  If CPU utilization is still low, then it means you are IO
>> (disk) bound... if you want to go faster, get faster disks.
>>
>> -Yonik
>>
>>> Do I have to change a parameter ??
>>>
>>> Thanks a lot ,
>>> Johanna
>>>
>>>
>>> Walter Underwood wrote:
>>>>
>>>> Try running your submits while watching a CPU load meter.
>>>> Do this on a multi-CPU machine.
>>>>
>>>> If all CPUs are busy, you are running as fast as possible.
>>>>
>>>> If one CPU is busy (around 50% usage on a dual-CPU system),
>>>> parallel submits might help.
>>>>
>>>> If no CPU is 100% busy, the bottleneck is probably disk
>>>> or network.
>>>>
>>>> wunder
>>>>
>>>> On 2/20/07 10:46 AM, "Jack L" <[hidden email]> wrote:
>>>>
>>>>> Thanks to all who replied. It's encouraging :)
>>>>>
>>>>> The numbers vary quite a bit though, from 13 docs/s (Burkamp)
>>>>> to 250 docs/s (Walter) to 1000 docs/s I understand the results also
>>>>> depend
>>>>> on the doc size and hardware.
>>>>>
>>>>> I have a question for Erik: you mentioned "single threaded indexer"
>>>>> (below). I'm not familiar with solr at all and did a search on solr
>>>>> wiki for "thread" and didn't find anything. Is it so that I can
>>>>> actually configure solr to be single-threaded and multi-threaded?
>>>>>
>>>>> And I'm not sure what you meant by parallelizing the indexer?
>>>>> Running multiple instances of the indexer, or multiple instances
>>>>> of solr?
>>>>>
>>>>> Thanks,
>>>>>
>>>>> Jack
>>>>>
>>>>>> My largest Solr index is currently at 1.4M and it takes a max of 3ms
>>>>>> to add a document (according to Solr's console), most of them 1ms.
>>>>>> My single threaded indexer is indexing around 1000 documents per
>>>>>> minute, but I think I can get this number even faster by
>>>>>> parallelizing the indexer.
>>>>>
>>>>>
>>>>> __________________________________________________
>>>>> Do You Yahoo!?
>>>>> Tired of spam?  Yahoo! Mail has the best spam protection around
>>>>> http://mail.yahoo.com
>>>>
>>>>
>>>>
>>>
>>> --
>>> View this message in context:
>>> http://www.nabble.com/solr-performance-tp9055437p20833521.html
>>> Sent from the Solr - User mailing list archive at Nabble.com.
>>>
>>>
>>
>>
>
> --
> View this message in context: http://www.nabble.com/solr-performance-tp9055437p20833790.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>
Reply | Threaded
Open this post in threaded view
|

Re: solr performance

Mark Miller-3
In reply to this post by Yonik Seeley-2
Yonik Seeley wrote:
>  
> Not sure what would be the best
> for error handling though - perhaps just polling (allow user to ask
> for failed or successful operations).
>  
Thats how I've handled similar situations in the past. Your submitting a
batch of data to be processed, and if your so inclined to see how it
went, you can inspect some kind of report object. If the batch process
blocks, you could return the report object, or if not, you could return
a batch/job id (with reports valid for x amount of time after they are
done?).

It seems like a sound enough method to me, but it would be interesting
to hear if someone has a better idea.

- Mark
12