How fast is Solr insert or am i doing something wrong?

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

How fast is Solr insert or am i doing something wrong?

AE-4
Hi:

Just want to know if this the norm or is it my configuration. I created simple file with 10 000 records, 4 field per record these are id, title, desc, link.

First I use the Solrb i.e. ruby gem library to perform insert acording to instructions and it took me about an hour and still counting.. I know ruby is slow but is it also the same in other language? What should be my target i.e. are there any benchmark on insert... is there a way to load bunch of data at one time..

Is it a good practice to do <commit> after every insert .. is this what is taking the time.. are there any general rule of thumb.

Thanks for any feedback.


 
---------------------------------

Stava rätt! Stava lätt! Yahoo! Mails stavkontroll tar hand om tryckfelen och mycket mer! Få den på http://se.mail.yahoo.com
Reply | Threaded
Open this post in threaded view
|

Re: How fast is Solr insert or am i doing something wrong?

Yonik Seeley-2
On 1/29/07, Antonio Eggberg <[hidden email]> wrote:
> Is it a good practice to do <commit> after every insert .. is this what is taking the time.. are there any general rule of thumb.

Definitely don't do a commit after every insert.  Do a single one at the end.

-Yonik
Reply | Threaded
Open this post in threaded view
|

Re: How fast is Solr insert or am i doing something wrong?

Coda Hale
SOLR-121 just got applied to the Solrb library, which allows
Solr::Connection#add to accept arrays of documents:

  connection.add([doc1, doc2, doc3])

Which means you can do something like this:

  connection.add(records.map { |r| make_solr_doc(r) })

Posting more than a single document in a request speeds things up by
quite a bit -- I've got a batch job which adds 250K+ documents to an
index in less than an hour -- about 10 fields, only the doc id stored.

On 1/29/07, Yonik Seeley <[hidden email]> wrote:
> On 1/29/07, Antonio Eggberg <[hidden email]> wrote:
> > Is it a good practice to do <commit> after every insert .. is this what is taking the time.. are there any general rule of thumb.
>
> Definitely don't do a commit after every insert.  Do a single one at the end.
>
> -Yonik
>


--
Coda Hale
http://blog.codahale.com
Reply | Threaded
Open this post in threaded view
|

SV: Re: How fast is Solr insert or am i doing something wrong?

AE-4
Thanks Coda and Yonik! for the prompt answer..

I will give Solr-121 a try.. Cool
Cheers



Coda Hale <[hidden email]> skrev: SOLR-121 just got applied to the Solrb library, which allows
Solr::Connection#add to accept arrays of documents:

  connection.add([doc1, doc2, doc3])

Which means you can do something like this:

  connection.add(records.map { |r| make_solr_doc(r) })

Posting more than a single document in a request speeds things up by
quite a bit -- I've got a batch job which adds 250K+ documents to an
index in less than an hour -- about 10 fields, only the doc id stored.

On 1/29/07, Yonik Seeley  wrote:
> On 1/29/07, Antonio Eggberg  wrote:
> > Is it a good practice to do  after every insert .. is this what is taking the time.. are there any general rule of thumb.
>
> Definitely don't do a commit after every insert.  Do a single one at the end.
>
> -Yonik
>


--
Coda Hale
http://blog.codahale.com


 
---------------------------------

Stava rätt! Stava lätt! Yahoo! Mails stavkontroll tar hand om tryckfelen och mycket mer! Få den på http://se.mail.yahoo.com
Reply | Threaded
Open this post in threaded view
|

Re: SV: Re: How fast is Solr insert or am i doing something wrong?

Erik Hatcher
Wow, I'm in awe of the uptake of solrb already!  Answers now being  
provided before I even get a chance to chime in.  And we haven't even  
published a gem yet (though I did get it building successfully on a  
nightly build server, and will get the gems published sometime soon).

I've indexed 50k runs in around 10 minutes, with one document per  
POST.  I'll surely start using the multiple per post feature soon.

        Erik


On Jan 29, 2007, at 7:37 PM, Antonio Eggberg wrote:

> Thanks Coda and Yonik! for the prompt answer..
>
> I will give Solr-121 a try.. Cool
> Cheers
>
>
>
> Coda Hale <[hidden email]> skrev: SOLR-121 just got applied to  
> the Solrb library, which allows
> Solr::Connection#add to accept arrays of documents:
>
>   connection.add([doc1, doc2, doc3])
>
> Which means you can do something like this:
>
>   connection.add(records.map { |r| make_solr_doc(r) })
>
> Posting more than a single document in a request speeds things up by
> quite a bit -- I've got a batch job which adds 250K+ documents to an
> index in less than an hour -- about 10 fields, only the doc id stored.
>
> On 1/29/07, Yonik Seeley  wrote:
>> On 1/29/07, Antonio Eggberg  wrote:
>>> Is it a good practice to do  after every insert .. is this what  
>>> is taking the time.. are there any general rule of thumb.
>>
>> Definitely don't do a commit after every insert.  Do a single one  
>> at the end.
>>
>> -Yonik
>>
>
>
> --
> Coda Hale
> http://blog.codahale.com
>
>
>  
> ---------------------------------
>
> Stava rätt! Stava lätt! Yahoo! Mails stavkontroll tar hand om  
> tryckfelen och mycket mer! Få den på http://se.mail.yahoo.com

Reply | Threaded
Open this post in threaded view
|

Re: How fast is Solr insert or am i doing something wrong?

Erik Hatcher
In reply to this post by Yonik Seeley-2

On Jan 29, 2007, at 7:08 PM, Yonik Seeley wrote:

> On 1/29/07, Antonio Eggberg <[hidden email]> wrote:
>> Is it a good practice to do <commit> after every insert .. is this  
>> what is taking the time.. are there any general rule of thumb.
>
> Definitely don't do a commit after every insert.  Do a single one  
> at the end.

For sure.  solrb has an autocommit flag that was initially  
contributed as on by default, but I turned it off by default before  
committing.  autocommit is handy for little demos (see solrb README,  
pasted below) and one-off communication, but not at all suitable for  
batch runs.

        Erik

   require 'solr'  # load the library
   include Solr    # Allow Solr:: to be omitted from class/module  
references

   # connect to the solr instance
   conn = Connection.new('http://localhost:8983/solr', :autocommit  
=> :on)

   # add a document to the index
   conn.add(:id => 123, :title_text => 'Lucene in Action')

   # update the document
   conn.update(:id => 123, :title_text => 'Solr in Action')

   # print out the first hit in a query for 'action'
   response = conn.query('action')
   print response.hits[0]

   # iterate through all the hits for 'action'
   conn.query('action') do |hit|
     puts hit.inspect
   end

   # delete document by id
   conn.delete(123)