Solr Indexing MAX FILE LIMIT

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

Solr Indexing MAX FILE LIMIT

mitra
 Hello Guys

Im using Apache solr 3.6.1 on tomcat 7 for indexing csv files using curl on windows machine

** My question is that what would be the max csv file size limit when doing a HTTP POST or while using the following curl command
curl http://localhost:8080/solr/update/csv -F "stream.file=D:\eighth.csv" -F "commit=true" -F "optimize=true" -F "encapsulate="" -F "keepEmpty=true"

** My requirement is quite large because we have to index CSV files ranging between 8 to 10 GB

** What would be the optimum settings for index parameters like commit for better perfomance on a machine with 8gb RAM

Please guide me on it

Thanks in Advance
Reply | Threaded
Open this post in threaded view
|

RE: Solr Indexing MAX FILE LIMIT

Markus Jelsma-2
Hi - instead of trying to make the system ingest such large files perhaps you can split the files in many small pieces.
 
-----Original message-----

> From:mitra <[hidden email]>
> Sent: Tue 13-Nov-2012 09:05
> To: [hidden email]
> Subject: Solr Indexing MAX FILE LIMIT
>
>  Hello Guys
>
> Im using Apache solr 3.6.1 on tomcat 7 for indexing csv files using curl on
> windows machine
>
> ** My question is that what would be the max csv file size limit when doing
> a HTTP POST or while using the following curl command
> curl http://localhost:8080/solr/update/csv -F "stream.file=D:\eighth.csv" -F
> "commit=true" -F "optimize=true" -F "encapsulate="" -F "keepEmpty=true"
>
> ** My requirement is quite large because we have to index CSV files ranging
> between 8 to 10 GB
>
> ** What would be the optimum settings for index parameters like commit for
> better perfomance on a machine with 8gb RAM
>
> Please guide me on it
>
> Thanks in Advance
>
>
>
> --
> View this message in context: http://lucene.472066.n3.nabble.com/Solr-Indexing-MAX-FILE-LIMIT-tp4019952.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
Reply | Threaded
Open this post in threaded view
|

RE: Solr Indexing MAX FILE LIMIT

mitra
Thankyou


*** I understand that the default size for HTTP POST in tomcat is 2mb can we change that somehow
       so that i dont need to split the 10gb csv into 2mb chunks

curl http://localhost:8080/solr/update/csv -F "stream.file=D:\eighth.csv" -F "commit=true" -F "optimize=true" -F "encapsulate="" -F "keepEmpty=true"

*** As I mentioned im using the above command to post rather than using this below format

curl http://localhost:8080/solr/update/csv --data-binary @eighth.csv -H 'Content-type:text/plain; charset=utf-8'

***My question Is the Limit still applicable even when not using the above data binary format also
Reply | Threaded
Open this post in threaded view
|

Re: Solr Indexing MAX FILE LIMIT

Erick Erickson
Have you considered writing a small SolrJ (or other client) program that
processed the rows in your huge file and sent them to solr in sensible
chunks? That would give you much finer control over how the file was
processed, how many docs were sent to Solr at a time, what to do with
errors. You could even run N simultaneous programs to increase throughput...

FWIW,
Erick


On Tue, Nov 13, 2012 at 3:42 AM, mitra <[hidden email]> wrote:

> Thankyou
>
>
> *** I understand that the default size for HTTP POST in tomcat is 2mb can
> we
> change that somehow
>        so that i dont need to split the 10gb csv into 2mb chunks
>
> curl http://localhost:8080/solr/update/csv -F "stream.file=D:\eighth.csv"
> -F
> "commit=true" -F "optimize=true" -F "encapsulate="" -F "keepEmpty=true"
>
> *** As I mentioned im using the above command to post rather than using
> this
> below format
>
> curl http://localhost:8080/solr/update/csv --data-binary @eighth.csv -H
> 'Content-type:text/plain; charset=utf-8'
>
> ***My question Is the Limit still applicable even when not using the above
> data binary format also
>
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Solr-Indexing-MAX-FILE-LIMIT-tp4019952p4019965.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
Reply | Threaded
Open this post in threaded view
|

Re: Solr Indexing MAX FILE LIMIT

mitra
Thank you eric

I didnt know that we could write a Java class for it , can you provide me with some info on how to

Thanks
Reply | Threaded
Open this post in threaded view
|

Re: Solr Indexing MAX FILE LIMIT

Alexandre Rafalovitch
Maybe you can start by testing this with split -l and xargs :-) These are
standard Unix toolkit approaches and since you use one of them (curl) you
may be happy to use others too.

Regards,
   Alex.

Personal blog: http://blog.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all at
once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD book)


On Wed, Nov 14, 2012 at 11:33 PM, mitra <[hidden email]> wrote:

> Thank you eric
>
> I didnt know that we could write a Java class for it , can you provide me
> with some info on how to
>
> Thanks
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Solr-Indexing-MAX-FILE-LIMIT-tp4019952p4020407.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>