Lucene Sizing Metrics

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Lucene Sizing Metrics

hurtlingturtle
Hi,
Are there any sizing metrics available for Lucene indexes?  I am unclear on how this would scale up.  I am considering what indexing technology to use to index many hundreds of terabytes of documents and email content to enable searching of that content for keywords and phrases and also ensuring that the results are security trimmed.
My concerns are around the following sizing details...
Cores
RAM
Servers
Disks
Shards

and anything else you think would be relevant :)
thanks
hurtlingturtle
Reply | Threaded
Open this post in threaded view
|

Re: Lucene Sizing Metrics

Ted Dunning
It all depends on your data and your policies.

That much data is not a good fit for a single machine, but is quite
plausible for SolrCloud.

I would recommend that you run some experiments with different trade-offs.
 It is common for a Lucene index to be a fraction of the size of the
original text which would mean that your final index would be several
terabytes which might require dozens to hundreds of instances to
effectively search and/or maintain.  The error bars on such an estimate,
however, are huge and you should test it for yourself.

On Thu, Mar 28, 2013 at 7:09 PM, hurtlingturtle <
[hidden email]> wrote:

> Hi,
> Are there any sizing metrics available for Lucene indexes?  I am unclear on
> how this would scale up.  I am considering what indexing technology to use
> to index many hundreds of terabytes of documents and email content to
> enable
> searching of that content for keywords and phrases and also ensuring that
> the results are security trimmed.
> My concerns are around the following sizing details...
> Cores
> RAM
> Servers
> Disks
> Shards
>
> and anything else you think would be relevant :)
> thanks
> hurtlingturtle
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Lucene-Sizing-Metrics-tp4052140.html
> Sent from the Lucene - General mailing list archive at Nabble.com.
>