Biggest index

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

Biggest index

spring
Hi,

I have some question about the index size on a single machine:

What is your biggest index you use in production?
Do you use MultiReader/Searcher?
What hardware do you need to serve it?
What kind of application is it?

Thank you.


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Biggest index

Otis Gospodnetic-2
Questions like these are always hard to answer well.  Actually, no, they are easy, right Erik: "It depends" ;)

Just kidding...partially.  Anyhow, you should ask a few more questions then:

- what is the response latency? (average, median, Nth percentile...)
- are stored fields involved, if so how many and how big are they?
- what kind of queries are involves (some are costlier than others)
- what is the search rate?
...


Otis

--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch

----- Original Message ----
From: "[hidden email]" <[hidden email]>
To: [hidden email]
Sent: Monday, March 10, 2008 5:06:04 PM
Subject: Biggest index

Hi,

I have some question about the index size on a single machine:

What is your biggest index you use in production?
Do you use MultiReader/Searcher?
What hardware do you need to serve it?
What kind of application is it?

Thank you.


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]





---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

RE: Biggest index

spring
Yes of course, the answers to your questions are important too.
But no anwser at all until now :(

For me I can say (not production yet):

2 ID-Fields and one content field per doc. Seach on content field only.
Simple searches like "content:foo" or "content:foo*".
1,5 GB index per 1 million docs.
About 50 million docs now.
Max. 10 million docs per year increase.

So I will have 75 GB index soon.

Can searching this index be handled by a single machine?

Thank you.

> -----Original Message-----
> From: Otis Gospodnetic [mailto:[hidden email]]
> Sent: Dienstag, 11. März 2008 20:07
> To: [hidden email]
> Subject: Re: Biggest index
>
> Questions like these are always hard to answer well.  
> Actually, no, they are easy, right Erik: "It depends" ;)
>
> Just kidding...partially.  Anyhow, you should ask a few more
> questions then:
>
> - what is the response latency? (average, median, Nth percentile...)
> - are stored fields involved, if so how many and how big are they?
> - what kind of queries are involves (some are costlier than others)
> - what is the search rate?
> ...
>
>
> Otis
>
> --
> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
>
> ----- Original Message ----
> From: "[hidden email]" <[hidden email]>
> To: [hidden email]
> Sent: Monday, March 10, 2008 5:06:04 PM
> Subject: Biggest index
>
> Hi,
>
> I have some question about the index size on a single machine:
>
> What is your biggest index you use in production?
> Do you use MultiReader/Searcher?
> What hardware do you need to serve it?
> What kind of application is it?
>
> Thank you.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Biggest index

Grant Ingersoll-2
How big is your machine and how big are your docs? (unique terms,  
etc.)   Even if it would fit, it sounds like you are going to have to  
go distributed sooner or later, so you might as well start planning  
for it.

-Grant

On Mar 14, 2008, at 8:51 AM, <[hidden email]> <[hidden email]> wrote:

> Yes of course, the answers to your questions are important too.
> But no anwser at all until now :(
>
> For me I can say (not production yet):
>
> 2 ID-Fields and one content field per doc. Seach on content field  
> only.
> Simple searches like "content:foo" or "content:foo*".
> 1,5 GB index per 1 million docs.
> About 50 million docs now.
> Max. 10 million docs per year increase.
>
> So I will have 75 GB index soon.
>
> Can searching this index be handled by a single machine?
>
> Thank you.
>
>> -----Original Message-----
>> From: Otis Gospodnetic [mailto:[hidden email]]
>> Sent: Dienstag, 11. März 2008 20:07
>> To: [hidden email]
>> Subject: Re: Biggest index
>>
>> Questions like these are always hard to answer well.
>> Actually, no, they are easy, right Erik: "It depends" ;)
>>
>> Just kidding...partially.  Anyhow, you should ask a few more
>> questions then:
>>
>> - what is the response latency? (average, median, Nth percentile...)
>> - are stored fields involved, if so how many and how big are they?
>> - what kind of queries are involves (some are costlier than others)
>> - what is the search rate?
>> ...
>>
>>
>> Otis
>>
>> --
>> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
>>
>> ----- Original Message ----
>> From: "[hidden email]" <[hidden email]>
>> To: [hidden email]
>> Sent: Monday, March 10, 2008 5:06:04 PM
>> Subject: Biggest index
>>
>> Hi,
>>
>> I have some question about the index size on a single machine:
>>
>> What is your biggest index you use in production?
>> Do you use MultiReader/Searcher?
>> What hardware do you need to serve it?
>> What kind of application is it?
>>
>> Thank you.
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [hidden email]
>> For additional commands, e-mail: [hidden email]
>>
>>
>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [hidden email]
>> For additional commands, e-mail: [hidden email]
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>

--------------------------
Grant Ingersoll
http://www.lucenebootcamp.com
Next Training: April 7, 2008 at ApacheCon Europe in Amsterdam

Lucene Helpful Hints:
http://wiki.apache.org/lucene-java/BasicsOfPerformance
http://wiki.apache.org/lucene-java/LuceneFAQ






---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Biggest index

John Wang-9
We are running on one box in prod with 20 million docs in one index.
-John

On Fri, Mar 14, 2008 at 8:01 PM, Grant Ingersoll <[hidden email]>
wrote:

> How big is your machine and how big are your docs? (unique terms,
> etc.)   Even if it would fit, it sounds like you are going to have to
> go distributed sooner or later, so you might as well start planning
> for it.
>
> -Grant
>
> On Mar 14, 2008, at 8:51 AM, <[hidden email]> <[hidden email]> wrote:
>
> > Yes of course, the answers to your questions are important too.
> > But no anwser at all until now :(
> >
> > For me I can say (not production yet):
> >
> > 2 ID-Fields and one content field per doc. Seach on content field
> > only.
> > Simple searches like "content:foo" or "content:foo*".
> > 1,5 GB index per 1 million docs.
> > About 50 million docs now.
> > Max. 10 million docs per year increase.
> >
> > So I will have 75 GB index soon.
> >
> > Can searching this index be handled by a single machine?
> >
> > Thank you.
> >
> >> -----Original Message-----
> >> From: Otis Gospodnetic [mailto:[hidden email]]
> >> Sent: Dienstag, 11. März 2008 20:07
> >> To: [hidden email]
> >> Subject: Re: Biggest index
> >>
> >> Questions like these are always hard to answer well.
> >> Actually, no, they are easy, right Erik: "It depends" ;)
> >>
> >> Just kidding...partially.  Anyhow, you should ask a few more
> >> questions then:
> >>
> >> - what is the response latency? (average, median, Nth percentile...)
> >> - are stored fields involved, if so how many and how big are they?
> >> - what kind of queries are involves (some are costlier than others)
> >> - what is the search rate?
> >> ...
> >>
> >>
> >> Otis
> >>
> >> --
> >> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
> >>
> >> ----- Original Message ----
> >> From: "[hidden email]" <[hidden email]>
> >> To: [hidden email]
> >> Sent: Monday, March 10, 2008 5:06:04 PM
> >> Subject: Biggest index
> >>
> >> Hi,
> >>
> >> I have some question about the index size on a single machine:
> >>
> >> What is your biggest index you use in production?
> >> Do you use MultiReader/Searcher?
> >> What hardware do you need to serve it?
> >> What kind of application is it?
> >>
> >> Thank you.
> >>
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: [hidden email]
> >> For additional commands, e-mail: [hidden email]
> >>
> >>
> >>
> >>
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: [hidden email]
> >> For additional commands, e-mail: [hidden email]
> >>
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: [hidden email]
> > For additional commands, e-mail: [hidden email]
> >
>
> --------------------------
> Grant Ingersoll
> http://www.lucenebootcamp.com
> Next Training: April 7, 2008 at ApacheCon Europe in Amsterdam
>
> Lucene Helpful Hints:
> http://wiki.apache.org/lucene-java/BasicsOfPerformance
> http://wiki.apache.org/lucene-java/LuceneFAQ
>
>
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>
adb
Reply | Threaded
Open this post in threaded view
|

Re: Biggest index

adb
In reply to this post by spring
[hidden email] wrote:
> Yes of course, the answers to your questions are important too.
> But no anwser at all until now :(

One example:

1.5 million documents
Approx 15 fields per document
DB is 10-15GB (can't find correct figure)
All on one machine.  No stats on search usage though.

We're about to embark on a 25-40M documents (email data) per annum, no deletes
over 10 years.  Planning for index distribution, but haven't decided on the
partitioning yet.

Antony



---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]