multiple small indexes or one big index?

classic Classic list List threaded Threaded
12 messages Options
Reply | Threaded
Open this post in threaded view
|

multiple small indexes or one big index?

Alexander Rosemann
Hi all, I was wondering whether you could give me some advice on how to
improve my search performance.

I have 90 lucene indexes, each having different fields (~5 per
Document). When I search, I always have to go through all indexes to
build my result set. Searching one index takes approx. 100ms, thus
searching all indexes takes 9s in total.

How can I reduce the time it needs to search?

I decided to create this many indexes because putting all data in one
index would mean that a document would have ~400 fields, with most of
them left empty. Is that ok? Would a single index be faster compared to
multiple small ones?

Any pointers are much appreciated.

Regards,
Alex

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: multiple small indexes or one big index?

Erick Erickson
I'd start by putting them all in one index. There's no penalty
in Lucene for having empty fields in a document, unlike an
RDBMS.

Alternately, if you're opening then closing searchers each
time, that's very expensive. Could you open the searchers
once and keep them open (all 90 of them)? That alone might
do the trick and be less of a change to your program. You
could also fire multiple threads at the searches, but check if
you're CPU bound first (if you are, multiple threads won't
help much/at all).

You haven't said how big these indexes are nor how many
documents you're talking about here, so this advice is suspect.

Do look at putting it all in one index though, let us know if you
have some data indicating how big stuff is/would be.

Best
Erick

On Wed, Jun 1, 2011 at 4:35 PM, Alexander Rosemann
<[hidden email]> wrote:

> Hi all, I was wondering whether you could give me some advice on how to
> improve my search performance.
>
> I have 90 lucene indexes, each having different fields (~5 per Document).
> When I search, I always have to go through all indexes to build my result
> set. Searching one index takes approx. 100ms, thus searching all indexes
> takes 9s in total.
>
> How can I reduce the time it needs to search?
>
> I decided to create this many indexes because putting all data in one index
> would mean that a document would have ~400 fields, with most of them left
> empty. Is that ok? Would a single index be faster compared to multiple small
> ones?
>
> Any pointers are much appreciated.
>
> Regards,
> Alex
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: multiple small indexes or one big index?

Alexander Rosemann
Many thanks for the tips, Erick! I do close each searcher after a
search... I will change that first thing tmrw. and let you know how that
went. Multi-threaded searching will be next and if that hasn't helped, I
will switch to one big index.
All indexes together are rather small, ~200MB and 50.000 documents.

-Alex

On 01.06.2011 23:26, Erick Erickson wrote:

> I'd start by putting them all in one index. There's no penalty
> in Lucene for having empty fields in a document, unlike an
> RDBMS.
>
> Alternately, if you're opening then closing searchers each
> time, that's very expensive. Could you open the searchers
> once and keep them open (all 90 of them)? That alone might
> do the trick and be less of a change to your program. You
> could also fire multiple threads at the searches, but check if
> you're CPU bound first (if you are, multiple threads won't
> help much/at all).
>
> You haven't said how big these indexes are nor how many
> documents you're talking about here, so this advice is suspect.
>
> Do look at putting it all in one index though, let us know if you
> have some data indicating how big stuff is/would be.
>
> Best
> Erick
>
> On Wed, Jun 1, 2011 at 4:35 PM, Alexander Rosemann
> <[hidden email]>  wrote:
>> Hi all, I was wondering whether you could give me some advice on how to
>> improve my search performance.
>>
>> I have 90 lucene indexes, each having different fields (~5 per Document).
>> When I search, I always have to go through all indexes to build my result
>> set. Searching one index takes approx. 100ms, thus searching all indexes
>> takes 9s in total.
>>
>> How can I reduce the time it needs to search?
>>
>> I decided to create this many indexes because putting all data in one index
>> would mean that a document would have ~400 fields, with most of them left
>> empty. Is that ok? Would a single index be faster compared to multiple small
>> ones?
>>
>> Any pointers are much appreciated.
>>
>> Regards,
>> Alex
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [hidden email]
>> For additional commands, e-mail: [hidden email]
>>
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: multiple small indexes or one big index?

Shai Erera
>
> All indexes together are rather small, ~200MB and 50.000 documents.


Then I would definitely consider merging them under one index. Even if you
don't close the searcher, it will still require 90 x N ms to search them,
N=ms to search one index.

Also, multi-threading will improve, but only up to a point - because you
cannot parallelize 90 searches (unless you have some sort of super-computer
there).

On the other hand, if you merge them into one index then you'll be talking
about an index that's <20GB and <5M docs, which is definitely reasonable for
Lucene and performance (depends of course on the search application, but
generally) is very good.

Starting Lucene 3.1 you can perform your searches in parallel (over one
index) using IndexSearcher, which comes in handy if your index has multiple
segments. Look at
http://lucene.apache.org/java/3_1_0/api/core/org/apache/lucene/search/IndexSearcher.html#IndexSearcher(org.apache.lucene.index.IndexReader,
java.util.concurrent.ExecutorService).

Having said that, keeping the indexes separate may have advantages that your
application needs. For example, if those indexes are completely rebuilt very
frequently, then it's much better to delete and index and rebuild, then to
delete 50K docs from the merged large index. But that really depends on your
application needs.

I'd say, if you don't see a strong case for keeping them apart, merge them
into one. Besides performance, there's also index management overhead, maybe
synchronizing commits, making sure all are closed/opened together etc., that
may just be an unnecessary overhead.

BTW, in Lucene in Action 2nd Edition, there's an example class called
SearcherManager which manages IndexSearcher instances and ensures an
IndexSearcher instance is closed only after the last thread released it + it
can manage the reopen() logic for you as well as warming up the index. You
might want to give it a try too !
LUCENE-2955<https://issues.apache.org/jira/browse/LUCENE-2955> makes
use of it, so you can consult it for examples (it's still not committed).

Hope this helps,
Shai

On Thu, Jun 2, 2011 at 12:37 AM, Alexander Rosemann <
[hidden email]> wrote:

> Many thanks for the tips, Erick! I do close each searcher after a search...
> I will change that first thing tmrw. and let you know how that went.
> Multi-threaded searching will be next and if that hasn't helped, I will
> switch to one big index.
> All indexes together are rather small, ~200MB and 50.000 documents.
>
> -Alex
>
>
> On 01.06.2011 23:26, Erick Erickson wrote:
>
>> I'd start by putting them all in one index. There's no penalty
>> in Lucene for having empty fields in a document, unlike an
>> RDBMS.
>>
>> Alternately, if you're opening then closing searchers each
>> time, that's very expensive. Could you open the searchers
>> once and keep them open (all 90 of them)? That alone might
>> do the trick and be less of a change to your program. You
>> could also fire multiple threads at the searches, but check if
>> you're CPU bound first (if you are, multiple threads won't
>> help much/at all).
>>
>> You haven't said how big these indexes are nor how many
>> documents you're talking about here, so this advice is suspect.
>>
>> Do look at putting it all in one index though, let us know if you
>> have some data indicating how big stuff is/would be.
>>
>> Best
>> Erick
>>
>> On Wed, Jun 1, 2011 at 4:35 PM, Alexander Rosemann
>> <[hidden email]>  wrote:
>>
>>> Hi all, I was wondering whether you could give me some advice on how to
>>> improve my search performance.
>>>
>>> I have 90 lucene indexes, each having different fields (~5 per Document).
>>> When I search, I always have to go through all indexes to build my result
>>> set. Searching one index takes approx. 100ms, thus searching all indexes
>>> takes 9s in total.
>>>
>>> How can I reduce the time it needs to search?
>>>
>>> I decided to create this many indexes because putting all data in one
>>> index
>>> would mean that a document would have ~400 fields, with most of them left
>>> empty. Is that ok? Would a single index be faster compared to multiple
>>> small
>>> ones?
>>>
>>> Any pointers are much appreciated.
>>>
>>> Regards,
>>> Alex
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: [hidden email]
>>> For additional commands, e-mail: [hidden email]
>>>
>>>
>>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [hidden email]
>> For additional commands, e-mail: [hidden email]
>>
>>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>
Reply | Threaded
Open this post in threaded view
|

Re: multiple small indexes or one big index?

Alexander Rosemann
Many, many thanks for the input. I have applied the little change of not
closing the searchers each time and search times dropped already by half!

I'll try to merge all indexes into a single one next. I'll let you know
how that went.


On 02.06.2011 05:28, Shai Erera wrote:

>>
>> All indexes together are rather small, ~200MB and 50.000 documents.
>
>
> Then I would definitely consider merging them under one index. Even if you
> don't close the searcher, it will still require 90 x N ms to search them,
> N=ms to search one index.
>
> Also, multi-threading will improve, but only up to a point - because you
> cannot parallelize 90 searches (unless you have some sort of super-computer
> there).
>
> On the other hand, if you merge them into one index then you'll be talking
> about an index that's<20GB and<5M docs, which is definitely reasonable for
> Lucene and performance (depends of course on the search application, but
> generally) is very good.
>
> Starting Lucene 3.1 you can perform your searches in parallel (over one
> index) using IndexSearcher, which comes in handy if your index has multiple
> segments. Look at
> http://lucene.apache.org/java/3_1_0/api/core/org/apache/lucene/search/IndexSearcher.html#IndexSearcher(org.apache.lucene.index.IndexReader,
> java.util.concurrent.ExecutorService).
>
> Having said that, keeping the indexes separate may have advantages that your
> application needs. For example, if those indexes are completely rebuilt very
> frequently, then it's much better to delete and index and rebuild, then to
> delete 50K docs from the merged large index. But that really depends on your
> application needs.
>
> I'd say, if you don't see a strong case for keeping them apart, merge them
> into one. Besides performance, there's also index management overhead, maybe
> synchronizing commits, making sure all are closed/opened together etc., that
> may just be an unnecessary overhead.
>
> BTW, in Lucene in Action 2nd Edition, there's an example class called
> SearcherManager which manages IndexSearcher instances and ensures an
> IndexSearcher instance is closed only after the last thread released it + it
> can manage the reopen() logic for you as well as warming up the index. You
> might want to give it a try too !
> LUCENE-2955<https://issues.apache.org/jira/browse/LUCENE-2955>  makes
> use of it, so you can consult it for examples (it's still not committed).
>
> Hope this helps,
> Shai
>
> On Thu, Jun 2, 2011 at 12:37 AM, Alexander Rosemann<
> [hidden email]>  wrote:
>
>> Many thanks for the tips, Erick! I do close each searcher after a search...
>> I will change that first thing tmrw. and let you know how that went.
>> Multi-threaded searching will be next and if that hasn't helped, I will
>> switch to one big index.
>> All indexes together are rather small, ~200MB and 50.000 documents.
>>
>> -Alex
>>
>>
>> On 01.06.2011 23:26, Erick Erickson wrote:
>>
>>> I'd start by putting them all in one index. There's no penalty
>>> in Lucene for having empty fields in a document, unlike an
>>> RDBMS.
>>>
>>> Alternately, if you're opening then closing searchers each
>>> time, that's very expensive. Could you open the searchers
>>> once and keep them open (all 90 of them)? That alone might
>>> do the trick and be less of a change to your program. You
>>> could also fire multiple threads at the searches, but check if
>>> you're CPU bound first (if you are, multiple threads won't
>>> help much/at all).
>>>
>>> You haven't said how big these indexes are nor how many
>>> documents you're talking about here, so this advice is suspect.
>>>
>>> Do look at putting it all in one index though, let us know if you
>>> have some data indicating how big stuff is/would be.
>>>
>>> Best
>>> Erick
>>>
>>> On Wed, Jun 1, 2011 at 4:35 PM, Alexander Rosemann
>>> <[hidden email]>   wrote:
>>>
>>>> Hi all, I was wondering whether you could give me some advice on how to
>>>> improve my search performance.
>>>>
>>>> I have 90 lucene indexes, each having different fields (~5 per Document).
>>>> When I search, I always have to go through all indexes to build my result
>>>> set. Searching one index takes approx. 100ms, thus searching all indexes
>>>> takes 9s in total.
>>>>
>>>> How can I reduce the time it needs to search?
>>>>
>>>> I decided to create this many indexes because putting all data in one
>>>> index
>>>> would mean that a document would have ~400 fields, with most of them left
>>>> empty. Is that ok? Would a single index be faster compared to multiple
>>>> small
>>>> ones?
>>>>
>>>> Any pointers are much appreciated.
>>>>
>>>> Regards,
>>>> Alex
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: [hidden email]
>>>> For additional commands, e-mail: [hidden email]
>>>>
>>>>
>>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: [hidden email]
>>> For additional commands, e-mail: [hidden email]
>>>
>>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [hidden email]
>> For additional commands, e-mail: [hidden email]
>>
>>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: multiple small indexes or one big index?

Erick Erickson
In reply to this post by Alexander Rosemann
At this size, really consider going to a single index. The lack of
administrative headaches alone is probably well worth the effort....

I almost guarantee that the time you spend re-writing things to keep
the searchers open (and finding the bugs!) will be far more than just
putting all the data in a single index.

But that might just be my preferences showing....

Best
Erick

On Wed, Jun 1, 2011 at 5:37 PM, Alexander Rosemann
<[hidden email]> wrote:

> Many thanks for the tips, Erick! I do close each searcher after a search...
> I will change that first thing tmrw. and let you know how that went.
> Multi-threaded searching will be next and if that hasn't helped, I will
> switch to one big index.
> All indexes together are rather small, ~200MB and 50.000 documents.
>
> -Alex
>
> On 01.06.2011 23:26, Erick Erickson wrote:
>>
>> I'd start by putting them all in one index. There's no penalty
>> in Lucene for having empty fields in a document, unlike an
>> RDBMS.
>>
>> Alternately, if you're opening then closing searchers each
>> time, that's very expensive. Could you open the searchers
>> once and keep them open (all 90 of them)? That alone might
>> do the trick and be less of a change to your program. You
>> could also fire multiple threads at the searches, but check if
>> you're CPU bound first (if you are, multiple threads won't
>> help much/at all).
>>
>> You haven't said how big these indexes are nor how many
>> documents you're talking about here, so this advice is suspect.
>>
>> Do look at putting it all in one index though, let us know if you
>> have some data indicating how big stuff is/would be.
>>
>> Best
>> Erick
>>
>> On Wed, Jun 1, 2011 at 4:35 PM, Alexander Rosemann
>> <[hidden email]>  wrote:
>>>
>>> Hi all, I was wondering whether you could give me some advice on how to
>>> improve my search performance.
>>>
>>> I have 90 lucene indexes, each having different fields (~5 per Document).
>>> When I search, I always have to go through all indexes to build my result
>>> set. Searching one index takes approx. 100ms, thus searching all indexes
>>> takes 9s in total.
>>>
>>> How can I reduce the time it needs to search?
>>>
>>> I decided to create this many indexes because putting all data in one
>>> index
>>> would mean that a document would have ~400 fields, with most of them left
>>> empty. Is that ok? Would a single index be faster compared to multiple
>>> small
>>> ones?
>>>
>>> Any pointers are much appreciated.
>>>
>>> Regards,
>>> Alex
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: [hidden email]
>>> For additional commands, e-mail: [hidden email]
>>>
>>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [hidden email]
>> For additional commands, e-mail: [hidden email]
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: multiple small indexes or one big index?

Alexander Rosemann
Hi Erick, caching the IndexSearchers didn't took too much effort and
decreased searching already by 30%!

I am busy changing the code to use a single index as you suggested atm.
Still a few things left to be done but once I have it working I let you
know how much faster it is for me.

Thanks,
Alex

On 02.06.2011 13:04, Erick Erickson wrote:

> At this size, really consider going to a single index. The lack of
> administrative headaches alone is probably well worth the effort....
>
> I almost guarantee that the time you spend re-writing things to keep
> the searchers open (and finding the bugs!) will be far more than just
> putting all the data in a single index.
>
> But that might just be my preferences showing....
>
> Best
> Erick
>
> On Wed, Jun 1, 2011 at 5:37 PM, Alexander Rosemann
> <[hidden email]>  wrote:
>> Many thanks for the tips, Erick! I do close each searcher after a search...
>> I will change that first thing tmrw. and let you know how that went.
>> Multi-threaded searching will be next and if that hasn't helped, I will
>> switch to one big index.
>> All indexes together are rather small, ~200MB and 50.000 documents.
>>
>> -Alex
>>
>> On 01.06.2011 23:26, Erick Erickson wrote:
>>>
>>> I'd start by putting them all in one index. There's no penalty
>>> in Lucene for having empty fields in a document, unlike an
>>> RDBMS.
>>>
>>> Alternately, if you're opening then closing searchers each
>>> time, that's very expensive. Could you open the searchers
>>> once and keep them open (all 90 of them)? That alone might
>>> do the trick and be less of a change to your program. You
>>> could also fire multiple threads at the searches, but check if
>>> you're CPU bound first (if you are, multiple threads won't
>>> help much/at all).
>>>
>>> You haven't said how big these indexes are nor how many
>>> documents you're talking about here, so this advice is suspect.
>>>
>>> Do look at putting it all in one index though, let us know if you
>>> have some data indicating how big stuff is/would be.
>>>
>>> Best
>>> Erick
>>>
>>> On Wed, Jun 1, 2011 at 4:35 PM, Alexander Rosemann
>>> <[hidden email]>    wrote:
>>>>
>>>> Hi all, I was wondering whether you could give me some advice on how to
>>>> improve my search performance.
>>>>
>>>> I have 90 lucene indexes, each having different fields (~5 per Document).
>>>> When I search, I always have to go through all indexes to build my result
>>>> set. Searching one index takes approx. 100ms, thus searching all indexes
>>>> takes 9s in total.
>>>>
>>>> How can I reduce the time it needs to search?
>>>>
>>>> I decided to create this many indexes because putting all data in one
>>>> index
>>>> would mean that a document would have ~400 fields, with most of them left
>>>> empty. Is that ok? Would a single index be faster compared to multiple
>>>> small
>>>> ones?
>>>>
>>>> Any pointers are much appreciated.
>>>>
>>>> Regards,
>>>> Alex
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: [hidden email]
>>>> For additional commands, e-mail: [hidden email]
>>>>
>>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: [hidden email]
>>> For additional commands, e-mail: [hidden email]
>>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [hidden email]
>> For additional commands, e-mail: [hidden email]
>>
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: multiple small indexes or one big index?

Erick Erickson
Sounds good, just be sure to keep your (now single) searcher open! Also,
be sure to measure queries after a while. The first few queries will fill up
caches etc, so the time should improve after the first few.

Best
Erick

On Thu, Jun 2, 2011 at 9:28 AM, Alexander Rosemann
<[hidden email]> wrote:

> Hi Erick, caching the IndexSearchers didn't took too much effort and
> decreased searching already by 30%!
>
> I am busy changing the code to use a single index as you suggested atm.
> Still a few things left to be done but once I have it working I let you know
> how much faster it is for me.
>
> Thanks,
> Alex
>
> On 02.06.2011 13:04, Erick Erickson wrote:
>>
>> At this size, really consider going to a single index. The lack of
>> administrative headaches alone is probably well worth the effort....
>>
>> I almost guarantee that the time you spend re-writing things to keep
>> the searchers open (and finding the bugs!) will be far more than just
>> putting all the data in a single index.
>>
>> But that might just be my preferences showing....
>>
>> Best
>> Erick
>>
>> On Wed, Jun 1, 2011 at 5:37 PM, Alexander Rosemann
>> <[hidden email]>  wrote:
>>>
>>> Many thanks for the tips, Erick! I do close each searcher after a
>>> search...
>>> I will change that first thing tmrw. and let you know how that went.
>>> Multi-threaded searching will be next and if that hasn't helped, I will
>>> switch to one big index.
>>> All indexes together are rather small, ~200MB and 50.000 documents.
>>>
>>> -Alex
>>>
>>> On 01.06.2011 23:26, Erick Erickson wrote:
>>>>
>>>> I'd start by putting them all in one index. There's no penalty
>>>> in Lucene for having empty fields in a document, unlike an
>>>> RDBMS.
>>>>
>>>> Alternately, if you're opening then closing searchers each
>>>> time, that's very expensive. Could you open the searchers
>>>> once and keep them open (all 90 of them)? That alone might
>>>> do the trick and be less of a change to your program. You
>>>> could also fire multiple threads at the searches, but check if
>>>> you're CPU bound first (if you are, multiple threads won't
>>>> help much/at all).
>>>>
>>>> You haven't said how big these indexes are nor how many
>>>> documents you're talking about here, so this advice is suspect.
>>>>
>>>> Do look at putting it all in one index though, let us know if you
>>>> have some data indicating how big stuff is/would be.
>>>>
>>>> Best
>>>> Erick
>>>>
>>>> On Wed, Jun 1, 2011 at 4:35 PM, Alexander Rosemann
>>>> <[hidden email]>    wrote:
>>>>>
>>>>> Hi all, I was wondering whether you could give me some advice on how to
>>>>> improve my search performance.
>>>>>
>>>>> I have 90 lucene indexes, each having different fields (~5 per
>>>>> Document).
>>>>> When I search, I always have to go through all indexes to build my
>>>>> result
>>>>> set. Searching one index takes approx. 100ms, thus searching all
>>>>> indexes
>>>>> takes 9s in total.
>>>>>
>>>>> How can I reduce the time it needs to search?
>>>>>
>>>>> I decided to create this many indexes because putting all data in one
>>>>> index
>>>>> would mean that a document would have ~400 fields, with most of them
>>>>> left
>>>>> empty. Is that ok? Would a single index be faster compared to multiple
>>>>> small
>>>>> ones?
>>>>>
>>>>> Any pointers are much appreciated.
>>>>>
>>>>> Regards,
>>>>> Alex
>>>>>
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: [hidden email]
>>>>> For additional commands, e-mail: [hidden email]
>>>>>
>>>>>
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: [hidden email]
>>>> For additional commands, e-mail: [hidden email]
>>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: [hidden email]
>>> For additional commands, e-mail: [hidden email]
>>>
>>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [hidden email]
>> For additional commands, e-mail: [hidden email]
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: multiple small indexes or one big index?

Alexander Rosemann
No worries, I'll keep that in mind now.
In addition I am going to switch to another collector as well. ATM I
collect the results and then sort them using the std. Collections.sort
approach... I have to look what Lucene offers and switch to something else.

Thanks,
Alex

On 02.06.2011 15:36, Erick Erickson wrote:

> Sounds good, just be sure to keep your (now single) searcher open! Also,
> be sure to measure queries after a while. The first few queries will fill up
> caches etc, so the time should improve after the first few.
>
> Best
> Erick
>
> On Thu, Jun 2, 2011 at 9:28 AM, Alexander Rosemann
> <[hidden email]>  wrote:
>> Hi Erick, caching the IndexSearchers didn't took too much effort and
>> decreased searching already by 30%!
>>
>> I am busy changing the code to use a single index as you suggested atm.
>> Still a few things left to be done but once I have it working I let you know
>> how much faster it is for me.
>>
>> Thanks,
>> Alex
>>
>> On 02.06.2011 13:04, Erick Erickson wrote:
>>>
>>> At this size, really consider going to a single index. The lack of
>>> administrative headaches alone is probably well worth the effort....
>>>
>>> I almost guarantee that the time you spend re-writing things to keep
>>> the searchers open (and finding the bugs!) will be far more than just
>>> putting all the data in a single index.
>>>
>>> But that might just be my preferences showing....
>>>
>>> Best
>>> Erick
>>>
>>> On Wed, Jun 1, 2011 at 5:37 PM, Alexander Rosemann
>>> <[hidden email]>    wrote:
>>>>
>>>> Many thanks for the tips, Erick! I do close each searcher after a
>>>> search...
>>>> I will change that first thing tmrw. and let you know how that went.
>>>> Multi-threaded searching will be next and if that hasn't helped, I will
>>>> switch to one big index.
>>>> All indexes together are rather small, ~200MB and 50.000 documents.
>>>>
>>>> -Alex
>>>>
>>>> On 01.06.2011 23:26, Erick Erickson wrote:
>>>>>
>>>>> I'd start by putting them all in one index. There's no penalty
>>>>> in Lucene for having empty fields in a document, unlike an
>>>>> RDBMS.
>>>>>
>>>>> Alternately, if you're opening then closing searchers each
>>>>> time, that's very expensive. Could you open the searchers
>>>>> once and keep them open (all 90 of them)? That alone might
>>>>> do the trick and be less of a change to your program. You
>>>>> could also fire multiple threads at the searches, but check if
>>>>> you're CPU bound first (if you are, multiple threads won't
>>>>> help much/at all).
>>>>>
>>>>> You haven't said how big these indexes are nor how many
>>>>> documents you're talking about here, so this advice is suspect.
>>>>>
>>>>> Do look at putting it all in one index though, let us know if you
>>>>> have some data indicating how big stuff is/would be.
>>>>>
>>>>> Best
>>>>> Erick
>>>>>
>>>>> On Wed, Jun 1, 2011 at 4:35 PM, Alexander Rosemann
>>>>> <[hidden email]>      wrote:
>>>>>>
>>>>>> Hi all, I was wondering whether you could give me some advice on how to
>>>>>> improve my search performance.
>>>>>>
>>>>>> I have 90 lucene indexes, each having different fields (~5 per
>>>>>> Document).
>>>>>> When I search, I always have to go through all indexes to build my
>>>>>> result
>>>>>> set. Searching one index takes approx. 100ms, thus searching all
>>>>>> indexes
>>>>>> takes 9s in total.
>>>>>>
>>>>>> How can I reduce the time it needs to search?
>>>>>>
>>>>>> I decided to create this many indexes because putting all data in one
>>>>>> index
>>>>>> would mean that a document would have ~400 fields, with most of them
>>>>>> left
>>>>>> empty. Is that ok? Would a single index be faster compared to multiple
>>>>>> small
>>>>>> ones?
>>>>>>
>>>>>> Any pointers are much appreciated.
>>>>>>
>>>>>> Regards,
>>>>>> Alex
>>>>>>
>>>>>> ---------------------------------------------------------------------
>>>>>> To unsubscribe, e-mail: [hidden email]
>>>>>> For additional commands, e-mail: [hidden email]
>>>>>>
>>>>>>
>>>>>
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: [hidden email]
>>>>> For additional commands, e-mail: [hidden email]
>>>>>
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: [hidden email]
>>>> For additional commands, e-mail: [hidden email]
>>>>
>>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: [hidden email]
>>> For additional commands, e-mail: [hidden email]
>>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [hidden email]
>> For additional commands, e-mail: [hidden email]
>>
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: multiple small indexes or one big index?

Alexander Rosemann
Alright. With all the changes you suggested I am down from 9s to <1s.
Again, many thanks to both of you Erick and Shai!

Regards,
Alex

On 02.06.2011 15:48, Alexander Rosemann wrote:

> No worries, I'll keep that in mind now.
> In addition I am going to switch to another collector as well. ATM I
> collect the results and then sort them using the std. Collections.sort
> approach... I have to look what Lucene offers and switch to something else.
>
> Thanks,
> Alex
>
> On 02.06.2011 15:36, Erick Erickson wrote:
>> Sounds good, just be sure to keep your (now single) searcher open! Also,
>> be sure to measure queries after a while. The first few queries will
>> fill up
>> caches etc, so the time should improve after the first few.
>>
>> Best
>> Erick
>>
>> On Thu, Jun 2, 2011 at 9:28 AM, Alexander Rosemann
>> <[hidden email]> wrote:
>>> Hi Erick, caching the IndexSearchers didn't took too much effort and
>>> decreased searching already by 30%!
>>>
>>> I am busy changing the code to use a single index as you suggested atm.
>>> Still a few things left to be done but once I have it working I let
>>> you know
>>> how much faster it is for me.
>>>
>>> Thanks,
>>> Alex
>>>
>>> On 02.06.2011 13:04, Erick Erickson wrote:
>>>>
>>>> At this size, really consider going to a single index. The lack of
>>>> administrative headaches alone is probably well worth the effort....
>>>>
>>>> I almost guarantee that the time you spend re-writing things to keep
>>>> the searchers open (and finding the bugs!) will be far more than just
>>>> putting all the data in a single index.
>>>>
>>>> But that might just be my preferences showing....
>>>>
>>>> Best
>>>> Erick
>>>>
>>>> On Wed, Jun 1, 2011 at 5:37 PM, Alexander Rosemann
>>>> <[hidden email]> wrote:
>>>>>
>>>>> Many thanks for the tips, Erick! I do close each searcher after a
>>>>> search...
>>>>> I will change that first thing tmrw. and let you know how that went.
>>>>> Multi-threaded searching will be next and if that hasn't helped, I
>>>>> will
>>>>> switch to one big index.
>>>>> All indexes together are rather small, ~200MB and 50.000 documents.
>>>>>
>>>>> -Alex
>>>>>
>>>>> On 01.06.2011 23:26, Erick Erickson wrote:
>>>>>>
>>>>>> I'd start by putting them all in one index. There's no penalty
>>>>>> in Lucene for having empty fields in a document, unlike an
>>>>>> RDBMS.
>>>>>>
>>>>>> Alternately, if you're opening then closing searchers each
>>>>>> time, that's very expensive. Could you open the searchers
>>>>>> once and keep them open (all 90 of them)? That alone might
>>>>>> do the trick and be less of a change to your program. You
>>>>>> could also fire multiple threads at the searches, but check if
>>>>>> you're CPU bound first (if you are, multiple threads won't
>>>>>> help much/at all).
>>>>>>
>>>>>> You haven't said how big these indexes are nor how many
>>>>>> documents you're talking about here, so this advice is suspect.
>>>>>>
>>>>>> Do look at putting it all in one index though, let us know if you
>>>>>> have some data indicating how big stuff is/would be.
>>>>>>
>>>>>> Best
>>>>>> Erick
>>>>>>
>>>>>> On Wed, Jun 1, 2011 at 4:35 PM, Alexander Rosemann
>>>>>> <[hidden email]> wrote:
>>>>>>>
>>>>>>> Hi all, I was wondering whether you could give me some advice on
>>>>>>> how to
>>>>>>> improve my search performance.
>>>>>>>
>>>>>>> I have 90 lucene indexes, each having different fields (~5 per
>>>>>>> Document).
>>>>>>> When I search, I always have to go through all indexes to build my
>>>>>>> result
>>>>>>> set. Searching one index takes approx. 100ms, thus searching all
>>>>>>> indexes
>>>>>>> takes 9s in total.
>>>>>>>
>>>>>>> How can I reduce the time it needs to search?
>>>>>>>
>>>>>>> I decided to create this many indexes because putting all data in
>>>>>>> one
>>>>>>> index
>>>>>>> would mean that a document would have ~400 fields, with most of them
>>>>>>> left
>>>>>>> empty. Is that ok? Would a single index be faster compared to
>>>>>>> multiple
>>>>>>> small
>>>>>>> ones?
>>>>>>>
>>>>>>> Any pointers are much appreciated.
>>>>>>>
>>>>>>> Regards,
>>>>>>> Alex
>>>>>>>
>>>>>>> ---------------------------------------------------------------------
>>>>>>>
>>>>>>> To unsubscribe, e-mail: [hidden email]
>>>>>>> For additional commands, e-mail: [hidden email]
>>>>>>>
>>>>>>>
>>>>>>
>>>>>> ---------------------------------------------------------------------
>>>>>> To unsubscribe, e-mail: [hidden email]
>>>>>> For additional commands, e-mail: [hidden email]
>>>>>>
>>>>>
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: [hidden email]
>>>>> For additional commands, e-mail: [hidden email]
>>>>>
>>>>>
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: [hidden email]
>>>> For additional commands, e-mail: [hidden email]
>>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: [hidden email]
>>> For additional commands, e-mail: [hidden email]
>>>
>>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [hidden email]
>> For additional commands, e-mail: [hidden email]
>>

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: multiple small indexes or one big index?

Erick Erickson
OK, if they're all in a single index, you might also try using Lucene
sorting. Be aware
that the first sort on a field takes extra time to warm the caches...

But note that sorting is for single-valued, un-tokenized fields..

Best
Erick

On Fri, Jun 3, 2011 at 2:39 AM, Alexander Rosemann
<[hidden email]> wrote:

> Alright. With all the changes you suggested I am down from 9s to <1s. Again,
> many thanks to both of you Erick and Shai!
>
> Regards,
> Alex
>
> On 02.06.2011 15:48, Alexander Rosemann wrote:
>>
>> No worries, I'll keep that in mind now.
>> In addition I am going to switch to another collector as well. ATM I
>> collect the results and then sort them using the std. Collections.sort
>> approach... I have to look what Lucene offers and switch to something
>> else.
>>
>> Thanks,
>> Alex
>>
>> On 02.06.2011 15:36, Erick Erickson wrote:
>>>
>>> Sounds good, just be sure to keep your (now single) searcher open! Also,
>>> be sure to measure queries after a while. The first few queries will
>>> fill up
>>> caches etc, so the time should improve after the first few.
>>>
>>> Best
>>> Erick
>>>
>>> On Thu, Jun 2, 2011 at 9:28 AM, Alexander Rosemann
>>> <[hidden email]> wrote:
>>>>
>>>> Hi Erick, caching the IndexSearchers didn't took too much effort and
>>>> decreased searching already by 30%!
>>>>
>>>> I am busy changing the code to use a single index as you suggested atm.
>>>> Still a few things left to be done but once I have it working I let
>>>> you know
>>>> how much faster it is for me.
>>>>
>>>> Thanks,
>>>> Alex
>>>>
>>>> On 02.06.2011 13:04, Erick Erickson wrote:
>>>>>
>>>>> At this size, really consider going to a single index. The lack of
>>>>> administrative headaches alone is probably well worth the effort....
>>>>>
>>>>> I almost guarantee that the time you spend re-writing things to keep
>>>>> the searchers open (and finding the bugs!) will be far more than just
>>>>> putting all the data in a single index.
>>>>>
>>>>> But that might just be my preferences showing....
>>>>>
>>>>> Best
>>>>> Erick
>>>>>
>>>>> On Wed, Jun 1, 2011 at 5:37 PM, Alexander Rosemann
>>>>> <[hidden email]> wrote:
>>>>>>
>>>>>> Many thanks for the tips, Erick! I do close each searcher after a
>>>>>> search...
>>>>>> I will change that first thing tmrw. and let you know how that went.
>>>>>> Multi-threaded searching will be next and if that hasn't helped, I
>>>>>> will
>>>>>> switch to one big index.
>>>>>> All indexes together are rather small, ~200MB and 50.000 documents.
>>>>>>
>>>>>> -Alex
>>>>>>
>>>>>> On 01.06.2011 23:26, Erick Erickson wrote:
>>>>>>>
>>>>>>> I'd start by putting them all in one index. There's no penalty
>>>>>>> in Lucene for having empty fields in a document, unlike an
>>>>>>> RDBMS.
>>>>>>>
>>>>>>> Alternately, if you're opening then closing searchers each
>>>>>>> time, that's very expensive. Could you open the searchers
>>>>>>> once and keep them open (all 90 of them)? That alone might
>>>>>>> do the trick and be less of a change to your program. You
>>>>>>> could also fire multiple threads at the searches, but check if
>>>>>>> you're CPU bound first (if you are, multiple threads won't
>>>>>>> help much/at all).
>>>>>>>
>>>>>>> You haven't said how big these indexes are nor how many
>>>>>>> documents you're talking about here, so this advice is suspect.
>>>>>>>
>>>>>>> Do look at putting it all in one index though, let us know if you
>>>>>>> have some data indicating how big stuff is/would be.
>>>>>>>
>>>>>>> Best
>>>>>>> Erick
>>>>>>>
>>>>>>> On Wed, Jun 1, 2011 at 4:35 PM, Alexander Rosemann
>>>>>>> <[hidden email]> wrote:
>>>>>>>>
>>>>>>>> Hi all, I was wondering whether you could give me some advice on
>>>>>>>> how to
>>>>>>>> improve my search performance.
>>>>>>>>
>>>>>>>> I have 90 lucene indexes, each having different fields (~5 per
>>>>>>>> Document).
>>>>>>>> When I search, I always have to go through all indexes to build my
>>>>>>>> result
>>>>>>>> set. Searching one index takes approx. 100ms, thus searching all
>>>>>>>> indexes
>>>>>>>> takes 9s in total.
>>>>>>>>
>>>>>>>> How can I reduce the time it needs to search?
>>>>>>>>
>>>>>>>> I decided to create this many indexes because putting all data in
>>>>>>>> one
>>>>>>>> index
>>>>>>>> would mean that a document would have ~400 fields, with most of them
>>>>>>>> left
>>>>>>>> empty. Is that ok? Would a single index be faster compared to
>>>>>>>> multiple
>>>>>>>> small
>>>>>>>> ones?
>>>>>>>>
>>>>>>>> Any pointers are much appreciated.
>>>>>>>>
>>>>>>>> Regards,
>>>>>>>> Alex
>>>>>>>>
>>>>>>>>
>>>>>>>> ---------------------------------------------------------------------
>>>>>>>>
>>>>>>>> To unsubscribe, e-mail: [hidden email]
>>>>>>>> For additional commands, e-mail: [hidden email]
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>> ---------------------------------------------------------------------
>>>>>>> To unsubscribe, e-mail: [hidden email]
>>>>>>> For additional commands, e-mail: [hidden email]
>>>>>>>
>>>>>>
>>>>>> ---------------------------------------------------------------------
>>>>>> To unsubscribe, e-mail: [hidden email]
>>>>>> For additional commands, e-mail: [hidden email]
>>>>>>
>>>>>>
>>>>>
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: [hidden email]
>>>>> For additional commands, e-mail: [hidden email]
>>>>>
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: [hidden email]
>>>> For additional commands, e-mail: [hidden email]
>>>>
>>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: [hidden email]
>>> For additional commands, e-mail: [hidden email]
>>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: multiple small indexes or one big index?

Itamar Syn-Hershko-2
In reply to this post by Alexander Rosemann
Erick,


Sorry about reopening this more than a week late...


You were asking about the size of each index; at what index size would
you consider splitting to several indices with multiple searches etc,
for what reasons, and does it matter which Lucene version is used?


Thanks :)


Itamar.


On 02/06/2011 16:48, Alexander Rosemann wrote:

> No worries, I'll keep that in mind now.
> In addition I am going to switch to another collector as well. ATM I
> collect the results and then sort them using the std. Collections.sort
> approach... I have to look what Lucene offers and switch to something
> else.
>
> Thanks,
> Alex
>
> On 02.06.2011 15:36, Erick Erickson wrote:
>> Sounds good, just be sure to keep your (now single) searcher open! Also,
>> be sure to measure queries after a while. The first few queries will
>> fill up
>> caches etc, so the time should improve after the first few.
>>
>> Best
>> Erick
>>
>> On Thu, Jun 2, 2011 at 9:28 AM, Alexander Rosemann
>> <[hidden email]>  wrote:
>>> Hi Erick, caching the IndexSearchers didn't took too much effort and
>>> decreased searching already by 30%!
>>>
>>> I am busy changing the code to use a single index as you suggested atm.
>>> Still a few things left to be done but once I have it working I let
>>> you know
>>> how much faster it is for me.
>>>
>>> Thanks,
>>> Alex
>>>
>>> On 02.06.2011 13:04, Erick Erickson wrote:
>>>>
>>>> At this size, really consider going to a single index. The lack of
>>>> administrative headaches alone is probably well worth the effort....
>>>>
>>>> I almost guarantee that the time you spend re-writing things to keep
>>>> the searchers open (and finding the bugs!) will be far more than just
>>>> putting all the data in a single index.
>>>>
>>>> But that might just be my preferences showing....
>>>>
>>>> Best
>>>> Erick
>>>>
>>>> On Wed, Jun 1, 2011 at 5:37 PM, Alexander Rosemann
>>>> <[hidden email]>    wrote:
>>>>>
>>>>> Many thanks for the tips, Erick! I do close each searcher after a
>>>>> search...
>>>>> I will change that first thing tmrw. and let you know how that went.
>>>>> Multi-threaded searching will be next and if that hasn't helped, I
>>>>> will
>>>>> switch to one big index.
>>>>> All indexes together are rather small, ~200MB and 50.000 documents.
>>>>>
>>>>> -Alex
>>>>>
>>>>> On 01.06.2011 23:26, Erick Erickson wrote:
>>>>>>
>>>>>> I'd start by putting them all in one index. There's no penalty
>>>>>> in Lucene for having empty fields in a document, unlike an
>>>>>> RDBMS.
>>>>>>
>>>>>> Alternately, if you're opening then closing searchers each
>>>>>> time, that's very expensive. Could you open the searchers
>>>>>> once and keep them open (all 90 of them)? That alone might
>>>>>> do the trick and be less of a change to your program. You
>>>>>> could also fire multiple threads at the searches, but check if
>>>>>> you're CPU bound first (if you are, multiple threads won't
>>>>>> help much/at all).
>>>>>>
>>>>>> You haven't said how big these indexes are nor how many
>>>>>> documents you're talking about here, so this advice is suspect.
>>>>>>
>>>>>> Do look at putting it all in one index though, let us know if you
>>>>>> have some data indicating how big stuff is/would be.
>>>>>>
>>>>>> Best
>>>>>> Erick
>>>>>>
>>>>>> On Wed, Jun 1, 2011 at 4:35 PM, Alexander Rosemann
>>>>>> <[hidden email]>      wrote:
>>>>>>>
>>>>>>> Hi all, I was wondering whether you could give me some advice on
>>>>>>> how to
>>>>>>> improve my search performance.
>>>>>>>
>>>>>>> I have 90 lucene indexes, each having different fields (~5 per
>>>>>>> Document).
>>>>>>> When I search, I always have to go through all indexes to build my
>>>>>>> result
>>>>>>> set. Searching one index takes approx. 100ms, thus searching all
>>>>>>> indexes
>>>>>>> takes 9s in total.
>>>>>>>
>>>>>>> How can I reduce the time it needs to search?
>>>>>>>
>>>>>>> I decided to create this many indexes because putting all data
>>>>>>> in one
>>>>>>> index
>>>>>>> would mean that a document would have ~400 fields, with most of
>>>>>>> them
>>>>>>> left
>>>>>>> empty. Is that ok? Would a single index be faster compared to
>>>>>>> multiple
>>>>>>> small
>>>>>>> ones?
>>>>>>>
>>>>>>> Any pointers are much appreciated.
>>>>>>>
>>>>>>> Regards,
>>>>>>> Alex
>>>>>>>
>>>>>>> ---------------------------------------------------------------------
>>>>>>>
>>>>>>> To unsubscribe, e-mail: [hidden email]
>>>>>>> For additional commands, e-mail: [hidden email]
>>>>>>>
>>>>>>>
>>>>>>
>>>>>> ---------------------------------------------------------------------
>>>>>>
>>>>>> To unsubscribe, e-mail: [hidden email]
>>>>>> For additional commands, e-mail: [hidden email]
>>>>>>
>>>>>
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: [hidden email]
>>>>> For additional commands, e-mail: [hidden email]
>>>>>
>>>>>
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: [hidden email]
>>>> For additional commands, e-mail: [hidden email]
>>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: [hidden email]
>>> For additional commands, e-mail: [hidden email]
>>>
>>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [hidden email]
>> For additional commands, e-mail: [hidden email]
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]