Performance searching over multiple indexes

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Performance searching over multiple indexes

Ard Schrijvers
Hello,

I am experimenting with lucene MultiSearcher and do some simple
BooleanQueries in which I combine a couple of TermQueries. I am
experiencing, that a single lucene index for just 100.000 docs (~10 k
each) is like 100 times faster than when I have about 100 seperate
indexes and use MultiSearcher. The difference specifically is visible
when the number of hits gets lower (ie, more TermQueries). A single
index seems to be way faster. I must admit I did optimize the single
index (but I can't imagine this explains the 100X).

Is it correct that a single index is much faster when the query consists
of many TermQueries where the number of hits is low? Does lucene
something like starting with the Term that has the lowest number of
hits, and then do the consecutive terms with the lowest hits? Is this
more efficient within one index, or is it the combining of the hits that
makes it slower?

Hopefully somebody can enlight me,

thx

Regards Ard

--

Hippo
Oosteinde 11
1017WT Amsterdam
The Netherlands
Tel  +31 (0)20 5224466
-------------------------------------------------------------
[hidden email] / [hidden email] / http://www.hippo.nl
--------------------------------------------------------------

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

RE: Performance searching over multiple indexes

Fang_Li
Using more than one Index will definitely decrease the searching
performance. The most Lucene search latency is to load the hits. If
there is no hit, the searching takes a short time, dozens milli seconds
and it's a const if the document number is less than 1M. search 100
indexes will take 100 times longer.

I think it's not a good way to use many indexes when the document number
is small. Also you can try ParallelMultiSearcher which search all
indexes in parallel.

I suppose you did not take the opening index time in the comparison.


-----Original Message-----
From: Ard Schrijvers [mailto:[hidden email]]
Sent: Thursday, October 25, 2007 6:09 PM
To: [hidden email]
Subject: Performance searching over multiple indexes

Hello,

I am experimenting with lucene MultiSearcher and do some simple
BooleanQueries in which I combine a couple of TermQueries. I am
experiencing, that a single lucene index for just 100.000 docs (~10 k
each) is like 100 times faster than when I have about 100 seperate
indexes and use MultiSearcher. The difference specifically is visible
when the number of hits gets lower (ie, more TermQueries). A single
index seems to be way faster. I must admit I did optimize the single
index (but I can't imagine this explains the 100X).

Is it correct that a single index is much faster when the query consists
of many TermQueries where the number of hits is low? Does lucene
something like starting with the Term that has the lowest number of
hits, and then do the consecutive terms with the lowest hits? Is this
more efficient within one index, or is it the combining of the hits that
makes it slower?

Hopefully somebody can enlight me,

thx

Regards Ard

--

Hippo
Oosteinde 11
1017WT Amsterdam
The Netherlands
Tel  +31 (0)20 5224466
-------------------------------------------------------------
[hidden email] / [hidden email] / http://www.hippo.nl
--------------------------------------------------------------

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]



---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

RE: Performance searching over multiple indexes

Ard Schrijvers
Hello,

> Using more than one Index will definitely decrease the
> searching performance. The most Lucene search latency is to
> load the hits. If there is no hit, the searching takes a
> short time, dozens milli seconds and it's a const if the
> document number is less than 1M. search 100 indexes will take
> 100 times longer.

I was kind of experiencing this indeed :-)

>
> I think it's not a good way to use many indexes when the
> document number is small. Also you can try
> ParallelMultiSearcher which search all indexes in parallel.

Yes, I have tried it already, but for few documents (say 100.000) in 100
indexes, ParallelMultiSearcher is actually even slower (i think because
of locks. Probably only makes sense if you have a couple of very large
indexes)

>
> I suppose you did not take the opening index time in the comparison.

No. The original reasoning behind the "many indexes" was to have fast
incremental updating, have one index in memory, and flush to persistent
indexes every x sec. Then, these persistent indexes are being merged
likewise the segments in lucene. I think the idea is ok, but the number
of persistent indexes must be kept small I think. I'll do some more
testing,

thx for your advice,

regards Ard


>
>
> -----Original Message-----
> From: Ard Schrijvers [mailto:[hidden email]]
> Sent: Thursday, October 25, 2007 6:09 PM
> To: [hidden email]
> Subject: Performance searching over multiple indexes
>
> Hello,
>
> I am experimenting with lucene MultiSearcher and do some
> simple BooleanQueries in which I combine a couple of
> TermQueries. I am experiencing, that a single lucene index
> for just 100.000 docs (~10 k
> each) is like 100 times faster than when I have about 100
> seperate indexes and use MultiSearcher. The difference
> specifically is visible when the number of hits gets lower
> (ie, more TermQueries). A single index seems to be way
> faster. I must admit I did optimize the single index (but I
> can't imagine this explains the 100X).
>
> Is it correct that a single index is much faster when the
> query consists of many TermQueries where the number of hits
> is low? Does lucene something like starting with the Term
> that has the lowest number of hits, and then do the
> consecutive terms with the lowest hits? Is this more
> efficient within one index, or is it the combining of the
> hits that makes it slower?
>
> Hopefully somebody can enlight me,
>
> thx
>
> Regards Ard
>
> --
>
> Hippo
> Oosteinde 11
> 1017WT Amsterdam
> The Netherlands
> Tel  +31 (0)20 5224466
> -------------------------------------------------------------
> [hidden email] / [hidden email] / http://www.hippo.nl
> --------------------------------------------------------------
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]