Should I use one or many index's?

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Should I use one or many index's?

acorcutt
Hello, I'm just starting out with lucene and need to decide how to organize my index's..

I have a project where many users (1000's) will need to search only there own documents, maybe upto 1000 documents each at a time

I'm trying to decide if I should go with 1 large index or one smaller index for each user, I would guess that 1 smaller index per user would give faster search results, but is it a good idea with regards to using many index reader/writers at once performance/memory wise? - with the 1 big index approach I can just keep 1 of each open globaly. Also is there big performance loss opening/closing readers as a user needs them?

I guess my options are really just:
1. One large index / One global reader/writer.
2. Many index's / many global readers/writers.
3. Many index's / open&close index's as needed

any advice appreciated,
thanks
Anthony




Reply | Threaded
Open this post in threaded view
|

Re: Should I use one or many index's?

Fredrik Andersson-2-2
If the users only should have access to search their own documents, it would
probably make sense to keep their respective index locally. Besides greater
query speed, it would also simplify things when updating/appending the
index. So, that would mean one index, one IndexModifier and one
IndexSearcher living on each persons computer. Or am I missing something?

Fredrik


On 5/26/06, acorcutt <[hidden email]> wrote:

>
>
> Hello, I'm just starting out with lucene and need to decide how to
> organize
> my index's..
>
> I have a project where many users (1000's) will need to search only there
> own documents, maybe upto 1000 documents each at a time
>
> I'm trying to decide if I should go with 1 large index or one smaller
> index
> for each user, I would guess that 1 smaller index per user would give
> faster
> search results, but is it a good idea with regards to using many index
> reader/writers at once performance/memory wise? - with the 1 big index
> approach I can just keep 1 of each open globaly. Also is there big
> performance loss opening/closing readers as a user needs them?
>
> I guess my options are really just:
> 1. One large index / One global reader/writer.
> 2. Many index's / many global readers/writers.
> 3. Many index's / open&close index's as needed
>
> any advice appreciated,
> thanks
> Anthony
>
>
>
>
>
> --
> View this message in context:
> http://www.nabble.com/Should+I+use+one+or+many+index%27s--t1684400.html#a4568925
> Sent from the Lucene - General forum at Nabble.com.
>
>
Reply | Threaded
Open this post in threaded view
|

Re: Should I use one or many index's?

acorcutt
sorry I did not mention this will be on a web server.. so if its one index per user it would mean many indexmodifiers running at once on the server.
I guess mainly im trying to determine if running many indexmodifiers is a bad thing - I can see that smaller indexs will be quicker, but would that speed be lost by running maybe 1000 at once?
Reply | Threaded
Open this post in threaded view
|

Re: Should I use one or many index's?

Fredrik Andersson-2-2
Ok, I figured you had some setup like that.

Personally, I would prefer one large index. The overhead associated with
opening/closing/managing thousands of searchers/modifiers is much bigger
than to incorporate the personal restriction in the query. Also, you risk
running out of filepointers, depending on your OS.
If your documents are indexed like (userId, title, text, date), it's no
hassle to search the subset of a certain userId even if the index is several
million records big, by using a MultiFieldQuery which enforces the userId
field.
I have personally deployed this tactic in several projects where we've
handled user restrictions / subset searches, and it's very convinient.
Regarding the IndexModifier - you can only have one modifier open at once
since they take write-looks. So, you would probably want to have a "crawler"
which updates the index continously while you have an arbitrary amount of
IndexSearchers (read-only) attached to the very same index.

Anyway, my $0.02!



On 5/26/06, acorcutt <[hidden email]> wrote:

>
>
> sorry I did not mention this will be on a web server.. so if its one index
> per user it would mean many indexmodifiers running at once on the server.
> I guess mainly im trying to determine if running many indexmodifiers is a
> bad thing - I can see that smaller indexs will be quicker, but would that
> speed be lost by running maybe 1000 at once?
> --
> View this message in context:
> http://www.nabble.com/Should+I+use+one+or+many+index%27s--t1684400.html#a4574646
> Sent from the Lucene - General forum at Nabble.com.
>
>
Reply | Threaded
Open this post in threaded view
|

Re: Should I use one or many index's?

acorcutt
ok thanks, im going to have to look into how lucene handles concurrent inserts/updates/deletes if i go for 1 big index - i need them near real-time, from multiple users at once. With smaller indexes i dont see a problem with the updates as most would be less than 100mb and only 1 user updating it.
at the moment sqlserver is looking a better option.