How to Use ParallelReader

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

How to Use ParallelReader

Liu_Andy2
Hi,

There is one class named org.apache.lucene.index.ParallelReader, as its javadoc stated:
An IndexReader which reads multiple, parallel indexes. Each index added must have the same number of documents, but typically each contains different fields. Each document contains the union of the fields of all documents with the same document number. When searching, matches for a query term are from the first index added that has the field.
This is useful, e.g., with collections that have large fields which change rarely and small fields that change more frequently. The smaller fields may be re-indexed in a new index and both indexes may be searched together.
Warning: It is up to you to make sure all indexes are created and modified the same way. For example, if you add documents to one index, you need to add the same documents in the same order to the other indexes. Failure to do so will result in undefined behavior.

My question is: If I just want to update the small fields in one index and do not want to update the large fields in another index, how can I make sure these two indexes are synchronized and have the same document number?

Thanks and Regards     
Andy

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: How to Use ParallelReader

Chris Hostetter-3

: My question is: If I just want to update the small fields in one index
: and do not want to update the large fields in another index, how can I
: make sure these two indexes are synchronized and have the same document
: number?

the short answer: build them in the same order, use the exact same
IndexWriter settings, and optimize both indexes.  you can rebuild either
of them again and again and again if you want -- as long as you keep doing
it in the same order.

ParallelReader is a pretty special caes class that not a lot of people
seem to use (or if they are they don't talk about it much) but there has
been a few discussions about it in the past .. i would suggest searching
the mail archives and pay special attention to anything by Chuck Williams
... he's pretty much the foremost authority on ParallelReader.



-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: How to Use ParallelReader

Otis Gospodnetic-2
In reply to this post by Liu_Andy2
Hi,

----- Original Message ----
From: Chris Hostetter <[hidden email]>
To: [hidden email]
Sent: Saturday, June 16, 2007 3:10:08 AM
Subject: Re: How to Use ParallelReader


: My question is: If I just want to update the small fields in one index
: and do not want to update the large fields in another index, how can I
: make sure these two indexes are synchronized and have the same document
: number?

the short answer: build them in the same order, use the exact same
IndexWriter settings, and optimize both indexes.  you can rebuild either
of them again and again and again if you want -- as long as you keep doing
it in the same order.

OG: I think I understood how PR worked at one point, but have since forgotten.  I can't recall how one gets docIds to match up after updates (del+add).  For example:

docId                index1                index2
1                               uid:10              name:Chuck
2                                uid:20              name:Mark
3                                uid:30              name:Chris
4                                uid:40             name:Tarzan

OG: If I need to change Chris' name to Yonik, I have to delete docId 3 in index2 and re-add.  When it gets re-added we have docId 3 == isDeleted and the new doc with name:Yonik has docId == 5.  Say that both indices are then closed and even optimized, and then re-opened, aren't docs going to be misaligned?

docId                index1                index2

1                               uid:10              name:Chuck

2                                uid:20              name:Mark

3                                uid:30              name:Tarzan

4                                uid:40             name:Yonik


OG: No?

Otis





---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]