refresh segments for deleted documents?

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

refresh segments for deleted documents?

Robert Engels
I implemented the IndexReader.reopen(). My original implementation did not
"refresh" the deleted documents, and it seemed to work. The latest impl does
re-read the deletions.

BUT, on inspecting the IndexReader code, I am not sure this is necessary???

When a document is deleted, IndexReader marks the bit as deleted in the
SegmentReader, and if the SegmentReader instance is "reused", the document
is still deleted. If the Segment was merged, it would not be "reused"
anyway.

Doug, can you comment on exactly why the 'deletions' need to be re-read?
Doesn't seem necessary to me.
Reply | Threaded
Open this post in threaded view
|

Re: refresh segments for deleted documents?

Doug Cutting
Robert Engels wrote:
> Doug, can you comment on exactly why the 'deletions' need to be re-read?
> Doesn't seem necessary to me.

A common idiom is to use one IndexReader for searches, and a separate
for deletions.  For example, one might do something like:

1. Open IndexReader A.
2. Start serving queries against A.
3. Open IndexReader B.
4. Process queued deletions/updates against B.
5. Close B.
6. Open IndexWriter C.
7. Process queued additions/updates against C.
8. Close C.
9. Sleep until 1 minute has elapsed.
10. Go to step 1.

This would publish a new version of the index every minute, attempting
to batch insertions, updates and deletes, as is optimal.  In this case,
if you re-open A, its deletions could be out-of sync, but if you re-open
B its deletions would not be out of sync.

So perhaps in your usage pattern deletions are never out of sync at
re-open, but there are also reasonable usage patterns where deletions
can become out of sync on re-open.

Doug

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

RE: refresh segments for deleted documents?

Robert Engels
Thanks. I understand now. In my usage pattern deletions are never out of
sync - that is why it works.



-----Original Message-----
From: Doug Cutting [mailto:[hidden email]]
Sent: Monday, May 01, 2006 5:36 PM
To: [hidden email]
Subject: Re: refresh segments for deleted documents?


Robert Engels wrote:
> Doug, can you comment on exactly why the 'deletions' need to be re-read?
> Doesn't seem necessary to me.

A common idiom is to use one IndexReader for searches, and a separate
for deletions.  For example, one might do something like:

1. Open IndexReader A.
2. Start serving queries against A.
3. Open IndexReader B.
4. Process queued deletions/updates against B.
5. Close B.
6. Open IndexWriter C.
7. Process queued additions/updates against C.
8. Close C.
9. Sleep until 1 minute has elapsed.
10. Go to step 1.

This would publish a new version of the index every minute, attempting
to batch insertions, updates and deletes, as is optimal.  In this case,
if you re-open A, its deletions could be out-of sync, but if you re-open
B its deletions would not be out of sync.

So perhaps in your usage pattern deletions are never out of sync at
re-open, but there are also reasonable usage patterns where deletions
can become out of sync on re-open.

Doug

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]