Deletions

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Deletions

John Griffin-3
Guys (and Gals),

 

A question on index deletions, what exactly happens to the Lucene document
numbers in an index when a document is deleted? Let's say I have a 5 doc
index.

 

Document #                  Doc

0                                  doc1                            

1                                  doc2

2                                  doc3

3                                  doc4

4                                  doc5

 

If doc 2 is deleted, is this what I'm left with?

 

Document #                  Doc

0                                  doc1                            

1                                  doc2

2                                  doc4

3                                  doc5

 

This is my assumption. If not, what DOES happen?

 

TIA

 

John G.

 

 

 

Reply | Threaded
Open this post in threaded view
|

Re: Deletions

Anshum-2
Hi John,

In case of deletions, it is just a delayed delete. In other words, the doc
is just marked as deleted in the deletable file, leaving a void in the
numbering of docs. The actual shifting of document ids happens only when you
optimize the index. In that case the deletables file is used to physically
remove the docs from the index.

Hope that clears the doubt :)

--
Anshum Gupta
Naukri Labs!

On Fri, Jul 11, 2008 at 8:24 AM, John Griffin <[hidden email]>
wrote:

> Guys (and Gals),
>
>
>
> A question on index deletions, what exactly happens to the Lucene document
> numbers in an index when a document is deleted? Let's say I have a 5 doc
> index.
>
>
>
> Document #                  Doc
>
> 0                                  doc1
>
> 1                                  doc2
>
> 2                                  doc3
>
> 3                                  doc4
>
> 4                                  doc5
>
>
>
> If doc 2 is deleted, is this what I'm left with?
>
>
>
> Document #                  Doc
>
> 0                                  doc1
>
> 1                                  doc2
>
> 2                                  doc4
>
> 3                                  doc5
>
>
>
> This is my assumption. If not, what DOES happen?
>
>
>
> TIA
>
>
>
> John G.
>
>
>
>
>
>
>
>


--
--
The facts expressed here belong to everybody, the opinions to me.
The distinction is yours to draw............
Reply | Threaded
Open this post in threaded view
|

Re: Deletions

Michael McCandless-2

The deleted docs are actually stored separately, per segment, into  
files named _X_N.del, where X is the segment name and N is a  
generation count (keeps increasing by 1 every time new deletes are  
committed to that segment).

Normal segment merging will also collapse the deletes in those  
segments that were merged, thus collapsing down the docIDs.  You can  
also call IndexWriter.expungeDeletes() to collapse all holes from the  
index.  That method just merges adjacent segments that have deletes ...

Lucene used to also have a file "deletable" which tracked those index  
files that should be deleted, but that is no longer used as of 2.1.  
Instead, Lucene computes (using reference counting) which files in the  
directory are no longer referenced.

Mike

Anshum wrote:

> Hi John,
>
> In case of deletions, it is just a delayed delete. In other words,  
> the doc
> is just marked as deleted in the deletable file, leaving a void in the
> numbering of docs. The actual shifting of document ids happens only  
> when you
> optimize the index. In that case the deletables file is used to  
> physically
> remove the docs from the index.
>
> Hope that clears the doubt :)
>
> --
> Anshum Gupta
> Naukri Labs!
>
> On Fri, Jul 11, 2008 at 8:24 AM, John Griffin <[hidden email]
> >
> wrote:
>
>> Guys (and Gals),
>>
>>
>>
>> A question on index deletions, what exactly happens to the Lucene  
>> document
>> numbers in an index when a document is deleted? Let's say I have a  
>> 5 doc
>> index.
>>
>>
>>
>> Document #                  Doc
>>
>> 0                                  doc1
>>
>> 1                                  doc2
>>
>> 2                                  doc3
>>
>> 3                                  doc4
>>
>> 4                                  doc5
>>
>>
>>
>> If doc 2 is deleted, is this what I'm left with?
>>
>>
>>
>> Document #                  Doc
>>
>> 0                                  doc1
>>
>> 1                                  doc2
>>
>> 2                                  doc4
>>
>> 3                                  doc5
>>
>>
>>
>> This is my assumption. If not, what DOES happen?
>>
>>
>>
>> TIA
>>
>>
>>
>> John G.
>>
>>
>>
>>
>>
>>
>>
>>
>
>
> --
> --
> The facts expressed here belong to everybody, the opinions to me.
> The distinction is yours to draw............


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]