Dedup won't actually dedup

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Dedup won't actually dedup

Jon Shoberg
Any idea why dedup won't actually remove the items?

Thoughts?

*** First Pass ************************************

051008 144843 Clearing old deletions in
051008 144843 Reading url hashes...
051008 144902 Sorting url hashes...
051008 144905 Deleting url duplicates...
051008 144907 Deleted 4082 url duplicates.
051008 144907 Reading content hashes...
051008 144918 Sorting content hashes...
051008 144923 Deleting content duplicates...
051008 144925 Deleted 228430 content duplicates.
051008 144925 Duplicate deletion complete locally.  Now returning to NFS...
051008 144925 DeleteDuplicates complete

*** Second Pass ***********************************

051008 144932 Reading url hashes...
051008 144949 Sorting url hashes...
051008 144953 Deleting url duplicates...
051008 144955 Deleted 4082 url duplicates.
051008 144955 Reading content hashes...
051008 145005 Sorting content hashes...
051008 145011 Deleting content duplicates...
051008 145012 Deleted 228430 content duplicates.
051008 145012 Duplicate deletion complete locally.  Now returning to NFS...
051008 145012 DeleteDuplicates complete


Reply | Threaded
Open this post in threaded view
|

Re: Dedup won't actually dedup

Piotr Kosiorowski
Hello Jon,
As far as I remember dedup marks the records as deleted only without
physically removing them.
And first action of dedup is to clear old deletions (as it is written in
log). So if you repeat it you will get the same number of deleted
records each time.
Regards
Piotr

Jon Shoberg wrote:

> Any idea why dedup won't actually remove the items?
>
> Thoughts?
>
> *** First Pass ************************************
>
> 051008 144843 Clearing old deletions in
> 051008 144843 Reading url hashes...
> 051008 144902 Sorting url hashes...
> 051008 144905 Deleting url duplicates...
> 051008 144907 Deleted 4082 url duplicates.
> 051008 144907 Reading content hashes...
> 051008 144918 Sorting content hashes...
> 051008 144923 Deleting content duplicates...
> 051008 144925 Deleted 228430 content duplicates.
> 051008 144925 Duplicate deletion complete locally.  Now returning to NFS...
> 051008 144925 DeleteDuplicates complete
>
> *** Second Pass ***********************************
>
> 051008 144932 Reading url hashes...
> 051008 144949 Sorting url hashes...
> 051008 144953 Deleting url duplicates...
> 051008 144955 Deleted 4082 url duplicates.
> 051008 144955 Reading content hashes...
> 051008 145005 Sorting content hashes...
> 051008 145011 Deleting content duplicates...
> 051008 145012 Deleted 228430 content duplicates.
> 051008 145012 Duplicate deletion complete locally.  Now returning to NFS...
> 051008 145012 DeleteDuplicates complete
>
>
>

Reply | Threaded
Open this post in threaded view
|

Re: Dedup won't actually dedup

Michael Ji
As I checked code of DeleteDuplicates.java, there is
function called "deleteDuplicates ( )" and has line of


"readers[indexedDoc.index].delete(indexedDoc.doc);"
// delete it

I believe when readers is closed, it does physical
deletion, is it right?

Michael Ji,

--- Piotr Kosiorowski <[hidden email]> wrote:

> Hello Jon,
> As far as I remember dedup marks the records as
> deleted only without
> physically removing them.
> And first action of dedup is to clear old deletions
> (as it is written in
> log). So if you repeat it you will get the same
> number of deleted
> records each time.
> Regards
> Piotr
>
> Jon Shoberg wrote:
> > Any idea why dedup won't actually remove the
> items?
> >
> > Thoughts?
> >
> > *** First Pass
> ************************************
> >
> > 051008 144843 Clearing old deletions in
> > 051008 144843 Reading url hashes...
> > 051008 144902 Sorting url hashes...
> > 051008 144905 Deleting url duplicates...
> > 051008 144907 Deleted 4082 url duplicates.
> > 051008 144907 Reading content hashes...
> > 051008 144918 Sorting content hashes...
> > 051008 144923 Deleting content duplicates...
> > 051008 144925 Deleted 228430 content duplicates.
> > 051008 144925 Duplicate deletion complete locally.
>  Now returning to NFS...
> > 051008 144925 DeleteDuplicates complete
> >
> > *** Second Pass
> ***********************************
> >
> > 051008 144932 Reading url hashes...
> > 051008 144949 Sorting url hashes...
> > 051008 144953 Deleting url duplicates...
> > 051008 144955 Deleted 4082 url duplicates.
> > 051008 144955 Reading content hashes...
> > 051008 145005 Sorting content hashes...
> > 051008 145011 Deleting content duplicates...
> > 051008 145012 Deleted 228430 content duplicates.
> > 051008 145012 Duplicate deletion complete locally.
>  Now returning to NFS...
> > 051008 145012 DeleteDuplicates complete
> >
> >
> >
>
>



       
               
__________________________________
Yahoo! Mail - PC Magazine Editors' Choice 2005
http://mail.yahoo.com
Reply | Threaded
Open this post in threaded view
|

Re: Dedup won't actually dedup

quovadis
In reply to this post by Piotr Kosiorowski
Is it possible to get the marked records deleted?

On Sun, 09 Oct 2005 11:05:31 +0200
 Piotr Kosiorowski <[hidden email]> wrote:

>Hello Jon,
>As far as I remember dedup marks the records as deleted
>only without physically removing them.
>And first action of dedup is to clear old deletions (as it
>is written in log). So if you repeat it you will get the
>same number of deleted records each time.
>Regards
>Piotr
>
>Jon Shoberg wrote:
>> Any idea why dedup won't actually remove the items?
>>
>> Thoughts?
>>
>> *** First Pass ************************************
>>
>> 051008 144843 Clearing old deletions in
>> 051008 144843 Reading url hashes...
>> 051008 144902 Sorting url hashes...
>> 051008 144905 Deleting url duplicates...
>> 051008 144907 Deleted 4082 url duplicates.
>> 051008 144907 Reading content hashes...
>> 051008 144918 Sorting content hashes...
>> 051008 144923 Deleting content duplicates...
>> 051008 144925 Deleted 228430 content duplicates.
>> 051008 144925 Duplicate deletion complete locally.  Now
>returning to NFS...
>> 051008 144925 DeleteDuplicates complete
>>
>> *** Second Pass ***********************************
>>
>> 051008 144932 Reading url hashes...
>> 051008 144949 Sorting url hashes...
>> 051008 144953 Deleting url duplicates...
>> 051008 144955 Deleted 4082 url duplicates.
>> 051008 144955 Reading content hashes...
>> 051008 145005 Sorting content hashes...
>> 051008 145011 Deleting content duplicates...
>> 051008 145012 Deleted 228430 content duplicates.
>> 051008 145012 Duplicate deletion complete locally.  Now
>returning to NFS...
>> 051008 145012 DeleteDuplicates complete
>>
>>
>>
>

____________________________________________________________
Specials on Demo Appliances http://www.discountdirect.co.za

http://www.webmail.co.za the South African FREE email service