Re: finding potential duplicate documents

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

Re: finding potential duplicate documents

Marco Dissel
Any tips on this issue?

Thanks

Marco
  ----- Original Message -----
  From: Marco Dissel
  To: [hidden email]
  Sent: Friday, May 13, 2005 9:05 AM
  Subject: finding potential duplicate documents


  Hello

  I've got many documents that are potentially duplicate (merging several external systems). Any tips how to find documents that are potentially duplicate (using a variable ranking like >0.5 match)..

  I can use the similarity (MoreLikeThis) method from Sandbox, but that's always comparing one document with the index. Is there a way to give back all the potential duplicate documents in the index without interating every document in the index and compare it with the other documents in the index.

  Thanks
  Marco


  ---------------------------------------------------------------------
  To unsubscribe, e-mail: [hidden email]
  For additional commands, e-mail: [hidden email]