MD5 vs TextProfile Signature

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

MD5 vs TextProfile Signature

Rajasekar Karthik
Hi,
Wondering which does a better job - MD5 or TextProfile signature? From what I get from the apis and if there is content on a page, MD5 calculates the raw binary content of a page and TextProfile calculates the plain text profile of the page. I believe the values calculated are used to delete duplicates.

Wouldn't be better if pages that contain unwanted characters - where these characters are removed before doing a hash of them (because content on two pages could be same, except they differ with these unwanted characters) ?

Thanks,
Karthik