Unique IDs for URLs in crawl file

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

Unique IDs for URLs in crawl file

Björn Wilmsmann
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi everybody,

I need to attach a unique ID to each URL in the file processed by the  
nutch crawler in order to identify URLs for saving the parsed and  
indexed results in a database. Does anybody have an idea of what  
could be considered the best way and place to implement such a feature?

- --
Best regards,
Björn Wilmsmann


-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.5 (Darwin)

iD8DBQFFYiGvgz0R1bg11MERAi3cAJ9Vv+EXu3AHf5jPEdVX6AJzyvbFogCeOs4Q
zobesdszGf52elrTB2Al6Ik=
=6nM5
-----END PGP SIGNATURE-----