nutch parsers

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

nutch parsers

discoversk
Hello,

1.    How parsers are parsing or extracting urls from documents like html/doc/pdf?

2.     If we got 1000 urls at depth 0, and we have given topN 100; in this case what is algorithm nutch is using to select 100 urls out of 1000 ??



Thanks.
Salim