Crawling using nutch jar/job file

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Crawling using nutch jar/job file

kranthi reddy
Hi,

    I dunno if what i am asking is right or wrong . please correct me if i
am wrong .

   If i am right the nutch*.job or nutch*.jar file i create by compiling
using "ant" has everything needed for nutch to do the crawling.

  Is it possible to crawling using hadoop like

bin/hadoop jar *.jar classpath input output

where do i get to mention the depth,threads etc.

If it is possible then how can be done ?

Any help would be greatly appreciated
I couldn't find any online help.

Thanking you
kranthi reddy.B
Reply | Threaded
Open this post in threaded view
|

Re: Crawling using nutch jar/job file

brainstorm-2-2
Short answer: Hadoop does not crawl at all.
Longer answer: Nutch does crawl using hadoop as a backend for
distributed storage and task/job processing.

You need to read:

http://en.wikipedia.org/wiki/Hadoop
http://en.wikipedia.org/wiki/Nutch

On Sun, Jul 13, 2008 at 8:12 PM, kranthi reddy <[hidden email]> wrote:

> Hi,
>
>    I dunno if what i am asking is right or wrong . please correct me if i
> am wrong .
>
>   If i am right the nutch*.job or nutch*.jar file i create by compiling
> using "ant" has everything needed for nutch to do the crawling.
>
>  Is it possible to crawling using hadoop like
>
> bin/hadoop jar *.jar classpath input output
>
> where do i get to mention the depth,threads etc.
>
> If it is possible then how can be done ?
>
> Any help would be greatly appreciated
> I couldn't find any online help.
>
> Thanking you
> kranthi reddy.B
>