mulitple website crawling

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

mulitple website crawling

Michael Ji
hi there,

If I put multiple web URL in the plain text file
"urls" in the following command, will it fetch
multiple website for me?

"
bin/nutch crawl urls -dir crawl.test -depth 3 >&
crawl.log
"

I tried it, but didn't get a return search result.
Anything I missed?

thanks,

Michael,


__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around
http://mail.yahoo.com 
Reply | Threaded
Open this post in threaded view
|

Re: mulitple website crawling

luti
Please check your crawl-urlfilter.txt. If you use older version of nutch
(e.g. 0.6 final), there is an entry, that specifies that, crawl only
from nutch.org.

Feng (Michael) Ji wrotte:

>hi there,
>
>If I put multiple web URL in the plain text file
>"urls" in the following command, will it fetch
>multiple website for me?
>
>"
>bin/nutch crawl urls -dir crawl.test -depth 3 >&
>crawl.log
>"
>
>I tried it, but didn't get a return search result.
>Anything I missed?
>
>thanks,
>
>Michael,
>
>
>__________________________________________________
>Do You Yahoo!?
>Tired of spam?  Yahoo! Mail has the best spam protection around
>http://mail.yahoo.com 
>
>
>  
>

Reply | Threaded
Open this post in threaded view
|

Re: mulitple website crawling

Michael Ji
hi there,

I already edit this file, so it is "*.*", means I
accept any websites.

If I use crawling command as following, where I
specify search depth.

"
bin/nutch admin db1 -create
bin/nutch inject db1 -urlfile urls-full
bin/nutch generate db1 segments1
s1=`ls -d segments1/2* | tail -1`
bin/nutch fetch $s1 >& m1.log
bin/nutch updatedb db1 $s1
bin/nutch generate db1 segments -topN 10000
bin/nutch index $s1
bin/nutch dedup segments1 dedup.tmp
"

thanks,

Michael

--- "[hidden email]" <[hidden email]>
wrote:

> Please check your crawl-urlfilter.txt. If you use
> older version of nutch
> (e.g. 0.6 final), there is an entry, that specifies
> that, crawl only
> from nutch.org.
>
> Feng (Michael) Ji wrotte:
>
> >hi there,
> >
> >If I put multiple web URL in the plain text file
> >"urls" in the following command, will it fetch
> >multiple website for me?
> >
> >"
> >bin/nutch crawl urls -dir crawl.test -depth 3 >&
> >crawl.log
> >"
> >
> >I tried it, but didn't get a return search result.
> >Anything I missed?
> >
> >thanks,
> >
> >Michael,
> >
> >
> >__________________________________________________
> >Do You Yahoo!?
> >Tired of spam?  Yahoo! Mail has the best spam
> protection around
> >http://mail.yahoo.com 
> >
> >
> >  
> >
>
>



               
____________________________________________________
Start your day with Yahoo! - make it your home page
http://www.yahoo.com/r/hs