nutch 0.9 and eclipse 3.3 -

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

nutch 0.9 and eclipse 3.3 -

Lev Kantorovich
I configured Eclipse following RunNutchInEclipse0.9 guide and got following:

 When I run Nutch 0.9 using bin\nutch crawl  command inside cygwin -
everything is OK, apache.org site (I just use what is written in the
Tutorial) is processed as it should be. When  I do the same  from Eclipse -
it runs, but doesn't process any URLs - one of printouts on the console says
"No URLs to fetch - check your seed list and URL filters."

I have checked bin/nutch script and added following as VM arguments:

 

-Xmx1000m

-Dhadoop.log.dir=c:\nutch-0.9\logs

-Dhadoop.log.file=hadoop.log

-Djava.library.path=c:\nutch-0.9\lib\native\Windows_2003-x86-32  

 

It didn't help.

 

Looking into the list history, I found that other people also had this
problem with nutch 0.9.

What configuration/anything else is missing? Can someone confirm that  nutch
0.9  runs successfully  in eclipse, and advise about settings?

 

I would very appreciate any help.

 

Thank you,

 

Lev Kantorovich

Reply | Threaded
Open this post in threaded view
|

Re: nutch 0.9 and eclipse 3.3 -

Tranquil
I had the same problem,
it's like it doesnt know where to read the url's from.

gave up on it and using ant/VIM meantime... it would be great if someone can
help on this issue.

On Nov 19, 2007 9:18 PM, Lev Kantorovich <[hidden email]> wrote:

> I configured Eclipse following RunNutchInEclipse0.9 guide and got
> following:
>
>  When I run Nutch 0.9 using bin\nutch crawl  command inside cygwin -
> everything is OK, apache.org site (I just use what is written in the
> Tutorial) is processed as it should be. When  I do the same  from Eclipse
> -
> it runs, but doesn't process any URLs - one of printouts on the console
> says
> "No URLs to fetch - check your seed list and URL filters."
>
> I have checked bin/nutch script and added following as VM arguments:
>
>
>
> -Xmx1000m
>
> -Dhadoop.log.dir=c:\nutch-0.9\logs
>
> -Dhadoop.log.file=hadoop.log
>
> -Djava.library.path=c:\nutch-0.9\lib\native\Windows_2003-x86-32
>
>
>
> It didn't help.
>
>
>
> Looking into the list history, I found that other people also had this
> problem with nutch 0.9.
>
> What configuration/anything else is missing? Can someone confirm that
>  nutch
> 0.9  runs successfully  in eclipse, and advise about settings?
>
>
>
> I would very appreciate any help.
>
>
>
> Thank you,
>
>
>
> Lev Kantorovich
>
>


--
Eyal Edri
Reply | Threaded
Open this post in threaded view
|

Re: nutch 0.9 and eclipse 3.3 -

Tranquil
In reply to this post by Lev Kantorovich
check the conf/crawl-urlfilter file, change the line:

+^http://([a-z0-9]*\.)*MY.DOMAIN.NAME/

to

+.


and you're set.

Eyal.



On Nov 19, 2007 9:18 PM, Lev Kantorovich <[hidden email]> wrote:

> I configured Eclipse following RunNutchInEclipse0.9 guide and got
> following:
>
>  When I run Nutch 0.9 using bin\nutch crawl  command inside cygwin -
> everything is OK, apache.org site (I just use what is written in the
> Tutorial) is processed as it should be. When  I do the same  from Eclipse
> -
> it runs, but doesn't process any URLs - one of printouts on the console
> says
> "No URLs to fetch - check your seed list and URL filters."
>
> I have checked bin/nutch script and added following as VM arguments:
>
>
>
> -Xmx1000m
>
> -Dhadoop.log.dir=c:\nutch-0.9\logs
>
> -Dhadoop.log.file=hadoop.log
>
> -Djava.library.path=c:\nutch-0.9\lib\native\Windows_2003-x86-32
>
>
>
> It didn't help.
>
>
>
> Looking into the list history, I found that other people also had this
> problem with nutch 0.9.
>
> What configuration/anything else is missing? Can someone confirm that
>  nutch
> 0.9  runs successfully  in eclipse, and advise about settings?
>
>
>
> I would very appreciate any help.
>
>
>
> Thank you,
>
>
>
> Lev Kantorovich
>
>


--
Eyal Edri