Quantcast

Nutch Crawling error

classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Nutch Crawling error

Reza Harditya
Hi,

I'm a new nutch user. Currently I'm using Nutch 0.8.1. When I wanted to
start crawling according to the tutorial, I always get the following error:

Injector: starting
Injector: crawlDb: crawl2/crawldb
Injector: urlDir: urls
Injector: Converting injected urls to crawl db entries.
Exception in thread "main" java.io.IOException: Job failed!
        at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:357)
        at org.apache.nutch.crawl.Injector.inject(Injector.java:138)
        at org.apache.nutch.crawl.Crawl.main(Crawl.java:105)
------------------------------------------------------------------------------------------------------------

From the log, I found a more detailed description which is:

2007-05-14 09:32:57,977 INFO  crawl.Injector - Injector: starting
2007-05-14 09:32:57,978 INFO  crawl.Injector - Injector: crawlDb:
crawl2/crawldb
2007-05-14 09:32:57,978 INFO  crawl.Injector - Injector: urlDir: urls
2007-05-14 09:32:57,978 INFO  crawl.Injector - Injector: Converting injected
urls to crawl db entries.
2007-05-14 09:32:58,908 WARN  mapred.LocalJobRunner - job_lzlk81
java.lang.RuntimeException: java.net.UnknownHostException: dhcppc0: dhcppc0
        at org.apache.hadoop.io.SequenceFile$Writer.<init>(SequenceFile.java
:76)
        at org.apache.hadoop.io.SequenceFile$Writer.<init>(SequenceFile.java
:89)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:77)
        at org.apache.hadoop.mapred.LocalJobRunner$Job.run(
LocalJobRunner.java:91)
Caused by: java.net.UnknownHostException: dhcppc0: dhcppc0
        at java.net.InetAddress.getLocalHost(InetAddress.java:1308)
        at org.apache.hadoop.io.SequenceFile$Writer.<init>(SequenceFile.java
:73)
        ... 3 more


At first I suspect that the error was caused by tomcat not running properly,
but after doing some checking I am confirmed that tomcat is indeed running.

Could somebody let me know what I might be doing wrong here?

Cheers,
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Nutch Crawling error

Dennis Kubes
For some reason the nutch process can't resolve the hosts.  This could
be due to incorrect setup of dns on the machine or a firewall or proxy
in place.  See if you can ping one of the urls (hosts) that you are
trying to fetch.

Dennis Kubes

Reza Harditya wrote:

> Hi,
>
> I'm a new nutch user. Currently I'm using Nutch 0.8.1. When I wanted to
> start crawling according to the tutorial, I always get the following error:
>
> Injector: starting
> Injector: crawlDb: crawl2/crawldb
> Injector: urlDir: urls
> Injector: Converting injected urls to crawl db entries.
> Exception in thread "main" java.io.IOException: Job failed!
>        at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:357)
>        at org.apache.nutch.crawl.Injector.inject(Injector.java:138)
>        at org.apache.nutch.crawl.Crawl.main(Crawl.java:105)
> ------------------------------------------------------------------------------------------------------------
>
>
>  From the log, I found a more detailed description which is:
>
> 2007-05-14 09:32:57,977 INFO  crawl.Injector - Injector: starting
> 2007-05-14 09:32:57,978 INFO  crawl.Injector - Injector: crawlDb:
> crawl2/crawldb
> 2007-05-14 09:32:57,978 INFO  crawl.Injector - Injector: urlDir: urls
> 2007-05-14 09:32:57,978 INFO  crawl.Injector - Injector: Converting
> injected
> urls to crawl db entries.
> 2007-05-14 09:32:58,908 WARN  mapred.LocalJobRunner - job_lzlk81
> java.lang.RuntimeException: java.net.UnknownHostException: dhcppc0: dhcppc0
>        at org.apache.hadoop.io.SequenceFile$Writer.<init>(SequenceFile.java
> :76)
>        at org.apache.hadoop.io.SequenceFile$Writer.<init>(SequenceFile.java
> :89)
>        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:77)
>        at org.apache.hadoop.mapred.LocalJobRunner$Job.run(
> LocalJobRunner.java:91)
> Caused by: java.net.UnknownHostException: dhcppc0: dhcppc0
>        at java.net.InetAddress.getLocalHost(InetAddress.java:1308)
>        at org.apache.hadoop.io.SequenceFile$Writer.<init>(SequenceFile.java
> :73)
>        ... 3 more
>
>
> At first I suspect that the error was caused by tomcat not running
> properly,
> but after doing some checking I am confirmed that tomcat is indeed running.
>
> Could somebody let me know what I might be doing wrong here?
>
> Cheers,
>
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Nutch Crawling error

Reza Harditya
I have checked and confirmed that the hosts I'm trying to fetch are actually
accessible (ping requests and loading the site itself). However, I still get
the same error.

Any other alternatives?


On 5/14/07, Dennis Kubes <[hidden email]> wrote:

>
> For some reason the nutch process can't resolve the hosts.  This could
> be due to incorrect setup of dns on the machine or a firewall or proxy
> in place.  See if you can ping one of the urls (hosts) that you are
> trying to fetch.
>
> Dennis Kubes
>
> Reza Harditya wrote:
> > Hi,
> >
> > I'm a new nutch user. Currently I'm using Nutch 0.8.1. When I wanted to
> > start crawling according to the tutorial, I always get the following
> error:
> >
> > Injector: starting
> > Injector: crawlDb: crawl2/crawldb
> > Injector: urlDir: urls
> > Injector: Converting injected urls to crawl db entries.
> > Exception in thread "main" java.io.IOException: Job failed!
> >        at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:357)
> >        at org.apache.nutch.crawl.Injector.inject(Injector.java:138)
> >        at org.apache.nutch.crawl.Crawl.main(Crawl.java:105)
> >
> ------------------------------------------------------------------------------------------------------------
> >
> >
> >  From the log, I found a more detailed description which is:
> >
> > 2007-05-14 09:32:57,977 INFO  crawl.Injector - Injector: starting
> > 2007-05-14 09:32:57,978 INFO  crawl.Injector - Injector: crawlDb:
> > crawl2/crawldb
> > 2007-05-14 09:32:57,978 INFO  crawl.Injector - Injector: urlDir: urls
> > 2007-05-14 09:32:57,978 INFO  crawl.Injector - Injector: Converting
> > injected
> > urls to crawl db entries.
> > 2007-05-14 09:32:58,908 WARN  mapred.LocalJobRunner - job_lzlk81
> > java.lang.RuntimeException: java.net.UnknownHostException: dhcppc0:
> dhcppc0
> >        at org.apache.hadoop.io.SequenceFile$Writer.<init>(
> SequenceFile.java
> > :76)
> >        at org.apache.hadoop.io.SequenceFile$Writer.<init>(
> SequenceFile.java
> > :89)
> >        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:77)
> >        at org.apache.hadoop.mapred.LocalJobRunner$Job.run(
> > LocalJobRunner.java:91)
> > Caused by: java.net.UnknownHostException: dhcppc0: dhcppc0
> >        at java.net.InetAddress.getLocalHost(InetAddress.java:1308)
> >        at org.apache.hadoop.io.SequenceFile$Writer.<init>(
> SequenceFile.java
> > :73)
> >        ... 3 more
> >
> >
> > At first I suspect that the error was caused by tomcat not running
> > properly,
> > but after doing some checking I am confirmed that tomcat is indeed
> running.
> >
> > Could somebody let me know what I might be doing wrong here?
> >
> > Cheers,
> >
>
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Nutch Crawling error

Reza Harditya
Caused by: java.net.UnknownHostException: dhcppc0: dhcppc0
        at java.net.InetAddress.getLocalHost(InetAddress.java:1308)
        at org.apache.hadoop.io.SequenceFile$Writer.<init>(SequenceFile.java
 :73)

Could it be that it is because I have an installation of apache and tomcat
in the host that I've installed Nutch and it cannot determine whether
'localhost' points to the apache or tomcat? Or does it matter anyway?

I have both servers(apache and tomcat) listening on the default port# which
is 80 and 8080.




On 5/14/07, Reza Harditya <[hidden email]> wrote:

>
> I have checked and confirmed that the hosts I'm trying to fetch are
> actually accessible (ping requests and loading the site itself). However, I
> still get the same error.
>
> Any other alternatives?
>
>
> On 5/14/07, Dennis Kubes <[hidden email]> wrote:
> >
> > For some reason the nutch process can't resolve the hosts.  This could
> > be due to incorrect setup of dns on the machine or a firewall or proxy
> > in place.  See if you can ping one of the urls (hosts) that you are
> > trying to fetch.
> >
> > Dennis Kubes
> >
> > Reza Harditya wrote:
> > > Hi,
> > >
> > > I'm a new nutch user. Currently I'm using Nutch 0.8.1. When I wanted
> > to
> > > start crawling according to the tutorial, I always get the following
> > error:
> > >
> > > Injector: starting
> > > Injector: crawlDb: crawl2/crawldb
> > > Injector: urlDir: urls
> > > Injector: Converting injected urls to crawl db entries.
> > > Exception in thread "main" java.io.IOException : Job failed!
> > >        at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java
> > :357)
> > >        at org.apache.nutch.crawl.Injector.inject(Injector.java:138)
> > >        at org.apache.nutch.crawl.Crawl.main (Crawl.java:105)
> > >
> > ------------------------------------------------------------------------------------------------------------
> > >
> > >
> > >  From the log, I found a more detailed description which is:
> > >
> > > 2007-05-14 09:32:57,977 INFO  crawl.Injector - Injector: starting
> > > 2007-05-14 09:32:57,978 INFO  crawl.Injector - Injector: crawlDb:
> > > crawl2/crawldb
> > > 2007-05-14 09:32:57,978 INFO  crawl.Injector - Injector: urlDir: urls
> > > 2007-05-14 09:32:57,978 INFO  crawl.Injector - Injector: Converting
> > > injected
> > > urls to crawl db entries.
> > > 2007-05-14 09:32:58,908 WARN  mapred.LocalJobRunner - job_lzlk81
> > > java.lang.RuntimeException: java.net.UnknownHostException: dhcppc0:
> > dhcppc0
> > >        at org.apache.hadoop.io.SequenceFile$Writer.<init>(
> > SequenceFile.java
> > > :76)
> > >        at org.apache.hadoop.io.SequenceFile$Writer .<init>(
> > SequenceFile.java
> > > :89)
> > >        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:77)
> > >        at org.apache.hadoop.mapred.LocalJobRunner$Job.run(
> > > LocalJobRunner.java:91)
> > > Caused by: java.net.UnknownHostException: dhcppc0: dhcppc0
> > >        at java.net.InetAddress.getLocalHost(InetAddress.java:1308)
> > >        at org.apache.hadoop.io.SequenceFile$Writer.<init>(
> > SequenceFile.java
> > > :73)
> > >        ... 3 more
> > >
> > >
> > > At first I suspect that the error was caused by tomcat not running
> > > properly,
> > > but after doing some checking I am confirmed that tomcat is indeed
> > running.
> > >
> > > Could somebody let me know what I might be doing wrong here?
> > >
> > > Cheers,
> > >
> >
>
>
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Nutch Crawling error

Dennis Kubes
If dhcppc0 is the host that you are on you might want to check that your
hosts file has the localhost line pointing to 127.0.0.1 and that dhcppc0
is also pointing to 127.0.0.1.  Something like this.

127.0.0.1               yourhost.domain.com yourhost
localhost.localdomain localhost

Dennis Kubes

Reza Harditya wrote:

> Caused by: java.net.UnknownHostException: dhcppc0: dhcppc0
>        at java.net.InetAddress.getLocalHost(InetAddress.java:1308)
>        at org.apache.hadoop.io.SequenceFile$Writer.<init>(SequenceFile.java
> :73)
>
> Could it be that it is because I have an installation of apache and tomcat
> in the host that I've installed Nutch and it cannot determine whether
> 'localhost' points to the apache or tomcat? Or does it matter anyway?
>
> I have both servers(apache and tomcat) listening on the default port# which
> is 80 and 8080.
>
>
>
>
> On 5/14/07, Reza Harditya <[hidden email]> wrote:
>>
>> I have checked and confirmed that the hosts I'm trying to fetch are
>> actually accessible (ping requests and loading the site itself).
>> However, I
>> still get the same error.
>>
>> Any other alternatives?
>>
>>
>> On 5/14/07, Dennis Kubes <[hidden email]> wrote:
>> >
>> > For some reason the nutch process can't resolve the hosts.  This could
>> > be due to incorrect setup of dns on the machine or a firewall or proxy
>> > in place.  See if you can ping one of the urls (hosts) that you are
>> > trying to fetch.
>> >
>> > Dennis Kubes
>> >
>> > Reza Harditya wrote:
>> > > Hi,
>> > >
>> > > I'm a new nutch user. Currently I'm using Nutch 0.8.1. When I wanted
>> > to
>> > > start crawling according to the tutorial, I always get the following
>> > error:
>> > >
>> > > Injector: starting
>> > > Injector: crawlDb: crawl2/crawldb
>> > > Injector: urlDir: urls
>> > > Injector: Converting injected urls to crawl db entries.
>> > > Exception in thread "main" java.io.IOException : Job failed!
>> > >        at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java
>> > :357)
>> > >        at org.apache.nutch.crawl.Injector.inject(Injector.java:138)
>> > >        at org.apache.nutch.crawl.Crawl.main (Crawl.java:105)
>> > >
>> >
>> ------------------------------------------------------------------------------------------------------------
>>
>> > >
>> > >
>> > >  From the log, I found a more detailed description which is:
>> > >
>> > > 2007-05-14 09:32:57,977 INFO  crawl.Injector - Injector: starting
>> > > 2007-05-14 09:32:57,978 INFO  crawl.Injector - Injector: crawlDb:
>> > > crawl2/crawldb
>> > > 2007-05-14 09:32:57,978 INFO  crawl.Injector - Injector: urlDir: urls
>> > > 2007-05-14 09:32:57,978 INFO  crawl.Injector - Injector: Converting
>> > > injected
>> > > urls to crawl db entries.
>> > > 2007-05-14 09:32:58,908 WARN  mapred.LocalJobRunner - job_lzlk81
>> > > java.lang.RuntimeException: java.net.UnknownHostException: dhcppc0:
>> > dhcppc0
>> > >        at org.apache.hadoop.io.SequenceFile$Writer.<init>(
>> > SequenceFile.java
>> > > :76)
>> > >        at org.apache.hadoop.io.SequenceFile$Writer .<init>(
>> > SequenceFile.java
>> > > :89)
>> > >        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:77)
>> > >        at org.apache.hadoop.mapred.LocalJobRunner$Job.run(
>> > > LocalJobRunner.java:91)
>> > > Caused by: java.net.UnknownHostException: dhcppc0: dhcppc0
>> > >        at java.net.InetAddress.getLocalHost(InetAddress.java:1308)
>> > >        at org.apache.hadoop.io.SequenceFile$Writer.<init>(
>> > SequenceFile.java
>> > > :73)
>> > >        ... 3 more
>> > >
>> > >
>> > > At first I suspect that the error was caused by tomcat not running
>> > > properly,
>> > > but after doing some checking I am confirmed that tomcat is indeed
>> > running.
>> > >
>> > > Could somebody let me know what I might be doing wrong here?
>> > >
>> > > Cheers,
>> > >
>> >
>>
>>
>
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Nutch Crawling error

Reza Harditya
Hi Dennis,

Yes dhcppc0 is the machine that Nutch is on. And yes it is already pointing
to 127.0.0.1.
And my hosts file is already looking like this:
127.0.0.1       loacalhost.localdomain  localhost

However, I don't quite follow what you mean with "127.0.0.1
yourhost.domain.com yourhost
localhost.localdomain localhost". What should I put in yourhost.domain.com?
Is it dhcppc0?

Cheers,

Reza


On 5/14/07, Dennis Kubes <[hidden email]> wrote:

>
> If dhcppc0 is the host that you are on you might want to check that your
> hosts file has the localhost line pointing to 127.0.0.1 and that dhcppc0
> is also pointing to 127.0.0.1.  Something like this.
>
> 127.0.0.1               yourhost.domain.com yourhost
> localhost.localdomain localhost
>
> Dennis Kubes
>
> Reza Harditya wrote:
> > Caused by: java.net.UnknownHostException: dhcppc0: dhcppc0
> >        at java.net.InetAddress.getLocalHost(InetAddress.java:1308)
> >        at org.apache.hadoop.io.SequenceFile$Writer.<init>(
> SequenceFile.java
> > :73)
> >
> > Could it be that it is because I have an installation of apache and
> tomcat
> > in the host that I've installed Nutch and it cannot determine whether
> > 'localhost' points to the apache or tomcat? Or does it matter anyway?
> >
> > I have both servers(apache and tomcat) listening on the default port#
> which
> > is 80 and 8080.
> >
> >
> >
> >
> > On 5/14/07, Reza Harditya <[hidden email]> wrote:
> >>
> >> I have checked and confirmed that the hosts I'm trying to fetch are
> >> actually accessible (ping requests and loading the site itself).
> >> However, I
> >> still get the same error.
> >>
> >> Any other alternatives?
> >>
> >>
> >> On 5/14/07, Dennis Kubes <[hidden email]> wrote:
> >> >
> >> > For some reason the nutch process can't resolve the hosts.  This
> could
> >> > be due to incorrect setup of dns on the machine or a firewall or
> proxy
> >> > in place.  See if you can ping one of the urls (hosts) that you are
> >> > trying to fetch.
> >> >
> >> > Dennis Kubes
> >> >
> >> > Reza Harditya wrote:
> >> > > Hi,
> >> > >
> >> > > I'm a new nutch user. Currently I'm using Nutch 0.8.1. When I
> wanted
> >> > to
> >> > > start crawling according to the tutorial, I always get the
> following
> >> > error:
> >> > >
> >> > > Injector: starting
> >> > > Injector: crawlDb: crawl2/crawldb
> >> > > Injector: urlDir: urls
> >> > > Injector: Converting injected urls to crawl db entries.
> >> > > Exception in thread "main" java.io.IOException : Job failed!
> >> > >        at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java
> >> > :357)
> >> > >        at org.apache.nutch.crawl.Injector.inject(Injector.java:138)
> >> > >        at org.apache.nutch.crawl.Crawl.main (Crawl.java:105)
> >> > >
> >> >
> >>
> ------------------------------------------------------------------------------------------------------------
> >>
> >> > >
> >> > >
> >> > >  From the log, I found a more detailed description which is:
> >> > >
> >> > > 2007-05-14 09:32:57,977 INFO  crawl.Injector - Injector: starting
> >> > > 2007-05-14 09:32:57,978 INFO  crawl.Injector - Injector: crawlDb:
> >> > > crawl2/crawldb
> >> > > 2007-05-14 09:32:57,978 INFO  crawl.Injector - Injector: urlDir:
> urls
> >> > > 2007-05-14 09:32:57,978 INFO  crawl.Injector - Injector: Converting
> >> > > injected
> >> > > urls to crawl db entries.
> >> > > 2007-05-14 09:32:58,908 WARN  mapred.LocalJobRunner - job_lzlk81
> >> > > java.lang.RuntimeException: java.net.UnknownHostException: dhcppc0:
> >> > dhcppc0
> >> > >        at org.apache.hadoop.io.SequenceFile$Writer.<init>(
> >> > SequenceFile.java
> >> > > :76)
> >> > >        at org.apache.hadoop.io.SequenceFile$Writer .<init>(
> >> > SequenceFile.java
> >> > > :89)
> >> > >        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:77)
> >> > >        at org.apache.hadoop.mapred.LocalJobRunner$Job.run(
> >> > > LocalJobRunner.java:91)
> >> > > Caused by: java.net.UnknownHostException: dhcppc0: dhcppc0
> >> > >        at java.net.InetAddress.getLocalHost(InetAddress.java:1308)
> >> > >        at org.apache.hadoop.io.SequenceFile$Writer.<init>(
> >> > SequenceFile.java
> >> > > :73)
> >> > >        ... 3 more
> >> > >
> >> > >
> >> > > At first I suspect that the error was caused by tomcat not running
> >> > > properly,
> >> > > but after doing some checking I am confirmed that tomcat is indeed
> >> > running.
> >> > >
> >> > > Could somebody let me know what I might be doing wrong here?
> >> > >
> >> > > Cheers,
> >> > >
> >> >
> >>
> >>
> >
>
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Nutch Crawling error

Dennis Kubes
It should look like this but change out domain for your domain.  Try
this and let me know if it works.

127.0.0.1               dhcppc0.domain.com dhcppc0
localhost.localdomain localhost

Dennis Kubes

Reza Harditya wrote:

> Hi Dennis,
>
> Yes dhcppc0 is the machine that Nutch is on. And yes it is already pointing
> to 127.0.0.1.
> And my hosts file is already looking like this:
> 127.0.0.1       loacalhost.localdomain  localhost
>
> However, I don't quite follow what you mean with "127.0.0.1
> yourhost.domain.com yourhost
> localhost.localdomain localhost". What should I put in yourhost.domain.com?
> Is it dhcppc0?
>
> Cheers,
>
> Reza
>
>
> On 5/14/07, Dennis Kubes <[hidden email]> wrote:
>>
>> If dhcppc0 is the host that you are on you might want to check that your
>> hosts file has the localhost line pointing to 127.0.0.1 and that dhcppc0
>> is also pointing to 127.0.0.1.  Something like this.
>>
>> 127.0.0.1               yourhost.domain.com yourhost
>> localhost.localdomain localhost
>>
>> Dennis Kubes
>>
>> Reza Harditya wrote:
>> > Caused by: java.net.UnknownHostException: dhcppc0: dhcppc0
>> >        at java.net.InetAddress.getLocalHost(InetAddress.java:1308)
>> >        at org.apache.hadoop.io.SequenceFile$Writer.<init>(
>> SequenceFile.java
>> > :73)
>> >
>> > Could it be that it is because I have an installation of apache and
>> tomcat
>> > in the host that I've installed Nutch and it cannot determine whether
>> > 'localhost' points to the apache or tomcat? Or does it matter anyway?
>> >
>> > I have both servers(apache and tomcat) listening on the default port#
>> which
>> > is 80 and 8080.
>> >
>> >
>> >
>> >
>> > On 5/14/07, Reza Harditya <[hidden email]> wrote:
>> >>
>> >> I have checked and confirmed that the hosts I'm trying to fetch are
>> >> actually accessible (ping requests and loading the site itself).
>> >> However, I
>> >> still get the same error.
>> >>
>> >> Any other alternatives?
>> >>
>> >>
>> >> On 5/14/07, Dennis Kubes <[hidden email]> wrote:
>> >> >
>> >> > For some reason the nutch process can't resolve the hosts.  This
>> could
>> >> > be due to incorrect setup of dns on the machine or a firewall or
>> proxy
>> >> > in place.  See if you can ping one of the urls (hosts) that you are
>> >> > trying to fetch.
>> >> >
>> >> > Dennis Kubes
>> >> >
>> >> > Reza Harditya wrote:
>> >> > > Hi,
>> >> > >
>> >> > > I'm a new nutch user. Currently I'm using Nutch 0.8.1. When I
>> wanted
>> >> > to
>> >> > > start crawling according to the tutorial, I always get the
>> following
>> >> > error:
>> >> > >
>> >> > > Injector: starting
>> >> > > Injector: crawlDb: crawl2/crawldb
>> >> > > Injector: urlDir: urls
>> >> > > Injector: Converting injected urls to crawl db entries.
>> >> > > Exception in thread "main" java.io.IOException : Job failed!
>> >> > >        at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java
>> >> > :357)
>> >> > >        at
>> org.apache.nutch.crawl.Injector.inject(Injector.java:138)
>> >> > >        at org.apache.nutch.crawl.Crawl.main (Crawl.java:105)
>> >> > >
>> >> >
>> >>
>> ------------------------------------------------------------------------------------------------------------
>>
>> >>
>> >> > >
>> >> > >
>> >> > >  From the log, I found a more detailed description which is:
>> >> > >
>> >> > > 2007-05-14 09:32:57,977 INFO  crawl.Injector - Injector: starting
>> >> > > 2007-05-14 09:32:57,978 INFO  crawl.Injector - Injector: crawlDb:
>> >> > > crawl2/crawldb
>> >> > > 2007-05-14 09:32:57,978 INFO  crawl.Injector - Injector: urlDir:
>> urls
>> >> > > 2007-05-14 09:32:57,978 INFO  crawl.Injector - Injector:
>> Converting
>> >> > > injected
>> >> > > urls to crawl db entries.
>> >> > > 2007-05-14 09:32:58,908 WARN  mapred.LocalJobRunner - job_lzlk81
>> >> > > java.lang.RuntimeException: java.net.UnknownHostException:
>> dhcppc0:
>> >> > dhcppc0
>> >> > >        at org.apache.hadoop.io.SequenceFile$Writer.<init>(
>> >> > SequenceFile.java
>> >> > > :76)
>> >> > >        at org.apache.hadoop.io.SequenceFile$Writer .<init>(
>> >> > SequenceFile.java
>> >> > > :89)
>> >> > >        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:77)
>> >> > >        at org.apache.hadoop.mapred.LocalJobRunner$Job.run(
>> >> > > LocalJobRunner.java:91)
>> >> > > Caused by: java.net.UnknownHostException: dhcppc0: dhcppc0
>> >> > >        at java.net.InetAddress.getLocalHost(InetAddress.java:1308)
>> >> > >        at org.apache.hadoop.io.SequenceFile$Writer.<init>(
>> >> > SequenceFile.java
>> >> > > :73)
>> >> > >        ... 3 more
>> >> > >
>> >> > >
>> >> > > At first I suspect that the error was caused by tomcat not running
>> >> > > properly,
>> >> > > but after doing some checking I am confirmed that tomcat is indeed
>> >> > running.
>> >> > >
>> >> > > Could somebody let me know what I might be doing wrong here?
>> >> > >
>> >> > > Cheers,
>> >> > >
>> >> >
>> >>
>> >>
>> >
>>
>
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Nutch Crawling error

Reza Harditya
Thanks Dennis, Worked like a charm :)

Forgive me for running in tangent in this thread here, but I just don't
understand from which crawl directory does the search engine fetch the
search result from?

I mean, let's say I ran the crawl from the root of Nutch installation and
put the crawl result in a directory called 'my.crawl'. And I know that the
search engine itself is fetching the search result from the 'crawl'
directory under webapps when using the web interface. So how does the
content of 'my.crawl' gets copied to 'crawl'? Do I have to do it manually
for every crawl?

Reza


On 5/14/07, Dennis Kubes <[hidden email]> wrote:

>
> It should look like this but change out domain for your domain.  Try
> this and let me know if it works.
>
> 127.0.0.1               dhcppc0.domain.com dhcppc0
> localhost.localdomain localhost
>
> Dennis Kubes
>
>
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Nutch Crawling error

Doğacan Güney-3
Hi,

On 5/15/07, Reza Harditya <[hidden email]> wrote:

> Thanks Dennis, Worked like a charm :)
>
> Forgive me for running in tangent in this thread here, but I just don't
> understand from which crawl directory does the search engine fetch the
> search result from?
>
> I mean, let's say I ran the crawl from the root of Nutch installation and
> put the crawl result in a directory called 'my.crawl'. And I know that the
> search engine itself is fetching the search result from the 'crawl'
> directory under webapps when using the web interface. So how does the
> content of 'my.crawl' gets copied to 'crawl'? Do I have to do it manually
> for every crawl?

Check "searcher.dir" configuration setting. Your webapp reads this
setting and fetches results from this directory. If it is a relative
path, then it is relative to where you started your webapp.

>
> Reza
>
>
> On 5/14/07, Dennis Kubes <[hidden email]> wrote:
> >
> > It should look like this but change out domain for your domain.  Try
> > this and let me know if it works.
> >
> > 127.0.0.1               dhcppc0.domain.com dhcppc0
> > localhost.localdomain localhost
> >
> > Dennis Kubes
> >
> >
>


--
Doğacan Güney
Loading...