Quantcast

CrawlDB data-loss and unable to inject 1.12 on Hadoop 2.7.3

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

CrawlDB data-loss and unable to inject 1.12 on Hadoop 2.7.3

Markus Jelsma-2
Hello,

This wednesday we experienced trouble running the 1.12 injector on Hadoop 2.7.3. We operated 2.7.2 before and we had no trouble running a job.

2017-01-18 15:36:53,005 FATAL [main] org.apache.hadoop.mapred.YarnChild: Error running child : java.lang.IncompatibleClassChangeError: Found interface org.apache.hadoop.mapreduce.Counter, but class was expected
        at org.apache.nutch.crawl.Injector$InjectMapper.map(Injector.java:216)
        at org.apache.nutch.crawl.Injector$InjectMapper.map(Injector.java:100)
        at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:146)
        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
        at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
        at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
Exception in thread "main" java.lang.IncompatibleClassChangeError: Found interface org.apache.hadoop.mapreduce.Counter, but class was expected
        at org.apache.nutch.crawl.Injector.inject(Injector.java:383)
        at org.apache.nutch.crawl.Injector.run(Injector.java:467)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
        at org.apache.nutch.crawl.Injector.main(Injector.java:441)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:136)

Our processes retried injecting for a few minutes until we manually shut it down. Meanwhile on HDFS, our CrawlDB was gone, thanks for snapshots and/or backups we could restore it, so enable those if you haven't done so yet.

These freak Hadoop errors can be notoriously difficult to debug but it seems we are in luck, recompile Nutch with Hadoop 2.7.3 instead 2.4.0. You are also in luck if your job file uses the old org.hadoop.mapred.* API, only jobs using the org.hadoop.mapreduce.* API seem to fail.

Reference issue: https://issues.apache.org/jira/browse/NUTCH-2354

Regards,
Markus
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: CrawlDB data-loss and unable to inject 1.12 on Hadoop 2.7.3

Sebastian Nagel
Hi Markus,

after having once faced failing jobs due to dependency issues,
I started to compile the Nutch.job with the same Hadoop version
of the cluster. That's little extra time to change the ivy.xml
and rarely resolve a conflicting dependency, but to fix broken
data in the cluster costs you much more.


> Reference issue: https://issues.apache.org/jira/browse/NUTCH-2354

What about the opposite, running Nutch.job compiled with 2.7.3 on a 2.7.2 Hadoop?
Nothing against upgrading, but in doubt it would be good to know.


Thanks,
Sebastian


On 01/20/2017 02:23 PM, Markus Jelsma wrote:

> Hello,
>
> This wednesday we experienced trouble running the 1.12 injector on Hadoop 2.7.3. We operated 2.7.2 before and we had no trouble running a job.
>
> 2017-01-18 15:36:53,005 FATAL [main] org.apache.hadoop.mapred.YarnChild: Error running child : java.lang.IncompatibleClassChangeError: Found interface org.apache.hadoop.mapreduce.Counter, but class was expected
> at org.apache.nutch.crawl.Injector$InjectMapper.map(Injector.java:216)
> at org.apache.nutch.crawl.Injector$InjectMapper.map(Injector.java:100)
> at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:146)
> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
> at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
> at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
> Exception in thread "main" java.lang.IncompatibleClassChangeError: Found interface org.apache.hadoop.mapreduce.Counter, but class was expected
>         at org.apache.nutch.crawl.Injector.inject(Injector.java:383)
>         at org.apache.nutch.crawl.Injector.run(Injector.java:467)
>         at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
>         at org.apache.nutch.crawl.Injector.main(Injector.java:441)
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>         at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>         at java.lang.reflect.Method.invoke(Method.java:498)
>         at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
>         at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
>
> Our processes retried injecting for a few minutes until we manually shut it down. Meanwhile on HDFS, our CrawlDB was gone, thanks for snapshots and/or backups we could restore it, so enable those if you haven't done so yet.
>
> These freak Hadoop errors can be notoriously difficult to debug but it seems we are in luck, recompile Nutch with Hadoop 2.7.3 instead 2.4.0. You are also in luck if your job file uses the old org.hadoop.mapred.* API, only jobs using the org.hadoop.mapreduce.* API seem to fail.
>
> Reference issue: https://issues.apache.org/jira/browse/NUTCH-2354
>
> Regards,
> Markus
>

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

RE: CrawlDB data-loss and unable to inject 1.12 on Hadoop 2.7.3

Markus Jelsma-2
In reply to this post by Markus Jelsma-2
Hello Sebastian,

I am not sure what will happen with having it compiled with 2.7.3 but running it on 2.7.2. Since the other way around caused trouble (which usually doesn't happen), we could assume this might not work well either. Unfortunately i cannot test it, both our Hadoop clusters have already been upgraded.

Everyone would either have to recompile Nutch themselves or upgrade their Hadoop cluster, the latter is mostly a good thing, 2.7.2 and 2.7.3 fixed long-standing issues for Nutch.

The question is, what do we do.

Thanks,
Markus
 
-----Original message-----

> From:Sebastian Nagel <[hidden email]>
> Sent: Saturday 21st January 2017 19:57
> To: [hidden email]
> Subject: Re: CrawlDB data-loss and unable to inject 1.12 on Hadoop 2.7.3
>
> Hi Markus,
>
> after having once faced failing jobs due to dependency issues,
> I started to compile the Nutch.job with the same Hadoop version
> of the cluster. That's little extra time to change the ivy.xml
> and rarely resolve a conflicting dependency, but to fix broken
> data in the cluster costs you much more.
>
>
> > Reference issue: https://issues.apache.org/jira/browse/NUTCH-2354
>
> What about the opposite, running Nutch.job compiled with 2.7.3 on a 2.7.2 Hadoop?
> Nothing against upgrading, but in doubt it would be good to know.
>
>
> Thanks,
> Sebastian
>
>
> On 01/20/2017 02:23 PM, Markus Jelsma wrote:
> > Hello,
> >
> > This wednesday we experienced trouble running the 1.12 injector on Hadoop 2.7.3. We operated 2.7.2 before and we had no trouble running a job.
> >
> > 2017-01-18 15:36:53,005 FATAL [main] org.apache.hadoop.mapred.YarnChild: Error running child : java.lang.IncompatibleClassChangeError: Found interface org.apache.hadoop.mapreduce.Counter, but class was expected
> > at org.apache.nutch.crawl.Injector$InjectMapper.map(Injector.java:216)
> > at org.apache.nutch.crawl.Injector$InjectMapper.map(Injector.java:100)
> > at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:146)
> > at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787)
> > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
> > at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
> > at java.security.AccessController.doPrivileged(Native Method)
> > at javax.security.auth.Subject.doAs(Subject.java:422)
> > at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
> > at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
> > Exception in thread "main" java.lang.IncompatibleClassChangeError: Found interface org.apache.hadoop.mapreduce.Counter, but class was expected
> >         at org.apache.nutch.crawl.Injector.inject(Injector.java:383)
> >         at org.apache.nutch.crawl.Injector.run(Injector.java:467)
> >         at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
> >         at org.apache.nutch.crawl.Injector.main(Injector.java:441)
> >         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> >         at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> >         at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> >         at java.lang.reflect.Method.invoke(Method.java:498)
> >         at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
> >         at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
> >
> > Our processes retried injecting for a few minutes until we manually shut it down. Meanwhile on HDFS, our CrawlDB was gone, thanks for snapshots and/or backups we could restore it, so enable those if you haven't done so yet.
> >
> > These freak Hadoop errors can be notoriously difficult to debug but it seems we are in luck, recompile Nutch with Hadoop 2.7.3 instead 2.4.0. You are also in luck if your job file uses the old org.hadoop.mapred.* API, only jobs using the org.hadoop.mapreduce.* API seem to fail.
> >
> > Reference issue: https://issues.apache.org/jira/browse/NUTCH-2354
> >
> > Regards,
> > Markus
> >
>
>
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

RE: CrawlDB data-loss and unable to inject 1.12 on Hadoop 2.7.3

Markus Jelsma-2
In reply to this post by Markus Jelsma-2
Hmm - i may have been wrong about recompiling. For some reason, the problem persists but seems to be causes by custom patches. I confirmed that 1.12 and master both run fine on Hadoop 2.7.3, whether or not it is compiled with 2.7.3 or 2.7.2.

Regards,
Markus

 
 
-----Original message-----

> From:Markus Jelsma <[hidden email]>
> Sent: Wednesday 25th January 2017 13:30
> To: [hidden email]
> Subject: RE: CrawlDB data-loss and unable to inject 1.12 on Hadoop 2.7.3
>
> Hello Sebastian,
>
> I am not sure what will happen with having it compiled with 2.7.3 but running it on 2.7.2. Since the other way around caused trouble (which usually doesn't happen), we could assume this might not work well either. Unfortunately i cannot test it, both our Hadoop clusters have already been upgraded.
>
> Everyone would either have to recompile Nutch themselves or upgrade their Hadoop cluster, the latter is mostly a good thing, 2.7.2 and 2.7.3 fixed long-standing issues for Nutch.
>
> The question is, what do we do.
>
> Thanks,
> Markus

> -----Original message-----
> > From:Sebastian Nagel <[hidden email]>
> > Sent: Saturday 21st January 2017 19:57
> > To: [hidden email]
> > Subject: Re: CrawlDB data-loss and unable to inject 1.12 on Hadoop 2.7.3
> >
> > Hi Markus,
> >
> > after having once faced failing jobs due to dependency issues,
> > I started to compile the Nutch.job with the same Hadoop version
> > of the cluster. That's little extra time to change the ivy.xml
> > and rarely resolve a conflicting dependency, but to fix broken
> > data in the cluster costs you much more.
> >
> >
> > > Reference issue: https://issues.apache.org/jira/browse/NUTCH-2354
> >
> > What about the opposite, running Nutch.job compiled with 2.7.3 on a 2.7.2 Hadoop?
> > Nothing against upgrading, but in doubt it would be good to know.
> >
> >
> > Thanks,
> > Sebastian
> >
> >
> > On 01/20/2017 02:23 PM, Markus Jelsma wrote:
> > > Hello,
> > >
> > > This wednesday we experienced trouble running the 1.12 injector on Hadoop 2.7.3. We operated 2.7.2 before and we had no trouble running a job.
> > >
> > > 2017-01-18 15:36:53,005 FATAL [main] org.apache.hadoop.mapred.YarnChild: Error running child : java.lang.IncompatibleClassChangeError: Found interface org.apache.hadoop.mapreduce.Counter, but class was expected
> > > at org.apache.nutch.crawl.Injector$InjectMapper.map(Injector.java:216)
> > > at org.apache.nutch.crawl.Injector$InjectMapper.map(Injector.java:100)
> > > at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:146)
> > > at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787)
> > > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
> > > at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
> > > at java.security.AccessController.doPrivileged(Native Method)
> > > at javax.security.auth.Subject.doAs(Subject.java:422)
> > > at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
> > > at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
> > > Exception in thread "main" java.lang.IncompatibleClassChangeError: Found interface org.apache.hadoop.mapreduce.Counter, but class was expected
> > >         at org.apache.nutch.crawl.Injector.inject(Injector.java:383)
> > >         at org.apache.nutch.crawl.Injector.run(Injector.java:467)
> > >         at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
> > >         at org.apache.nutch.crawl.Injector.main(Injector.java:441)
> > >         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> > >         at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> > >         at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> > >         at java.lang.reflect.Method.invoke(Method.java:498)
> > >         at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
> > >         at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
> > >
> > > Our processes retried injecting for a few minutes until we manually shut it down. Meanwhile on HDFS, our CrawlDB was gone, thanks for snapshots and/or backups we could restore it, so enable those if you haven't done so yet.
> > >
> > > These freak Hadoop errors can be notoriously difficult to debug but it seems we are in luck, recompile Nutch with Hadoop 2.7.3 instead 2.4.0. You are also in luck if your job file uses the old org.hadoop.mapred.* API, only jobs using the org.hadoop.mapreduce.* API seem to fail.
> > >
> > > Reference issue: https://issues.apache.org/jira/browse/NUTCH-2354
> > >
> > > Regards,
> > > Markus
> > >
> >
> >
>
Loading...