[jira] Created: (NUTCH-284) NullPointerException during index

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

[jira] Created: (NUTCH-284) NullPointerException during index

Nick Burch (Jira)
NullPointerException during index
---------------------------------

         Key: NUTCH-284
         URL: http://issues.apache.org/jira/browse/NUTCH-284
     Project: Nutch
        Type: Bug

  Components: indexer  
    Versions: 0.8-dev    
    Reporter: Stefan Neufeind


For  quite a few this "reduce > sort" has been going on. Then it fails. What could be wrong with this?


060524 212613 reduce > sort
060524 212614 reduce > sort
060524 212615 reduce > sort
060524 212615 found resource common-terms.utf8 at file:/home/mm/nutch-nightly-prod/conf/common-terms.utf8
060524 212615 found resource common-terms.utf8 at file:/home/mm/nutch-nightly-prod/conf/common-terms.utf8
060524 212619 Optimizing index.
060524 212619 job_jlbhhm
java.lang.NullPointerException
        at org.apache.nutch.indexer.Indexer$OutputFormat$1.write(Indexer.java:111)
        at org.apache.hadoop.mapred.ReduceTask$3.collect(ReduceTask.java:269)
        at org.apache.nutch.indexer.Indexer.reduce(Indexer.java:253)
        at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:282)
        at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:114)
Exception in thread "main" java.io.IOException: Job failed!
        at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:341)
        at org.apache.nutch.indexer.Indexer.index(Indexer.java:287)
        at org.apache.nutch.indexer.Indexer.main(Indexer.java:304)


--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (NUTCH-284) NullPointerException during index

Nick Burch (Jira)
    [ http://issues.apache.org/jira/browse/NUTCH-284?page=comments#action_12413227 ]

Marko Bauhardt commented on NUTCH-284:
--------------------------------------

I think the index-basic plugin is not included? Because
Line 111: .... doc.getField("url").stringValue() ....

The BasicIndexingFilter index the field "url".

 Verify your Logfile or the nutch-default.xml (or nutch-site.xml).

Marko



> NullPointerException during index
> ---------------------------------
>
>          Key: NUTCH-284
>          URL: http://issues.apache.org/jira/browse/NUTCH-284
>      Project: Nutch
>         Type: Bug

>   Components: indexer
>     Versions: 0.8-dev
>     Reporter: Stefan Neufeind

>
> For  quite a few this "reduce > sort" has been going on. Then it fails. What could be wrong with this?
> 060524 212613 reduce > sort
> 060524 212614 reduce > sort
> 060524 212615 reduce > sort
> 060524 212615 found resource common-terms.utf8 at file:/home/mm/nutch-nightly-prod/conf/common-terms.utf8
> 060524 212615 found resource common-terms.utf8 at file:/home/mm/nutch-nightly-prod/conf/common-terms.utf8
> 060524 212619 Optimizing index.
> 060524 212619 job_jlbhhm
> java.lang.NullPointerException
>         at org.apache.nutch.indexer.Indexer$OutputFormat$1.write(Indexer.java:111)
>         at org.apache.hadoop.mapred.ReduceTask$3.collect(ReduceTask.java:269)
>         at org.apache.nutch.indexer.Indexer.reduce(Indexer.java:253)
>         at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:282)
>         at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:114)
> Exception in thread "main" java.io.IOException: Job failed!
>         at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:341)
>         at org.apache.nutch.indexer.Indexer.index(Indexer.java:287)
>         at org.apache.nutch.indexer.Indexer.main(Indexer.java:304)

--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (NUTCH-284) NullPointerException during index

Nick Burch (Jira)
In reply to this post by Nick Burch (Jira)
    [ http://issues.apache.org/jira/browse/NUTCH-284?page=comments#action_12413231 ]

Gal Nitzan commented on NUTCH-284:
----------------------------------

I just had somthing similar.

Try the following:

run ant on each of your tasktrackers machines:

% ant

than restart your nutch and try again.

I think there is a problem with the classpath

> NullPointerException during index
> ---------------------------------
>
>          Key: NUTCH-284
>          URL: http://issues.apache.org/jira/browse/NUTCH-284
>      Project: Nutch
>         Type: Bug

>   Components: indexer
>     Versions: 0.8-dev
>     Reporter: Stefan Neufeind

>
> For  quite a few this "reduce > sort" has been going on. Then it fails. What could be wrong with this?
> 060524 212613 reduce > sort
> 060524 212614 reduce > sort
> 060524 212615 reduce > sort
> 060524 212615 found resource common-terms.utf8 at file:/home/mm/nutch-nightly-prod/conf/common-terms.utf8
> 060524 212615 found resource common-terms.utf8 at file:/home/mm/nutch-nightly-prod/conf/common-terms.utf8
> 060524 212619 Optimizing index.
> 060524 212619 job_jlbhhm
> java.lang.NullPointerException
>         at org.apache.nutch.indexer.Indexer$OutputFormat$1.write(Indexer.java:111)
>         at org.apache.hadoop.mapred.ReduceTask$3.collect(ReduceTask.java:269)
>         at org.apache.nutch.indexer.Indexer.reduce(Indexer.java:253)
>         at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:282)
>         at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:114)
> Exception in thread "main" java.io.IOException: Job failed!
>         at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:341)
>         at org.apache.nutch.indexer.Indexer.index(Indexer.java:287)
>         at org.apache.nutch.indexer.Indexer.main(Indexer.java:304)

--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (NUTCH-284) NullPointerException during index

Nick Burch (Jira)
In reply to this post by Nick Burch (Jira)
    [ http://issues.apache.org/jira/browse/NUTCH-284?page=comments#action_12413240 ]

Stefan Neufeind commented on NUTCH-284:
---------------------------------------

Yes, I was missing index-basic. Please apologize. I needed the extra fields of index-more and thought it would do the basic fields as well.
The same thing occured in NUTCH-51.

Would it be possible to maybe demand that index-basic is loaded (same like "well, you need a scoring-plugin" etc.)? What if somebody writes his own index-basic2-plugin - then he'd have to be able to put an "provides index-basic" into his plugin to notify that he indexes the basic fields or so. Maybe something like this could avoid trouble / searching for some people like me :-)

> NullPointerException during index
> ---------------------------------
>
>          Key: NUTCH-284
>          URL: http://issues.apache.org/jira/browse/NUTCH-284
>      Project: Nutch
>         Type: Bug

>   Components: indexer
>     Versions: 0.8-dev
>     Reporter: Stefan Neufeind

>
> For  quite a few this "reduce > sort" has been going on. Then it fails. What could be wrong with this?
> 060524 212613 reduce > sort
> 060524 212614 reduce > sort
> 060524 212615 reduce > sort
> 060524 212615 found resource common-terms.utf8 at file:/home/mm/nutch-nightly-prod/conf/common-terms.utf8
> 060524 212615 found resource common-terms.utf8 at file:/home/mm/nutch-nightly-prod/conf/common-terms.utf8
> 060524 212619 Optimizing index.
> 060524 212619 job_jlbhhm
> java.lang.NullPointerException
>         at org.apache.nutch.indexer.Indexer$OutputFormat$1.write(Indexer.java:111)
>         at org.apache.hadoop.mapred.ReduceTask$3.collect(ReduceTask.java:269)
>         at org.apache.nutch.indexer.Indexer.reduce(Indexer.java:253)
>         at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:282)
>         at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:114)
> Exception in thread "main" java.io.IOException: Job failed!
>         at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:341)
>         at org.apache.nutch.indexer.Indexer.index(Indexer.java:287)
>         at org.apache.nutch.indexer.Indexer.main(Indexer.java:304)

--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira

Reply | Threaded
Open this post in threaded view
|

[jira] Closed: (NUTCH-284) NullPointerException during index

Nick Burch (Jira)
In reply to this post by Nick Burch (Jira)
     [ http://issues.apache.org/jira/browse/NUTCH-284?page=all ]
     
Stefan Groschupf closed NUTCH-284:
----------------------------------

    Resolution: Won't Fix

>Yes, I was missing index-basic.

> NullPointerException during index
> ---------------------------------
>
>          Key: NUTCH-284
>          URL: http://issues.apache.org/jira/browse/NUTCH-284
>      Project: Nutch
>         Type: Bug

>   Components: indexer
>     Versions: 0.8-dev
>     Reporter: Stefan Neufeind

>
> For  quite a few this "reduce > sort" has been going on. Then it fails. What could be wrong with this?
> 060524 212613 reduce > sort
> 060524 212614 reduce > sort
> 060524 212615 reduce > sort
> 060524 212615 found resource common-terms.utf8 at file:/home/mm/nutch-nightly-prod/conf/common-terms.utf8
> 060524 212615 found resource common-terms.utf8 at file:/home/mm/nutch-nightly-prod/conf/common-terms.utf8
> 060524 212619 Optimizing index.
> 060524 212619 job_jlbhhm
> java.lang.NullPointerException
>         at org.apache.nutch.indexer.Indexer$OutputFormat$1.write(Indexer.java:111)
>         at org.apache.hadoop.mapred.ReduceTask$3.collect(ReduceTask.java:269)
>         at org.apache.nutch.indexer.Indexer.reduce(Indexer.java:253)
>         at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:282)
>         at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:114)
> Exception in thread "main" java.io.IOException: Job failed!
>         at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:341)
>         at org.apache.nutch.indexer.Indexer.index(Indexer.java:287)
>         at org.apache.nutch.indexer.Indexer.main(Indexer.java:304)

--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (NUTCH-284) NullPointerException during index

Nick Burch (Jira)
In reply to this post by Nick Burch (Jira)
    [ http://issues.apache.org/jira/browse/NUTCH-284?page=comments#action_12414453 ]

Stefan Groschupf commented on NUTCH-284:
----------------------------------------

Please try discuss such things first in the user mailing list than open a issue.
Maintaining the issue tracking is very time consuming. But if there is a bug please continue open bug reports. :)
Thanks.


> NullPointerException during index
> ---------------------------------
>
>          Key: NUTCH-284
>          URL: http://issues.apache.org/jira/browse/NUTCH-284
>      Project: Nutch
>         Type: Bug

>   Components: indexer
>     Versions: 0.8-dev
>     Reporter: Stefan Neufeind

>
> For  quite a few this "reduce > sort" has been going on. Then it fails. What could be wrong with this?
> 060524 212613 reduce > sort
> 060524 212614 reduce > sort
> 060524 212615 reduce > sort
> 060524 212615 found resource common-terms.utf8 at file:/home/mm/nutch-nightly-prod/conf/common-terms.utf8
> 060524 212615 found resource common-terms.utf8 at file:/home/mm/nutch-nightly-prod/conf/common-terms.utf8
> 060524 212619 Optimizing index.
> 060524 212619 job_jlbhhm
> java.lang.NullPointerException
>         at org.apache.nutch.indexer.Indexer$OutputFormat$1.write(Indexer.java:111)
>         at org.apache.hadoop.mapred.ReduceTask$3.collect(ReduceTask.java:269)
>         at org.apache.nutch.indexer.Indexer.reduce(Indexer.java:253)
>         at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:282)
>         at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:114)
> Exception in thread "main" java.io.IOException: Job failed!
>         at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:341)
>         at org.apache.nutch.indexer.Indexer.index(Indexer.java:287)
>         at org.apache.nutch.indexer.Indexer.main(Indexer.java:304)

--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira