NPE in org.apache.hadoop.io.SequenceFile$Sorter$MergeQueue

classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|

NPE in org.apache.hadoop.io.SequenceFile$Sorter$MergeQueue

Gal Nitzan
Hi,

Does anybody uses Nutch trunk?

I am running nutch 0.9 and unable to fetch.

after 50-60K urls I get NPE in
org.apache.hadoop.io.SequenceFile$Sorter$MergeQueue every time.

I was wandering if anyone have a work around or maybe something is wrong with
my setup.

I have opened a new issue in jira
http://issues.apache.org/jira/browse/hadoop-1008 for this.

Any clue?

Gal


Reply | Threaded
Open this post in threaded view
|

Re: NPE in org.apache.hadoop.io.SequenceFile$Sorter$MergeQueue

Dennis Kubes
This has to do with HADOOP-964.  Replace the jar files in your Nutch
versions with the most recent versions from Hadoop.  You will also need
to apply NUTCH-437 patch to get Nutch to work with the most recent
changes to the Hadoop codebase.

Dennis Kubes

Gal Nitzan wrote:

> Hi,
>
> Does anybody uses Nutch trunk?
>
> I am running nutch 0.9 and unable to fetch.
>
> after 50-60K urls I get NPE in
> org.apache.hadoop.io.SequenceFile$Sorter$MergeQueue every time.
>
> I was wandering if anyone have a work around or maybe something is wrong with
> my setup.
>
> I have opened a new issue in jira
> http://issues.apache.org/jira/browse/hadoop-1008 for this.
>
> Any clue?
>
> Gal
>
>
Reply | Threaded
Open this post in threaded view
|

Re: NPE in org.apache.hadoop.io.SequenceFile$Sorter$MergeQueue

Dennis Kubes
Actually I take it back.  I don't think it is the same problem but I do
think it is the right solution.

Dennis Kubes

Dennis Kubes wrote:

> This has to do with HADOOP-964.  Replace the jar files in your Nutch
> versions with the most recent versions from Hadoop.  You will also need
> to apply NUTCH-437 patch to get Nutch to work with the most recent
> changes to the Hadoop codebase.
>
> Dennis Kubes
>
> Gal Nitzan wrote:
>> Hi,
>>
>> Does anybody uses Nutch trunk?
>>
>> I am running nutch 0.9 and unable to fetch.
>>
>> after 50-60K urls I get NPE in
>> org.apache.hadoop.io.SequenceFile$Sorter$MergeQueue every time.
>>
>> I was wandering if anyone have a work around or maybe something is
>> wrong with
>> my setup.
>>
>> I have opened a new issue in jira
>> http://issues.apache.org/jira/browse/hadoop-1008 for this.
>>
>> Any clue?
>>
>> Gal
>>
>>
Reply | Threaded
Open this post in threaded view
|

RE: NPE in org.apache.hadoop.io.SequenceFile$Sorter$MergeQueue

Gal Nitzan

Thanks Dennis, it seems it did the trick. Not sure totally, but so it seems
:)

Gal.

-----Original Message-----
From: Dennis Kubes [mailto:[hidden email]]
Sent: Tuesday, February 13, 2007 11:09 PM
To: [hidden email]
Subject: Re: NPE in org.apache.hadoop.io.SequenceFile$Sorter$MergeQueue

Actually I take it back.  I don't think it is the same problem but I do
think it is the right solution.

Dennis Kubes

Dennis Kubes wrote:

> This has to do with HADOOP-964.  Replace the jar files in your Nutch
> versions with the most recent versions from Hadoop.  You will also need
> to apply NUTCH-437 patch to get Nutch to work with the most recent
> changes to the Hadoop codebase.
>
> Dennis Kubes
>
> Gal Nitzan wrote:
>> Hi,
>>
>> Does anybody uses Nutch trunk?
>>
>> I am running nutch 0.9 and unable to fetch.
>>
>> after 50-60K urls I get NPE in
>> org.apache.hadoop.io.SequenceFile$Sorter$MergeQueue every time.
>>
>> I was wandering if anyone have a work around or maybe something is
>> wrong with
>> my setup.
>>
>> I have opened a new issue in jira
>> http://issues.apache.org/jira/browse/hadoop-1008 for this.
>>
>> Any clue?
>>
>> Gal
>>
>>


Reply | Threaded
Open this post in threaded view
|

RE: NPE in org.apache.hadoop.io.SequenceFile$Sorter$MergeQueue

Armel T. Nene-2
In reply to this post by Dennis Kubes
Dennis

I was wondering if this patch could fix my problem which is, if not the
same, very similar to this one. I am using Nutch 0.8.2-dev, I have made
checkout awhile ago from SVN but never updated again. I was able to crawl
10000 xml files before with no error whatsoever. This is the following
errors that I get when I'm fetching:

INFO parser.custom: Custom-parse: Parsing content
file:/C:/TeamBinder/AddressBook/9100/(65)E110_ST A0 (1).pdf
07/02/12 22:09:16 INFO fetcher.Fetcher: fetch of
file:/C:/TeamBinder/AddressBook/9100/(65)E110_ST A0 (1).pdf failed with:
java.lang.NullPointerException
07/02/12 22:09:17 INFO mapred.LocalJobRunner: 0 pages, 0 errors, 0.0
pages/s, 0 kb/s,
07/02/12 22:09:17 FATAL fetcher.Fetcher: java.lang.NullPointerException
07/02/12 22:09:17 FATAL fetcher.Fetcher: at
org.apache.hadoop.io.SequenceFile$Writer.append(SequenceFile.java:198)
07/02/12 22:09:17 FATAL fetcher.Fetcher: at
org.apache.hadoop.io.SequenceFile$Writer.append(SequenceFile.java:189)
07/02/12 22:09:17 FATAL fetcher.Fetcher: at
org.apache.hadoop.mapred.MapTask$2.collect(MapTask.java:91)
07/02/12 22:09:17 FATAL fetcher.Fetcher: at
org.apache.nutch.fetcher.Fetcher$FetcherThread.output(Fetcher.java:314)
07/02/12 22:09:17 FATAL fetcher.Fetcher: at
org.apache.nutch.fetcher.Fetcher$FetcherThread.run(Fetcher.java:232)
07/02/12 22:09:17 FATAL fetcher.Fetcher: fetcher
caught:java.lang.NullPointerException

One of the problem is that my hadoop version says the following:
hadoop-0.4.0-patched. Now I don't know if it means that I am running the
0.4.0 version but it seems a little bit confusing. Once you can clarify that
for me, then I will be able to apply the patch to my version.

Best Regards,

Armel

-----Original Message-----
From: Dennis Kubes [mailto:[hidden email]]
Sent: 13 February 2007 21:09
To: [hidden email]
Subject: Re: NPE in org.apache.hadoop.io.SequenceFile$Sorter$MergeQueue

Actually I take it back.  I don't think it is the same problem but I do
think it is the right solution.

Dennis Kubes

Dennis Kubes wrote:

> This has to do with HADOOP-964.  Replace the jar files in your Nutch
> versions with the most recent versions from Hadoop.  You will also need
> to apply NUTCH-437 patch to get Nutch to work with the most recent
> changes to the Hadoop codebase.
>
> Dennis Kubes
>
> Gal Nitzan wrote:
>> Hi,
>>
>> Does anybody uses Nutch trunk?
>>
>> I am running nutch 0.9 and unable to fetch.
>>
>> after 50-60K urls I get NPE in
>> org.apache.hadoop.io.SequenceFile$Sorter$MergeQueue every time.
>>
>> I was wandering if anyone have a work around or maybe something is
>> wrong with
>> my setup.
>>
>> I have opened a new issue in jira
>> http://issues.apache.org/jira/browse/hadoop-1008 for this.
>>
>> Any clue?
>>
>> Gal
>>
>>

--
No virus found in this incoming message.
Checked by AVG Free Edition.
Version: 7.5.441 / Virus Database: 268.17.37/682 - Release Date: 12/02/2007
13:23
 

--
No virus found in this outgoing message.
Checked by AVG Free Edition.
Version: 7.5.441 / Virus Database: 268.17.37/682 - Release Date: 12/02/2007
13:23
 

Reply | Threaded
Open this post in threaded view
|

How to get score in search.jsp

Anton Potekhin
Hi Nutch Gurus!

I have a small problem. I need to add some changes into search.jsp. I need
to get first 50 results and to sort them in different way. I will change the
score of each result with formula "new_score = nutch_score +
domain_score_from_my_db" to sort. But i don't understand how to get
nutch_score in search.jsp  

Now I use a makeshift. I get the nutch_score using getValue() method of
org.apache.lucene.search.Explanation class. But i think it is a very slow
way.

Can anybody help me to find a solution for this problem?

P.S. I hope that I described my problem clearly. Thanks in advance.


Reply | Threaded
Open this post in threaded view
|

How to get score in search.jsp

Anton Potekhin
In reply to this post by Armel T. Nene-2
Hi Nutch Gurus!

I have a small problem. I need to add some changes into search.jsp. I need
to get first 50 results and to sort them in different way. I will change the
score of each result with formula "new_score = nutch_score +
domain_score_from_my_db" to sort. But i don't understand how to get
nutch_score in search.jsp  

Now I use a makeshift. I get the nutch_score using getValue() method of
org.apache.lucene.search.Explanation class. But i think it is a very slow
way.

Can anybody help me to find a solution for this problem?

P.S. I hope that I described my problem clearly. Thanks in advance.

Sorry for the duplicated mail. I think I had some problems with my mail
account....


Reply | Threaded
Open this post in threaded view
|

Re: NPE in org.apache.hadoop.io.SequenceFile$Sorter$MergeQueue

Dennis Kubes
In reply to this post by Armel T. Nene-2
It may fix the problem it may not.  There have been many changes in
hadoop since 0.4.  I think they are now on .11.x.  So if you are
upgrading existing dfs implementations that currently have content that
is something to take into consideration.  That being said the changes in
hadoop from .4 to present may very well have fixed the error you are
seeing and to use the most recent version of hadoop you will need to use
the NUTCH-437 patch.

Looking at your output below though my first thought would be that this
is something in the PDF parser and not hadoop causing the error.  Nutch
uses pdfbox software to parse PDF files so you may want to take the
specific file and see if it parses correctly outside of nutch using pdfbox.

Dennis Kubes

Armel T. Nene wrote:

> Dennis
>
> I was wondering if this patch could fix my problem which is, if not the
> same, very similar to this one. I am using Nutch 0.8.2-dev, I have made
> checkout awhile ago from SVN but never updated again. I was able to crawl
> 10000 xml files before with no error whatsoever. This is the following
> errors that I get when I'm fetching:
>
> INFO parser.custom: Custom-parse: Parsing content
> file:/C:/TeamBinder/AddressBook/9100/(65)E110_ST A0 (1).pdf
> 07/02/12 22:09:16 INFO fetcher.Fetcher: fetch of
> file:/C:/TeamBinder/AddressBook/9100/(65)E110_ST A0 (1).pdf failed with:
> java.lang.NullPointerException
> 07/02/12 22:09:17 INFO mapred.LocalJobRunner: 0 pages, 0 errors, 0.0
> pages/s, 0 kb/s,
> 07/02/12 22:09:17 FATAL fetcher.Fetcher: java.lang.NullPointerException
> 07/02/12 22:09:17 FATAL fetcher.Fetcher: at
> org.apache.hadoop.io.SequenceFile$Writer.append(SequenceFile.java:198)
> 07/02/12 22:09:17 FATAL fetcher.Fetcher: at
> org.apache.hadoop.io.SequenceFile$Writer.append(SequenceFile.java:189)
> 07/02/12 22:09:17 FATAL fetcher.Fetcher: at
> org.apache.hadoop.mapred.MapTask$2.collect(MapTask.java:91)
> 07/02/12 22:09:17 FATAL fetcher.Fetcher: at
> org.apache.nutch.fetcher.Fetcher$FetcherThread.output(Fetcher.java:314)
> 07/02/12 22:09:17 FATAL fetcher.Fetcher: at
> org.apache.nutch.fetcher.Fetcher$FetcherThread.run(Fetcher.java:232)
> 07/02/12 22:09:17 FATAL fetcher.Fetcher: fetcher
> caught:java.lang.NullPointerException
>
> One of the problem is that my hadoop version says the following:
> hadoop-0.4.0-patched. Now I don't know if it means that I am running the
> 0.4.0 version but it seems a little bit confusing. Once you can clarify that
> for me, then I will be able to apply the patch to my version.
>
> Best Regards,
>
> Armel
>
> -----Original Message-----
> From: Dennis Kubes [mailto:[hidden email]]
> Sent: 13 February 2007 21:09
> To: [hidden email]
> Subject: Re: NPE in org.apache.hadoop.io.SequenceFile$Sorter$MergeQueue
>
> Actually I take it back.  I don't think it is the same problem but I do
> think it is the right solution.
>
> Dennis Kubes
>
> Dennis Kubes wrote:
>> This has to do with HADOOP-964.  Replace the jar files in your Nutch
>> versions with the most recent versions from Hadoop.  You will also need
>> to apply NUTCH-437 patch to get Nutch to work with the most recent
>> changes to the Hadoop codebase.
>>
>> Dennis Kubes
>>
>> Gal Nitzan wrote:
>>> Hi,
>>>
>>> Does anybody uses Nutch trunk?
>>>
>>> I am running nutch 0.9 and unable to fetch.
>>>
>>> after 50-60K urls I get NPE in
>>> org.apache.hadoop.io.SequenceFile$Sorter$MergeQueue every time.
>>>
>>> I was wandering if anyone have a work around or maybe something is
>>> wrong with
>>> my setup.
>>>
>>> I have opened a new issue in jira
>>> http://issues.apache.org/jira/browse/hadoop-1008 for this.
>>>
>>> Any clue?
>>>
>>> Gal
>>>
>>>
>
Reply | Threaded
Open this post in threaded view
|

RE: How to get score in search.jsp

Anton Potekhin
In reply to this post by Anton Potekhin
I have found solution. I've add variable score  into Hit....

-----Original Message-----
From: Anton Potekhin [mailto:[hidden email]]
Sent: Wednesday, February 14, 2007 10:48 AM
To: [hidden email]
Subject: How to get score in search.jsp
Importance: High

Hi Nutch Gurus!

I have a small problem. I need to add some changes into search.jsp. I need
to get first 50 results and to sort them in different way. I will change the
score of each result with formula "new_score = nutch_score +
domain_score_from_my_db" to sort. But i don't understand how to get
nutch_score in search.jsp  

Now I use a makeshift. I get the nutch_score using getValue() method of
org.apache.lucene.search.Explanation class. But i think it is a very slow
way.

Can anybody help me to find a solution for this problem?

P.S. I hope that I described my problem clearly. Thanks in advance.

Sorry for the duplicated mail. I think I had some problems with my mail
account....