nutch latest build - inject operation failing

classic Classic list List threaded Threaded
14 messages Options
Reply | Threaded
Open this post in threaded view
|

nutch latest build - inject operation failing

DS jha
Hi -

Looks like latest trunk version of nutch is failing with the following
exception when trying to perform inject operation:

java.io.IOException: Target
file:/tmp/hadoop-user/mapred/temp/inject-temp-1280136828/_reduce_dv90x0/part-00000
already exists
        at org.apache.hadoop.fs.FileUtil.checkDest(FileUtil.java:246)
        at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:125)
        at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:116)
        at org.apache.hadoop.fs.RawLocalFileSystem.rename(RawLocalFileSystem.java:196)
        at org.apache.hadoop.fs.ChecksumFileSystem.rename(ChecksumFileSystem.java:394)
        at org.apache.hadoop.mapred.Task.moveTaskOutputs(Task.java:452)
        at org.apache.hadoop.mapred.Task.moveTaskOutputs(Task.java:469)
        at org.apache.hadoop.mapred.Task.saveTaskOutput(Task.java:426)
        at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:165)

Any thoughts?

Thanks
Jha
Reply | Threaded
Open this post in threaded view
|

Re: nutch latest build - inject operation failing

Andrzej Białecki-2
DS jha wrote:
> Hi -
>
> Looks like latest trunk version of nutch is failing with the following
> exception when trying to perform inject operation:
>
> java.io.IOException: Target
> file:/tmp/hadoop-user/mapred/temp/inject-temp-1280136828/_reduce_dv90x0/part-00000
> already exists

Is this really the latest trunk? Can you check the version of
lib/hadoop*.jar? It should be 0.15.3. And make sure you have no other
older hadoop libraries on the classpath.

--
Best regards,
Andrzej Bialecki     <><
  ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com

Reply | Threaded
Open this post in threaded view
|

Re: nutch latest build - inject operation failing

DS jha
Yeah - it is using hadoop v 0.15.3 jar file - strange!


Thanks
Jha


On Feb 7, 2008 8:11 AM, Andrzej Bialecki <[hidden email]> wrote:

> DS jha wrote:
> > Hi -
> >
> > Looks like latest trunk version of nutch is failing with the following
> > exception when trying to perform inject operation:
> >
> > java.io.IOException: Target
> > file:/tmp/hadoop-user/mapred/temp/inject-temp-1280136828/_reduce_dv90x0/part-00000
> > already exists
>
> Is this really the latest trunk? Can you check the version of
> lib/hadoop*.jar? It should be 0.15.3. And make sure you have no other
> older hadoop libraries on the classpath.
>
> --
> Best regards,
> Andrzej Bialecki     <><
>   ___. ___ ___ ___ _ _   __________________________________
> [__ || __|__/|__||\/|  Information Retrieval, Semantic Web
> ___|||__||  \|  ||  |  Embedded Unix, System Integration
> http://www.sigram.com  Contact: info at sigram dot com
>
>
Reply | Threaded
Open this post in threaded view
|

Re: nutch latest build - inject operation failing

Andrzej Białecki-2
DS jha wrote:

> Yeah - it is using hadoop v 0.15.3 jar file - strange!
>
>
> Thanks
> Jha
>
>
> On Feb 7, 2008 8:11 AM, Andrzej Bialecki <[hidden email]> wrote:
>> DS jha wrote:
>>> Hi -
>>>
>>> Looks like latest trunk version of nutch is failing with the following
>>> exception when trying to perform inject operation:
>>>
>>> java.io.IOException: Target
>>> file:/tmp/hadoop-user/mapred/temp/inject-temp-1280136828/_reduce_dv90x0/part-00000
>>> already exists

Hmm, wait - this path is strange in itself, because it starts with
/tmp/hadoop-user ... Are you running on *nix or Windows/Cygwin? Did you
change hadoop-site.xml to redefine hadoop.tmp.dir ? Or perhaps you are
running as a user with username "user" ?


--
Best regards,
Andrzej Bialecki     <><
  ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com

Reply | Threaded
Open this post in threaded view
|

Re: nutch latest build - inject operation failing

DS jha
This is running on Windows/Cygwin, with username 'user' - and it is
using default hadoop-site.xml

Thanks,
Jha

On Feb 7, 2008 10:03 AM, Andrzej Bialecki <[hidden email]> wrote:

> DS jha wrote:
> > Yeah - it is using hadoop v 0.15.3 jar file - strange!
> >
> >
> > Thanks
> > Jha
> >
> >
> > On Feb 7, 2008 8:11 AM, Andrzej Bialecki <[hidden email]> wrote:
> >> DS jha wrote:
> >>> Hi -
> >>>
> >>> Looks like latest trunk version of nutch is failing with the following
> >>> exception when trying to perform inject operation:
> >>>
> >>> java.io.IOException: Target
> >>> file:/tmp/hadoop-user/mapred/temp/inject-temp-1280136828/_reduce_dv90x0/part-00000
> >>> already exists
>
> Hmm, wait - this path is strange in itself, because it starts with
> /tmp/hadoop-user ... Are you running on *nix or Windows/Cygwin? Did you
> change hadoop-site.xml to redefine hadoop.tmp.dir ? Or perhaps you are
> running as a user with username "user" ?
>
>
> --
>
> Best regards,
> Andrzej Bialecki     <><
>   ___. ___ ___ ___ _ _   __________________________________
> [__ || __|__/|__||\/|  Information Retrieval, Semantic Web
> ___|||__||  \|  ||  |  Embedded Unix, System Integration
> http://www.sigram.com  Contact: info at sigram dot com
>
>
Reply | Threaded
Open this post in threaded view
|

Re: nutch latest build - inject operation failing

Dennis Kubes-2
Do you have speculative execution turned on.  If so turn it off.

Dennis

DS jha wrote:

> This is running on Windows/Cygwin, with username 'user' - and it is
> using default hadoop-site.xml
>
> Thanks,
> Jha
>
> On Feb 7, 2008 10:03 AM, Andrzej Bialecki <[hidden email]> wrote:
>> DS jha wrote:
>>> Yeah - it is using hadoop v 0.15.3 jar file - strange!
>>>
>>>
>>> Thanks
>>> Jha
>>>
>>>
>>> On Feb 7, 2008 8:11 AM, Andrzej Bialecki <[hidden email]> wrote:
>>>> DS jha wrote:
>>>>> Hi -
>>>>>
>>>>> Looks like latest trunk version of nutch is failing with the following
>>>>> exception when trying to perform inject operation:
>>>>>
>>>>> java.io.IOException: Target
>>>>> file:/tmp/hadoop-user/mapred/temp/inject-temp-1280136828/_reduce_dv90x0/part-00000
>>>>> already exists
>> Hmm, wait - this path is strange in itself, because it starts with
>> /tmp/hadoop-user ... Are you running on *nix or Windows/Cygwin? Did you
>> change hadoop-site.xml to redefine hadoop.tmp.dir ? Or perhaps you are
>> running as a user with username "user" ?
>>
>>
>> --
>>
>> Best regards,
>> Andrzej Bialecki     <><
>>   ___. ___ ___ ___ _ _   __________________________________
>> [__ || __|__/|__||\/|  Information Retrieval, Semantic Web
>> ___|||__||  \|  ||  |  Embedded Unix, System Integration
>> http://www.sigram.com  Contact: info at sigram dot com
>>
>>
Reply | Threaded
Open this post in threaded view
|

Re: nutch latest build - inject operation failing

DS jha
I tried setting it to false but it was still throwing the same error.

Looks like when I am using older version of hadoop (0.14.4) it is working fine.

Thanks



On Feb 7, 2008 10:37 AM, Dennis Kubes <[hidden email]> wrote:

> Do you have speculative execution turned on.  If so turn it off.
>
> Dennis
>
>
> DS jha wrote:
> > This is running on Windows/Cygwin, with username 'user' - and it is
> > using default hadoop-site.xml
> >
> > Thanks,
> > Jha
> >
> > On Feb 7, 2008 10:03 AM, Andrzej Bialecki <[hidden email]> wrote:
> >> DS jha wrote:
> >>> Yeah - it is using hadoop v 0.15.3 jar file - strange!
> >>>
> >>>
> >>> Thanks
> >>> Jha
> >>>
> >>>
> >>> On Feb 7, 2008 8:11 AM, Andrzej Bialecki <[hidden email]> wrote:
> >>>> DS jha wrote:
> >>>>> Hi -
> >>>>>
> >>>>> Looks like latest trunk version of nutch is failing with the following
> >>>>> exception when trying to perform inject operation:
> >>>>>
> >>>>> java.io.IOException: Target
> >>>>> file:/tmp/hadoop-user/mapred/temp/inject-temp-1280136828/_reduce_dv90x0/part-00000
> >>>>> already exists
> >> Hmm, wait - this path is strange in itself, because it starts with
> >> /tmp/hadoop-user ... Are you running on *nix or Windows/Cygwin? Did you
> >> change hadoop-site.xml to redefine hadoop.tmp.dir ? Or perhaps you are
> >> running as a user with username "user" ?
> >>
> >>
> >> --
> >>
> >> Best regards,
> >> Andrzej Bialecki     <><
> >>   ___. ___ ___ ___ _ _   __________________________________
> >> [__ || __|__/|__||\/|  Information Retrieval, Semantic Web
> >> ___|||__||  \|  ||  |  Embedded Unix, System Integration
> >> http://www.sigram.com  Contact: info at sigram dot com
> >>
> >>
>
Reply | Threaded
Open this post in threaded view
|

Re: nutch latest build - inject operation failing

Dennis Kubes-2
Would need more info on your configuration.  Local or DFS,
hadoop-site.xml changes.

Dennis

DS jha wrote:

> I tried setting it to false but it was still throwing the same error.
>
> Looks like when I am using older version of hadoop (0.14.4) it is working fine.
>
> Thanks
>
>
>
> On Feb 7, 2008 10:37 AM, Dennis Kubes <[hidden email]> wrote:
>> Do you have speculative execution turned on.  If so turn it off.
>>
>> Dennis
>>
>>
>> DS jha wrote:
>>> This is running on Windows/Cygwin, with username 'user' - and it is
>>> using default hadoop-site.xml
>>>
>>> Thanks,
>>> Jha
>>>
>>> On Feb 7, 2008 10:03 AM, Andrzej Bialecki <[hidden email]> wrote:
>>>> DS jha wrote:
>>>>> Yeah - it is using hadoop v 0.15.3 jar file - strange!
>>>>>
>>>>>
>>>>> Thanks
>>>>> Jha
>>>>>
>>>>>
>>>>> On Feb 7, 2008 8:11 AM, Andrzej Bialecki <[hidden email]> wrote:
>>>>>> DS jha wrote:
>>>>>>> Hi -
>>>>>>>
>>>>>>> Looks like latest trunk version of nutch is failing with the following
>>>>>>> exception when trying to perform inject operation:
>>>>>>>
>>>>>>> java.io.IOException: Target
>>>>>>> file:/tmp/hadoop-user/mapred/temp/inject-temp-1280136828/_reduce_dv90x0/part-00000
>>>>>>> already exists
>>>> Hmm, wait - this path is strange in itself, because it starts with
>>>> /tmp/hadoop-user ... Are you running on *nix or Windows/Cygwin? Did you
>>>> change hadoop-site.xml to redefine hadoop.tmp.dir ? Or perhaps you are
>>>> running as a user with username "user" ?
>>>>
>>>>
>>>> --
>>>>
>>>> Best regards,
>>>> Andrzej Bialecki     <><
>>>>   ___. ___ ___ ___ _ _   __________________________________
>>>> [__ || __|__/|__||\/|  Information Retrieval, Semantic Web
>>>> ___|||__||  \|  ||  |  Embedded Unix, System Integration
>>>> http://www.sigram.com  Contact: info at sigram dot com
>>>>
>>>>
Reply | Threaded
Open this post in threaded view
|

Re: nutch latest build - inject operation failing

DS jha
Local filesystem. No changes to default hadoop-site.xml (it is empty).

Thanks





On Feb 7, 2008 10:54 AM, Dennis Kubes <[hidden email]> wrote:

> Would need more info on your configuration.  Local or DFS,
> hadoop-site.xml changes.
>
> Dennis
>
>
> DS jha wrote:
> > I tried setting it to false but it was still throwing the same error.
> >
> > Looks like when I am using older version of hadoop (0.14.4) it is working fine.
> >
> > Thanks
> >
> >
> >
> > On Feb 7, 2008 10:37 AM, Dennis Kubes <[hidden email]> wrote:
> >> Do you have speculative execution turned on.  If so turn it off.
> >>
> >> Dennis
> >>
> >>
> >> DS jha wrote:
> >>> This is running on Windows/Cygwin, with username 'user' - and it is
> >>> using default hadoop-site.xml
> >>>
> >>> Thanks,
> >>> Jha
> >>>
> >>> On Feb 7, 2008 10:03 AM, Andrzej Bialecki <[hidden email]> wrote:
> >>>> DS jha wrote:
> >>>>> Yeah - it is using hadoop v 0.15.3 jar file - strange!
> >>>>>
> >>>>>
> >>>>> Thanks
> >>>>> Jha
> >>>>>
> >>>>>
> >>>>> On Feb 7, 2008 8:11 AM, Andrzej Bialecki <[hidden email]> wrote:
> >>>>>> DS jha wrote:
> >>>>>>> Hi -
> >>>>>>>
> >>>>>>> Looks like latest trunk version of nutch is failing with the following
> >>>>>>> exception when trying to perform inject operation:
> >>>>>>>
> >>>>>>> java.io.IOException: Target
> >>>>>>> file:/tmp/hadoop-user/mapred/temp/inject-temp-1280136828/_reduce_dv90x0/part-00000
> >>>>>>> already exists
> >>>> Hmm, wait - this path is strange in itself, because it starts with
> >>>> /tmp/hadoop-user ... Are you running on *nix or Windows/Cygwin? Did you
> >>>> change hadoop-site.xml to redefine hadoop.tmp.dir ? Or perhaps you are
> >>>> running as a user with username "user" ?
> >>>>
> >>>>
> >>>> --
> >>>>
> >>>> Best regards,
> >>>> Andrzej Bialecki     <><
> >>>>   ___. ___ ___ ___ _ _   __________________________________
> >>>> [__ || __|__/|__||\/|  Information Retrieval, Semantic Web
> >>>> ___|||__||  \|  ||  |  Embedded Unix, System Integration
> >>>> http://www.sigram.com  Contact: info at sigram dot com
> >>>>
> >>>>
>
Reply | Threaded
Open this post in threaded view
|

Re: nutch latest build - inject operation failing

Susam Pal
In reply to this post by DS jha
I can confirm this error as I just tried running the last revision of
Nutch, rev-620818 on Debian as well as Cygwin on Windows.

It works fine on Debian but fails on Cygwin with this error:-

2008-02-14 19:49:47,756 WARN  regex.RegexURLNormalizer - can\'t find
rules for scope \'inject\', using default
2008-02-14 19:49:48,381 WARN  mapred.LocalJobRunner - job_local_1
java.io.IOException: Target
file:/D:/tmp/hadoop-guest/mapred/temp/inject-temp-322737506/_reduce_bjm6rw/part-00000
already exists
        at org.apache.hadoop.fs.FileUtil.checkDest(FileUtil.java:246)
        at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:125)
        at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:116)
        at org.apache.hadoop.fs.RawLocalFileSystem.rename(RawLocalFileSystem.java:196)
        at org.apache.hadoop.fs.ChecksumFileSystem.rename(ChecksumFileSystem.java:394)
        at org.apache.hadoop.mapred.Task.moveTaskOutputs(Task.java:452)
        at org.apache.hadoop.mapred.Task.moveTaskOutputs(Task.java:469)
        at org.apache.hadoop.mapred.Task.saveTaskOutput(Task.java:426)
        at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:165)
2008-02-14 19:49:49,225 FATAL crawl.Injector - Injector:
java.io.IOException: Job failed!
        at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:831)
        at org.apache.nutch.crawl.Injector.inject(Injector.java:162)
        at org.apache.nutch.crawl.Injector.run(Injector.java:192)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
        at org.apache.hadoop.util.ToolBase.doMain(ToolBase.java:54)
        at org.apache.nutch.crawl.Injector.main(Injector.java:182)

Indeed the \'inject-temp-322737506\' is present in the specified
folder of D drive and doesn\'t get deleted.

Is this because multiple map/reduce is running and one of them is
finding the directory to be present and therefore fails?

So, I also tried setting this in \'conf/hadoop-site.xml\':-

<property>
<name>mapred.speculative.execution</name>
<value>false</value>
<description></description>
</property>

I wonder why the same issue doesn\'t occur in Linux. I am not well
acquainted with the Hadoop code yet. Could someone throw light on what
might be going wrong?

Regards,
Susam Pal

On 2/7/08, DS jha <[hidden email]> wrote:
Hi -

>
> Looks like latest trunk version of nutch is failing with the following
> exception when trying to perform inject operation:
>
> java.io.IOException: Target
> file:/tmp/hadoop-user/mapred/temp/inject-temp-1280136828/_reduce_dv90x0/part-00000
> already exists
>         at org.apache.hadoop.fs.FileUtil.checkDest(FileUtil.java:246)
>         at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:125)
>         at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:116)
>         at org.apache.hadoop.fs.RawLocalFileSystem.rename(RawLocalFileSystem.java:196)
>         at org.apache.hadoop.fs.ChecksumFileSystem.rename(ChecksumFileSystem.java:394)
>         at org.apache.hadoop.mapred.Task.moveTaskOutputs(Task.java:452)
>         at org.apache.hadoop.mapred.Task.moveTaskOutputs(Task.java:469)
>         at org.apache.hadoop.mapred.Task.saveTaskOutput(Task.java:426)
>         at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:165)
>
> Any thoughts?
>
> Thanks
> Jha
>
Reply | Threaded
Open this post in threaded view
|

Re: nutch latest build - inject operation failing

Dennis Kubes-2
I think what might be occurring is a file path issue with hadoop.  I
have seen it in the past.  Can you try on windows using the cygdrive
path and see if that works?  For below it would be /cygdrive/D/tmp/ ...

Dennis

Susam Pal wrote:

> I can confirm this error as I just tried running the last revision of
> Nutch, rev-620818 on Debian as well as Cygwin on Windows.
>
> It works fine on Debian but fails on Cygwin with this error:-
>
> 2008-02-14 19:49:47,756 WARN  regex.RegexURLNormalizer - can\'t find
> rules for scope \'inject\', using default
> 2008-02-14 19:49:48,381 WARN  mapred.LocalJobRunner - job_local_1
> java.io.IOException: Target
> file:/D:/tmp/hadoop-guest/mapred/temp/inject-temp-322737506/_reduce_bjm6rw/part-00000
> already exists
> at org.apache.hadoop.fs.FileUtil.checkDest(FileUtil.java:246)
> at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:125)
> at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:116)
> at org.apache.hadoop.fs.RawLocalFileSystem.rename(RawLocalFileSystem.java:196)
> at org.apache.hadoop.fs.ChecksumFileSystem.rename(ChecksumFileSystem.java:394)
> at org.apache.hadoop.mapred.Task.moveTaskOutputs(Task.java:452)
> at org.apache.hadoop.mapred.Task.moveTaskOutputs(Task.java:469)
> at org.apache.hadoop.mapred.Task.saveTaskOutput(Task.java:426)
> at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:165)
> 2008-02-14 19:49:49,225 FATAL crawl.Injector - Injector:
> java.io.IOException: Job failed!
> at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:831)
> at org.apache.nutch.crawl.Injector.inject(Injector.java:162)
> at org.apache.nutch.crawl.Injector.run(Injector.java:192)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
> at org.apache.hadoop.util.ToolBase.doMain(ToolBase.java:54)
> at org.apache.nutch.crawl.Injector.main(Injector.java:182)
>
> Indeed the \'inject-temp-322737506\' is present in the specified
> folder of D drive and doesn\'t get deleted.
>
> Is this because multiple map/reduce is running and one of them is
> finding the directory to be present and therefore fails?
>
> So, I also tried setting this in \'conf/hadoop-site.xml\':-
>
> <property>
> <name>mapred.speculative.execution</name>
> <value>false</value>
> <description></description>
> </property>
>
> I wonder why the same issue doesn\'t occur in Linux. I am not well
> acquainted with the Hadoop code yet. Could someone throw light on what
> might be going wrong?
>
> Regards,
> Susam Pal
>
> On 2/7/08, DS jha <[hidden email]> wrote:
> Hi -
>> Looks like latest trunk version of nutch is failing with the following
>> exception when trying to perform inject operation:
>>
>> java.io.IOException: Target
>> file:/tmp/hadoop-user/mapred/temp/inject-temp-1280136828/_reduce_dv90x0/part-00000
>> already exists
>>         at org.apache.hadoop.fs.FileUtil.checkDest(FileUtil.java:246)
>>         at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:125)
>>         at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:116)
>>         at org.apache.hadoop.fs.RawLocalFileSystem.rename(RawLocalFileSystem.java:196)
>>         at org.apache.hadoop.fs.ChecksumFileSystem.rename(ChecksumFileSystem.java:394)
>>         at org.apache.hadoop.mapred.Task.moveTaskOutputs(Task.java:452)
>>         at org.apache.hadoop.mapred.Task.moveTaskOutputs(Task.java:469)
>>         at org.apache.hadoop.mapred.Task.saveTaskOutput(Task.java:426)
>>         at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:165)
>>
>> Any thoughts?
>>
>> Thanks
>> Jha
>>
Reply | Threaded
Open this post in threaded view
|

Re: nutch latest build - inject operation failing

Susam Pal
What I did try was setting hadoop.tmp.dir to /opt/tmp. I found the
behavior strange. I had an /opt/tmp directory in my Cygwin
installation (Absolute Windows path: D:\Cygwin\opt\tmp) and I was
expecting Hadoop to use it. However, it created a new D:\opt\tmp and
wrote the temp files there. Of course this failed with the same error.

Right now I don't have a Windows system with me. I will try setting it
as /cygdrive/d/tmp/ tomorrow when I again have access to a Windows
system and then I'll update the mailing list with the observations.
Thanks for the suggestion.

Regards,
Susam Pal

On Thu, Feb 14, 2008 at 9:41 PM, Dennis Kubes <[hidden email]> wrote:

> I think what might be occurring is a file path issue with hadoop.  I
>  have seen it in the past.  Can you try on windows using the cygdrive
>  path and see if that works?  For below it would be /cygdrive/D/tmp/ ...
>
>  Dennis
>
>
>
>  Susam Pal wrote:
>  > I can confirm this error as I just tried running the last revision of
>  > Nutch, rev-620818 on Debian as well as Cygwin on Windows.
>  >
>  > It works fine on Debian but fails on Cygwin with this error:-
>  >
>  > 2008-02-14 19:49:47,756 WARN  regex.RegexURLNormalizer - can\'t find
>  > rules for scope \'inject\', using default
>  > 2008-02-14 19:49:48,381 WARN  mapred.LocalJobRunner - job_local_1
>  > java.io.IOException: Target
>  > file:/D:/tmp/hadoop-guest/mapred/temp/inject-temp-322737506/_reduce_bjm6rw/part-00000
>  > already exists
>  >       at org.apache.hadoop.fs.FileUtil.checkDest(FileUtil.java:246)
>  >       at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:125)
>  >       at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:116)
>  >       at org.apache.hadoop.fs.RawLocalFileSystem.rename(RawLocalFileSystem.java:196)
>  >       at org.apache.hadoop.fs.ChecksumFileSystem.rename(ChecksumFileSystem.java:394)
>  >       at org.apache.hadoop.mapred.Task.moveTaskOutputs(Task.java:452)
>  >       at org.apache.hadoop.mapred.Task.moveTaskOutputs(Task.java:469)
>  >       at org.apache.hadoop.mapred.Task.saveTaskOutput(Task.java:426)
>  >       at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:165)
>  > 2008-02-14 19:49:49,225 FATAL crawl.Injector - Injector:
>  > java.io.IOException: Job failed!
>  >       at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:831)
>  >       at org.apache.nutch.crawl.Injector.inject(Injector.java:162)
>  >       at org.apache.nutch.crawl.Injector.run(Injector.java:192)
>  >       at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>  >       at org.apache.hadoop.util.ToolBase.doMain(ToolBase.java:54)
>  >       at org.apache.nutch.crawl.Injector.main(Injector.java:182)
>  >
>  > Indeed the \'inject-temp-322737506\' is present in the specified
>  > folder of D drive and doesn\'t get deleted.
>  >
>  > Is this because multiple map/reduce is running and one of them is
>  > finding the directory to be present and therefore fails?
>  >
>  > So, I also tried setting this in \'conf/hadoop-site.xml\':-
>  >
>  > <property>
>  > <name>mapred.speculative.execution</name>
>  > <value>false</value>
>  > <description></description>
>  > </property>
>  >
>  > I wonder why the same issue doesn\'t occur in Linux. I am not well
>  > acquainted with the Hadoop code yet. Could someone throw light on what
>  > might be going wrong?
>  >
>  > Regards,
>  > Susam Pal
>  >
>  > On 2/7/08, DS jha <[hidden email]> wrote:
>  > Hi -
>  >> Looks like latest trunk version of nutch is failing with the following
>  >> exception when trying to perform inject operation:
>  >>
>  >> java.io.IOException: Target
>  >> file:/tmp/hadoop-user/mapred/temp/inject-temp-1280136828/_reduce_dv90x0/part-00000
>  >> already exists
>  >>         at org.apache.hadoop.fs.FileUtil.checkDest(FileUtil.java:246)
>  >>         at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:125)
>  >>         at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:116)
>  >>         at org.apache.hadoop.fs.RawLocalFileSystem.rename(RawLocalFileSystem.java:196)
>  >>         at org.apache.hadoop.fs.ChecksumFileSystem.rename(ChecksumFileSystem.java:394)
>  >>         at org.apache.hadoop.mapred.Task.moveTaskOutputs(Task.java:452)
>  >>         at org.apache.hadoop.mapred.Task.moveTaskOutputs(Task.java:469)
>  >>         at org.apache.hadoop.mapred.Task.saveTaskOutput(Task.java:426)
>  >>         at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:165)
>  >>
>  >> Any thoughts?
>  >>
>  >> Thanks
>  >> Jha
>  >>
>
Reply | Threaded
Open this post in threaded view
|

Re: nutch latest build - inject operation failing

Susam Pal
I tried setting hadoop.tmp.dir to /cygdrive/d/tmp and it created
D:\cygdrive\d\tmp\mapred\temp\inject-temp-1365510909\_reduce_n7v9vq.

The same error occurred:-

2008-02-15 10:19:22,833 WARN  mapred.LocalJobRunner - job_local_1
java.io.IOException: Target file:/D:/cygdrive/d/tmp/mapred/temp/inject-temp-1365
510909/_reduce_n7v9vq/part-00000 already exists
       at org.apache.hadoop.fs.FileUtil.checkDest(FileUtil.java:246)
       at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:125)
       at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:116)
       at org.apache.hadoop.fs.RawLocalFileSystem.rename(RawLocalFileSystem.java:180)
       at org.apache.hadoop.fs.ChecksumFileSystem.rename(ChecksumFileSystem.java:394)
       at org.apache.hadoop.mapred.Task.moveTaskOutputs(Task.java:452)
       at org.apache.hadoop.mapred.Task.moveTaskOutputs(Task.java:469)
       at org.apache.hadoop.mapred.Task.saveTaskOutput(Task.java:426)
       at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:165)

Regards,
Susam Pal

On Thu, Feb 14, 2008 at 10:07 PM, Susam Pal <[hidden email]> wrote:

> What I did try was setting hadoop.tmp.dir to /opt/tmp. I found the
>  behavior strange. I had an /opt/tmp directory in my Cygwin
>  installation (Absolute Windows path: D:\Cygwin\opt\tmp) and I was
>  expecting Hadoop to use it. However, it created a new D:\opt\tmp and
>  wrote the temp files there. Of course this failed with the same error.
>
>  Right now I don't have a Windows system with me. I will try setting it
>  as /cygdrive/d/tmp/ tomorrow when I again have access to a Windows
>  system and then I'll update the mailing list with the observations.
>  Thanks for the suggestion.
>
>  Regards,
>  Susam Pal
>
>
>
>  On Thu, Feb 14, 2008 at 9:41 PM, Dennis Kubes <[hidden email]> wrote:
>  > I think what might be occurring is a file path issue with hadoop.  I
>  >  have seen it in the past.  Can you try on windows using the cygdrive
>  >  path and see if that works?  For below it would be /cygdrive/D/tmp/ ...
>  >
>  >  Dennis
>  >
>  >
>  >
>  >  Susam Pal wrote:
>  >  > I can confirm this error as I just tried running the last revision of
>  >  > Nutch, rev-620818 on Debian as well as Cygwin on Windows.
>  >  >
>  >  > It works fine on Debian but fails on Cygwin with this error:-
>  >  >
>  >  > 2008-02-14 19:49:47,756 WARN  regex.RegexURLNormalizer - can\'t find
>  >  > rules for scope \'inject\', using default
>  >  > 2008-02-14 19:49:48,381 WARN  mapred.LocalJobRunner - job_local_1
>  >  > java.io.IOException: Target
>  >  > file:/D:/tmp/hadoop-guest/mapred/temp/inject-temp-322737506/_reduce_bjm6rw/part-00000
>  >  > already exists
>  >  >       at org.apache.hadoop.fs.FileUtil.checkDest(FileUtil.java:246)
>  >  >       at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:125)
>  >  >       at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:116)
>  >  >       at org.apache.hadoop.fs.RawLocalFileSystem.rename(RawLocalFileSystem.java:196)
>  >  >       at org.apache.hadoop.fs.ChecksumFileSystem.rename(ChecksumFileSystem.java:394)
>  >  >       at org.apache.hadoop.mapred.Task.moveTaskOutputs(Task.java:452)
>  >  >       at org.apache.hadoop.mapred.Task.moveTaskOutputs(Task.java:469)
>  >  >       at org.apache.hadoop.mapred.Task.saveTaskOutput(Task.java:426)
>  >  >       at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:165)
>  >  > 2008-02-14 19:49:49,225 FATAL crawl.Injector - Injector:
>  >  > java.io.IOException: Job failed!
>  >  >       at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:831)
>  >  >       at org.apache.nutch.crawl.Injector.inject(Injector.java:162)
>  >  >       at org.apache.nutch.crawl.Injector.run(Injector.java:192)
>  >  >       at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>  >  >       at org.apache.hadoop.util.ToolBase.doMain(ToolBase.java:54)
>  >  >       at org.apache.nutch.crawl.Injector.main(Injector.java:182)
>  >  >
>  >  > Indeed the \'inject-temp-322737506\' is present in the specified
>  >  > folder of D drive and doesn\'t get deleted.
>  >  >
>  >  > Is this because multiple map/reduce is running and one of them is
>  >  > finding the directory to be present and therefore fails?
>  >  >
>  >  > So, I also tried setting this in \'conf/hadoop-site.xml\':-
>  >  >
>  >  > <property>
>  >  > <name>mapred.speculative.execution</name>
>  >  > <value>false</value>
>  >  > <description></description>
>  >  > </property>
>  >  >
>  >  > I wonder why the same issue doesn\'t occur in Linux. I am not well
>  >  > acquainted with the Hadoop code yet. Could someone throw light on what
>  >  > might be going wrong?
>  >  >
>  >  > Regards,
>  >  > Susam Pal
>  >  >
>  >  > On 2/7/08, DS jha <[hidden email]> wrote:
>  >  > Hi -
>  >  >> Looks like latest trunk version of nutch is failing with the following
>  >  >> exception when trying to perform inject operation:
>  >  >>
>  >  >> java.io.IOException: Target
>  >  >> file:/tmp/hadoop-user/mapred/temp/inject-temp-1280136828/_reduce_dv90x0/part-00000
>  >  >> already exists
>  >  >>         at org.apache.hadoop.fs.FileUtil.checkDest(FileUtil.java:246)
>  >  >>         at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:125)
>  >  >>         at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:116)
>  >  >>         at org.apache.hadoop.fs.RawLocalFileSystem.rename(RawLocalFileSystem.java:196)
>  >  >>         at org.apache.hadoop.fs.ChecksumFileSystem.rename(ChecksumFileSystem.java:394)
>  >  >>         at org.apache.hadoop.mapred.Task.moveTaskOutputs(Task.java:452)
>  >  >>         at org.apache.hadoop.mapred.Task.moveTaskOutputs(Task.java:469)
>  >  >>         at org.apache.hadoop.mapred.Task.saveTaskOutput(Task.java:426)
>  >  >>         at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:165)
>  >  >>
>  >  >> Any thoughts?
>  >  >>
>  >  >> Thanks
>  >  >> Jha
>  >  >>
>  >
>
Reply | Threaded
Open this post in threaded view
|

Re: nutch latest build - inject operation failing

esmithers
Any resolution to this problem? I just tried installing on Windows and I'm getting the same problem.

Susam Pal wrote
I tried setting hadoop.tmp.dir to /cygdrive/d/tmp and it created
D:\cygdrive\d\tmp\mapred\temp\inject-temp-1365510909\_reduce_n7v9vq.

The same error occurred:-

2008-02-15 10:19:22,833 WARN  mapred.LocalJobRunner - job_local_1
java.io.IOException: Target file:/D:/cygdrive/d/tmp/mapred/temp/inject-temp-1365
510909/_reduce_n7v9vq/part-00000 already exists
       at org.apache.hadoop.fs.FileUtil.checkDest(FileUtil.java:246)
       at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:125)
       at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:116)
       at org.apache.hadoop.fs.RawLocalFileSystem.rename(RawLocalFileSystem.java:180)
       at org.apache.hadoop.fs.ChecksumFileSystem.rename(ChecksumFileSystem.java:394)
       at org.apache.hadoop.mapred.Task.moveTaskOutputs(Task.java:452)
       at org.apache.hadoop.mapred.Task.moveTaskOutputs(Task.java:469)
       at org.apache.hadoop.mapred.Task.saveTaskOutput(Task.java:426)
       at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:165)

Regards,
Susam Pal

On Thu, Feb 14, 2008 at 10:07 PM, Susam Pal <susam.pal@gmail.com> wrote:
> What I did try was setting hadoop.tmp.dir to /opt/tmp. I found the
>  behavior strange. I had an /opt/tmp directory in my Cygwin
>  installation (Absolute Windows path: D:\Cygwin\opt\tmp) and I was
>  expecting Hadoop to use it. However, it created a new D:\opt\tmp and
>  wrote the temp files there. Of course this failed with the same error.
>
>  Right now I don't have a Windows system with me. I will try setting it
>  as /cygdrive/d/tmp/ tomorrow when I again have access to a Windows
>  system and then I'll update the mailing list with the observations.
>  Thanks for the suggestion.
>
>  Regards,
>  Susam Pal
>
>
>
>  On Thu, Feb 14, 2008 at 9:41 PM, Dennis Kubes <kubes@apache.org> wrote:
>  > I think what might be occurring is a file path issue with hadoop.  I
>  >  have seen it in the past.  Can you try on windows using the cygdrive
>  >  path and see if that works?  For below it would be /cygdrive/D/tmp/ ...
>  >
>  >  Dennis
>  >
>  >
>  >
>  >  Susam Pal wrote:
>  >  > I can confirm this error as I just tried running the last revision of
>  >  > Nutch, rev-620818 on Debian as well as Cygwin on Windows.
>  >  >
>  >  > It works fine on Debian but fails on Cygwin with this error:-
>  >  >
>  >  > 2008-02-14 19:49:47,756 WARN  regex.RegexURLNormalizer - can\'t find
>  >  > rules for scope \'inject\', using default
>  >  > 2008-02-14 19:49:48,381 WARN  mapred.LocalJobRunner - job_local_1
>  >  > java.io.IOException: Target
>  >  > file:/D:/tmp/hadoop-guest/mapred/temp/inject-temp-322737506/_reduce_bjm6rw/part-00000
>  >  > already exists
>  >  >       at org.apache.hadoop.fs.FileUtil.checkDest(FileUtil.java:246)
>  >  >       at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:125)
>  >  >       at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:116)
>  >  >       at org.apache.hadoop.fs.RawLocalFileSystem.rename(RawLocalFileSystem.java:196)
>  >  >       at org.apache.hadoop.fs.ChecksumFileSystem.rename(ChecksumFileSystem.java:394)
>  >  >       at org.apache.hadoop.mapred.Task.moveTaskOutputs(Task.java:452)
>  >  >       at org.apache.hadoop.mapred.Task.moveTaskOutputs(Task.java:469)
>  >  >       at org.apache.hadoop.mapred.Task.saveTaskOutput(Task.java:426)
>  >  >       at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:165)
>  >  > 2008-02-14 19:49:49,225 FATAL crawl.Injector - Injector:
>  >  > java.io.IOException: Job failed!
>  >  >       at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:831)
>  >  >       at org.apache.nutch.crawl.Injector.inject(Injector.java:162)
>  >  >       at org.apache.nutch.crawl.Injector.run(Injector.java:192)
>  >  >       at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>  >  >       at org.apache.hadoop.util.ToolBase.doMain(ToolBase.java:54)
>  >  >       at org.apache.nutch.crawl.Injector.main(Injector.java:182)
>  >  >
>  >  > Indeed the \'inject-temp-322737506\' is present in the specified
>  >  > folder of D drive and doesn\'t get deleted.
>  >  >
>  >  > Is this because multiple map/reduce is running and one of them is
>  >  > finding the directory to be present and therefore fails?
>  >  >
>  >  > So, I also tried setting this in \'conf/hadoop-site.xml\':-
>  >  >
>  >  > <property>
>  >  > <name>mapred.speculative.execution</name>
>  >  > <value>false</value>
>  >  > <description></description>
>  >  > </property>
>  >  >
>  >  > I wonder why the same issue doesn\'t occur in Linux. I am not well
>  >  > acquainted with the Hadoop code yet. Could someone throw light on what
>  >  > might be going wrong?
>  >  >
>  >  > Regards,
>  >  > Susam Pal
>  >  >
>  >  > On 2/7/08, DS jha <aedsjha@gmail.com> wrote:
>  >  > Hi -
>  >  >> Looks like latest trunk version of nutch is failing with the following
>  >  >> exception when trying to perform inject operation:
>  >  >>
>  >  >> java.io.IOException: Target
>  >  >> file:/tmp/hadoop-user/mapred/temp/inject-temp-1280136828/_reduce_dv90x0/part-00000
>  >  >> already exists
>  >  >>         at org.apache.hadoop.fs.FileUtil.checkDest(FileUtil.java:246)
>  >  >>         at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:125)
>  >  >>         at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:116)
>  >  >>         at org.apache.hadoop.fs.RawLocalFileSystem.rename(RawLocalFileSystem.java:196)
>  >  >>         at org.apache.hadoop.fs.ChecksumFileSystem.rename(ChecksumFileSystem.java:394)
>  >  >>         at org.apache.hadoop.mapred.Task.moveTaskOutputs(Task.java:452)
>  >  >>         at org.apache.hadoop.mapred.Task.moveTaskOutputs(Task.java:469)
>  >  >>         at org.apache.hadoop.mapred.Task.saveTaskOutput(Task.java:426)
>  >  >>         at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:165)
>  >  >>
>  >  >> Any thoughts?
>  >  >>
>  >  >> Thanks
>  >  >> Jha
>  >  >>
>  >
>