fetcher failing with outofmemory exception

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

fetcher failing with outofmemory exception

DS jha
Hello -

I am using latest nutch trunk on a Linux machine (single file system)
- I am trying to fetch about 5-10K pages and every time I run fetch
command, after fetching few hundred pages, it starts throwing
OutofMemory exception (not related to heapsize):

2008-02-08 02:41:01,395 FATAL fetcher.Fetcher - java.io.IOException:
java.io.IOException: Cannot allocate memory
2008-02-08 02:41:01,719 FATAL fetcher.Fetcher - at
java.lang.UNIXProcess.<init>(UNIXProcess.java:148)
2008-02-08 02:41:01,719 FATAL fetcher.Fetcher - at
java.lang.ProcessImpl.start(ProcessImpl.java:65)
2008-02-08 02:41:01,719 FATAL fetcher.Fetcher - at
java.lang.ProcessBuilder.start(ProcessBuilder.java:451)
2008-02-08 02:41:01,719 FATAL fetcher.Fetcher - at
java.lang.Runtime.exec(Runtime.java:591)
2008-02-08 02:41:01,719 FATAL fetcher.Fetcher - at
java.lang.Runtime.exec(Runtime.java:464)
2008-02-08 02:41:01,719 FATAL fetcher.Fetcher - at
org.apache.hadoop.fs.ShellCommand.runCommand(ShellCommand.java:48)
2008-02-08 02:41:01,719 FATAL fetcher.Fetcher - at
org.apache.hadoop.fs.ShellCommand.run(ShellCommand.java:42)
2008-02-08 02:41:01,720 FATAL fetcher.Fetcher - at
org.apache.hadoop.fs.DF.getAvailable(DF.java:72)
2008-02-08 02:41:01,720 FATAL fetcher.Fetcher - at
org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:296)
2008-02-08 02:41:01,720 FATAL fetcher.Fetcher - at
org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:124)
2008-02-08 02:41:01,720 FATAL fetcher.Fetcher - at
org.apache.hadoop.mapred.MapOutputFile.getSpillFileForWrite(MapOutputFile.java:88)
2008-02-08 02:41:01,720 FATAL fetcher.Fetcher - at
org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpillToDisk(MapTask.java:382)
2008-02-08 02:41:01,720 FATAL fetcher.Fetcher - at
org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:364)
2008-02-08 02:41:01,720 FATAL fetcher.Fetcher - at
org.apache.nutch.fetcher.Fetcher$FetcherThread.output(Fetcher.java:354)
2008-02-08 02:41:01,720 FATAL fetcher.Fetcher - at
org.apache.nutch.fetcher.Fetcher$FetcherThread.run(Fetcher.java:178)

Hard-disk does have  enough space (over 20GB of which <2 GB is used)

I am mostly using default hadoop and nutch settings (I tried changing
number of fetch threads - default 35 to 50, and 100 - but it doesn't
have any impact - Fetcher keeps on throwing the above exception after
a while.

Any thoughts?

Thanks
Jha.
Reply | Threaded
Open this post in threaded view
|

RE: fetcher failing with outofmemory exception

Arkadi.Kosmynin
I don't know if you are using any custom plugins on the fetching stage.
I don't even know if this is possible (I don't need it). But, I have had
a similar experience with indexing. After a few thousand pages, Nutch
would start complaining about lack of memory. The culprit was my plugin
that created a connection to a database in each call.

So, if you _are_ using custom plugins, make sure that they don't leak
resources and reduce dependency on garbage collection to the minimum.

Regards,

Arkadi

> -----Original Message-----
> From: DS jha [mailto:[hidden email]]
> Sent: Friday, February 08, 2008 4:17 PM
> To: [hidden email]
> Subject: fetcher failing with outofmemory exception
>
> Hello -
>
> I am using latest nutch trunk on a Linux machine (single file system)
> - I am trying to fetch about 5-10K pages and every time I run fetch
> command, after fetching few hundred pages, it starts throwing
> OutofMemory exception (not related to heapsize):
>
> 2008-02-08 02:41:01,395 FATAL fetcher.Fetcher - java.io.IOException:
> java.io.IOException: Cannot allocate memory
> 2008-02-08 02:41:01,719 FATAL fetcher.Fetcher - at
> java.lang.UNIXProcess.<init>(UNIXProcess.java:148)
> 2008-02-08 02:41:01,719 FATAL fetcher.Fetcher - at
> java.lang.ProcessImpl.start(ProcessImpl.java:65)
> 2008-02-08 02:41:01,719 FATAL fetcher.Fetcher - at
> java.lang.ProcessBuilder.start(ProcessBuilder.java:451)
> 2008-02-08 02:41:01,719 FATAL fetcher.Fetcher - at
> java.lang.Runtime.exec(Runtime.java:591)
> 2008-02-08 02:41:01,719 FATAL fetcher.Fetcher - at
> java.lang.Runtime.exec(Runtime.java:464)
> 2008-02-08 02:41:01,719 FATAL fetcher.Fetcher - at
> org.apache.hadoop.fs.ShellCommand.runCommand(ShellCommand.java:48)
> 2008-02-08 02:41:01,719 FATAL fetcher.Fetcher - at
> org.apache.hadoop.fs.ShellCommand.run(ShellCommand.java:42)
> 2008-02-08 02:41:01,720 FATAL fetcher.Fetcher - at
> org.apache.hadoop.fs.DF.getAvailable(DF.java:72)
> 2008-02-08 02:41:01,720 FATAL fetcher.Fetcher - at
>
org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathF
or
> Write(LocalDirAllocator.java:296)
> 2008-02-08 02:41:01,720 FATAL fetcher.Fetcher - at
>
org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllo
ca
> tor.java:124)
> 2008-02-08 02:41:01,720 FATAL fetcher.Fetcher - at
>
org.apache.hadoop.mapred.MapOutputFile.getSpillFileForWrite(MapOutputFil
e.
> java:88)
> 2008-02-08 02:41:01,720 FATAL fetcher.Fetcher - at
>
org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpillToDisk(MapT
as
> k.java:382)
> 2008-02-08 02:41:01,720 FATAL fetcher.Fetcher - at
>
org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:36
4)
> 2008-02-08 02:41:01,720 FATAL fetcher.Fetcher - at
>
org.apache.nutch.fetcher.Fetcher$FetcherThread.output(Fetcher.java:354)

> 2008-02-08 02:41:01,720 FATAL fetcher.Fetcher - at
> org.apache.nutch.fetcher.Fetcher$FetcherThread.run(Fetcher.java:178)
>
> Hard-disk does have  enough space (over 20GB of which <2 GB is used)
>
> I am mostly using default hadoop and nutch settings (I tried changing
> number of fetch threads - default 35 to 50, and 100 - but it doesn't
> have any impact - Fetcher keeps on throwing the above exception after
> a while.
>
> Any thoughts?
>
> Thanks
> Jha.


Reply | Threaded
Open this post in threaded view
|

Re: fetcher failing with outofmemory exception

DS jha
I removed my custom plugin and ran fetch again - but it is still
faling with the same OutOfMemory exception.

Thanks much,



On Feb 8, 2008 6:12 PM,  <[hidden email]> wrote:

> I don't know if you are using any custom plugins on the fetching stage.
> I don't even know if this is possible (I don't need it). But, I have had
> a similar experience with indexing. After a few thousand pages, Nutch
> would start complaining about lack of memory. The culprit was my plugin
> that created a connection to a database in each call.
>
> So, if you _are_ using custom plugins, make sure that they don't leak
> resources and reduce dependency on garbage collection to the minimum.
>
> Regards,
>
> Arkadi
>
>
> > -----Original Message-----
> > From: DS jha [mailto:[hidden email]]
> > Sent: Friday, February 08, 2008 4:17 PM
> > To: [hidden email]
> > Subject: fetcher failing with outofmemory exception
> >
> > Hello -
> >
> > I am using latest nutch trunk on a Linux machine (single file system)
> > - I am trying to fetch about 5-10K pages and every time I run fetch
> > command, after fetching few hundred pages, it starts throwing
> > OutofMemory exception (not related to heapsize):
> >
> > 2008-02-08 02:41:01,395 FATAL fetcher.Fetcher - java.io.IOException:
> > java.io.IOException: Cannot allocate memory
> > 2008-02-08 02:41:01,719 FATAL fetcher.Fetcher - at
> > java.lang.UNIXProcess.<init>(UNIXProcess.java:148)
> > 2008-02-08 02:41:01,719 FATAL fetcher.Fetcher - at
> > java.lang.ProcessImpl.start(ProcessImpl.java:65)
> > 2008-02-08 02:41:01,719 FATAL fetcher.Fetcher - at
> > java.lang.ProcessBuilder.start(ProcessBuilder.java:451)
> > 2008-02-08 02:41:01,719 FATAL fetcher.Fetcher - at
> > java.lang.Runtime.exec(Runtime.java:591)
> > 2008-02-08 02:41:01,719 FATAL fetcher.Fetcher - at
> > java.lang.Runtime.exec(Runtime.java:464)
> > 2008-02-08 02:41:01,719 FATAL fetcher.Fetcher - at
> > org.apache.hadoop.fs.ShellCommand.runCommand(ShellCommand.java:48)
> > 2008-02-08 02:41:01,719 FATAL fetcher.Fetcher - at
> > org.apache.hadoop.fs.ShellCommand.run(ShellCommand.java:42)
> > 2008-02-08 02:41:01,720 FATAL fetcher.Fetcher - at
> > org.apache.hadoop.fs.DF.getAvailable(DF.java:72)
> > 2008-02-08 02:41:01,720 FATAL fetcher.Fetcher - at
> >
> org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathF
> or
> > Write(LocalDirAllocator.java:296)
> > 2008-02-08 02:41:01,720 FATAL fetcher.Fetcher - at
> >
> org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllo
> ca
> > tor.java:124)
> > 2008-02-08 02:41:01,720 FATAL fetcher.Fetcher - at
> >
> org.apache.hadoop.mapred.MapOutputFile.getSpillFileForWrite(MapOutputFil
> e.
> > java:88)
> > 2008-02-08 02:41:01,720 FATAL fetcher.Fetcher - at
> >
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpillToDisk(MapT
> as
> > k.java:382)
> > 2008-02-08 02:41:01,720 FATAL fetcher.Fetcher - at
> >
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:36
> 4)
> > 2008-02-08 02:41:01,720 FATAL fetcher.Fetcher - at
> >
> org.apache.nutch.fetcher.Fetcher$FetcherThread.output(Fetcher.java:354)
> > 2008-02-08 02:41:01,720 FATAL fetcher.Fetcher - at
> > org.apache.nutch.fetcher.Fetcher$FetcherThread.run(Fetcher.java:178)
> >
> > Hard-disk does have  enough space (over 20GB of which <2 GB is used)
> >
> > I am mostly using default hadoop and nutch settings (I tried changing
> > number of fetch threads - default 35 to 50, and 100 - but it doesn't
> > have any impact - Fetcher keeps on throwing the above exception after
> > a while.
> >
> > Any thoughts?
> >
> > Thanks
> > Jha.
>
>
>
Reply | Threaded
Open this post in threaded view
|

Re: fetcher failing with outofmemory exception

DS jha
I am running this on a linux box with about 950MB of RAM - do I need
to increase my memory? I tried running with about 20-30 threads to
about 100 threads and also specified Xmx option to set max heap size -
but  after a while, fetcher always fail with OutofMemory exception. is
there a min memory requirement for running nutch?


Thanks
-Jha





On Feb 9, 2008 7:15 PM, DS jha <[hidden email]> wrote:

> I removed my custom plugin and ran fetch again - but it is still
> faling with the same OutOfMemory exception.
>
> Thanks much,
>
>
>
>
> On Feb 8, 2008 6:12 PM,  <[hidden email]> wrote:
> > I don't know if you are using any custom plugins on the fetching stage.
> > I don't even know if this is possible (I don't need it). But, I have had
> > a similar experience with indexing. After a few thousand pages, Nutch
> > would start complaining about lack of memory. The culprit was my plugin
> > that created a connection to a database in each call.
> >
> > So, if you _are_ using custom plugins, make sure that they don't leak
> > resources and reduce dependency on garbage collection to the minimum.
> >
> > Regards,
> >
> > Arkadi
> >
> >
> > > -----Original Message-----
> > > From: DS jha [mailto:[hidden email]]
> > > Sent: Friday, February 08, 2008 4:17 PM
> > > To: [hidden email]
> > > Subject: fetcher failing with outofmemory exception
> > >
> > > Hello -
> > >
> > > I am using latest nutch trunk on a Linux machine (single file system)
> > > - I am trying to fetch about 5-10K pages and every time I run fetch
> > > command, after fetching few hundred pages, it starts throwing
> > > OutofMemory exception (not related to heapsize):
> > >
> > > 2008-02-08 02:41:01,395 FATAL fetcher.Fetcher - java.io.IOException:
> > > java.io.IOException: Cannot allocate memory
> > > 2008-02-08 02:41:01,719 FATAL fetcher.Fetcher - at
> > > java.lang.UNIXProcess.<init>(UNIXProcess.java:148)
> > > 2008-02-08 02:41:01,719 FATAL fetcher.Fetcher - at
> > > java.lang.ProcessImpl.start(ProcessImpl.java:65)
> > > 2008-02-08 02:41:01,719 FATAL fetcher.Fetcher - at
> > > java.lang.ProcessBuilder.start(ProcessBuilder.java:451)
> > > 2008-02-08 02:41:01,719 FATAL fetcher.Fetcher - at
> > > java.lang.Runtime.exec(Runtime.java:591)
> > > 2008-02-08 02:41:01,719 FATAL fetcher.Fetcher - at
> > > java.lang.Runtime.exec(Runtime.java:464)
> > > 2008-02-08 02:41:01,719 FATAL fetcher.Fetcher - at
> > > org.apache.hadoop.fs.ShellCommand.runCommand(ShellCommand.java:48)
> > > 2008-02-08 02:41:01,719 FATAL fetcher.Fetcher - at
> > > org.apache.hadoop.fs.ShellCommand.run(ShellCommand.java:42)
> > > 2008-02-08 02:41:01,720 FATAL fetcher.Fetcher - at
> > > org.apache.hadoop.fs.DF.getAvailable(DF.java:72)
> > > 2008-02-08 02:41:01,720 FATAL fetcher.Fetcher - at
> > >
> > org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathF
> > or
> > > Write(LocalDirAllocator.java:296)
> > > 2008-02-08 02:41:01,720 FATAL fetcher.Fetcher - at
> > >
> > org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllo
> > ca
> > > tor.java:124)
> > > 2008-02-08 02:41:01,720 FATAL fetcher.Fetcher - at
> > >
> > org.apache.hadoop.mapred.MapOutputFile.getSpillFileForWrite(MapOutputFil
> > e.
> > > java:88)
> > > 2008-02-08 02:41:01,720 FATAL fetcher.Fetcher - at
> > >
> > org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpillToDisk(MapT
> > as
> > > k.java:382)
> > > 2008-02-08 02:41:01,720 FATAL fetcher.Fetcher - at
> > >
> > org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:36
> > 4)
> > > 2008-02-08 02:41:01,720 FATAL fetcher.Fetcher - at
> > >
> > org.apache.nutch.fetcher.Fetcher$FetcherThread.output(Fetcher.java:354)
> > > 2008-02-08 02:41:01,720 FATAL fetcher.Fetcher - at
> > > org.apache.nutch.fetcher.Fetcher$FetcherThread.run(Fetcher.java:178)
> > >
> > > Hard-disk does have  enough space (over 20GB of which <2 GB is used)
> > >
> > > I am mostly using default hadoop and nutch settings (I tried changing
> > > number of fetch threads - default 35 to 50, and 100 - but it doesn't
> > > have any impact - Fetcher keeps on throwing the above exception after
> > > a while.
> > >
> > > Any thoughts?
> > >
> > > Thanks
> > > Jha.
> >
> >
> >
>