run nutch from tomcat with ProcessBuilder

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

run nutch from tomcat with ProcessBuilder

DB Design
Hi,
i want to run nutch crawler command from tomcat web application with java
ProcessBuilder, when i run crawler command from terminal every thing is ok,
but when run with ProccessBuilder job fails with below error.
nutch version: 1.12
java version: 8
OS: ubuntu 16.04
tomcat version: 8
solr version: 6.2
thanks for your help.
java.lang.Exception: java.io.IOException: Mkdirs failed to create
file:/generate-temp-8dca91a6-3610-4802-a534-c0cdf85cde73/_temporary/0/_temporary/attempt_local1345668275_0001_r_000001_0/fetchlist-1
    at
org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462)
    at
org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:529)
Caused by: java.io.IOException: Mkdirs failed to create
file:/generate-temp-8dca91a6-3610-4802-a534-c0cdf85cde73/_temporary/0/_temporary/attempt_local1345668275_0001_r_000001_0/fetchlist-1
    at
org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:438)
    at
org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:424)
    at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:906)
    at
org.apache.hadoop.io.SequenceFile$Writer.<init>(SequenceFile.java:1071)
    at org.apache.hadoop.io.SequenceFile.createWriter(SequenceFile.java:270)
    at org.apache.hadoop.io.SequenceFile.createWriter(SequenceFile.java:527)
    at
org.apache.hadoop.mapred.SequenceFileOutputFormat.getRecordWriter(SequenceFileOutputFormat.java:63)
    at
org.apache.hadoop.mapred.lib.MultipleSequenceFileOutputFormat.getBaseRecordWriter(MultipleSequenceFileOutputFormat.java:51)
    at
org.apache.hadoop.mapred.lib.MultipleOutputFormat$1.write(MultipleOutputFormat.java:104)
    at
org.apache.hadoop.mapred.ReduceTask$OldTrackingRecordWriter.write(ReduceTask.java:493)
    at org.apache.hadoop.mapred.ReduceTask$3.collect(ReduceTask.java:422)
    at org.apache.nutch.crawl.Generator$Selector.reduce(Generator.java:342)
    at org.apache.nutch.crawl.Generator$Selector.reduce(Generator.java:110)
    at
org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:444)
    at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:392)
    at
org.apache.hadoop.mapred.LocalJobRunner$Job$ReduceTaskRunnable.run(LocalJobRunner.java:319)
    at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)
2017-08-22 20:58:28,674 INFO  mapreduce.Job - Job job_local1345668275_0001
running in uber mode : false
2017-08-22 20:58:28,674 INFO  mapreduce.Job -  map 100% reduce 50%
2017-08-22 20:58:28,675 INFO  mapreduce.Job - Job job_local1345668275_0001
failed with state FAILED due to: NA
2017-08-22 20:58:28,684 INFO  mapreduce.Job - Counters: 33
    File System Counters
        FILE: Number of bytes read=1601520
        FILE: Number of bytes written=2209492
        FILE: Number of read operations=0
        FILE: Number of large read operations=0
        FILE: Number of write operations=0
    Map-Reduce Framework
        Map input records=1
        Map output records=1
        Map output bytes=77
        Map output materialized bytes=88
        Input split bytes=142
        Combine input records=0
        Combine output records=0
        Reduce input groups=0
        Reduce shuffle bytes=88
        Reduce input records=0
        Reduce output records=0
        Spilled Records=1
        Shuffled Maps =2
        Failed Shuffles=0
        Merged Map outputs=2
        GC time elapsed (ms)=6
        CPU time spent (ms)=0
        Physical memory (bytes) snapshot=0
        Virtual memory (bytes) snapshot=0
        Total committed heap usage (bytes)=531628032
    Shuffle Errors
        BAD_ID=0
        CONNECTION=0
        IO_ERROR=0
        WRONG_LENGTH=0
        WRONG_MAP=0
        WRONG_REDUCE=0
    File Input Format Counters
        Bytes Read=154
    File Output Format Counters
        Bytes Written=0
2017-08-22 20:58:28,684 ERROR crawl.Generator - Generator:
java.io.IOException: Job failed!
    at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:836)
    at org.apache.nutch.crawl.Generator.generate(Generator.java:589)
    at org.apache.nutch.crawl.Generator.run(Generator.java:764)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
    at org.apache.nutch.crawl.Generator.main(Generator.java:717)
Reply | Threaded
Open this post in threaded view
|

RE: run nutch from tomcat with ProcessBuilder

Markus Jelsma-2
Well, the exception doesn't mention it, but i would guess it has something to do with permissions.

 
 
-----Original message-----

> From:DB Design <[hidden email]>
> Sent: Tuesday 22nd August 2017 19:33
> To: [hidden email]
> Subject: run nutch from tomcat with ProcessBuilder
>
> Hi,
> i want to run nutch crawler command from tomcat web application with java
> ProcessBuilder, when i run crawler command from terminal every thing is ok,
> but when run with ProccessBuilder job fails with below error.
> nutch version: 1.12
> java version: 8
> OS: ubuntu 16.04
> tomcat version: 8
> solr version: 6.2
> thanks for your help.
> java.lang.Exception: java.io.IOException: Mkdirs failed to create
> file:/generate-temp-8dca91a6-3610-4802-a534-c0cdf85cde73/_temporary/0/_temporary/attempt_local1345668275_0001_r_000001_0/fetchlist-1
>     at
> org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462)
>     at
> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:529)
> Caused by: java.io.IOException: Mkdirs failed to create
> file:/generate-temp-8dca91a6-3610-4802-a534-c0cdf85cde73/_temporary/0/_temporary/attempt_local1345668275_0001_r_000001_0/fetchlist-1
>     at
> org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:438)
>     at
> org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:424)
>     at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:906)
>     at
> org.apache.hadoop.io.SequenceFile$Writer.<init>(SequenceFile.java:1071)
>     at org.apache.hadoop.io.SequenceFile.createWriter(SequenceFile.java:270)
>     at org.apache.hadoop.io.SequenceFile.createWriter(SequenceFile.java:527)
>     at
> org.apache.hadoop.mapred.SequenceFileOutputFormat.getRecordWriter(SequenceFileOutputFormat.java:63)
>     at
> org.apache.hadoop.mapred.lib.MultipleSequenceFileOutputFormat.getBaseRecordWriter(MultipleSequenceFileOutputFormat.java:51)
>     at
> org.apache.hadoop.mapred.lib.MultipleOutputFormat$1.write(MultipleOutputFormat.java:104)
>     at
> org.apache.hadoop.mapred.ReduceTask$OldTrackingRecordWriter.write(ReduceTask.java:493)
>     at org.apache.hadoop.mapred.ReduceTask$3.collect(ReduceTask.java:422)
>     at org.apache.nutch.crawl.Generator$Selector.reduce(Generator.java:342)
>     at org.apache.nutch.crawl.Generator$Selector.reduce(Generator.java:110)
>     at
> org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:444)
>     at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:392)
>     at
> org.apache.hadoop.mapred.LocalJobRunner$Job$ReduceTaskRunnable.run(LocalJobRunner.java:319)
>     at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>     at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>     at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>     at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>     at java.lang.Thread.run(Thread.java:748)
> 2017-08-22 20:58:28,674 INFO  mapreduce.Job - Job job_local1345668275_0001
> running in uber mode : false
> 2017-08-22 20:58:28,674 INFO  mapreduce.Job -  map 100% reduce 50%
> 2017-08-22 20:58:28,675 INFO  mapreduce.Job - Job job_local1345668275_0001
> failed with state FAILED due to: NA
> 2017-08-22 20:58:28,684 INFO  mapreduce.Job - Counters: 33
>     File System Counters
>         FILE: Number of bytes read=1601520
>         FILE: Number of bytes written=2209492
>         FILE: Number of read operations=0
>         FILE: Number of large read operations=0
>         FILE: Number of write operations=0
>     Map-Reduce Framework
>         Map input records=1
>         Map output records=1
>         Map output bytes=77
>         Map output materialized bytes=88
>         Input split bytes=142
>         Combine input records=0
>         Combine output records=0
>         Reduce input groups=0
>         Reduce shuffle bytes=88
>         Reduce input records=0
>         Reduce output records=0
>         Spilled Records=1
>         Shuffled Maps =2
>         Failed Shuffles=0
>         Merged Map outputs=2
>         GC time elapsed (ms)=6
>         CPU time spent (ms)=0
>         Physical memory (bytes) snapshot=0
>         Virtual memory (bytes) snapshot=0
>         Total committed heap usage (bytes)=531628032
>     Shuffle Errors
>         BAD_ID=0
>         CONNECTION=0
>         IO_ERROR=0
>         WRONG_LENGTH=0
>         WRONG_MAP=0
>         WRONG_REDUCE=0
>     File Input Format Counters
>         Bytes Read=154
>     File Output Format Counters
>         Bytes Written=0
> 2017-08-22 20:58:28,684 ERROR crawl.Generator - Generator:
> java.io.IOException: Job failed!
>     at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:836)
>     at org.apache.nutch.crawl.Generator.generate(Generator.java:589)
>     at org.apache.nutch.crawl.Generator.run(Generator.java:764)
>     at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
>     at org.apache.nutch.crawl.Generator.main(Generator.java:717)
>
Reply | Threaded
Open this post in threaded view
|

Re: run nutch from tomcat with ProcessBuilder

DB Design
Hi,
Markus thanks for your attention.
finally problem solved!
when we run crawl script with ProcessBuilder, current directory will be /
(root) and nutch will try to create temp files (like generate temp files)
in root, so without sudoer user we got permission problem.
solution is to set working directory for ProcessBuilder to some where that
user has permission.
we set working directory to nutch bin folder and now every thing is ok!


On Tue, Aug 22, 2017 at 11:47 PM, Markus Jelsma <[hidden email]>
wrote:

> Well, the exception doesn't mention it, but i would guess it has something
> to do with permissions.
>
>
>
> -----Original message-----
> > From:DB Design <[hidden email]>
> > Sent: Tuesday 22nd August 2017 19:33
> > To: [hidden email]
> > Subject: run nutch from tomcat with ProcessBuilder
> >
> > Hi,
> > i want to run nutch crawler command from tomcat web application with java
> > ProcessBuilder, when i run crawler command from terminal every thing is
> ok,
> > but when run with ProccessBuilder job fails with below error.
> > nutch version: 1.12
> > java version: 8
> > OS: ubuntu 16.04
> > tomcat version: 8
> > solr version: 6.2
> > thanks for your help.
> > java.lang.Exception: java.io.IOException: Mkdirs failed to create
> > file:/generate-temp-8dca91a6-3610-4802-a534-c0cdf85cde73/_
> temporary/0/_temporary/attempt_local1345668275_0001_r_000001_0/fetchlist-1
> >     at
> > org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(
> LocalJobRunner.java:462)
> >     at
> > org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:529)
> > Caused by: java.io.IOException: Mkdirs failed to create
> > file:/generate-temp-8dca91a6-3610-4802-a534-c0cdf85cde73/_
> temporary/0/_temporary/attempt_local1345668275_0001_r_000001_0/fetchlist-1
> >     at
> > org.apache.hadoop.fs.ChecksumFileSystem.create(
> ChecksumFileSystem.java:438)
> >     at
> > org.apache.hadoop.fs.ChecksumFileSystem.create(
> ChecksumFileSystem.java:424)
> >     at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:906)
> >     at
> > org.apache.hadoop.io.SequenceFile$Writer.<init>(SequenceFile.java:1071)
> >     at org.apache.hadoop.io.SequenceFile.createWriter(
> SequenceFile.java:270)
> >     at org.apache.hadoop.io.SequenceFile.createWriter(
> SequenceFile.java:527)
> >     at
> > org.apache.hadoop.mapred.SequenceFileOutputFormat.getRecordWriter(
> SequenceFileOutputFormat.java:63)
> >     at
> > org.apache.hadoop.mapred.lib.MultipleSequenceFileOutputForm
> at.getBaseRecordWriter(MultipleSequenceFileOutputFormat.java:51)
> >     at
> > org.apache.hadoop.mapred.lib.MultipleOutputFormat$1.write(
> MultipleOutputFormat.java:104)
> >     at
> > org.apache.hadoop.mapred.ReduceTask$OldTrackingRecordWriter.write(
> ReduceTask.java:493)
> >     at org.apache.hadoop.mapred.ReduceTask$3.collect(
> ReduceTask.java:422)
> >     at org.apache.nutch.crawl.Generator$Selector.reduce(
> Generator.java:342)
> >     at org.apache.nutch.crawl.Generator$Selector.reduce(
> Generator.java:110)
> >     at
> > org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:444)
> >     at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:392)
> >     at
> > org.apache.hadoop.mapred.LocalJobRunner$Job$ReduceTaskRunnable.run(
> LocalJobRunner.java:319)
> >     at
> > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> >     at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> >     at
> > java.util.concurrent.ThreadPoolExecutor.runWorker(
> ThreadPoolExecutor.java:1149)
> >     at
> > java.util.concurrent.ThreadPoolExecutor$Worker.run(
> ThreadPoolExecutor.java:624)
> >     at java.lang.Thread.run(Thread.java:748)
> > 2017-08-22 20:58:28,674 INFO  mapreduce.Job - Job
> job_local1345668275_0001
> > running in uber mode : false
> > 2017-08-22 20:58:28,674 INFO  mapreduce.Job -  map 100% reduce 50%
> > 2017-08-22 20:58:28,675 INFO  mapreduce.Job - Job
> job_local1345668275_0001
> > failed with state FAILED due to: NA
> > 2017-08-22 20:58:28,684 INFO  mapreduce.Job - Counters: 33
> >     File System Counters
> >         FILE: Number of bytes read=1601520
> >         FILE: Number of bytes written=2209492
> >         FILE: Number of read operations=0
> >         FILE: Number of large read operations=0
> >         FILE: Number of write operations=0
> >     Map-Reduce Framework
> >         Map input records=1
> >         Map output records=1
> >         Map output bytes=77
> >         Map output materialized bytes=88
> >         Input split bytes=142
> >         Combine input records=0
> >         Combine output records=0
> >         Reduce input groups=0
> >         Reduce shuffle bytes=88
> >         Reduce input records=0
> >         Reduce output records=0
> >         Spilled Records=1
> >         Shuffled Maps =2
> >         Failed Shuffles=0
> >         Merged Map outputs=2
> >         GC time elapsed (ms)=6
> >         CPU time spent (ms)=0
> >         Physical memory (bytes) snapshot=0
> >         Virtual memory (bytes) snapshot=0
> >         Total committed heap usage (bytes)=531628032
> >     Shuffle Errors
> >         BAD_ID=0
> >         CONNECTION=0
> >         IO_ERROR=0
> >         WRONG_LENGTH=0
> >         WRONG_MAP=0
> >         WRONG_REDUCE=0
> >     File Input Format Counters
> >         Bytes Read=154
> >     File Output Format Counters
> >         Bytes Written=0
> > 2017-08-22 20:58:28,684 ERROR crawl.Generator - Generator:
> > java.io.IOException: Job failed!
> >     at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:836)
> >     at org.apache.nutch.crawl.Generator.generate(Generator.java:589)
> >     at org.apache.nutch.crawl.Generator.run(Generator.java:764)
> >     at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
> >     at org.apache.nutch.crawl.Generator.main(Generator.java:717)
> >
>