[jira] Created: (NUTCH-593) Nutch crawl problem

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

[jira] Created: (NUTCH-593) Nutch crawl problem

JIRA jira@apache.org
Nutch crawl problem
-------------------

                 Key: NUTCH-593
                 URL: https://issues.apache.org/jira/browse/NUTCH-593
             Project: Nutch
          Issue Type: Bug
    Affects Versions: 0.9.0
         Environment: java version : jdk-6u1-linux-amd64.bin, hadoop version : hadoop-0.12.0
            Reporter: sudarat
             Fix For: 0.9.0


i use nutch-0.9, hadoop-0.12.2 and i use this command "bin/nutch crawl
urls -dir crawled -depth 3" have error :

- crawl started in: crawled
- rootUrlDir = input
- threads = 10
- depth = 3
- Injector: starting
- Injector: crawlDb: crawled/crawldb
- Injector: urlDir: input
- Injector: Converting injected urls to crawl db entries.
- Total input paths to process : 1
- Running job: job_0001
- map 0% reduce 0%
- map 100% reduce 0%
- map 100% reduce 100%
- Job complete: job_0001
- Counters: 6
- Map-Reduce Framework
- Map input records=3
- Map output records=1
- Map input bytes=22
- Map output bytes=52
- Reduce input records=1
- Reduce output records=1
- Injector: Merging injected urls into crawl db.
- Total input paths to process : 2
- Running job: job_0002
- map 0% reduce 0%
- map 100% reduce 0%
- map 100% reduce 58%
- map 100% reduce 100%
- Job complete: job_0002
- Counters: 6
- Map-Reduce Framework
- Map input records=3
- Map output records=1
- Map input bytes=60
- Map output bytes=52
- Reduce input records=1
- Reduce output records=1
- Injector: done
- Generator: Selecting best-scoring urls due for fetch.
- Generator: starting
- Generator: segment: crawled/segments/25501213164325
- Generator: filtering: false
- Generator: topN: 2147483647
- Total input paths to process : 2
- Running job: job_0003
- map 0% reduce 0%
- map 100% reduce 0%
- map 100% reduce 100%
- Job complete: job_0003
- Counters: 6
- Map-Reduce Framework
- Map input records=3
- Map output records=1
- Map input bytes=59
- Map output bytes=77
- Reduce input records=1
- Reduce output records=1
- Generator: 0 records selected for fetching, exiting ...
- Stopping at depth=0 - no more URLs to fetch.
- No URLs to fetch - check your seed list and URL filters.
- crawl finished: crawled

but sometime i crawl some url it has error indexes time that

- Indexer: done
- Dedup: starting
- Dedup: adding indexes in: crawled/indexes
- Total input paths to process : 2
- Running job: job_0025
- map 0% reduce 0%
- Task Id : task_0025_m_000001_0, Status : FAILED
task_0025_m_000001_0: - Error running child
task_0025_m_000001_0: java.lang.ArrayIndexOutOfBoundsException: -1
task_0025_m_000001_0: at
org.apache.lucene.index.MultiReader.isDeleted(MultiReader.java:113)
task_0025_m_000001_0: at
org.apache.nutch.indexer.DeleteDuplicates$InputFormat$DDRecordReade
r.next(DeleteDuplicates.java:176)
task_0025_m_000001_0: at
org.apache.hadoop.mapred.MapTask$1.next(MapTask.java:157)
task_0025_m_000001_0: at
org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:46)
task_0025_m_000001_0: at org.apache.hadoop.mapred.MapTask.run
(MapTask.java:175)
task_0025_m_000001_0: at
org.apache.hadoop.mapred.TaskTracker$Child.main
(TaskTracker.java:1445)
- Task Id : task_0025_m_000000_0, Status : FAILED
task_0025_m_000000_0: - Error running child
task_0025_m_000000_0: java.lang.ArrayIndexOutOfBoundsException: -1
task_0025_m_000000_0: at
org.apache.lucene.index.MultiReader.isDeleted(MultiReader.java:113)
task_0025_m_000000_0: at
org.apache.nutch.indexer.DeleteDuplicates$InputFormat$DDRecordReade
r.next(DeleteDuplicates.java:176)
task_0025_m_000000_0: at
org.apache.hadoop.mapred.MapTask$1.next(MapTask.java:157)
task_0025_m_000000_0: at
org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:46)
task_0025_m_000000_0: at org.apache.hadoop.mapred.MapTask.run
(MapTask.java:175)
task_0025_m_000000_0: at
org.apache.hadoop.mapred.TaskTracker$Child.main
(TaskTracker.java:1445)
- Task Id : task_0025_m_000000_1, Status : FAILED
task_0025_m_000000_1: - Error running child
task_0025_m_000000_1: java.lang.ArrayIndexOutOfBoundsException: -1
task_0025_m_000000_1: at
org.apache.lucene.index.MultiReader.isDeleted(MultiReader.java:113)
task_0025_m_000000_1: at
org.apache.nutch.indexer.DeleteDuplicates$InputFormat$DDRecordReade
r.next(DeleteDuplicates.java:176)
task_0025_m_000000_1: at
org.apache.hadoop.mapred.MapTask$1.next(MapTask.java:157)
task_0025_m_000000_1: at
org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:46)
task_0025_m_000000_1: at org.apache.hadoop.mapred.MapTask.run
(MapTask.java:175)
task_0025_m_000000_1: at
org.apache.hadoop.mapred.TaskTracker$Child.main
(TaskTracker.java:1445)
- Task Id : task_0025_m_000001_1, Status : FAILED
task_0025_m_000001_1: - Error running child
task_0025_m_000001_1: java.lang.ArrayIndexOutOfBoundsException: -1
task_0025_m_000001_1: at
org.apache.lucene.index.MultiReader.isDeleted(MultiReader.java:113)
task_0025_m_000001_1: at
org.apache.nutch.indexer.DeleteDuplicates$InputFormat$DDRecordReade
r.next(DeleteDuplicates.java:176)
task_0025_m_000001_1: at
org.apache.hadoop.mapred.MapTask$1.next(MapTask.java:157)
task_0025_m_000001_1: at
org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:46)
task_0025_m_000001_1: at org.apache.hadoop.mapred.MapTask.run
(MapTask.java:175)
task_0025_m_000001_1: at
org.apache.hadoop.mapred.TaskTracker$Child.main
(TaskTracker.java:1445)
- Task Id : task_0025_m_000001_2, Status : FAILED
task_0025_m_000001_2: - Error running child
task_0025_m_000001_2: java.lang.ArrayIndexOutOfBoundsException: -1
task_0025_m_000001_2: at
org.apache.lucene.index.MultiReader.isDeleted(MultiReader.java:113)
task_0025_m_000001_2: at
org.apache.nutch.indexer.DeleteDuplicates$InputFormat$DDRecordReade
r.next(DeleteDuplicates.java:176)
task_0025_m_000001_2: at
org.apache.hadoop.mapred.MapTask$1.next(MapTask.java:157)
task_0025_m_000001_2: at
org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:46)
task_0025_m_000001_2: at org.apache.hadoop.mapred.MapTask.run
(MapTask.java:175)
task_0025_m_000001_2: at
org.apache.hadoop.mapred.TaskTracker$Child.main
(TaskTracker.java:1445)
- Task Id : task_0025_m_000000_2, Status : FAILED
task_0025_m_000000_2: - Error running child
task_0025_m_000000_2: java.lang.ArrayIndexOutOfBoundsException: -1
task_0025_m_000000_2: at
org.apache.lucene.index.MultiReader.isDeleted(MultiReader.java:113)
task_0025_m_000000_2: at
org.apache.nutch.indexer.DeleteDuplicates$InputFormat$DDRecordReade
r.next(DeleteDuplicates.java:176)
task_0025_m_000000_2: at
org.apache.hadoop.mapred.MapTask$1.next(MapTask.java:157)
task_0025_m_000000_2: at
org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:46)
task_0025_m_000000_2: at org.apache.hadoop.mapred.MapTask.run
(MapTask.java:175)
task_0025_m_000000_2: at
org.apache.hadoop.mapred.TaskTracker$Child.main
(TaskTracker.java:1445)
- map 100% reduce 100%
- Task Id : task_0025_m_000001_3, Status : FAILED
task_0025_m_000001_3: - Error running child
task_0025_m_000001_3: java.lang.ArrayIndexOutOfBoundsException: -1
task_0025_m_000001_3: at
org.apache.lucene.index.MultiReader.isDeleted(MultiReader.java:113)
task_0025_m_000001_3: at
org.apache.nutch.indexer.DeleteDuplicates$InputFormat$DDRecordReade
r.next(DeleteDuplicates.java:176)
task_0025_m_000001_3: at
org.apache.hadoop.mapred.MapTask$1.next(MapTask.java:157)
task_0025_m_000001_3: at
org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:46)
task_0025_m_000001_3: at org.apache.hadoop.mapred.MapTask.run
(MapTask.java:175)
task_0025_m_000001_3: at
org.apache.hadoop.mapred.TaskTracker$Child.main
(TaskTracker.java:1445)
- Task Id : task_0025_m_000000_3, Status : FAILED
task_0025_m_000000_3: - Error running child
task_0025_m_000000_3: java.lang.ArrayIndexOutOfBoundsException: -1
task_0025_m_000000_3: at
org.apache.lucene.index.MultiReader.isDeleted(MultiReader.java:113)
task_0025_m_000000_3: at
org.apache.nutch.indexer.DeleteDuplicates$InputFormat$DDRecordReade
r.next(DeleteDuplicates.java:176)
task_0025_m_000000_3: at
org.apache.hadoop.mapred.MapTask$1.next(MapTask.java:157)
task_0025_m_000000_3: at
org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:46)
task_0025_m_000000_3: at org.apache.hadoop.mapred.MapTask.run
(MapTask.java:175)
task_0025_m_000000_3: at
org.apache.hadoop.mapred.TaskTracker$Child.main
(TaskTracker.java:1445)
Exception in thread "main" java.io.IOException: Job failed!
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:604)
at org.apache.nutch.indexer.DeleteDuplicates.dedup
(DeleteDuplicates.java:439)
at org.apache.nutch.crawl.Crawl.main(Crawl.java:135)

how i solve it?


--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Closed: (NUTCH-593) Nutch crawl problem

JIRA jira@apache.org

     [ https://issues.apache.org/jira/browse/NUTCH-593?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Andrzej Bialecki  closed NUTCH-593.
-----------------------------------

       Resolution: Fixed
    Fix Version/s:     (was: 0.9.0)
                   1.0.0

> Nutch crawl problem
> -------------------
>
>                 Key: NUTCH-593
>                 URL: https://issues.apache.org/jira/browse/NUTCH-593
>             Project: Nutch
>          Issue Type: Bug
>    Affects Versions: 0.9.0
>         Environment: java version : jdk-6u1-linux-amd64.bin, hadoop version : hadoop-0.12.0
>            Reporter: sudarat
>             Fix For: 1.0.0
>
>
> i use nutch-0.9, hadoop-0.12.2 and i use this command "bin/nutch crawl
> urls -dir crawled -depth 3" have error :
> - crawl started in: crawled
> - rootUrlDir = input
> - threads = 10
> - depth = 3
> - Injector: starting
> - Injector: crawlDb: crawled/crawldb
> - Injector: urlDir: input
> - Injector: Converting injected urls to crawl db entries.
> - Total input paths to process : 1
> - Running job: job_0001
> - map 0% reduce 0%
> - map 100% reduce 0%
> - map 100% reduce 100%
> - Job complete: job_0001
> - Counters: 6
> - Map-Reduce Framework
> - Map input records=3
> - Map output records=1
> - Map input bytes=22
> - Map output bytes=52
> - Reduce input records=1
> - Reduce output records=1
> - Injector: Merging injected urls into crawl db.
> - Total input paths to process : 2
> - Running job: job_0002
> - map 0% reduce 0%
> - map 100% reduce 0%
> - map 100% reduce 58%
> - map 100% reduce 100%
> - Job complete: job_0002
> - Counters: 6
> - Map-Reduce Framework
> - Map input records=3
> - Map output records=1
> - Map input bytes=60
> - Map output bytes=52
> - Reduce input records=1
> - Reduce output records=1
> - Injector: done
> - Generator: Selecting best-scoring urls due for fetch.
> - Generator: starting
> - Generator: segment: crawled/segments/25501213164325
> - Generator: filtering: false
> - Generator: topN: 2147483647
> - Total input paths to process : 2
> - Running job: job_0003
> - map 0% reduce 0%
> - map 100% reduce 0%
> - map 100% reduce 100%
> - Job complete: job_0003
> - Counters: 6
> - Map-Reduce Framework
> - Map input records=3
> - Map output records=1
> - Map input bytes=59
> - Map output bytes=77
> - Reduce input records=1
> - Reduce output records=1
> - Generator: 0 records selected for fetching, exiting ...
> - Stopping at depth=0 - no more URLs to fetch.
> - No URLs to fetch - check your seed list and URL filters.
> - crawl finished: crawled
> but sometime i crawl some url it has error indexes time that
> - Indexer: done
> - Dedup: starting
> - Dedup: adding indexes in: crawled/indexes
> - Total input paths to process : 2
> - Running job: job_0025
> - map 0% reduce 0%
> - Task Id : task_0025_m_000001_0, Status : FAILED
> task_0025_m_000001_0: - Error running child
> task_0025_m_000001_0: java.lang.ArrayIndexOutOfBoundsException: -1
> task_0025_m_000001_0: at
> org.apache.lucene.index.MultiReader.isDeleted(MultiReader.java:113)
> task_0025_m_000001_0: at
> org.apache.nutch.indexer.DeleteDuplicates$InputFormat$DDRecordReade
> r.next(DeleteDuplicates.java:176)
> task_0025_m_000001_0: at
> org.apache.hadoop.mapred.MapTask$1.next(MapTask.java:157)
> task_0025_m_000001_0: at
> org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:46)
> task_0025_m_000001_0: at org.apache.hadoop.mapred.MapTask.run
> (MapTask.java:175)
> task_0025_m_000001_0: at
> org.apache.hadoop.mapred.TaskTracker$Child.main
> (TaskTracker.java:1445)
> - Task Id : task_0025_m_000000_0, Status : FAILED
> task_0025_m_000000_0: - Error running child
> task_0025_m_000000_0: java.lang.ArrayIndexOutOfBoundsException: -1
> task_0025_m_000000_0: at
> org.apache.lucene.index.MultiReader.isDeleted(MultiReader.java:113)
> task_0025_m_000000_0: at
> org.apache.nutch.indexer.DeleteDuplicates$InputFormat$DDRecordReade
> r.next(DeleteDuplicates.java:176)
> task_0025_m_000000_0: at
> org.apache.hadoop.mapred.MapTask$1.next(MapTask.java:157)
> task_0025_m_000000_0: at
> org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:46)
> task_0025_m_000000_0: at org.apache.hadoop.mapred.MapTask.run
> (MapTask.java:175)
> task_0025_m_000000_0: at
> org.apache.hadoop.mapred.TaskTracker$Child.main
> (TaskTracker.java:1445)
> - Task Id : task_0025_m_000000_1, Status : FAILED
> task_0025_m_000000_1: - Error running child
> task_0025_m_000000_1: java.lang.ArrayIndexOutOfBoundsException: -1
> task_0025_m_000000_1: at
> org.apache.lucene.index.MultiReader.isDeleted(MultiReader.java:113)
> task_0025_m_000000_1: at
> org.apache.nutch.indexer.DeleteDuplicates$InputFormat$DDRecordReade
> r.next(DeleteDuplicates.java:176)
> task_0025_m_000000_1: at
> org.apache.hadoop.mapred.MapTask$1.next(MapTask.java:157)
> task_0025_m_000000_1: at
> org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:46)
> task_0025_m_000000_1: at org.apache.hadoop.mapred.MapTask.run
> (MapTask.java:175)
> task_0025_m_000000_1: at
> org.apache.hadoop.mapred.TaskTracker$Child.main
> (TaskTracker.java:1445)
> - Task Id : task_0025_m_000001_1, Status : FAILED
> task_0025_m_000001_1: - Error running child
> task_0025_m_000001_1: java.lang.ArrayIndexOutOfBoundsException: -1
> task_0025_m_000001_1: at
> org.apache.lucene.index.MultiReader.isDeleted(MultiReader.java:113)
> task_0025_m_000001_1: at
> org.apache.nutch.indexer.DeleteDuplicates$InputFormat$DDRecordReade
> r.next(DeleteDuplicates.java:176)
> task_0025_m_000001_1: at
> org.apache.hadoop.mapred.MapTask$1.next(MapTask.java:157)
> task_0025_m_000001_1: at
> org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:46)
> task_0025_m_000001_1: at org.apache.hadoop.mapred.MapTask.run
> (MapTask.java:175)
> task_0025_m_000001_1: at
> org.apache.hadoop.mapred.TaskTracker$Child.main
> (TaskTracker.java:1445)
> - Task Id : task_0025_m_000001_2, Status : FAILED
> task_0025_m_000001_2: - Error running child
> task_0025_m_000001_2: java.lang.ArrayIndexOutOfBoundsException: -1
> task_0025_m_000001_2: at
> org.apache.lucene.index.MultiReader.isDeleted(MultiReader.java:113)
> task_0025_m_000001_2: at
> org.apache.nutch.indexer.DeleteDuplicates$InputFormat$DDRecordReade
> r.next(DeleteDuplicates.java:176)
> task_0025_m_000001_2: at
> org.apache.hadoop.mapred.MapTask$1.next(MapTask.java:157)
> task_0025_m_000001_2: at
> org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:46)
> task_0025_m_000001_2: at org.apache.hadoop.mapred.MapTask.run
> (MapTask.java:175)
> task_0025_m_000001_2: at
> org.apache.hadoop.mapred.TaskTracker$Child.main
> (TaskTracker.java:1445)
> - Task Id : task_0025_m_000000_2, Status : FAILED
> task_0025_m_000000_2: - Error running child
> task_0025_m_000000_2: java.lang.ArrayIndexOutOfBoundsException: -1
> task_0025_m_000000_2: at
> org.apache.lucene.index.MultiReader.isDeleted(MultiReader.java:113)
> task_0025_m_000000_2: at
> org.apache.nutch.indexer.DeleteDuplicates$InputFormat$DDRecordReade
> r.next(DeleteDuplicates.java:176)
> task_0025_m_000000_2: at
> org.apache.hadoop.mapred.MapTask$1.next(MapTask.java:157)
> task_0025_m_000000_2: at
> org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:46)
> task_0025_m_000000_2: at org.apache.hadoop.mapred.MapTask.run
> (MapTask.java:175)
> task_0025_m_000000_2: at
> org.apache.hadoop.mapred.TaskTracker$Child.main
> (TaskTracker.java:1445)
> - map 100% reduce 100%
> - Task Id : task_0025_m_000001_3, Status : FAILED
> task_0025_m_000001_3: - Error running child
> task_0025_m_000001_3: java.lang.ArrayIndexOutOfBoundsException: -1
> task_0025_m_000001_3: at
> org.apache.lucene.index.MultiReader.isDeleted(MultiReader.java:113)
> task_0025_m_000001_3: at
> org.apache.nutch.indexer.DeleteDuplicates$InputFormat$DDRecordReade
> r.next(DeleteDuplicates.java:176)
> task_0025_m_000001_3: at
> org.apache.hadoop.mapred.MapTask$1.next(MapTask.java:157)
> task_0025_m_000001_3: at
> org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:46)
> task_0025_m_000001_3: at org.apache.hadoop.mapred.MapTask.run
> (MapTask.java:175)
> task_0025_m_000001_3: at
> org.apache.hadoop.mapred.TaskTracker$Child.main
> (TaskTracker.java:1445)
> - Task Id : task_0025_m_000000_3, Status : FAILED
> task_0025_m_000000_3: - Error running child
> task_0025_m_000000_3: java.lang.ArrayIndexOutOfBoundsException: -1
> task_0025_m_000000_3: at
> org.apache.lucene.index.MultiReader.isDeleted(MultiReader.java:113)
> task_0025_m_000000_3: at
> org.apache.nutch.indexer.DeleteDuplicates$InputFormat$DDRecordReade
> r.next(DeleteDuplicates.java:176)
> task_0025_m_000000_3: at
> org.apache.hadoop.mapred.MapTask$1.next(MapTask.java:157)
> task_0025_m_000000_3: at
> org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:46)
> task_0025_m_000000_3: at org.apache.hadoop.mapred.MapTask.run
> (MapTask.java:175)
> task_0025_m_000000_3: at
> org.apache.hadoop.mapred.TaskTracker$Child.main
> (TaskTracker.java:1445)
> Exception in thread "main" java.io.IOException: Job failed!
> at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:604)
> at org.apache.nutch.indexer.DeleteDuplicates.dedup
> (DeleteDuplicates.java:439)
> at org.apache.nutch.crawl.Crawl.main(Crawl.java:135)
> how i solve it?

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (NUTCH-593) Nutch crawl problem

JIRA jira@apache.org
In reply to this post by JIRA jira@apache.org

    [ https://issues.apache.org/jira/browse/NUTCH-593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12566187#action_12566187 ]

Andrzej Bialecki  commented on NUTCH-593:
-----------------------------------------

This is fixed in the current code. This bug was caused by not detecting empty indexes.

> Nutch crawl problem
> -------------------
>
>                 Key: NUTCH-593
>                 URL: https://issues.apache.org/jira/browse/NUTCH-593
>             Project: Nutch
>          Issue Type: Bug
>    Affects Versions: 0.9.0
>         Environment: java version : jdk-6u1-linux-amd64.bin, hadoop version : hadoop-0.12.0
>            Reporter: sudarat
>             Fix For: 1.0.0
>
>
> i use nutch-0.9, hadoop-0.12.2 and i use this command "bin/nutch crawl
> urls -dir crawled -depth 3" have error :
> - crawl started in: crawled
> - rootUrlDir = input
> - threads = 10
> - depth = 3
> - Injector: starting
> - Injector: crawlDb: crawled/crawldb
> - Injector: urlDir: input
> - Injector: Converting injected urls to crawl db entries.
> - Total input paths to process : 1
> - Running job: job_0001
> - map 0% reduce 0%
> - map 100% reduce 0%
> - map 100% reduce 100%
> - Job complete: job_0001
> - Counters: 6
> - Map-Reduce Framework
> - Map input records=3
> - Map output records=1
> - Map input bytes=22
> - Map output bytes=52
> - Reduce input records=1
> - Reduce output records=1
> - Injector: Merging injected urls into crawl db.
> - Total input paths to process : 2
> - Running job: job_0002
> - map 0% reduce 0%
> - map 100% reduce 0%
> - map 100% reduce 58%
> - map 100% reduce 100%
> - Job complete: job_0002
> - Counters: 6
> - Map-Reduce Framework
> - Map input records=3
> - Map output records=1
> - Map input bytes=60
> - Map output bytes=52
> - Reduce input records=1
> - Reduce output records=1
> - Injector: done
> - Generator: Selecting best-scoring urls due for fetch.
> - Generator: starting
> - Generator: segment: crawled/segments/25501213164325
> - Generator: filtering: false
> - Generator: topN: 2147483647
> - Total input paths to process : 2
> - Running job: job_0003
> - map 0% reduce 0%
> - map 100% reduce 0%
> - map 100% reduce 100%
> - Job complete: job_0003
> - Counters: 6
> - Map-Reduce Framework
> - Map input records=3
> - Map output records=1
> - Map input bytes=59
> - Map output bytes=77
> - Reduce input records=1
> - Reduce output records=1
> - Generator: 0 records selected for fetching, exiting ...
> - Stopping at depth=0 - no more URLs to fetch.
> - No URLs to fetch - check your seed list and URL filters.
> - crawl finished: crawled
> but sometime i crawl some url it has error indexes time that
> - Indexer: done
> - Dedup: starting
> - Dedup: adding indexes in: crawled/indexes
> - Total input paths to process : 2
> - Running job: job_0025
> - map 0% reduce 0%
> - Task Id : task_0025_m_000001_0, Status : FAILED
> task_0025_m_000001_0: - Error running child
> task_0025_m_000001_0: java.lang.ArrayIndexOutOfBoundsException: -1
> task_0025_m_000001_0: at
> org.apache.lucene.index.MultiReader.isDeleted(MultiReader.java:113)
> task_0025_m_000001_0: at
> org.apache.nutch.indexer.DeleteDuplicates$InputFormat$DDRecordReade
> r.next(DeleteDuplicates.java:176)
> task_0025_m_000001_0: at
> org.apache.hadoop.mapred.MapTask$1.next(MapTask.java:157)
> task_0025_m_000001_0: at
> org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:46)
> task_0025_m_000001_0: at org.apache.hadoop.mapred.MapTask.run
> (MapTask.java:175)
> task_0025_m_000001_0: at
> org.apache.hadoop.mapred.TaskTracker$Child.main
> (TaskTracker.java:1445)
> - Task Id : task_0025_m_000000_0, Status : FAILED
> task_0025_m_000000_0: - Error running child
> task_0025_m_000000_0: java.lang.ArrayIndexOutOfBoundsException: -1
> task_0025_m_000000_0: at
> org.apache.lucene.index.MultiReader.isDeleted(MultiReader.java:113)
> task_0025_m_000000_0: at
> org.apache.nutch.indexer.DeleteDuplicates$InputFormat$DDRecordReade
> r.next(DeleteDuplicates.java:176)
> task_0025_m_000000_0: at
> org.apache.hadoop.mapred.MapTask$1.next(MapTask.java:157)
> task_0025_m_000000_0: at
> org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:46)
> task_0025_m_000000_0: at org.apache.hadoop.mapred.MapTask.run
> (MapTask.java:175)
> task_0025_m_000000_0: at
> org.apache.hadoop.mapred.TaskTracker$Child.main
> (TaskTracker.java:1445)
> - Task Id : task_0025_m_000000_1, Status : FAILED
> task_0025_m_000000_1: - Error running child
> task_0025_m_000000_1: java.lang.ArrayIndexOutOfBoundsException: -1
> task_0025_m_000000_1: at
> org.apache.lucene.index.MultiReader.isDeleted(MultiReader.java:113)
> task_0025_m_000000_1: at
> org.apache.nutch.indexer.DeleteDuplicates$InputFormat$DDRecordReade
> r.next(DeleteDuplicates.java:176)
> task_0025_m_000000_1: at
> org.apache.hadoop.mapred.MapTask$1.next(MapTask.java:157)
> task_0025_m_000000_1: at
> org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:46)
> task_0025_m_000000_1: at org.apache.hadoop.mapred.MapTask.run
> (MapTask.java:175)
> task_0025_m_000000_1: at
> org.apache.hadoop.mapred.TaskTracker$Child.main
> (TaskTracker.java:1445)
> - Task Id : task_0025_m_000001_1, Status : FAILED
> task_0025_m_000001_1: - Error running child
> task_0025_m_000001_1: java.lang.ArrayIndexOutOfBoundsException: -1
> task_0025_m_000001_1: at
> org.apache.lucene.index.MultiReader.isDeleted(MultiReader.java:113)
> task_0025_m_000001_1: at
> org.apache.nutch.indexer.DeleteDuplicates$InputFormat$DDRecordReade
> r.next(DeleteDuplicates.java:176)
> task_0025_m_000001_1: at
> org.apache.hadoop.mapred.MapTask$1.next(MapTask.java:157)
> task_0025_m_000001_1: at
> org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:46)
> task_0025_m_000001_1: at org.apache.hadoop.mapred.MapTask.run
> (MapTask.java:175)
> task_0025_m_000001_1: at
> org.apache.hadoop.mapred.TaskTracker$Child.main
> (TaskTracker.java:1445)
> - Task Id : task_0025_m_000001_2, Status : FAILED
> task_0025_m_000001_2: - Error running child
> task_0025_m_000001_2: java.lang.ArrayIndexOutOfBoundsException: -1
> task_0025_m_000001_2: at
> org.apache.lucene.index.MultiReader.isDeleted(MultiReader.java:113)
> task_0025_m_000001_2: at
> org.apache.nutch.indexer.DeleteDuplicates$InputFormat$DDRecordReade
> r.next(DeleteDuplicates.java:176)
> task_0025_m_000001_2: at
> org.apache.hadoop.mapred.MapTask$1.next(MapTask.java:157)
> task_0025_m_000001_2: at
> org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:46)
> task_0025_m_000001_2: at org.apache.hadoop.mapred.MapTask.run
> (MapTask.java:175)
> task_0025_m_000001_2: at
> org.apache.hadoop.mapred.TaskTracker$Child.main
> (TaskTracker.java:1445)
> - Task Id : task_0025_m_000000_2, Status : FAILED
> task_0025_m_000000_2: - Error running child
> task_0025_m_000000_2: java.lang.ArrayIndexOutOfBoundsException: -1
> task_0025_m_000000_2: at
> org.apache.lucene.index.MultiReader.isDeleted(MultiReader.java:113)
> task_0025_m_000000_2: at
> org.apache.nutch.indexer.DeleteDuplicates$InputFormat$DDRecordReade
> r.next(DeleteDuplicates.java:176)
> task_0025_m_000000_2: at
> org.apache.hadoop.mapred.MapTask$1.next(MapTask.java:157)
> task_0025_m_000000_2: at
> org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:46)
> task_0025_m_000000_2: at org.apache.hadoop.mapred.MapTask.run
> (MapTask.java:175)
> task_0025_m_000000_2: at
> org.apache.hadoop.mapred.TaskTracker$Child.main
> (TaskTracker.java:1445)
> - map 100% reduce 100%
> - Task Id : task_0025_m_000001_3, Status : FAILED
> task_0025_m_000001_3: - Error running child
> task_0025_m_000001_3: java.lang.ArrayIndexOutOfBoundsException: -1
> task_0025_m_000001_3: at
> org.apache.lucene.index.MultiReader.isDeleted(MultiReader.java:113)
> task_0025_m_000001_3: at
> org.apache.nutch.indexer.DeleteDuplicates$InputFormat$DDRecordReade
> r.next(DeleteDuplicates.java:176)
> task_0025_m_000001_3: at
> org.apache.hadoop.mapred.MapTask$1.next(MapTask.java:157)
> task_0025_m_000001_3: at
> org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:46)
> task_0025_m_000001_3: at org.apache.hadoop.mapred.MapTask.run
> (MapTask.java:175)
> task_0025_m_000001_3: at
> org.apache.hadoop.mapred.TaskTracker$Child.main
> (TaskTracker.java:1445)
> - Task Id : task_0025_m_000000_3, Status : FAILED
> task_0025_m_000000_3: - Error running child
> task_0025_m_000000_3: java.lang.ArrayIndexOutOfBoundsException: -1
> task_0025_m_000000_3: at
> org.apache.lucene.index.MultiReader.isDeleted(MultiReader.java:113)
> task_0025_m_000000_3: at
> org.apache.nutch.indexer.DeleteDuplicates$InputFormat$DDRecordReade
> r.next(DeleteDuplicates.java:176)
> task_0025_m_000000_3: at
> org.apache.hadoop.mapred.MapTask$1.next(MapTask.java:157)
> task_0025_m_000000_3: at
> org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:46)
> task_0025_m_000000_3: at org.apache.hadoop.mapred.MapTask.run
> (MapTask.java:175)
> task_0025_m_000000_3: at
> org.apache.hadoop.mapred.TaskTracker$Child.main
> (TaskTracker.java:1445)
> Exception in thread "main" java.io.IOException: Job failed!
> at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:604)
> at org.apache.nutch.indexer.DeleteDuplicates.dedup
> (DeleteDuplicates.java:439)
> at org.apache.nutch.crawl.Crawl.main(Crawl.java:135)
> how i solve it?

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.