StringIndexOutOfBoundsException

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

StringIndexOutOfBoundsException

bupo.Jung
Hi,
I use "org.apache.nutch.searcher.NutchBean" to search some Chinese words.
It return the right result most of the time,but sometimes it only return the
total hits but no summarys.
And I found a StringIndexOutOfBoundsException in the hadoop.log as follow:

2010-12-13 19:43:54,277 ERROR searcher.NutchBean - Exception occured while
executing search: java.lang.RuntimeException:
java.util.concurrent.ExecutionException:
java.lang.StringIndexOutOfBoundsException: String index out of range: -2
java.lang.RuntimeException: java.util.concurrent.ExecutionException:
java.lang.StringIndexOutOfBoundsException: String index out of range: -2
at
org.apache.nutch.searcher.FetchedSegments.getSummary(FetchedSegments.java:297)
at org.apache.nutch.searcher.NutchBean.getSummary(NutchBean.java:350)
at org.apache.nutch.searcher.NutchBean.main(NutchBean.java:410)
Caused by: java.util.concurrent.ExecutionException:
java.lang.StringIndexOutOfBoundsException: String index out of range: -2
at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:222)
at java.util.concurrent.FutureTask.get(FutureTask.java:83)
at
org.apache.nutch.searcher.FetchedSegments.getSummary(FetchedSegments.java:292)
... 2 more
Caused by: java.lang.StringIndexOutOfBoundsException: String index out of
range: -2
at java.lang.String.substring(String.java:1937)
at
org.apache.nutch.summary.basic.BasicSummarizer.getSummary(BasicSummarizer.java:188)
at
org.apache.nutch.searcher.FetchedSegments.getSummary(FetchedSegments.java:263)
at
org.apache.nutch.searcher.FetchedSegments$SummaryTask.call(FetchedSegments.java:63)
at
org.apache.nutch.searcher.FetchedSegments$SummaryTask.call(FetchedSegments.java:53)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)

Is there any clue what cause this error?

from bupo.jung
Reply | Threaded
Open this post in threaded view
|

Re: StringIndexOutOfBoundsException

bupo.Jung
I hava found the problem,
The nutch is not initially support Chinese . In Chinese two token may be
overlap.
For example:"可爱的小女生" may be parse to “可爱”、“小女”、“女生”。
so,the two token "小女" and “女生” are overlap. And this overlap cause the
error at
org.apache.nutch.summary.basic.BasicSummarizer.getSummary(BasicSummarizer.java:188).

2010/12/13 Bupo Jung <[hidden email]>

> Hi,
> I use "org.apache.nutch.searcher.NutchBean" to search some Chinese words.
> It return the right result most of the time,but sometimes it only return
> the total hits but no summarys.
> And I found a StringIndexOutOfBoundsException in the hadoop.log as follow:
>
> 2010-12-13 19:43:54,277 ERROR searcher.NutchBean - Exception occured while
> executing search: java.lang.RuntimeException:
> java.util.concurrent.ExecutionException:
> java.lang.StringIndexOutOfBoundsException: String index out of range: -2
> java.lang.RuntimeException: java.util.concurrent.ExecutionException:
> java.lang.StringIndexOutOfBoundsException: String index out of range: -2
> at
> org.apache.nutch.searcher.FetchedSegments.getSummary(FetchedSegments.java:297)
>  at org.apache.nutch.searcher.NutchBean.getSummary(NutchBean.java:350)
> at org.apache.nutch.searcher.NutchBean.main(NutchBean.java:410)
> Caused by: java.util.concurrent.ExecutionException:
> java.lang.StringIndexOutOfBoundsException: String index out of range: -2
> at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:222)
>  at java.util.concurrent.FutureTask.get(FutureTask.java:83)
> at
> org.apache.nutch.searcher.FetchedSegments.getSummary(FetchedSegments.java:292)
>  ... 2 more
> Caused by: java.lang.StringIndexOutOfBoundsException: String index out of
> range: -2
> at java.lang.String.substring(String.java:1937)
>  at
> org.apache.nutch.summary.basic.BasicSummarizer.getSummary(BasicSummarizer.java:188)
> at
> org.apache.nutch.searcher.FetchedSegments.getSummary(FetchedSegments.java:263)
>  at
> org.apache.nutch.searcher.FetchedSegments$SummaryTask.call(FetchedSegments.java:63)
> at
> org.apache.nutch.searcher.FetchedSegments$SummaryTask.call(FetchedSegments.java:53)
>  at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
> at java.util.concurrent.FutureTask.run(FutureTask.java:138)
>  at
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>  at java.lang.Thread.run(Thread.java:662)
>
> Is there any clue what cause this error?
>
> from bupo.jung
>