Re: Unable to index on Hadoop 3.2.0 with 1.16

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Re: Unable to index on Hadoop 3.2.0 with 1.16

JoeGilvary
Hi,

I wasn't on the list when this discussion happened, so I hope this will thread correctly in archives. I linked to the archive below and tried to include enough here to ensure searchers can find it if this won't thread.

I was getting an error with Nutch 1.17.  I never used 1.16, but upgraded from 1.15 recently.

java.lang.Exception: java.lang.IllegalStateException: text width is less than 1, was <-26>
        at org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:492)
        at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:559)
Caused by: java.lang.IllegalStateException: text width is less than 1, was <-26>
        at org.apache.commons.lang3.Validate.validState(Validate.java:829)
        at de.vandermeer.skb.interfaces.transformers.textformat.Text_To_FormattedText.transform(Text_To_FormattedText.java:215)
        at de.vandermeer.asciitable.AT_Renderer.renderAsCollection(AT_Renderer.java:250)
        at de.vandermeer.asciitable.AT_Renderer.render(AT_Renderer.java:128)
        at de.vandermeer.asciitable.AsciiTable.render(AsciiTable.java:191)
        at org.apache.nutch.indexer.IndexWriters.describe(IndexWriters.java:326)
        at org.apache.nutch.indexer.IndexerOutputFormat.getRecordWriter(IndexerOutputFormat.java:45)
        at org.apache.hadoop.mapred.ReduceTask$NewTrackingRecordWriter.<init>(ReduceTask.java:542)
        at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:615)
        at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:390)
        at org.apache.hadoop.mapred.LocalJobRunner$Job$ReduceTaskRunnable.run(LocalJobRunner.java:347)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)

This looks like the error that Markus Jelsma described in the earlier discussion, though the invalid test width in my case was -26. I eliminated it when I updated the index-writers.xml for the solr_indexer_1 to use only a single URL. I don't know where the -26 comes from or the -41 Marcus was getting, but the fact that they were different values told me that the issue would be in the site-specific difference in our configs.

Adding the link in the archive were I found the earlier discussion:
http://mail-archives.apache.org/mod_mbox/nutch-user/201910.mbox/%3C05eda22b-14b2-309f-3bc7-d6d85c218235@...%3E

Adding the only potentially relevant Jira link I found while searching:
https://issues.apache.org/jira/browse/NUTCH-2602

It seems potentially relevant because Marcus started getting the error after migrating to 1.16 & I started getting it when I went from 1.15 to 1.17.

Thanks. Stay safe, stay healthy,

Joe
Reply | Threaded
Open this post in threaded view
|

Re: Unable to index on Hadoop 3.2.0 with 1.16

Sebastian Nagel-2
Hi Joe,

> I eliminated it when I updated the index-writers.xml for the solr_indexer_1
> to use only a single URL.

Thanks for the hint. I'm able to reproduce the error by adding an overlong URL to
  <param name="url" value="..."/>


Could you open an issue to fix this on
    https://issues.apache.org/jira/projects/NUTCH ?

Thanks!

Best,
Sebastian


On 8/12/20 5:35 PM, Gilvary, Joseph wrote:

> Hi,
>
> I wasn't on the list when this discussion happened, so I hope this will thread correctly in archives. I linked to the archive below and tried to include enough here to ensure searchers can find it if this won't thread.
>
> I was getting an error with Nutch 1.17.  I never used 1.16, but upgraded from 1.15 recently.
>
> java.lang.Exception: java.lang.IllegalStateException: text width is less than 1, was <-26>
>         at org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:492)
>         at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:559)
> Caused by: java.lang.IllegalStateException: text width is less than 1, was <-26>
>         at org.apache.commons.lang3.Validate.validState(Validate.java:829)
>         at de.vandermeer.skb.interfaces.transformers.textformat.Text_To_FormattedText.transform(Text_To_FormattedText.java:215)
>         at de.vandermeer.asciitable.AT_Renderer.renderAsCollection(AT_Renderer.java:250)
>         at de.vandermeer.asciitable.AT_Renderer.render(AT_Renderer.java:128)
>         at de.vandermeer.asciitable.AsciiTable.render(AsciiTable.java:191)
>         at org.apache.nutch.indexer.IndexWriters.describe(IndexWriters.java:326)
>         at org.apache.nutch.indexer.IndexerOutputFormat.getRecordWriter(IndexerOutputFormat.java:45)
>         at org.apache.hadoop.mapred.ReduceTask$NewTrackingRecordWriter.<init>(ReduceTask.java:542)
>         at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:615)
>         at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:390)
>         at org.apache.hadoop.mapred.LocalJobRunner$Job$ReduceTaskRunnable.run(LocalJobRunner.java:347)
>         at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>         at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>         at java.lang.Thread.run(Thread.java:748)
>
> This looks like the error that Markus Jelsma described in the earlier discussion, though the invalid test width in my case was -26. I eliminated it when I updated the index-writers.xml for the solr_indexer_1 to use only a single URL. I don't know where the -26 comes from or the -41 Marcus was getting, but the fact that they were different values told me that the issue would be in the site-specific difference in our configs.
>
> Adding the link in the archive were I found the earlier discussion:
> http://mail-archives.apache.org/mod_mbox/nutch-user/201910.mbox/%3C05eda22b-14b2-309f-3bc7-d6d85c218235@...%3E
>
> Adding the only potentially relevant Jira link I found while searching:
> https://issues.apache.org/jira/browse/NUTCH-2602
>
> It seems potentially relevant because Marcus started getting the error after migrating to 1.16 & I started getting it when I went from 1.15 to 1.17.
>
> Thanks. Stay safe, stay healthy,
>
> Joe
>

Reply | Threaded
Open this post in threaded view
|

Re: Unable to index on Hadoop 3.2.0 with 1.16

Adil Alpkocak-2
Hi,
Can someone tell how I can subscribe from this list.
Thanks,


*Doç.Dr. Adil Alpkoçak*

Dokuz Eylul University
Dept of Computer Engineering, 219
Tinaztepe, 35160 Izmir, Turkey

*Phone*: +90-232-301 7408
*Mobile*: +90-532-548 0632
*Skype*: adil.alpkocak

https://scholar.google.com/citations?user=DV5RqU8AAAAJ
https://orcid.org/0000-0001-7695-196X
http://www.researcherid.com/rid/F-3388-2013








Sebastian Nagel <[hidden email]>, 13 Ağu 2020 Per,
08:53 tarihinde şunu yazdı:

> Hi Joe,
>
> > I eliminated it when I updated the index-writers.xml for the
> solr_indexer_1
> > to use only a single URL.
>
> Thanks for the hint. I'm able to reproduce the error by adding an overlong
> URL to
>   <param name="url" value="..."/>
>
>
> Could you open an issue to fix this on
>     https://issues.apache.org/jira/projects/NUTCH ?
>
> Thanks!
>
> Best,
> Sebastian
>
>
> On 8/12/20 5:35 PM, Gilvary, Joseph wrote:
> > Hi,
> >
> > I wasn't on the list when this discussion happened, so I hope this will
> thread correctly in archives. I linked to the archive below and tried to
> include enough here to ensure searchers can find it if this won't thread.
> >
> > I was getting an error with Nutch 1.17.  I never used 1.16, but upgraded
> from 1.15 recently.
> >
> > java.lang.Exception: java.lang.IllegalStateException: text width is less
> than 1, was <-26>
> >         at
> org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:492)
> >         at
> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:559)
> > Caused by: java.lang.IllegalStateException: text width is less than 1,
> was <-26>
> >         at
> org.apache.commons.lang3.Validate.validState(Validate.java:829)
> >         at
> de.vandermeer.skb.interfaces.transformers.textformat.Text_To_FormattedText.transform(Text_To_FormattedText.java:215)
> >         at
> de.vandermeer.asciitable.AT_Renderer.renderAsCollection(AT_Renderer.java:250)
> >         at
> de.vandermeer.asciitable.AT_Renderer.render(AT_Renderer.java:128)
> >         at
> de.vandermeer.asciitable.AsciiTable.render(AsciiTable.java:191)
> >         at
> org.apache.nutch.indexer.IndexWriters.describe(IndexWriters.java:326)
> >         at
> org.apache.nutch.indexer.IndexerOutputFormat.getRecordWriter(IndexerOutputFormat.java:45)
> >         at
> org.apache.hadoop.mapred.ReduceTask$NewTrackingRecordWriter.<init>(ReduceTask.java:542)
> >         at
> org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:615)
> >         at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:390)
> >         at
> org.apache.hadoop.mapred.LocalJobRunner$Job$ReduceTaskRunnable.run(LocalJobRunner.java:347)
> >         at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> >         at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> >         at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> >         at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> >         at java.lang.Thread.run(Thread.java:748)
> >
> > This looks like the error that Markus Jelsma described in the earlier
> discussion, though the invalid test width in my case was -26. I eliminated
> it when I updated the index-writers.xml for the solr_indexer_1 to use only
> a single URL. I don't know where the -26 comes from or the -41 Marcus was
> getting, but the fact that they were different values told me that the
> issue would be in the site-specific difference in our configs.
> >
> > Adding the link in the archive were I found the earlier discussion:
> >
> http://mail-archives.apache.org/mod_mbox/nutch-user/201910.mbox/%3C05eda22b-14b2-309f-3bc7-d6d85c218235@...%3E
> >
> > Adding the only potentially relevant Jira link I found while searching:
> > https://issues.apache.org/jira/browse/NUTCH-2602
> >
> > It seems potentially relevant because Marcus started getting the error
> after migrating to 1.16 & I started getting it when I went from 1.15 to
> 1.17.
> >
> > Thanks. Stay safe, stay healthy,
> >
> > Joe
> >
>
>
Reply | Threaded
Open this post in threaded view
|

Re: Unable to index on Hadoop 3.2.0 with 1.16

Sebastian Nagel-2
Hi Adil,

please follow the instructions given on
  https://nutch.apache.org/mailing_lists.html

Best,
Sebastian

On 8/13/20 9:21 AM, Adil Alpkocak wrote:

> Hi,
> Can someone tell how I can subscribe from this list.
> Thanks,
>
>
> *Doç.Dr. Adil Alpkoçak*
>
> Dokuz Eylul University
> Dept of Computer Engineering, 219
> Tinaztepe, 35160 Izmir, Turkey
>
> *Phone*: +90-232-301 7408
> *Mobile*: +90-532-548 0632
> *Skype*: adil.alpkocak
>
> https://scholar.google.com/citations?user=DV5RqU8AAAAJ
> https://orcid.org/0000-0001-7695-196X
> http://www.researcherid.com/rid/F-3388-2013
>
>
>
>
>
>
>
>
> Sebastian Nagel <[hidden email]>, 13 Ağu 2020 Per,
> 08:53 tarihinde şunu yazdı:
>
>> Hi Joe,
>>
>>> I eliminated it when I updated the index-writers.xml for the
>> solr_indexer_1
>>> to use only a single URL.
>>
>> Thanks for the hint. I'm able to reproduce the error by adding an overlong
>> URL to
>>   <param name="url" value="..."/>
>>
>>
>> Could you open an issue to fix this on
>>     https://issues.apache.org/jira/projects/NUTCH ?
>>
>> Thanks!
>>
>> Best,
>> Sebastian
>>
>>
>> On 8/12/20 5:35 PM, Gilvary, Joseph wrote:
>>> Hi,
>>>
>>> I wasn't on the list when this discussion happened, so I hope this will
>> thread correctly in archives. I linked to the archive below and tried to
>> include enough here to ensure searchers can find it if this won't thread.
>>>
>>> I was getting an error with Nutch 1.17.  I never used 1.16, but upgraded
>> from 1.15 recently.
>>>
>>> java.lang.Exception: java.lang.IllegalStateException: text width is less
>> than 1, was <-26>
>>>         at
>> org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:492)
>>>         at
>> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:559)
>>> Caused by: java.lang.IllegalStateException: text width is less than 1,
>> was <-26>
>>>         at
>> org.apache.commons.lang3.Validate.validState(Validate.java:829)
>>>         at
>> de.vandermeer.skb.interfaces.transformers.textformat.Text_To_FormattedText.transform(Text_To_FormattedText.java:215)
>>>         at
>> de.vandermeer.asciitable.AT_Renderer.renderAsCollection(AT_Renderer.java:250)
>>>         at
>> de.vandermeer.asciitable.AT_Renderer.render(AT_Renderer.java:128)
>>>         at
>> de.vandermeer.asciitable.AsciiTable.render(AsciiTable.java:191)
>>>         at
>> org.apache.nutch.indexer.IndexWriters.describe(IndexWriters.java:326)
>>>         at
>> org.apache.nutch.indexer.IndexerOutputFormat.getRecordWriter(IndexerOutputFormat.java:45)
>>>         at
>> org.apache.hadoop.mapred.ReduceTask$NewTrackingRecordWriter.<init>(ReduceTask.java:542)
>>>         at
>> org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:615)
>>>         at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:390)
>>>         at
>> org.apache.hadoop.mapred.LocalJobRunner$Job$ReduceTaskRunnable.run(LocalJobRunner.java:347)
>>>         at
>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>>>         at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>>>         at
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>>>         at
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>>>         at java.lang.Thread.run(Thread.java:748)
>>>
>>> This looks like the error that Markus Jelsma described in the earlier
>> discussion, though the invalid test width in my case was -26. I eliminated
>> it when I updated the index-writers.xml for the solr_indexer_1 to use only
>> a single URL. I don't know where the -26 comes from or the -41 Marcus was
>> getting, but the fact that they were different values told me that the
>> issue would be in the site-specific difference in our configs.
>>>
>>> Adding the link in the archive were I found the earlier discussion:
>>>
>> http://mail-archives.apache.org/mod_mbox/nutch-user/201910.mbox/%3C05eda22b-14b2-309f-3bc7-d6d85c218235@...%3E
>>>
>>> Adding the only potentially relevant Jira link I found while searching:
>>> https://issues.apache.org/jira/browse/NUTCH-2602
>>>
>>> It seems potentially relevant because Marcus started getting the error
>> after migrating to 1.16 & I started getting it when I went from 1.15 to
>> 1.17.
>>>
>>> Thanks. Stay safe, stay healthy,
>>>
>>> Joe
>>>
>>
>>
>