Re: Apache Nutch help request for a school project :)

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Re: Apache Nutch help request for a school project :)

lewis john mcgibbney-2
I’ll have a look today. You can always use the mailing list as well. Feel
free to post your questions there and we will help you out :)

On Sun, Jun 6, 2021 at 12:43 gokmen.yontem <[hidden email]>
wrote:

> Hi Lewis,
> Sorry to bother you. I've been trying to configure Apache Nutch for
> almost 10 days now and I'm about to give up. I saw that you are
> contributing to this project and I thought maybe you can help me.
> This is how desperate I am :)
>
> Here's my repo if you have time:
> https://github.com/gorkemyontem/nutch/blob/main/docker-compose.yml
> I'm trying to use docker images so there isn't much on the repo/
>
> This is my current error:
>
> nutch    | Indexer: java.lang.RuntimeException: Indexing job did not
> succeed, job status:FAILED, reason: NA
> nutch    |      at
> org.apache.nutch.indexer.IndexingJob.index(IndexingJob.java:150)
> nutch    |      at
> org.apache.nutch.indexer.IndexingJob.run(IndexingJob.java:291)
> nutch    |      at
> org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
> nutch    |      at
> org.apache.nutch.indexer.IndexingJob.main(IndexingJob.java:300)
>
>
> People say that schema.xml could be wrong, but I'm using the most up to
> date one from here
>
> https://github.com/apache/nutch/blob/master/src/plugin/indexer-solr/schema.xml
>
>
> Many many thanks!
> Best wishes,
> Gorkem
>
--
http://home.apache.org/~lewismc/
http://people.apache.org/keys/committer/lewismc
Reply | Threaded
Open this post in threaded view
|

Re: Apache Nutch help request for a school project :)

Sebastian Nagel-2
Hi Gorkem,

I haven't verified it by trying - but it may be that given your configuration
the Solr instance isn't reachable via
   http://localhost:8983/solr/nutch
Inside the Docker network, host names are the same as container names, that is
   http://solr:8983/solr/nutch
might work. Cf. the docker-compose networking documentation:
   https://docs.docker.com/compose/networking/

In your docker-compose.yaml there is:

services:
  solr:
    container_name: solr
    image: 'solr:8.5.2'
    ports:
      - '8983:8983'
    ...
  nutch:
   container_name: nutch
   ...
   command: '/root/nutch/bin/crawl -i -D solr.server.url=http://localhost:8983/solr/nutch -s urls crawl 1'

Please try to fix the URL not in the Sorl URL.

Important: you need to configure the Solr URL in the file conf/index-writers.xml unless you're using
Nutch 1.14 or below. See
   https://cwiki.apache.org/confluence/display/NUTCH/NutchTutorial#NutchTutorial-SetupSolrforsearch

In any case it's important to be able to read the logs (stdout/stderr and the hadoop.log)! I know this
isn't trivial when using docker-compose but it will save you a lot of time when searching for errors.
If you need help here, please let us know. Best start a separate thread in the Nutch user mailing list.

Best,
Sebastian

On 6/7/21 3:18 PM, lewis john mcgibbney wrote:

> I’ll have a look today. You can always use the mailing list as well. Feel
> free to post your questions there and we will help you out :)
>
> On Sun, Jun 6, 2021 at 12:43 gokmen.yontem <[hidden email]>
> wrote:
>
>> Hi Lewis,
>> Sorry to bother you. I've been trying to configure Apache Nutch for
>> almost 10 days now and I'm about to give up. I saw that you are
>> contributing to this project and I thought maybe you can help me.
>> This is how desperate I am :)
>>
>> Here's my repo if you have time:
>> https://github.com/gorkemyontem/nutch/blob/main/docker-compose.yml
>> I'm trying to use docker images so there isn't much on the repo/
>>
>> This is my current error:
>>
>> nutch    | Indexer: java.lang.RuntimeException: Indexing job did not
>> succeed, job status:FAILED, reason: NA
>> nutch    |      at
>> org.apache.nutch.indexer.IndexingJob.index(IndexingJob.java:150)
>> nutch    |      at
>> org.apache.nutch.indexer.IndexingJob.run(IndexingJob.java:291)
>> nutch    |      at
>> org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
>> nutch    |      at
>> org.apache.nutch.indexer.IndexingJob.main(IndexingJob.java:300)
>>
>>
>> People say that schema.xml could be wrong, but I'm using the most up to
>> date one from here
>>
>> https://github.com/apache/nutch/blob/master/src/plugin/indexer-solr/schema.xml
>>
>>
>> Many many thanks!
>> Best wishes,
>> Gorkem
>>

Reply | Threaded
Open this post in threaded view
|

Re: Apache Nutch help request for a school project :)

lewis john mcgibbney-2
In reply to this post by lewis john mcgibbney-2
Yep Sebastian is absolutely correct. I sent you a pull request.
https://github.com/gorkemyontem/nutch/pull/1
HTH
lewismc

On Mon, Jun 7, 2021 at 6:18 AM lewis john mcgibbney <[hidden email]>
wrote:

> I’ll have a look today. You can always use the mailing list as well. Feel
> free to post your questions there and we will help you out :)
>
> On Sun, Jun 6, 2021 at 12:43 gokmen.yontem <[hidden email]>
> wrote:
>
>> Hi Lewis,
>> Sorry to bother you. I've been trying to configure Apache Nutch for
>> almost 10 days now and I'm about to give up. I saw that you are
>> contributing to this project and I thought maybe you can help me.
>> This is how desperate I am :)
>>
>> Here's my repo if you have time:
>> https://github.com/gorkemyontem/nutch/blob/main/docker-compose.yml
>> I'm trying to use docker images so there isn't much on the repo/
>>
>> This is my current error:
>>
>> nutch    | Indexer: java.lang.RuntimeException: Indexing job did not
>> succeed, job status:FAILED, reason: NA
>> nutch    |      at
>> org.apache.nutch.indexer.IndexingJob.index(IndexingJob.java:150)
>> nutch    |      at
>> org.apache.nutch.indexer.IndexingJob.run(IndexingJob.java:291)
>> nutch    |      at
>> org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
>> nutch    |      at
>> org.apache.nutch.indexer.IndexingJob.main(IndexingJob.java:300)
>>
>>
>> People say that schema.xml could be wrong, but I'm using the most up to
>> date one from here
>>
>> https://github.com/apache/nutch/blob/master/src/plugin/indexer-solr/schema.xml
>>
>>
>> Many many thanks!
>> Best wishes,
>> Gorkem
>>
> --
> http://home.apache.org/~lewismc/
> http://people.apache.org/keys/committer/lewismc
>


--
http://home.apache.org/~lewismc/
http://people.apache.org/keys/committer/lewismc
Reply | Threaded
Open this post in threaded view
|

Re: Apache Nutch help request for a school project :)

lewis john mcgibbney-2
:)

On Thu, Jun 10, 2021 at 7:18 AM gokmen.yontem <[hidden email]>
wrote:

> Lewis, Sebastian
> I can’t thank you enough! Your help is much appreciated.
>
> Next time I'll follow your advice and use the mailing list, which I
> wasn't aware of that.
>
> Best wishes,
> Gorkem
>
>
> On 2021-06-07 20:08, lewis john mcgibbney wrote:
> > Yep Sebastian is absolutely correct. I sent you a pull request.
> >
> > https://github.com/gorkemyontem/nutch/pull/1
> > HTH
> > lewismc
> >
> > On Mon, Jun 7, 2021 at 6:18 AM lewis john mcgibbney
> > <[hidden email]> wrote:
> >
> >> I’ll have a look today. You can always use the mailing list as
> >> well. Feel free to post your questions there and we will help you
> >> out :)
> >>
> >> On Sun, Jun 6, 2021 at 12:43 gokmen.yontem
> >> <[hidden email]> wrote:
> >>
> >>> Hi Lewis,
> >>> Sorry to bother you. I've been trying to configure Apache Nutch
> >>> for
> >>> almost 10 days now and I'm about to give up. I saw that you are
> >>> contributing to this project and I thought maybe you can help me.
> >>> This is how desperate I am :)
> >>>
> >>> Here's my repo if you have time:
> >>> https://github.com/gorkemyontem/nutch/blob/main/docker-compose.yml
> >>> I'm trying to use docker images so there isn't much on the repo/
> >>>
> >>> This is my current error:
> >>>
> >>> nutch    | Indexer: java.lang.RuntimeException: Indexing job did
> >>> not
> >>> succeed, job status:FAILED, reason: NA
> >>> nutch    |      at
> >>> org.apache.nutch.indexer.IndexingJob.index(IndexingJob.java:150)
> >>> nutch    |      at
> >>> org.apache.nutch.indexer.IndexingJob.run(IndexingJob.java:291)
> >>> nutch    |      at
> >>> org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
> >>> nutch    |      at
> >>> org.apache.nutch.indexer.IndexingJob.main(IndexingJob.java:300)
> >>>
> >>> People say that schema.xml could be wrong, but I'm using the most
> >>> up to
> >>> date one from here
> >>>
> >>
> >
> https://github.com/apache/nutch/blob/master/src/plugin/indexer-solr/schema.xml
> >>>
> >>> Many many thanks!
> >>> Best wishes,
> >>> Gorkem
> >> --
> >>
> >> http://home.apache.org/~lewismc/
> >> http://people.apache.org/keys/committer/lewismc
> >
> > --
> >
> > http://home.apache.org/~lewismc/
> > http://people.apache.org/keys/committer/lewismc
>


--
http://home.apache.org/~lewismc/
http://people.apache.org/keys/committer/lewismc