bin/crawl not working

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

bin/crawl not working

Puneet Dhanda
Hi,

I am using Nutch-1.15. The following command does not execute, it
keeps complaining about it's Usage.

bin/crawl -i -D solr.server.url=http://localhost:8983/solr/nutch urls/
TestCrawl/  2

Usage: crawl [options] <crawl_dir> <num_rounds>


Please assist.
Reply | Threaded
Open this post in threaded view
|

RE: bin/crawl not working

Sadiki Latty
Hi Puneet,

To my recollection bin/crawl takes 3 arguments

Usage: crawl [-i|--index] [-D "key=value"] <Seed Dir> <Crawl Dir> <Num Rounds>

In addition, as of Nuth 1.14 the crawl script expects the path to the seed to be preceded by -s so your example would look like this

bin/crawl -i -D solr.server.url=http://localhost:8983/solr/nutch urls/ -s urls/ TestCrawl/  2

Where "urls" is the path to your seed urls

Reference: https://wiki.apache.org/nutch/bin/crawl

Hope this helps

-----Original Message-----
From: Puneet Dhanda [mailto:[hidden email]]
Sent: August-15-18 9:03 AM
To: [hidden email]
Subject: bin/crawl not working

Hi,

I am using Nutch-1.15. The following command does not execute, it keeps complaining about it's Usage.

bin/crawl -i -D solr.server.url=http://localhost:8983/solr/nutch urls/ TestCrawl/  2

Usage: crawl [options] <crawl_dir> <num_rounds>


Please assist.
Reply | Threaded
Open this post in threaded view
|

Re: bin/crawl not working

Sebastian Nagel-2
Hi,

please also note that the way the index writer plugins are configured has changed with 1.15,
see release notes and https://wiki.apache.org/nutch/bin/nutch%20index.

The Solr URL cannot be passed anymore via -Dsolr.server.url=...
I'll update the bin/crawl wiki page.

Thanks,
Sebastian

On 08/15/2018 03:24 PM, Sadiki Latty wrote:

> Hi Puneet,
>
> To my recollection bin/crawl takes 3 arguments
>
> Usage: crawl [-i|--index] [-D "key=value"] <Seed Dir> <Crawl Dir> <Num Rounds>
>
> In addition, as of Nuth 1.14 the crawl script expects the path to the seed to be preceded by -s so your example would look like this
>
> bin/crawl -i -D solr.server.url=http://localhost:8983/solr/nutch urls/ -s urls/ TestCrawl/  2
>
> Where "urls" is the path to your seed urls
>
> Reference: https://wiki.apache.org/nutch/bin/crawl
>
> Hope this helps
>
> -----Original Message-----
> From: Puneet Dhanda [mailto:[hidden email]]
> Sent: August-15-18 9:03 AM
> To: [hidden email]
> Subject: bin/crawl not working
>
> Hi,
>
> I am using Nutch-1.15. The following command does not execute, it keeps complaining about it's Usage.
>
> bin/crawl -i -D solr.server.url=http://localhost:8983/solr/nutch urls/ TestCrawl/  2
>
> Usage: crawl [options] <crawl_dir> <num_rounds>
>
>
> Please assist.
>