Quantcast

Need help installing scoring-depth plugin

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Need help installing scoring-depth plugin

Chip Calhoun
I'm upgrading from Nutch 1.4 to Nutch 1.12. I limit this crawl to my seeds, so my 1.4 command was:
bin/nutch crawl phfaws -dir crawl -depth 1 -topN 50000

My understanding is that the "crawl" command is deprecated, "-depth" went with it, and I need to install the scoring-depth plugin. I'm new to adding plugins. The instructions at https://wiki.apache.org/nutch/AboutPlugins give a sample command, but I don't know what the official PluginRepository for this plugin is and the sample link for the HtmlParser plugin is dead.

I'll appreciate any help. Thank you!

Chip Calhoun
Digital Archivist
Niels Bohr Library & Archives
American Institute of Physics
One Physics Ellipse
College Park, MD  20740
301-209-3180
https://www.aip.org/history-programs/niels-bohr-library

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Need help installing scoring-depth plugin

Julien Nioche-4
You don't need to install scoring-depth. It's just that the term 'depth' in
the old crawl class has been replaced by 'rounds', which is more accurate.

The equivalent of the command you used to call should be
*bin/crawl phfaws crawl **1 *

The value for topN needs setting in the crawl scrip, see sizeFetchlist in [
https://github.com/apache/nutch/blob/master/src/bin/crawl#L117]

HTH

Julien

On 31 January 2017 at 16:49, Chip Calhoun <[hidden email]> wrote:

> I'm upgrading from Nutch 1.4 to Nutch 1.12. I limit this crawl to my
> seeds, so my 1.4 command was:
> bin/nutch crawl phfaws -dir crawl -depth 1 -topN 50000
>
> My understanding is that the "crawl" command is deprecated, "-depth" went
> with it, and I need to install the scoring-depth plugin. I'm new to adding
> plugins. The instructions at https://wiki.apache.org/nutch/AboutPlugins
> give a sample command, but I don't know what the official PluginRepository
> for this plugin is and the sample link for the HtmlParser plugin is dead.
>
> I'll appreciate any help. Thank you!
>
> Chip Calhoun
> Digital Archivist
> Niels Bohr Library & Archives
> American Institute of Physics
> One Physics Ellipse
> College Park, MD  20740
> 301-209-3180
> https://www.aip.org/history-programs/niels-bohr-library
>
>


--

*Open Source Solutions for Text Engineering*

http://www.digitalpebble.com
http://digitalpebble.blogspot.com/
#digitalpebble <http://twitter.com/digitalpebble>
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

RE: Need help installing scoring-depth plugin

Chip Calhoun
Thank you Julien! That's exactly what I needed.

Chip

-----Original Message-----
From: Julien Nioche [mailto:[hidden email]]
Sent: Tuesday, January 31, 2017 1:09 PM
To: [hidden email]
Subject: Re: Need help installing scoring-depth plugin

You don't need to install scoring-depth. It's just that the term 'depth' in the old crawl class has been replaced by 'rounds', which is more accurate.

The equivalent of the command you used to call should be *bin/crawl phfaws crawl **1 *

The value for topN needs setting in the crawl scrip, see sizeFetchlist in [ https://github.com/apache/nutch/blob/master/src/bin/crawl#L117]

HTH

Julien

On 31 January 2017 at 16:49, Chip Calhoun <[hidden email]> wrote:

> I'm upgrading from Nutch 1.4 to Nutch 1.12. I limit this crawl to my
> seeds, so my 1.4 command was:
> bin/nutch crawl phfaws -dir crawl -depth 1 -topN 50000
>
> My understanding is that the "crawl" command is deprecated, "-depth"
> went with it, and I need to install the scoring-depth plugin. I'm new
> to adding plugins. The instructions at
> https://wiki.apache.org/nutch/AboutPlugins
> give a sample command, but I don't know what the official
> PluginRepository for this plugin is and the sample link for the HtmlParser plugin is dead.
>
> I'll appreciate any help. Thank you!
>
> Chip Calhoun
> Digital Archivist
> Niels Bohr Library & Archives
> American Institute of Physics
> One Physics Ellipse
> College Park, MD  20740
> 301-209-3180
> https://www.aip.org/history-programs/niels-bohr-library
>
>


--

*Open Source Solutions for Text Engineering*

http://www.digitalpebble.com
http://digitalpebble.blogspot.com/
#digitalpebble <http://twitter.com/digitalpebble>
Loading...