"WritingPluginExample-0.8" by RicardoJMendez

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view

"WritingPluginExample-0.8" by RicardoJMendez


I'm interested in building a Nutch plugin.  I am having trouble
getting the example "recommended" plugin to work - I followed all of
the steps in http://wiki.apache.org/nutch/WritingPluginExample-0%2e9,
confirmed after I ran the top-level ant that
build/plugins/recommended contained the plugin.xml and jar file for
the 'recommended' plugin, and then tried crawling a single page from
a local webserver that contains the test content (with the
="recommended" meta tag) from the example.  Although the page got
crawled/indexed and I can search for it, I see no evidence of any
rank boosting on the "explain" search link, and when I look at
NUTCHDIR/logs/hadoop.log I don't see any indication that the
recommended filter got loaded by the crawl.

If anyone has suggestions I'd appreciate hearing them.

Also, a couple of things I notice that I didn't understand and/or
looked odd from the example wiki page:

1. In the section on "Getting Ant to Compile Your Plugin", it said to
add the line into NUTCHDIR/src/plugin/build.xml:
<ant dir="reccomended" target="deploy" />

There's an extra "c" in there (typo).  (I fixed my local copy before
I ran the crawl; telling you in case you want to update the wiki; I
don't want to edit it myself until I have actually gotten it working...)

2. In the section on "Getting Nutch to Use Your Plugin" it said to
add a regex to include the id of the plugin, using the example:

But the <description> just above this part says you need to at least
include the nutch-extensionpoints plugin (which is not present in
this line).  I notice from the wiki edit history you used to have the
nutch-extensionpoints plugin in there and removed it, so I'm not sure
which way it's supposed to be -- what's correct?

(I tried it both with and without the nutch-extensionpoints and
neither way worked for me.)

  - Mike Schwartz