protocol-interactiveselenium Custom Handler

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

protocol-interactiveselenium Custom Handler

Craig Tataryn
Hello, I would like to create my own Custom Handler for
protocol-interactiveselenium.

In reading the code [1] I see that when setting the config:

<property>
  <name>interactiveselenium.handlers</name>
  <value>NewCustomHandler,DefaultHandler</value>
  <description></description>
</property>

the "NewCustomerHandler" would be loaded from the classpath assuming it was
called: org.apache.nutch.protocol.interactiveselenium.handlers.NewCustomerHandler.
However, my question is: how do I get Nutch to incorporate my new .jar file
containing the NewCustomerHandler?

I've written protocol and indexer plugins before, however this seems a bit
different. An example of a custom handler that someone has written would be
great.

Thanks,

Craig.

[1] -
https://github.com/apache/nutch/blob/ea862f45b83177b41aebad9c18b900936d43a19a/src/plugin/protocol-interactiveselenium/src/java/org/apache/nutch/protocol/interactiveselenium/HttpResponse.java#L364
Reply | Threaded
Open this post in threaded view
|

Re: protocol-interactiveselenium Custom Handler

Sebastian Nagel-2
Hi Craig,

in case, you're building Nutch from the git repo or from the source package
the easiest way is to put the file NewCustomHandler.java into
  src/plugin/protocol-interactiveselenium/src/java/.../handlers/
and run
  ant runtime
to compile and package Nutch including package your custom handler.

Using a jar isn't as simple, mostly because of the classpath encapsulation
of Nutch plugins.

1. add you jar as a dependency to
    src/plugin/protocol-interactiveselenium/ivy.xml

2. register the file name of the jar in
    src/plugin/protocol-interactiveselenium/plugin.xml
   as
    <library name="xyz.jar"/>

3. build Nutch, see above


Of course, ivy must be able to pick the jar from one of
the repositories listed in
  ivy/ivysettings.xml

But it's possible to add your local Maven repo/cache by adding:

  <property name="maven2.pattern.local"
    value="${user.home}/.m2/repository/[organisation]/[module]/[revision]/[module]-[revision](-[classifier]).[ext]"
    override="false" />
...
<resolvers>
  ...
  <filesystem name="maven2-local" m2compatible="true" >
    <artifact pattern="${maven2.pattern.local}"/>
    <ivy pattern="${maven2.pattern.local}"/>
  </filesystem>

> An example of a custom handler that someone has written would be great.

There are some handler implementations in
  src/plugin/protocol-interactiveselenium/src/java/.../handlers/
I've never made use of them, but they look "custom", at least,
at the first glance, because one file name includes a typo. :)
If you have time please open a Jira issue at
  https://issues.apache.org/jira/projects/NUTCH
to fix the naming.

Thanks,
Sebastian


On 6/25/20 1:18 AM, Craig Tataryn wrote:

> Hello, I would like to create my own Custom Handler for
> protocol-interactiveselenium.
>
> In reading the code [1] I see that when setting the config:
>
> <property>
>   <name>interactiveselenium.handlers</name>
>   <value>NewCustomHandler,DefaultHandler</value>
>   <description></description>
> </property>
>
> the "NewCustomerHandler" would be loaded from the classpath assuming it was
> called: org.apache.nutch.protocol.interactiveselenium.handlers.NewCustomerHandler.
> However, my question is: how do I get Nutch to incorporate my new .jar file
> containing the NewCustomerHandler?
>
> I've written protocol and indexer plugins before, however this seems a bit
> different. An example of a custom handler that someone has written would be
> great.
>
> Thanks,
>
> Craig.
>
> [1] -
> https://github.com/apache/nutch/blob/ea862f45b83177b41aebad9c18b900936d43a19a/src/plugin/protocol-interactiveselenium/src/java/org/apache/nutch/protocol/interactiveselenium/HttpResponse.java#L364
>