[jira] [Commented] (NUTCH-2429) Fix Plugin System to allow protocol plugins to bundle their URLStreamHandlers

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

[jira] [Commented] (NUTCH-2429) Fix Plugin System to allow protocol plugins to bundle their URLStreamHandlers

JIRA jira@apache.org

    [ https://issues.apache.org/jira/browse/NUTCH-2429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16193258#comment-16193258 ]

ASF GitHub Bot commented on NUTCH-2429:
---------------------------------------

lewismc commented on a change in pull request #222: NUTCH-2429 Fix Plugin System to allow protocol plugins to bundle their URLStreamHandlers
URL: https://github.com/apache/nutch/pull/222#discussion_r143002669
 
 

 ##########
 File path: src/java/org/apache/nutch/plugin/PluginRepository.java
 ##########
 @@ -521,4 +532,105 @@ public static void main(String[] args) throws Exception {
     System.arraycopy(args, 2, subargs, 0, subargs.length);
     m.invoke(null, new Object[] { subargs });
   }
+
+  /**
+   * Registers this PluginRepository to be invoked whenever URLs have to be
+   * parsed. This allows to check the registered protocol plugins for uncommon
+   * protocols.
+   */
+  private void registerURLStreamHandlerFactory() {
+    org.apache.nutch.plugin.URLStreamHandlerFactory.getInstance().registerPluginRepository(this);
+  }
+
+  /**
+   * Invoked whenever a java.net.URL needs to be instantiated. Tries to find a
+   * suitable extension and allow it to provide a URLStreamHandler. This is done
+   * by several attempts:
+   * <ul>
+   * <li>Find a protocol plugin that implements the desired protocol. If found,
+   * instantiate it so eventually the plugin can install a URLStreamHandler
+   * through a static hook.</li>
+   * <li>If the plugin specifies a URLStreamHandler in its <tt>plugin.xml</tt>,
+   * return an instance of this URLStreamHandler. Example:
+   *
+   * <pre>
+   *  ...
+   *  &lt;implementation id="org.apache.nutch.protocol.foo.Foo" class="my.foo.Foo"&gt;
+   *      &lt;parameter name="protocolName" value="foo"/&gt;
+   *      &lt;parameter name="urlStreamHandler" value="my.foo.Handler"/&gt;
+   *  &lt;/implementation&gt;
+   *  ...
+   * </pre>
+   *
+   * </li>
+   * <li>if all else fails, return null. This will fallback to the JVM's method
+   * of evaluating the system property <tt>java.protocol.handler.pkgs</tt>.</li>
+   * </ul>
+   *
+   * @return the URLStreamHandler found, or null.
+   * @see java.net.URL
+   */
+  public URLStreamHandler createURLStreamHandler(String protocol) {
+    LOG.debug("createURLStreamHandler(" + protocol + ")");
+
+    if (fExtensionPoints != null) {
+      ExtensionPoint ep = fExtensionPoints
+          .get("org.apache.nutch.protocol.Protocol");
+      if (ep != null) {
+        Extension[] extensions = ep.getExtensions();
+        for (Extension extension : extensions) {
+          String p = extension.getAttribute("protocolName");
+          LOG.trace("Found " + p);
+          if (p.equals(protocol)) {
+            LOG.debug("suitable " + p);
+
+            // instantiate the plugin. This allows it to execute a static hook,
+            // if present
+            // TODO: only do this if not done already
+            Object extinst = null;
+            try {
+              extinst = extension.getExtensionInstance();
+              LOG.debug("instantiated " + extinst.getClass().getName());
+            } catch (Exception e) {
+              LOG.warn("Could not instantiate " + extension.getId(), e);
+            }
+
+            // return the handler here, if possible
+            String handlerClass = extension.getAttribute("urlStreamHandler");
+            LOG.debug("urlStreamHandler=" + handlerClass);
+            if (handlerClass != null) {
+              // instantiate the handler and return it
+              ClassLoader cl = this.getClass().getClassLoader(); // the nutch
+                                                                 // classloader
+              LOG.trace("Using nutch classloader " + cl);
+              if (extinst != null) {
+                cl = extinst.getClass().getClassLoader(); // the extension's
+                                                          // classloader
+                LOG.trace("Using extension classloader " + cl);
+              }
+
+              try {
+                Class clazz = cl.loadClass(handlerClass);
+                return (URLStreamHandler) clazz.newInstance();
+              } catch (Exception e) {
+                LOG.error("Could not instantiate protocol " + protocol
 
 Review comment:
   Please use parameterized logging.
 
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


> Fix Plugin System to allow protocol plugins to bundle their URLStreamHandlers
> -----------------------------------------------------------------------------
>
>                 Key: NUTCH-2429
>                 URL: https://issues.apache.org/jira/browse/NUTCH-2429
>             Project: Nutch
>          Issue Type: Improvement
>          Components: commoncrawl
>    Affects Versions: 1.14
>         Environment: Tested on both Nutch 1.13 and 1.14 in Ubuntu Linux with OpenJDK 1.8.
>            Reporter: Hiran Chaudhuri
>             Fix For: 1.14
>
>
> While trying to use the protocol-smb plugin (which is not part of the Nutch distribution) I realized there are four steps to successfully make use of a protocol plugin:
> 1 - put the artifact into the plugins directory
> 2 - modify Nutch configuration files to allow smb:// urls plus include the plugin to the loaded list
> 3 - extract jcifs.jar and place it on the system classpath
> 4 - run nutch with the correct system property
> While steps 1 and 2 seem obvious, 3 and 4 require knowledge of plugin internals which does not feel right for nutch and plugin users. Even more, the jcifs.jar would exist twice on the classpath and could even cause further problems during runtime.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)