[jira] [Commented] (NUTCH-2429) Fix Plugin System to allow protocol plugins to bundle their URLStreamHandlers

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

[jira] [Commented] (NUTCH-2429) Fix Plugin System to allow protocol plugins to bundle their URLStreamHandlers

JIRA jira@apache.org

    [ https://issues.apache.org/jira/browse/NUTCH-2429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16193262#comment-16193262 ]

ASF GitHub Bot commented on NUTCH-2429:
---------------------------------------

lewismc commented on a change in pull request #222: NUTCH-2429 Fix Plugin System to allow protocol plugins to bundle their URLStreamHandlers
URL: https://github.com/apache/nutch/pull/222#discussion_r143003621
 
 

 ##########
 File path: src/java/org/apache/nutch/plugin/URLStreamHandlerFactory.java
 ##########
 @@ -0,0 +1,85 @@
+package org.apache.nutch.plugin;
+
+import java.lang.ref.WeakReference;
+import java.net.URL;
+import java.net.URLStreamHandler;
+import java.util.ArrayList;
+
+import org.mortbay.log.Log;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+public class URLStreamHandlerFactory
+    implements java.net.URLStreamHandlerFactory {
+  
+  protected static final Logger LOG = LoggerFactory
+      .getLogger(URLStreamHandlerFactory.class);
+  
+  /** The singleton instance. */
+  private static URLStreamHandlerFactory instance;
+  
+  /** Here we register all PluginRepositories. */
+  private ArrayList<WeakReference<PluginRepository>> prs;
+  
+  static {
+    instance = new URLStreamHandlerFactory();
+    URL.setURLStreamHandlerFactory(instance);
+    LOG.info("Registered URLStreamHandlerFactory with the JVM.");
+  }
+  
+  private URLStreamHandlerFactory() {
+    LOG.debug("URLStreamHandlerFactory()");
+    prs = new ArrayList<>();
+  }
+
+  /** Return the singleton instance of this class. */
+  public static URLStreamHandlerFactory getInstance() {
+    LOG.debug("getInstance()");
+    return instance;
+  }
+  
+  /** Use this method once a new PluginRepository was created to register it.
+   *
+   * @param pr The PluginRepository to be registered.
+   */
+  public void registerPluginRepository(PluginRepository pr) {
+    LOG.debug("registerPluginRepository(...)");
+    prs.add(new WeakReference<PluginRepository>(pr));
+    
+    removeInvalidRefs();
+  }
+
+  @Override
+  public URLStreamHandler createURLStreamHandler(String protocol) {
+    LOG.debug("createURLStreamHandler("+protocol+")");
+    
+    removeInvalidRefs();
+    
+    // find the 'correct' PluginRepository. For now we simply take the first.
+    // then ask it to return the URLStreamHandler
+
+    for(WeakReference<PluginRepository> ref: prs) {
+      PluginRepository pr = ref.get();
+      if(pr != null) {
+        // found PluginRepository. Let's get the URLStreamHandler...
+        return pr.createURLStreamHandler(protocol);
+      }
+    }
+    return null;
+  }
+
+  /** Maintains the list of PluginRepositories by
+   * removing the references whose referents have been
+   * garbage collected meanwhile.
+   */
+  private void removeInvalidRefs() {
+    LOG.debug("removeInvalidRefs()");
+    ArrayList<WeakReference<PluginRepository>> copy = new ArrayList<>(prs);
+    for(WeakReference<PluginRepository> ref: copy) {
+      if(ref.get() == null) {
+        prs.remove(ref);
+      }
+    }
+    LOG.debug("removed "+(copy.size()-prs.size())+" references, remaining "+prs.size());
 
 Review comment:
   Parameterized logging.
 
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


> Fix Plugin System to allow protocol plugins to bundle their URLStreamHandlers
> -----------------------------------------------------------------------------
>
>                 Key: NUTCH-2429
>                 URL: https://issues.apache.org/jira/browse/NUTCH-2429
>             Project: Nutch
>          Issue Type: Improvement
>          Components: commoncrawl
>    Affects Versions: 1.14
>         Environment: Tested on both Nutch 1.13 and 1.14 in Ubuntu Linux with OpenJDK 1.8.
>            Reporter: Hiran Chaudhuri
>             Fix For: 1.14
>
>
> While trying to use the protocol-smb plugin (which is not part of the Nutch distribution) I realized there are four steps to successfully make use of a protocol plugin:
> 1 - put the artifact into the plugins directory
> 2 - modify Nutch configuration files to allow smb:// urls plus include the plugin to the loaded list
> 3 - extract jcifs.jar and place it on the system classpath
> 4 - run nutch with the correct system property
> While steps 1 and 2 seem obvious, 3 and 4 require knowledge of plugin internals which does not feel right for nutch and plugin users. Even more, the jcifs.jar would exist twice on the classpath and could even cause further problems during runtime.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)