[jira] [Work started] (NUTCH-2579) Fetcher to use parsed URL to call ProtocolFactory.getProtocol(url)

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

[jira] [Work started] (NUTCH-2579) Fetcher to use parsed URL to call ProtocolFactory.getProtocol(url)

JIRA jira@apache.org

     [ https://issues.apache.org/jira/browse/NUTCH-2579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Work on NUTCH-2579 started by Sebastian Nagel.
----------------------------------------------

> Fetcher to use parsed URL to call ProtocolFactory.getProtocol(url)
> ------------------------------------------------------------------
>
>                 Key: NUTCH-2579
>                 URL: https://issues.apache.org/jira/browse/NUTCH-2579
>             Project: Nutch
>          Issue Type: Improvement
>          Components: fetcher, protocol
>    Affects Versions: 1.14
>            Reporter: Sebastian Nagel
>            Assignee: Sebastian Nagel
>            Priority: Minor
>             Fix For: 1.15
>
>
> The call of ProtocolFactory.getProtocol(url) is synchronized and causes waits for the lock in a multi-threaded fetcher. It uses the URL string, although it would be more efficient to use the parsed URL hold in the FetchItem. The lock could be released faster. In addition, parsing the URL also causes a lock in the URL stream handler:
> {noformat}
> "FetcherThread" #37 daemon prio=5 os_prio=0 tid=0x00007f21edea2000 nid=0x5c20 waiting for monitor entry [0x00007f21bacb4000]
>    java.lang.Thread.State: BLOCKED (on object monitor)
>         at java.util.Hashtable.get(Hashtable.java:363)
>         - waiting to lock <0x00000005e01b5840> (a java.util.Hashtable)
>         at java.net.URL.getURLStreamHandler(URL.java:1135)
>         at java.net.URL.<init>(URL.java:599)
>         at java.net.URL.<init>(URL.java:490)
>         at java.net.URL.<init>(URL.java:439)
>         at org.apache.nutch.protocol.ProtocolFactory.getProtocol(ProtocolFactory.java:74)
>         - locked <0x00000005fc5f4fb8> (a org.apache.nutch.protocol.ProtocolFactory)
>         at org.apache.nutch.fetcher.FetcherThread.run(FetcherThread.java:299)
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)