[jira] Created: (NUTCH-113) Disable permanent DNS-to-IP caching for JVM 1.4

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

[jira] Created: (NUTCH-113) Disable permanent DNS-to-IP caching for JVM 1.4

JIRA jira@apache.org
Disable permanent DNS-to-IP caching for JVM 1.4
-----------------------------------------------

         Key: NUTCH-113
         URL: http://issues.apache.org/jira/browse/NUTCH-113
     Project: Nutch
        Type: Improvement
    Versions: 0.8-dev, 0.7.2-dev    
    Reporter: Fuad Efendi
    Priority: Trivial


DNS-to-IP mapping may change during long crawls, by default JVM 1.4 caches it forever.

Some related discussions at Jakarta-HttpClient-User
http://mail-archives.apache.org/mod_mbox/jakarta-httpclient-user/200506.mbox/%3c20050627022440.SVIL13442.lakermmtao05.cox.net@zeus%3e

http://java.sun.com/j2se/1.4.2/docs/guide/net/properties.html
   networkaddress.cache.ttl (default: -1)
   Specified in java.security to indicate the caching policy for successful name lookups from the name service.. The value is specified as as integer to indicate the number of seconds to cache the successful lookup.
   A value of -1 indicates "cache forever".


We probably need this code in org.apache.nutch.fetcher.Fetcher:

  private static final int FETCHER_DNS_TTL_MINUTES =
    NutchConf.get().getInt("fetcher.dns.ttl.minutes", 120);

  static {
    java.security.Security.setProperty("networkaddress.cache.ttl", "" + FETCHER_DNS_TTL_MINUTES*60);
  }


And, new property in nutch-default.xml:

<property>
  <name>fetcher.dns.ttl.minutes</name>
  <value>120</value>
  <description>DNS-to-IP cache, Time-to-Live</description>
</property>


--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira

Reply | Threaded
Open this post in threaded view
|

[jira] Updated: (NUTCH-113) Disable permanent DNS-to-IP caching for JVM 1.4

JIRA jira@apache.org
     [ http://issues.apache.org/jira/browse/NUTCH-113?page=all ]

Fuad Efendi updated NUTCH-113:
------------------------------

    Component: fetcher

> Disable permanent DNS-to-IP caching for JVM 1.4
> -----------------------------------------------
>
>          Key: NUTCH-113
>          URL: http://issues.apache.org/jira/browse/NUTCH-113
>      Project: Nutch
>         Type: Improvement
>   Components: fetcher
>     Versions: 0.8-dev, 0.7.2-dev
>     Reporter: Fuad Efendi
>     Priority: Trivial

>
> DNS-to-IP mapping may change during long crawls, by default JVM 1.4 caches it forever.
> Some related discussions at Jakarta-HttpClient-User
> http://mail-archives.apache.org/mod_mbox/jakarta-httpclient-user/200506.mbox/%3c20050627022440.SVIL13442.lakermmtao05.cox.net@zeus%3e
> http://java.sun.com/j2se/1.4.2/docs/guide/net/properties.html
>    networkaddress.cache.ttl (default: -1)
>    Specified in java.security to indicate the caching policy for successful name lookups from the name service.. The value is specified as as integer to indicate the number of seconds to cache the successful lookup.
>    A value of -1 indicates "cache forever".
> We probably need this code in org.apache.nutch.fetcher.Fetcher:
>   private static final int FETCHER_DNS_TTL_MINUTES =
>     NutchConf.get().getInt("fetcher.dns.ttl.minutes", 120);
>   static {
>     java.security.Security.setProperty("networkaddress.cache.ttl", "" + FETCHER_DNS_TTL_MINUTES*60);
>   }
> And, new property in nutch-default.xml:
> <property>
>   <name>fetcher.dns.ttl.minutes</name>
>   <value>120</value>
>   <description>DNS-to-IP cache, Time-to-Live</description>
> </property>

--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira

Reply | Threaded
Open this post in threaded view
|

why is segslice so slow?

em-13
segslice usually performs 200-300 records/sec on my machine (quite fast
for everything else, top of the line).
Is it just copying the segments minus the last part or some processing
is required for each record?

Any advise how can it be optimized?