Log Error Stack - Re: Nutch Fetch - HttpException : Connect Exception : Invalid Argument

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

Log Error Stack - Re: Nutch Fetch - HttpException : Connect Exception : Invalid Argument

Jon Shoberg

From: src/java/net/nutch/fetcher/Fetcher.java

   Any suggestions on where to look for logging of this stack, related
to the message below.  I have to missing something small here (perhaps
lack of coffee).  "LOG.info" by default displays to stdout.  Where
does/can "LOG.log" write to?

   private void logError(String url, FetchListEntry fle, Throwable t) {
       LOG.info("fetch of " + url + " failed with: " + t);
       LOG.log(Level.FINE, "stack", t);            // stack trace
       synchronized (Fetcher.this) {               // record failure
         errors++;
       }
     }


> When following the whole web crawling strategy outlined in the tutorial,
> the following error is occurring.  I'd say probably 50% of the output
> from the fetch is this error?  Has anyone else seen this?  There are a
> few thousand URLs loaded via nutch inject.  I can understand possibly
> getting a few errors but in hand checking the URLs for which this
> happens, they respond fine.
>
> I checked the URL file list and there are not extraneous characters.
>
> Error: (example.com is not the real URL)
>
>  050719 221355 fetch of http://example.com/ failed with:
> net.nutch.protocol.http.HttpException: java.net.ConnectException:
> Invalid argument
>
> The Script:
>
> #!/bin/bash
> rm -rf db
> rm -rf segments
> mkdir db
> mkdir segments
> bin/nutch admin db -create
> bin/nutch inject db -urlfile urls
> bin/nutch generate db segments
> s=`ls -d segments/2* | tail -1`
> echo Segment is $s
> bin/nutch fetch $s   <-- ERROR ERROR ERROR
> bin/nutch updatedb db $s
> bin/nutch analyze db 5
> bin/nutch index $s