[jira] Created: (NUTCH-189) Injection infinite loop

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

[jira] Created: (NUTCH-189) Injection infinite loop

Hudson (Jira)
Injection infinite loop
-----------------------

         Key: NUTCH-189
         URL: http://issues.apache.org/jira/browse/NUTCH-189
     Project: Nutch
        Type: Bug
 Environment: Linux
    Reporter: Andy Liu
    Priority: Minor


f you inject the crawldb with a url file that doesn't end with a line feed, an infinite loop is entered.

060104 160950 Running job: job_7uku5w
060104 160952  map 0%
060104 160954  map 50%
060104 160957  map -2631%
060104 160959  map -259756%
060104 161002  map -538552%
060104 161006  map -818413%
060104 161009  map -1098421%
060104 161011  map -1377851%
060104 161014  map -1657718%
060104 161018  map -1939534%
060104 161021  map -2218515%
060104 161023  map -2588212%
060104 161026  map -2868787%
060104 161030  map -3147637%


--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (NUTCH-189) Injection infinite loop

Hudson (Jira)
    [ http://issues.apache.org/jira/browse/NUTCH-189?page=comments#action_12364270 ]

Bryan Pendleton commented on NUTCH-189:
---------------------------------------

I think this is caused by a similar issue I've been running into in my code, though I'm not testing crawling, so I can't be sure.

I'll attach a patch that fixes my issue.... which I will report if this isn't the fix for both.

> Injection infinite loop
> -----------------------
>
>          Key: NUTCH-189
>          URL: http://issues.apache.org/jira/browse/NUTCH-189
>      Project: Nutch
>         Type: Bug
>  Environment: Linux
>     Reporter: Andy Liu
>     Priority: Minor

>
> f you inject the crawldb with a url file that doesn't end with a line feed, an infinite loop is entered.
> 060104 160950 Running job: job_7uku5w
> 060104 160952  map 0%
> 060104 160954  map 50%
> 060104 160957  map -2631%
> 060104 160959  map -259756%
> 060104 161002  map -538552%
> 060104 161006  map -818413%
> 060104 161009  map -1098421%
> 060104 161011  map -1377851%
> 060104 161014  map -1657718%
> 060104 161018  map -1939534%
> 060104 161021  map -2218515%
> 060104 161023  map -2588212%
> 060104 161026  map -2868787%
> 060104 161030  map -3147637%

--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira

Reply | Threaded
Open this post in threaded view
|

[jira] Updated: (NUTCH-189) Injection infinite loop

Hudson (Jira)
In reply to this post by Hudson (Jira)
     [ http://issues.apache.org/jira/browse/NUTCH-189?page=all ]

Bryan Pendleton updated NUTCH-189:
----------------------------------

    Attachment: textinputformat.patch.txt

> Injection infinite loop
> -----------------------
>
>          Key: NUTCH-189
>          URL: http://issues.apache.org/jira/browse/NUTCH-189
>      Project: Nutch
>         Type: Bug
>  Environment: Linux
>     Reporter: Andy Liu
>     Priority: Minor
>  Attachments: textinputformat.patch.txt
>
> f you inject the crawldb with a url file that doesn't end with a line feed, an infinite loop is entered.
> 060104 160950 Running job: job_7uku5w
> 060104 160952  map 0%
> 060104 160954  map 50%
> 060104 160957  map -2631%
> 060104 160959  map -259756%
> 060104 161002  map -538552%
> 060104 161006  map -818413%
> 060104 161009  map -1098421%
> 060104 161011  map -1377851%
> 060104 161014  map -1657718%
> 060104 161018  map -1939534%
> 060104 161021  map -2218515%
> 060104 161023  map -2588212%
> 060104 161026  map -2868787%
> 060104 161030  map -3147637%

--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira