[jira] Created: (NUTCH-147) nutch map reduce does not work in windows map reduce runs in a loop

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

[jira] Created: (NUTCH-147) nutch map reduce does not work in windows map reduce runs in a loop

JIRA jira@apache.org
nutch map reduce does not work in windows map reduce runs in a loop
-------------------------------------------------------------------

         Key: NUTCH-147
         URL: http://issues.apache.org/jira/browse/NUTCH-147
     Project: Nutch
        Type: Bug
  Components: indexer  
    Versions: 0.8-dev    
 Environment: Windows system Winxp Pro
    Reporter: raghavendra prabhu
    Priority: Blocker


Description

Crawl Starts
and i am able to see the initial messages

Then the map reduce process starts and it continues to run in a loop

I do not find the same problem in linux(linux it works perfectly)
Below is loop into which i run into

clustering.OnlineClusterer)
051222 182058   Nutch Indexing Filter (org.apache.nutch.indexer.IndexingFilter)
051222 182058   Nutch Content Parser (org.apache.nutch.parse.Parser)
051222 182058   Ontology Model Loader (org.apache.nutch.ontology.Ontology)
051222 182058   Nutch Analysis (org.apache.nutch.analysis.NutchAnalyzer)
051222 182058   Nutch Query Filter (org.apache.nutch.searcher.QueryFilter)
051222 182058 found resource crawl-urlfilter.txt at file:/G:/trunklatest/conf/cr
awl-urlfilter.txt
051222 182058 crawl\url.txt:0+25
051222 182059 crawl\url.txt:0+25
051222 182059  map -521216%
051222 182100 crawl\url.txt:0+25
051222 182100  map -1107496%
051222 182101 crawl\url.txt:0+25
051222 182101  map -1678544%
051222 182102 crawl\url.txt:0+25
051222 182102  map -2265900%
051222 182103 crawl\url.txt:0+25
051222 182103  map -2849416%
051222 182104 crawl\url.txt:0+25
051222 182104  map -3422908%
051222 182105 crawl\url.txt:0+25

The same thing continues

--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (NUTCH-147) nutch map reduce does not work in windows map reduce runs in a loop

JIRA jira@apache.org
    [ http://issues.apache.org/jira/browse/NUTCH-147?page=comments#action_12361198 ]

raghavendra prabhu commented on NUTCH-147:
------------------------------------------

Is this issue because you need cygwin to run the crawl on windows

The version 0.7.1 had no such dependencies.

Can anyone conform????

> nutch map reduce does not work in windows map reduce runs in a loop
> -------------------------------------------------------------------
>
>          Key: NUTCH-147
>          URL: http://issues.apache.org/jira/browse/NUTCH-147
>      Project: Nutch
>         Type: Bug
>   Components: indexer
>     Versions: 0.8-dev
>  Environment: Windows system Winxp Pro
>     Reporter: raghavendra prabhu
>     Priority: Blocker

>
> Description
> Crawl Starts
> and i am able to see the initial messages
> Then the map reduce process starts and it continues to run in a loop
> I do not find the same problem in linux(linux it works perfectly)
> Below is loop into which i run into
> clustering.OnlineClusterer)
> 051222 182058   Nutch Indexing Filter (org.apache.nutch.indexer.IndexingFilter)
> 051222 182058   Nutch Content Parser (org.apache.nutch.parse.Parser)
> 051222 182058   Ontology Model Loader (org.apache.nutch.ontology.Ontology)
> 051222 182058   Nutch Analysis (org.apache.nutch.analysis.NutchAnalyzer)
> 051222 182058   Nutch Query Filter (org.apache.nutch.searcher.QueryFilter)
> 051222 182058 found resource crawl-urlfilter.txt at file:/G:/trunklatest/conf/cr
> awl-urlfilter.txt
> 051222 182058 crawl\url.txt:0+25
> 051222 182059 crawl\url.txt:0+25
> 051222 182059  map -521216%
> 051222 182100 crawl\url.txt:0+25
> 051222 182100  map -1107496%
> 051222 182101 crawl\url.txt:0+25
> 051222 182101  map -1678544%
> 051222 182102 crawl\url.txt:0+25
> 051222 182102  map -2265900%
> 051222 182103 crawl\url.txt:0+25
> 051222 182103  map -2849416%
> 051222 182104 crawl\url.txt:0+25
> 051222 182104  map -3422908%
> 051222 182105 crawl\url.txt:0+25
> The same thing continues

--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira

Reply | Threaded
Open this post in threaded view
|

[jira] Closed: (NUTCH-147) nutch map reduce does not work in windows map reduce runs in a loop

JIRA jira@apache.org
In reply to this post by JIRA jira@apache.org
     [ http://issues.apache.org/jira/browse/NUTCH-147?page=all ]
     
Piotr Kosiorowski closed NUTCH-147:
-----------------------------------

    Resolution: Invalid

cygwin requirement on Windows  is listed in nutch tutorial. Please reopen if problems persists after using it from cygwin environment.

> nutch map reduce does not work in windows map reduce runs in a loop
> -------------------------------------------------------------------
>
>          Key: NUTCH-147
>          URL: http://issues.apache.org/jira/browse/NUTCH-147
>      Project: Nutch
>         Type: Bug
>   Components: indexer
>     Versions: 0.8-dev
>  Environment: Windows system Winxp Pro
>     Reporter: raghavendra prabhu
>     Priority: Blocker

>
> Description
> Crawl Starts
> and i am able to see the initial messages
> Then the map reduce process starts and it continues to run in a loop
> I do not find the same problem in linux(linux it works perfectly)
> Below is loop into which i run into
> clustering.OnlineClusterer)
> 051222 182058   Nutch Indexing Filter (org.apache.nutch.indexer.IndexingFilter)
> 051222 182058   Nutch Content Parser (org.apache.nutch.parse.Parser)
> 051222 182058   Ontology Model Loader (org.apache.nutch.ontology.Ontology)
> 051222 182058   Nutch Analysis (org.apache.nutch.analysis.NutchAnalyzer)
> 051222 182058   Nutch Query Filter (org.apache.nutch.searcher.QueryFilter)
> 051222 182058 found resource crawl-urlfilter.txt at file:/G:/trunklatest/conf/cr
> awl-urlfilter.txt
> 051222 182058 crawl\url.txt:0+25
> 051222 182059 crawl\url.txt:0+25
> 051222 182059  map -521216%
> 051222 182100 crawl\url.txt:0+25
> 051222 182100  map -1107496%
> 051222 182101 crawl\url.txt:0+25
> 051222 182101  map -1678544%
> 051222 182102 crawl\url.txt:0+25
> 051222 182102  map -2265900%
> 051222 182103 crawl\url.txt:0+25
> 051222 182103  map -2849416%
> 051222 182104 crawl\url.txt:0+25
> 051222 182104  map -3422908%
> 051222 182105 crawl\url.txt:0+25
> The same thing continues

--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira