[jira] Created: (NUTCH-552) Upgrade Nutch to Hadoop 0.14.x

classic Classic list List threaded Threaded
15 messages Options
Reply | Threaded
Open this post in threaded view
|

[jira] Created: (NUTCH-552) Upgrade Nutch to Hadoop 0.14.x

JIRA jira@apache.org
Upgrade Nutch to Hadoop 0.14.x
------------------------------

                 Key: NUTCH-552
                 URL: https://issues.apache.org/jira/browse/NUTCH-552
             Project: Nutch
          Issue Type: Improvement
    Affects Versions: 1.0.0
            Reporter: Andrzej Bialecki
            Assignee: Andrzej Bialecki
             Fix For: 1.0.0


Upgrade Nutch to Hadoop 0.14.x .

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Updated: (NUTCH-552) Upgrade Nutch to Hadoop 0.15.x

JIRA jira@apache.org

     [ https://issues.apache.org/jira/browse/NUTCH-552?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Dennis Kubes updated NUTCH-552:
-------------------------------

    Description: Upgrade Nutch to Hadoop 0.15.x .  (was: Upgrade Nutch to Hadoop 0.14.x .)
        Summary: Upgrade Nutch to Hadoop 0.15.x  (was: Upgrade Nutch to Hadoop 0.14.x)

This will included updating the hadoop jar in lib to the hadoop-0.15.x jar and updating the code to handle hadoop api changes.

> Upgrade Nutch to Hadoop 0.15.x
> ------------------------------
>
>                 Key: NUTCH-552
>                 URL: https://issues.apache.org/jira/browse/NUTCH-552
>             Project: Nutch
>          Issue Type: Improvement
>    Affects Versions: 1.0.0
>            Reporter: Andrzej Bialecki
>            Assignee: Andrzej Bialecki
>             Fix For: 1.0.0
>
>
> Upgrade Nutch to Hadoop 0.15.x .

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Updated: (NUTCH-552) Upgrade Nutch to Hadoop 0.15.x

JIRA jira@apache.org
In reply to this post by JIRA jira@apache.org

     [ https://issues.apache.org/jira/browse/NUTCH-552?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Dennis Kubes updated NUTCH-552:
-------------------------------

    Attachment: NUTCH-552-1.patch

Includes api changes for upgrading to Hadoop-0.15.  This does not include a new hadoop jar.

> Upgrade Nutch to Hadoop 0.15.x
> ------------------------------
>
>                 Key: NUTCH-552
>                 URL: https://issues.apache.org/jira/browse/NUTCH-552
>             Project: Nutch
>          Issue Type: Improvement
>    Affects Versions: 1.0.0
>            Reporter: Andrzej Bialecki
>            Assignee: Andrzej Bialecki
>             Fix For: 1.0.0
>
>         Attachments: NUTCH-552-1.patch
>
>
> Upgrade Nutch to Hadoop 0.15.x .

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (NUTCH-552) Upgrade Nutch to Hadoop 0.15.x

JIRA jira@apache.org
In reply to this post by JIRA jira@apache.org

    [ https://issues.apache.org/jira/browse/NUTCH-552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12538313 ]

Doğacan Güney commented on NUTCH-552:
-------------------------------------

I am wondering if the lock stuff removed from FsDirectory will be problematic for lucene indexes on DFS.

Also,

+    FileSystem fs = new JobClient(new NutchJob(getConf())).getFs();

Since we are already changing the code, we may as well do,

       FileSystem fs = FileSystem.get(getConf());

And finally, why did you add a stringifyException code to StringUtil? AFAIK, there already is one in hadoop.

> Upgrade Nutch to Hadoop 0.15.x
> ------------------------------
>
>                 Key: NUTCH-552
>                 URL: https://issues.apache.org/jira/browse/NUTCH-552
>             Project: Nutch
>          Issue Type: Improvement
>    Affects Versions: 1.0.0
>            Reporter: Andrzej Bialecki
>            Assignee: Andrzej Bialecki
>             Fix For: 1.0.0
>
>         Attachments: NUTCH-552-1.patch
>
>
> Upgrade Nutch to Hadoop 0.15.x .

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (NUTCH-552) Upgrade Nutch to Hadoop 0.15.x

JIRA jira@apache.org
In reply to this post by JIRA jira@apache.org

    [ https://issues.apache.org/jira/browse/NUTCH-552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12538511 ]

Dennis Kubes commented on NUTCH-552:
------------------------------------

I don't think it will be problematic for indexes, but we don't run our indexes on DFS, we run them on local file systems.  We haven't had a problem with this setup and have been running it for a while in production.  Either way the locking isn't supported anymore and needs to be removed.

I am ok with changing to FileSystem fs = FileSystem.get(getConf());

As for the stringify.  I was trying not to make it dependent upon the Hadoop utility codebase.  AFAIK there is nowhere else in nutch that references the hadoop util.  If we are ok with making it dependent than we can use the Hadoop version.  This is simply a straight pull from the hadoop version pasted into the nutch utils.

> Upgrade Nutch to Hadoop 0.15.x
> ------------------------------
>
>                 Key: NUTCH-552
>                 URL: https://issues.apache.org/jira/browse/NUTCH-552
>             Project: Nutch
>          Issue Type: Improvement
>    Affects Versions: 1.0.0
>            Reporter: Andrzej Bialecki
>            Assignee: Andrzej Bialecki
>             Fix For: 1.0.0
>
>         Attachments: NUTCH-552-1.patch
>
>
> Upgrade Nutch to Hadoop 0.15.x .

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (NUTCH-552) Upgrade Nutch to Hadoop 0.15.x

JIRA jira@apache.org
In reply to this post by JIRA jira@apache.org

    [ https://issues.apache.org/jira/browse/NUTCH-552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12538563 ]

Doğacan Güney commented on NUTCH-552:
-------------------------------------

> As for the stringify. I was trying not to make it dependent upon the Hadoop utility codebase. AFAIK there is
> nowhere else in nutch that references the hadoop util. If we are ok with making it dependent than we can use
> the Hadoop version. This is simply a straight pull from the hadoop version pasted into the nutch utils.

Actually, hadoop's stringifyException is used in current codebase (more than 20 usages). Anyway, this is a minor thing, so however you want to take this forward is fine with me.

> Upgrade Nutch to Hadoop 0.15.x
> ------------------------------
>
>                 Key: NUTCH-552
>                 URL: https://issues.apache.org/jira/browse/NUTCH-552
>             Project: Nutch
>          Issue Type: Improvement
>    Affects Versions: 1.0.0
>            Reporter: Andrzej Bialecki
>            Assignee: Andrzej Bialecki
>             Fix For: 1.0.0
>
>         Attachments: NUTCH-552-1.patch
>
>
> Upgrade Nutch to Hadoop 0.15.x .

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (NUTCH-552) Upgrade Nutch to Hadoop 0.15.x

JIRA jira@apache.org
In reply to this post by JIRA jira@apache.org

    [ https://issues.apache.org/jira/browse/NUTCH-552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12538875 ]

Dennis Kubes commented on NUTCH-552:
------------------------------------

Don't know why my search didn't pick it up, my search today did.  Well, I am moving to a linux based dev environment so there are bound to be a few hiccups :).  If stringify is used throughout the codebase I would rather be consistent as that was my intention before.  Looks like hadoop is close to their 0.15 release.  When they release I will make a build and attach the jar.  WIll attach updated patch shortly.

> Upgrade Nutch to Hadoop 0.15.x
> ------------------------------
>
>                 Key: NUTCH-552
>                 URL: https://issues.apache.org/jira/browse/NUTCH-552
>             Project: Nutch
>          Issue Type: Improvement
>    Affects Versions: 1.0.0
>            Reporter: Andrzej Bialecki
>            Assignee: Andrzej Bialecki
>             Fix For: 1.0.0
>
>         Attachments: NUTCH-552-1.patch
>
>
> Upgrade Nutch to Hadoop 0.15.x .

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Updated: (NUTCH-552) Upgrade Nutch to Hadoop 0.15.x

JIRA jira@apache.org
In reply to this post by JIRA jira@apache.org

     [ https://issues.apache.org/jira/browse/NUTCH-552?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Dennis Kubes updated NUTCH-552:
-------------------------------

    Attachment: NUTCH-552-2.patch

Requested changes.

> Upgrade Nutch to Hadoop 0.15.x
> ------------------------------
>
>                 Key: NUTCH-552
>                 URL: https://issues.apache.org/jira/browse/NUTCH-552
>             Project: Nutch
>          Issue Type: Improvement
>    Affects Versions: 1.0.0
>            Reporter: Andrzej Bialecki
>            Assignee: Andrzej Bialecki
>             Fix For: 1.0.0
>
>         Attachments: NUTCH-552-1.patch, NUTCH-552-2.patch
>
>
> Upgrade Nutch to Hadoop 0.15.x .

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (NUTCH-552) Upgrade Nutch to Hadoop 0.15.x

JIRA jira@apache.org
In reply to this post by JIRA jira@apache.org

    [ https://issues.apache.org/jira/browse/NUTCH-552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12538897 ]

Doğacan Güney commented on NUTCH-552:
-------------------------------------

Btw, I think we need to update native libraries we ship as well. I am not sure how we did it before (do we just copy them from hadoop or re-build them?)

> Upgrade Nutch to Hadoop 0.15.x
> ------------------------------
>
>                 Key: NUTCH-552
>                 URL: https://issues.apache.org/jira/browse/NUTCH-552
>             Project: Nutch
>          Issue Type: Improvement
>    Affects Versions: 1.0.0
>            Reporter: Andrzej Bialecki
>            Assignee: Andrzej Bialecki
>             Fix For: 1.0.0
>
>         Attachments: NUTCH-552-1.patch, NUTCH-552-2.patch
>
>
> Upgrade Nutch to Hadoop 0.15.x .

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (NUTCH-552) Upgrade Nutch to Hadoop 0.15.x

JIRA jira@apache.org
In reply to this post by JIRA jira@apache.org

    [ https://issues.apache.org/jira/browse/NUTCH-552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12539028 ]

Andrzej Bialecki  commented on NUTCH-552:
-----------------------------------------

We definitely need to do this, things would crash & burn otherwise. Since we are using official releases, last time I did an upgrade I simply copied the relevant files from the official release package.

> Upgrade Nutch to Hadoop 0.15.x
> ------------------------------
>
>                 Key: NUTCH-552
>                 URL: https://issues.apache.org/jira/browse/NUTCH-552
>             Project: Nutch
>          Issue Type: Improvement
>    Affects Versions: 1.0.0
>            Reporter: Andrzej Bialecki
>            Assignee: Andrzej Bialecki
>             Fix For: 1.0.0
>
>         Attachments: NUTCH-552-1.patch, NUTCH-552-2.patch
>
>
> Upgrade Nutch to Hadoop 0.15.x .

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Updated: (NUTCH-552) Upgrade Nutch to Hadoop 0.15.x

JIRA jira@apache.org
In reply to this post by JIRA jira@apache.org

     [ https://issues.apache.org/jira/browse/NUTCH-552?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Dennis Kubes updated NUTCH-552:
-------------------------------

    Attachment: NUTCH-552-3.patch

New patch.  Fixes problems with path handling changes in hadoop affecting local file handling for index searches, more generally search servers.

> Upgrade Nutch to Hadoop 0.15.x
> ------------------------------
>
>                 Key: NUTCH-552
>                 URL: https://issues.apache.org/jira/browse/NUTCH-552
>             Project: Nutch
>          Issue Type: Improvement
>    Affects Versions: 1.0.0
>            Reporter: Andrzej Bialecki
>            Assignee: Andrzej Bialecki
>             Fix For: 1.0.0
>
>         Attachments: NUTCH-552-1.patch, NUTCH-552-2.patch, NUTCH-552-3.patch
>
>
> Upgrade Nutch to Hadoop 0.15.x .

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Reopened: (NUTCH-552) Upgrade Nutch to Hadoop 0.15.x

JIRA jira@apache.org
In reply to this post by JIRA jira@apache.org

     [ https://issues.apache.org/jira/browse/NUTCH-552?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Dennis Kubes reopened NUTCH-552:
--------------------------------

      Assignee: Dennis Kubes  (was: Andrzej Bialecki )

There is an issue with changes made to IndexSearcher to fix path handling.  The changes work on windows but not on linux.  I am currently testing a patch.

> Upgrade Nutch to Hadoop 0.15.x
> ------------------------------
>
>                 Key: NUTCH-552
>                 URL: https://issues.apache.org/jira/browse/NUTCH-552
>             Project: Nutch
>          Issue Type: Improvement
>    Affects Versions: 1.0.0
>            Reporter: Andrzej Bialecki
>            Assignee: Dennis Kubes
>             Fix For: 1.0.0
>
>         Attachments: NUTCH-552-1.patch, NUTCH-552-2.patch, NUTCH-552-3.patch, NUTCH-552-4.patch
>
>
> Upgrade Nutch to Hadoop 0.15.x .

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Updated: (NUTCH-552) Upgrade Nutch to Hadoop 0.15.x

JIRA jira@apache.org
In reply to this post by JIRA jira@apache.org

     [ https://issues.apache.org/jira/browse/NUTCH-552?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Dennis Kubes updated NUTCH-552:
-------------------------------

    Attachment: NUTCH-552.1-1.patch

Only fixes for IndexSearcher.  Now users Path.makeQualified to handle correct path to index.

> Upgrade Nutch to Hadoop 0.15.x
> ------------------------------
>
>                 Key: NUTCH-552
>                 URL: https://issues.apache.org/jira/browse/NUTCH-552
>             Project: Nutch
>          Issue Type: Improvement
>    Affects Versions: 1.0.0
>            Reporter: Andrzej Bialecki
>            Assignee: Dennis Kubes
>             Fix For: 1.0.0
>
>         Attachments: NUTCH-552-1.patch, NUTCH-552-2.patch, NUTCH-552-3.patch, NUTCH-552-4.patch, NUTCH-552.1-1.patch
>
>
> Upgrade Nutch to Hadoop 0.15.x .

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Closed: (NUTCH-552) Upgrade Nutch to Hadoop 0.15.x

JIRA jira@apache.org
In reply to this post by JIRA jira@apache.org

     [ https://issues.apache.org/jira/browse/NUTCH-552?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Dennis Kubes closed NUTCH-552.
------------------------------

    Resolution: Fixed

This has now been fixed and comitted.

> Upgrade Nutch to Hadoop 0.15.x
> ------------------------------
>
>                 Key: NUTCH-552
>                 URL: https://issues.apache.org/jira/browse/NUTCH-552
>             Project: Nutch
>          Issue Type: Improvement
>    Affects Versions: 1.0.0
>            Reporter: Andrzej Bialecki
>            Assignee: Dennis Kubes
>             Fix For: 1.0.0
>
>         Attachments: NUTCH-552-1.patch, NUTCH-552-2.patch, NUTCH-552-3.patch, NUTCH-552-4.patch, NUTCH-552.1-1.patch
>
>
> Upgrade Nutch to Hadoop 0.15.x .

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (NUTCH-552) Upgrade Nutch to Hadoop 0.15.x

JIRA jira@apache.org
In reply to this post by JIRA jira@apache.org

    [ https://issues.apache.org/jira/browse/NUTCH-552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12543194 ]

Hudson commented on NUTCH-552:
------------------------------

Integrated in Nutch-Nightly #268 (See [http://lucene.zones.apache.org:8080/hudson/job/Nutch-Nightly/268/])

> Upgrade Nutch to Hadoop 0.15.x
> ------------------------------
>
>                 Key: NUTCH-552
>                 URL: https://issues.apache.org/jira/browse/NUTCH-552
>             Project: Nutch
>          Issue Type: Improvement
>    Affects Versions: 1.0.0
>            Reporter: Andrzej Bialecki
>            Assignee: Dennis Kubes
>             Fix For: 1.0.0
>
>         Attachments: NUTCH-552-1.patch, NUTCH-552-2.patch, NUTCH-552-3.patch, NUTCH-552-4.patch, NUTCH-552.1-1.patch
>
>
> Upgrade Nutch to Hadoop 0.15.x .

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.