Nutch with Hadoop 3.x version

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Nutch with Hadoop 3.x version

Gajalakshmi G
Hi all,



I am using Nutch 2.3.1 with gora-core.jar of the version 0.6(tried upto 0.9 version of gora jars). With this versions I am not able to successfully crawl a site with Hadoop 3.x version. Is the latest Nutch 2.4 version run on Hadoop 3.x versions? Do we need to do any specific changes to make the Nutch 2.4 to run on Hadoop 3.x ?



Thanks & Regards,

Gajalakshmi.G

Assistant Consultant

Tata Consultancy Services
Mailto: [hidden email]<https://mail.tcs.com/owa/redir.aspx?C=15cf4bf65eff4bdab465e0a2dd682f11&URL=mailto%3agajalakshmi.g%40tcs.com>
=====-----=====-----=====
Notice: The information contained in this e-mail
message and/or attachments to it may contain
confidential or privileged information. If you are
not the intended recipient, any dissemination, use,
review, distribution, printing or copying of the
information contained in this e-mail message
and/or attachments to it are strictly prohibited. If
you have received this communication in error,
please notify us by reply e-mail or telephone and
immediately and permanently delete the message
and any attachments. Thank you


Reply | Threaded
Open this post in threaded view
|

Re: Nutch with Hadoop 3.x version

Shashanka Balakuntala
Hi,

I would like to point out that the Nutch 2.x version is not under active
maintenance/development and has been retired. By saying that, you can
follow the below steps to upgrade the 2.x to run on Hadoop 3.x:

1. Navigate to the root directory of the Nutch 2.x (local cloned
directory). If you have a binary release, clone the repository using
https://github.com/apache/nutch/tree/2.x and navigate to the directory.
2. Open ivy/ivy.xml using the editor of you choice.
3. After the line 49, you have <!-- Hadoop Dependencies --> change the
hadoop dependencies to 3.x(version which you want). Just to mention here
the 1.x has been updated to 3.1.3 and works well there. If you need to
check the changes, you can refer to pull request to see the changes which
was made to port Hadoop to 3.1.3 in 1.x branch here
<https://github.com/apache/nutch/pull/507/files>
4. Make sure to change all the hadoop dependencies in the ivy.xml file.
5. Then go back to the root directory and build the project using "ant
runtime".
6. You can find the runnable nutch script in ./runtime/local/bin/nutch and
more information on building project from source here
<https://cwiki.apache.org/confluence/display/NUTCH/Nutch2Tutorial>

This should make the changes for Nutch to use Hadoop 3.x.

*Regards*
  Shashanka Balakuntala Srinivasa



On Tue, Jun 23, 2020 at 12:08 PM Gajalakshmi G
<[hidden email]> wrote:

> Hi all,
>
>
>
> I am using Nutch 2.3.1 with gora-core.jar of the version 0.6(tried upto
> 0.9 version of gora jars). With this versions I am not able to successfully
> crawl a site with Hadoop 3.x version. Is the latest Nutch 2.4 version run
> on Hadoop 3.x versions? Do we need to do any specific changes to make the
> Nutch 2.4 to run on Hadoop 3.x ?
>
>
>
> Thanks & Regards,
>
> Gajalakshmi.G
>
> Assistant Consultant
>
> Tata Consultancy Services
> Mailto: [hidden email]<
> https://mail.tcs.com/owa/redir.aspx?C=15cf4bf65eff4bdab465e0a2dd682f11&URL=mailto%3agajalakshmi.g%40tcs.com
> >
> =====-----=====-----=====
> Notice: The information contained in this e-mail
> message and/or attachments to it may contain
> confidential or privileged information. If you are
> not the intended recipient, any dissemination, use,
> review, distribution, printing or copying of the
> information contained in this e-mail message
> and/or attachments to it are strictly prohibited. If
> you have received this communication in error,
> please notify us by reply e-mail or telephone and
> immediately and permanently delete the message
> and any attachments. Thank you
>
>
>
Reply | Threaded
Open this post in threaded view
|

Re: Nutch with Hadoop 3.x version

Gajalakshmi G
Hi,

Thanks for the suggestion, I have already changed the Hadoop dependencies to 3.x version in my Nutch 2.3.1 ivy file. As I am using gora-core.jar (0.9 version) to store the crawl data , Which is having the comparability only with Hadoop 2.x version and makes my crawling is not getting completed. So any suggestions to make this Nutch 2.x version with gora-core.jar combination work in Hadoop 3.x version?


Thanks & Regards,

Gajalakshmi.G


________________________________
From: Shashanka Balakuntala <[hidden email]>
Sent: Tuesday, June 23, 2020 2:02 PM
To: [hidden email] <[hidden email]>
Subject: Re: Nutch with Hadoop 3.x version

"External email. Open with Caution"

Hi,

I would like to point out that the Nutch 2.x version is not under active
maintenance/development and has been retired. By saying that, you can
follow the below steps to upgrade the 2.x to run on Hadoop 3.x:

1. Navigate to the root directory of the Nutch 2.x (local cloned
directory). If you have a binary release, clone the repository using
https://github.com/apache/nutch/tree/2.x and navigate to the directory.
2. Open ivy/ivy.xml using the editor of you choice.
3. After the line 49, you have <!-- Hadoop Dependencies --> change the
hadoop dependencies to 3.x(version which you want). Just to mention here
the 1.x has been updated to 3.1.3 and works well there. If you need to
check the changes, you can refer to pull request to see the changes which
was made to port Hadoop to 3.1.3 in 1.x branch here
<https://github.com/apache/nutch/pull/507/files>
4. Make sure to change all the hadoop dependencies in the ivy.xml file.
5. Then go back to the root directory and build the project using "ant
runtime".
6. You can find the runnable nutch script in ./runtime/local/bin/nutch and
more information on building project from source here
<https://cwiki.apache.org/confluence/display/NUTCH/Nutch2Tutorial>

This should make the changes for Nutch to use Hadoop 3.x.

*Regards*
  Shashanka Balakuntala Srinivasa



On Tue, Jun 23, 2020 at 12:08 PM Gajalakshmi G
<[hidden email]> wrote:

> Hi all,
>
>
>
> I am using Nutch 2.3.1 with gora-core.jar of the version 0.6(tried upto
> 0.9 version of gora jars). With this versions I am not able to successfully
> crawl a site with Hadoop 3.x version. Is the latest Nutch 2.4 version run
> on Hadoop 3.x versions? Do we need to do any specific changes to make the
> Nutch 2.4 to run on Hadoop 3.x ?
>
>
>
> Thanks & Regards,
>
> Gajalakshmi.G
>
> Assistant Consultant
>
> Tata Consultancy Services
> Mailto: [hidden email]<
> https://mail.tcs.com/owa/redir.aspx?C=15cf4bf65eff4bdab465e0a2dd682f11&URL=mailto%3agajalakshmi.g%40tcs.com
> >
> =====-----=====-----=====
> Notice: The information contained in this e-mail
> message and/or attachments to it may contain
> confidential or privileged information. If you are
> not the intended recipient, any dissemination, use,
> review, distribution, printing or copying of the
> information contained in this e-mail message
> and/or attachments to it are strictly prohibited. If
> you have received this communication in error,
> please notify us by reply e-mail or telephone and
> immediately and permanently delete the message
> and any attachments. Thank you
>
>
>
Reply | Threaded
Open this post in threaded view
|

Re: Nutch with Hadoop 3.x version

Shashanka Balakuntala
Hi,
There is a gora issue[1] created to do this exact change. I will look
into it  and if its a minor fix will try and fix it as well. But do keep
following the issue for more updates.

[1] - https://issues.apache.org/jira/browse/GORA-537

On Tue, 23 Jun 2020, 16:39 Gajalakshmi G, <[hidden email]>
wrote:

> Hi,
>
> Thanks for the suggestion, I have already changed the Hadoop dependencies
> to 3.x version in my Nutch 2.3.1 ivy file. As I am using gora-core.jar (0.9
> version) to store the crawl data , Which is having the comparability only
> with Hadoop 2.x version and makes my crawling is not getting completed. So
> any suggestions to make this Nutch 2.x version with gora-core.jar
> combination work in Hadoop 3.x version?
>
>
> Thanks & Regards,
>
> Gajalakshmi.G
>
>
> ________________________________
> From: Shashanka Balakuntala <[hidden email]>
> Sent: Tuesday, June 23, 2020 2:02 PM
> To: [hidden email] <[hidden email]>
> Subject: Re: Nutch with Hadoop 3.x version
>
> "External email. Open with Caution"
>
> Hi,
>
> I would like to point out that the Nutch 2.x version is not under active
> maintenance/development and has been retired. By saying that, you can
> follow the below steps to upgrade the 2.x to run on Hadoop 3.x:
>
> 1. Navigate to the root directory of the Nutch 2.x (local cloned
> directory). If you have a binary release, clone the repository using
> https://github.com/apache/nutch/tree/2.x and navigate to the directory.
> 2. Open ivy/ivy.xml using the editor of you choice.
> 3. After the line 49, you have <!-- Hadoop Dependencies --> change the
> hadoop dependencies to 3.x(version which you want). Just to mention here
> the 1.x has been updated to 3.1.3 and works well there. If you need to
> check the changes, you can refer to pull request to see the changes which
> was made to port Hadoop to 3.1.3 in 1.x branch here
> <https://github.com/apache/nutch/pull/507/files>
> 4. Make sure to change all the hadoop dependencies in the ivy.xml file.
> 5. Then go back to the root directory and build the project using "ant
> runtime".
> 6. You can find the runnable nutch script in ./runtime/local/bin/nutch and
> more information on building project from source here
> <https://cwiki.apache.org/confluence/display/NUTCH/Nutch2Tutorial>
>
> This should make the changes for Nutch to use Hadoop 3.x.
>
> *Regards*
>   Shashanka Balakuntala Srinivasa
>
>
>
> On Tue, Jun 23, 2020 at 12:08 PM Gajalakshmi G
> <[hidden email]> wrote:
>
> > Hi all,
> >
> >
> >
> > I am using Nutch 2.3.1 with gora-core.jar of the version 0.6(tried upto
> > 0.9 version of gora jars). With this versions I am not able to
> successfully
> > crawl a site with Hadoop 3.x version. Is the latest Nutch 2.4 version run
> > on Hadoop 3.x versions? Do we need to do any specific changes to make the
> > Nutch 2.4 to run on Hadoop 3.x ?
> >
> >
> >
> > Thanks & Regards,
> >
> > Gajalakshmi.G
> >
> > Assistant Consultant
> >
> > Tata Consultancy Services
> > Mailto: [hidden email]<
> >
> https://mail.tcs.com/owa/redir.aspx?C=15cf4bf65eff4bdab465e0a2dd682f11&URL=mailto%3agajalakshmi.g%40tcs.com
> > >
> > =====-----=====-----=====
> > Notice: The information contained in this e-mail
> > message and/or attachments to it may contain
> > confidential or privileged information. If you are
> > not the intended recipient, any dissemination, use,
> > review, distribution, printing or copying of the
> > information contained in this e-mail message
> > and/or attachments to it are strictly prohibited. If
> > you have received this communication in error,
> > please notify us by reply e-mail or telephone and
> > immediately and permanently delete the message
> > and any attachments. Thank you
> >
> >
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: Nutch with Hadoop 3.x version

Gajalakshmi G
Hi all,

Is there any update on gora with hadoop 3.x versions?


Thanks & Regards,

Gajalakshmi.G

Assistant Consultant

Tata Consultancy Services
Mailto: [hidden email]<https://mail.tcs.com/owa/redir.aspx?C=15cf4bf65eff4bdab465e0a2dd682f11&URL=mailto%3agajalakshmi.g%40tcs.com>

________________________________
From: Shashanka Balakuntala <[hidden email]>
Sent: Tuesday, June 23, 2020 6:11 PM
To: [hidden email] <[hidden email]>
Subject: Re: Nutch with Hadoop 3.x version

"External email. Open with Caution"

Hi,
There is a gora issue[1] created to do this exact change. I will look
into it  and if its a minor fix will try and fix it as well. But do keep
following the issue for more updates.

[1] - https://issues.apache.org/jira/browse/GORA-537

On Tue, 23 Jun 2020, 16:39 Gajalakshmi G, <[hidden email]>
wrote:

> Hi,
>
> Thanks for the suggestion, I have already changed the Hadoop dependencies
> to 3.x version in my Nutch 2.3.1 ivy file. As I am using gora-core.jar (0.9
> version) to store the crawl data , Which is having the comparability only
> with Hadoop 2.x version and makes my crawling is not getting completed. So
> any suggestions to make this Nutch 2.x version with gora-core.jar
> combination work in Hadoop 3.x version?
>
>
> Thanks & Regards,
>
> Gajalakshmi.G
>
>
> ________________________________
> From: Shashanka Balakuntala <[hidden email]>
> Sent: Tuesday, June 23, 2020 2:02 PM
> To: [hidden email] <[hidden email]>
> Subject: Re: Nutch with Hadoop 3.x version
>
> "External email. Open with Caution"
>
> Hi,
>
> I would like to point out that the Nutch 2.x version is not under active
> maintenance/development and has been retired. By saying that, you can
> follow the below steps to upgrade the 2.x to run on Hadoop 3.x:
>
> 1. Navigate to the root directory of the Nutch 2.x (local cloned
> directory). If you have a binary release, clone the repository using
> https://github.com/apache/nutch/tree/2.x and navigate to the directory.
> 2. Open ivy/ivy.xml using the editor of you choice.
> 3. After the line 49, you have <!-- Hadoop Dependencies --> change the
> hadoop dependencies to 3.x(version which you want). Just to mention here
> the 1.x has been updated to 3.1.3 and works well there. If you need to
> check the changes, you can refer to pull request to see the changes which
> was made to port Hadoop to 3.1.3 in 1.x branch here
> <https://github.com/apache/nutch/pull/507/files>
> 4. Make sure to change all the hadoop dependencies in the ivy.xml file.
> 5. Then go back to the root directory and build the project using "ant
> runtime".
> 6. You can find the runnable nutch script in ./runtime/local/bin/nutch and
> more information on building project from source here
> <https://cwiki.apache.org/confluence/display/NUTCH/Nutch2Tutorial>
>
> This should make the changes for Nutch to use Hadoop 3.x.
>
> *Regards*
>   Shashanka Balakuntala Srinivasa
>
>
>
> On Tue, Jun 23, 2020 at 12:08 PM Gajalakshmi G
> <[hidden email]> wrote:
>
> > Hi all,
> >
> >
> >
> > I am using Nutch 2.3.1 with gora-core.jar of the version 0.6(tried upto
> > 0.9 version of gora jars). With this versions I am not able to
> successfully
> > crawl a site with Hadoop 3.x version. Is the latest Nutch 2.4 version run
> > on Hadoop 3.x versions? Do we need to do any specific changes to make the
> > Nutch 2.4 to run on Hadoop 3.x ?
> >
> >
> >
> > Thanks & Regards,
> >
> > Gajalakshmi.G
> >
> > Assistant Consultant
> >
> > Tata Consultancy Services
> > Mailto: [hidden email]<
> >
> https://mail.tcs.com/owa/redir.aspx?C=15cf4bf65eff4bdab465e0a2dd682f11&URL=mailto%3agajalakshmi.g%40tcs.com
> > >
> > =====-----=====-----=====
> > Notice: The information contained in this e-mail
> > message and/or attachments to it may contain
> > confidential or privileged information. If you are
> > not the intended recipient, any dissemination, use,
> > review, distribution, printing or copying of the
> > information contained in this e-mail message
> > and/or attachments to it are strictly prohibited. If
> > you have received this communication in error,
> > please notify us by reply e-mail or telephone and
> > immediately and permanently delete the message
> > and any attachments. Thank you
> >
> >
> >
>