Request for Review

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

Request for Review

lewis john mcgibbney-2
Hi user@ and dev@,

As part of the Nutch Google Summer of Code effort this year, Omkar Reddy and I have been working persistently throughout the summer months on the Hadoop MapReduce API upgrade e.g. NUTCH-2375 Upgrade the code base from org.apache.hadoop.mapred to org.apache.hadoop.mapreduce [0].
We believe we are now at a stage where this code is stable and should be opened for widespread community review. It is a large patch, so the more eyes we can get on this the better. Upgrading MapReduce API usage in Nutch is long overdue so this will be a significant addition to the Nutch project.

The proposed pull request can be found at [1]. Please report any outcomes back to the issue tracker at [1].

Thank you
Lewis

N.B. Please note that the official version of Apache Hadoop supported by Nutch master branch at this time is 2.7.2.

Reply | Threaded
Open this post in threaded view
|

Re: Request for Review

Sebastian Nagel
Hi,

thanks, Omkar for your work!

Just wanted to start testing, but looks like the pull request is lost.

Thanks,
Sebastian

On 09/06/2017 10:57 PM, lewis john mcgibbney wrote:

> Hi user@ and dev@,
>
> As part of the Nutch Google Summer of Code effort this year, Omkar Reddy and I have been working
> persistently throughout the summer months on the Hadoop MapReduce API upgrade e.g. NUTCH-2375
> Upgrade the code base from org.apache.hadoop.mapred to org.apache.hadoop.mapreduce [0].
> We believe we are now at a stage where this code is stable and should be opened for widespread
> community review. It is a large patch, so the more eyes we can get on this the better. Upgrading
> MapReduce API usage in Nutch is long overdue so this will be a significant addition to the Nutch
> project.
>
> The proposed pull request can be found at [1]. Please report any outcomes back to the issue tracker
> at [1].
>
> Thank you
> Lewis
>
> N.B. Please note that the official version of Apache Hadoop supported by Nutch master branch at this
> time is 2.7.2.
>
> [0] https://issues.apache.org/jira/projects/NUTCH/issues/NUTCH-2375
> [1] https://github.com/apache/nutch/pull/188
>
> --
> http://home.apache.org/~lewismc/
> @hectorMcSpector
> http://www.linkedin.com/in/lmcgibbney

Reply | Threaded
Open this post in threaded view
|

Re: Request for Review

Omkar Reddy-2

Hi Sebastian,

While squashing the pull request there was some mistake and the commits were deleted. I will send a new pull request and keep you posted in this thread.

Thanks,
~Omkar

> On 10-Sep-2017, at 11:01 PM, Sebastian Nagel <[hidden email]> wrote:
>
> Hi,
>
> thanks, Omkar for your work!
>
> Just wanted to start testing, but looks like the pull request is lost.
>
> Thanks,
> Sebastian
>
>> On 09/06/2017 10:57 PM, lewis john mcgibbney wrote:
>> Hi user@ and dev@,
>>
>> As part of the Nutch Google Summer of Code effort this year, Omkar Reddy and I have been working
>> persistently throughout the summer months on the Hadoop MapReduce API upgrade e.g. NUTCH-2375
>> Upgrade the code base from org.apache.hadoop.mapred to org.apache.hadoop.mapreduce [0].
>> We believe we are now at a stage where this code is stable and should be opened for widespread
>> community review. It is a large patch, so the more eyes we can get on this the better. Upgrading
>> MapReduce API usage in Nutch is long overdue so this will be a significant addition to the Nutch
>> project.
>>
>> The proposed pull request can be found at [1]. Please report any outcomes back to the issue tracker
>> at [1].
>>
>> Thank you
>> Lewis
>>
>> N.B. Please note that the official version of Apache Hadoop supported by Nutch master branch at this
>> time is 2.7.2.
>>
>> [0] https://issues.apache.org/jira/projects/NUTCH/issues/NUTCH-2375
>> [1] https://github.com/apache/nutch/pull/188
>>
>> --
>> http://home.apache.org/~lewismc/
>> @hectorMcSpector
>> http://www.linkedin.com/in/lmcgibbney
>
Reply | Threaded
Open this post in threaded view
|

Re: Request for Review

kenneth mcfarland
Nice work Omkar, thumbs up from a fellow student.

On Sep 10, 2017 10:37 AM, "Omkar Reddy" <[hidden email]> wrote:

Hi Sebastian,

While squashing the pull request there was some mistake and the commits were deleted. I will send a new pull request and keep you posted in this thread.

Thanks,
~Omkar

> On 10-Sep-2017, at 11:01 PM, Sebastian Nagel <[hidden email]> wrote:
>
> Hi,
>
> thanks, Omkar for your work!
>
> Just wanted to start testing, but looks like the pull request is lost.
>
> Thanks,
> Sebastian
>
>> On 09/06/2017 10:57 PM, lewis john mcgibbney wrote:
>> Hi user@ and dev@,
>>
>> As part of the Nutch Google Summer of Code effort this year, Omkar Reddy and I have been working
>> persistently throughout the summer months on the Hadoop MapReduce API upgrade e.g. NUTCH-2375
>> Upgrade the code base from org.apache.hadoop.mapred to org.apache.hadoop.mapreduce [0].
>> We believe we are now at a stage where this code is stable and should be opened for widespread
>> community review. It is a large patch, so the more eyes we can get on this the better. Upgrading
>> MapReduce API usage in Nutch is long overdue so this will be a significant addition to the Nutch
>> project.
>>
>> The proposed pull request can be found at [1]. Please report any outcomes back to the issue tracker
>> at [1].
>>
>> Thank you
>> Lewis
>>
>> N.B. Please note that the official version of Apache Hadoop supported by Nutch master branch at this
>> time is 2.7.2.
>>
>> [0] https://issues.apache.org/jira/projects/NUTCH/issues/NUTCH-2375
>> [1] https://github.com/apache/nutch/pull/188
>>
>> --
>> http://home.apache.org/~lewismc/
>> @hectorMcSpector
>> http://www.linkedin.com/in/lmcgibbney
>
Reply | Threaded
Open this post in threaded view
|

Re: Request for Review

Omkar Reddy-2
Hi,

Kenneth thank you for your appreciation. Please participate in the code review. As Lewis said the more eyes we get on this the better.

Sebastian please find the pull request here [0]. The code is stable with "ant clean runtime test" passing successfully. This is my first experience submitting a java patch at this scale. Please feel free to provide any suggestion. 

Everyone is welcome to test this code and review it.

Thanks,
Omkar


On 11 September 2017 at 00:03, kenneth mcfarland <[hidden email]> wrote:
Nice work Omkar, thumbs up from a fellow student.

On Sep 10, 2017 10:37 AM, "Omkar Reddy" <[hidden email]> wrote:

Hi Sebastian,

While squashing the pull request there was some mistake and the commits were deleted. I will send a new pull request and keep you posted in this thread.

Thanks,
~Omkar

> On 10-Sep-2017, at 11:01 PM, Sebastian Nagel <[hidden email]> wrote:
>
> Hi,
>
> thanks, Omkar for your work!
>
> Just wanted to start testing, but looks like the pull request is lost.
>
> Thanks,
> Sebastian
>
>> On 09/06/2017 10:57 PM, lewis john mcgibbney wrote:
>> Hi user@ and dev@,
>>
>> As part of the Nutch Google Summer of Code effort this year, Omkar Reddy and I have been working
>> persistently throughout the summer months on the Hadoop MapReduce API upgrade e.g. NUTCH-2375
>> Upgrade the code base from org.apache.hadoop.mapred to org.apache.hadoop.mapreduce [0].
>> We believe we are now at a stage where this code is stable and should be opened for widespread
>> community review. It is a large patch, so the more eyes we can get on this the better. Upgrading
>> MapReduce API usage in Nutch is long overdue so this will be a significant addition to the Nutch
>> project.
>>
>> The proposed pull request can be found at [1]. Please report any outcomes back to the issue tracker
>> at [1].
>>
>> Thank you
>> Lewis
>>
>> N.B. Please note that the official version of Apache Hadoop supported by Nutch master branch at this
>> time is 2.7.2.
>>
>> [0] https://issues.apache.org/jira/projects/NUTCH/issues/NUTCH-2375
>> [1] https://github.com/apache/nutch/pull/188
>>
>> --
>> http://home.apache.org/~lewismc/
>> @hectorMcSpector
>> http://www.linkedin.com/in/lmcgibbney
>

Reply | Threaded
Open this post in threaded view
|

Re: Request for Review

Sebastian Nagel
Hi,

short status of testing from my side:

- successfully run a small test crawl in local mode
  (only inject + few generate-fetch-parse-update cycles)

- crawling in distributed mode (on Hadoop cluster) fails,
  generator does not generate fetch lists:
    17/09/14 13:56:09 WARN crawl.Generator: Generator: 0 records selected for fetching, exiting ...

  I've retried generator with the current master: it's definitely related to the
  current NUTCH-2375 branch/PR. Afaics, this is due to not properly set configuration variables,
  changes are requested.


Best,
Sebastian



On 09/11/2017 08:06 AM, Omkar Reddy wrote:

> Hi,
>
> Kenneth thank you for your appreciation. Please participate in the code review. As Lewis said the
> more eyes we get on this the better.
>
> Sebastian please find the pull request here [0]. The code is stable with "ant clean runtime test"
> passing successfully. This is my first experience submitting a java patch at this scale. Please feel
> free to provide any suggestion.
>
> Everyone is welcome to test this code and review it.
>
> Thanks,
> Omkar
>
> [0] https://github.com/apache/nutch/pull/221 
>
> On 11 September 2017 at 00:03, kenneth mcfarland <[hidden email]
> <mailto:[hidden email]>> wrote:
>
>     Nice work Omkar, thumbs up from a fellow student.
>
>     On Sep 10, 2017 10:37 AM, "Omkar Reddy" <[hidden email]
>     <mailto:[hidden email]>> wrote:
>
>
>         Hi Sebastian,
>
>         While squashing the pull request there was some mistake and the commits were deleted. I will
>         send a new pull request and keep you posted in this thread.
>
>         Thanks,
>         ~Omkar
>
>         > On 10-Sep-2017, at 11:01 PM, Sebastian Nagel <[hidden email]
>         <mailto:[hidden email]>> wrote:
>         >
>         > Hi,
>         >
>         > thanks, Omkar for your work!
>         >
>         > Just wanted to start testing, but looks like the pull request is lost.
>         >
>         > Thanks,
>         > Sebastian
>         >
>         >> On 09/06/2017 10:57 PM, lewis john mcgibbney wrote:
>         >> Hi user@ and dev@,
>         >>
>         >> As part of the Nutch Google Summer of Code effort this year, Omkar Reddy and I have been
>         working
>         >> persistently throughout the summer months on the Hadoop MapReduce API upgrade e.g. NUTCH-2375
>         >> Upgrade the code base from org.apache.hadoop.mapred to org.apache.hadoop.mapreduce [0].
>         >> We believe we are now at a stage where this code is stable and should be opened for
>         widespread
>         >> community review. It is a large patch, so the more eyes we can get on this the better.
>         Upgrading
>         >> MapReduce API usage in Nutch is long overdue so this will be a significant addition to
>         the Nutch
>         >> project.
>         >>
>         >> The proposed pull request can be found at [1]. Please report any outcomes back to the
>         issue tracker
>         >> at [1].
>         >>
>         >> Thank you
>         >> Lewis
>         >>
>         >> N.B. Please note that the official version of Apache Hadoop supported by Nutch master
>         branch at this
>         >> time is 2.7.2.
>         >>
>         >> [0] https://issues.apache.org/jira/projects/NUTCH/issues/NUTCH-2375
>         <https://issues.apache.org/jira/projects/NUTCH/issues/NUTCH-2375>
>         >> [1] https://github.com/apache/nutch/pull/188 <https://github.com/apache/nutch/pull/188>
>         >>
>         >> --
>         >> http://home.apache.org/~lewismc/ <http://home.apache.org/~lewismc/>
>         >> @hectorMcSpector
>         >> http://www.linkedin.com/in/lmcgibbney <http://www.linkedin.com/in/lmcgibbney>
>         >
>
>