Nutch 1.0, vs. 1.1 vs. 1.2

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Nutch 1.0, vs. 1.1 vs. 1.2

jeffersonzhou
Hi,

I would like to ask for your opinions about the comparison of Nutch 1.0,
1.1 and 1.2. The reason is that I have been able to use 1.0 for a while
without much difficulties. When I start to use 1.1 or 1.2, there have
been quite a lot changes, and more importantly, the out-of-box versions
of the two do not work as smoothly as I thought.

I just wonder if it is good to migrate to the newer versions.

Thanks.

Reply | Threaded
Open this post in threaded view
|

Re: Nutch 1.0, vs. 1.1 vs. 1.2

scohen
If I remember correctly. nutch-1.0 used hadoop 0.19 while nutch-1.1 and 1.2
use hadoop .20. When If you look the hadoop documentation you see that they
switched the configuration files between .19 and .20.  19 used
hadoop-default and hadoop-site.xml, while 20 broke those files into three
pieces and is now using core-default, core-site, mapred-default,
mapred-site, and hdfs-default and hdfs-site.xml. If you try to use the old
conf file in nutch-1.2 you will get errors.

Thanks,
Steve Cohen

On Sun, Dec 5, 2010 at 12:55 AM, jeff <[hidden email]> wrote:

> Hi,
>
> I would like to ask for your opinions about the comparison of Nutch 1.0,
> 1.1 and 1.2. The reason is that I have been able to use 1.0 for a while
> without much difficulties. When I start to use 1.1 or 1.2, there have
> been quite a lot changes, and more importantly, the out-of-box versions
> of the two do not work as smoothly as I thought.
>
> I just wonder if it is good to migrate to the newer versions.
>
> Thanks.
>
>
Reply | Threaded
Open this post in threaded view
|

Re: Nutch 1.0, vs. 1.1 vs. 1.2

jeffersonzhou
Thanks Steven.

I now don't think that far. :) I am more focused on the crawling and
parsing functionalities of 1.0, 1.1 and 1.2. To be more specific, I have
implemented an html parser in 1.0 which works properly. When I am trying
1.1 or 1.2, I have to deal with the tika parser issue which causes a lot
of headaches.

Having said so, I may have to consider distributed crawling functions
later. Do you know why we need the changes in the configuration files
and what are the benefits?

Thanks


On Sun, 2010-12-05 at 11:15 -0500, Steve Cohen wrote:

> If I remember correctly. nutch-1.0 used hadoop 0.19 while nutch-1.1 and 1.2
> use hadoop .20. When If you look the hadoop documentation you see that they
> switched the configuration files between .19 and .20.  19 used
> hadoop-default and hadoop-site.xml, while 20 broke those files into three
> pieces and is now using core-default, core-site, mapred-default,
> mapred-site, and hdfs-default and hdfs-site.xml. If you try to use the old
> conf file in nutch-1.2 you will get errors.
>
> Thanks,
> Steve Cohen
>
> On Sun, Dec 5, 2010 at 12:55 AM, jeff <[hidden email]> wrote:
>
> > Hi,
> >
> > I would like to ask for your opinions about the comparison of Nutch 1.0,
> > 1.1 and 1.2. The reason is that I have been able to use 1.0 for a while
> > without much difficulties. When I start to use 1.1 or 1.2, there have
> > been quite a lot changes, and more importantly, the out-of-box versions
> > of the two do not work as smoothly as I thought.
> >
> > I just wonder if it is good to migrate to the newer versions.
> >
> > Thanks.
> >
> >