Scoring-similarity plugin for Nutch 2.3.1

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Scoring-similarity plugin for Nutch 2.3.1

Gajanan Watkar
Hi all,
I am using Nutch 2.3.1 with Hbase-1.2.3 as storage backend on top of
Hadoop-2.5.2 cluster in *deploy mode* with crawled data being indexed to
solr-6.5.1.

I want to add *focussed crawling capabilities to nutch 2.3.1* similar to
one provided by *scoring-similarity plugin for nutch 1.x*.

Can somebody guide me on this?
Is there *something already available* which I could not trace?
Can the *scoring-similarity plugin* for Nutch 1.x be *modified* to run with
nutch 2.3.1? if yes, how?

-Gajanan
Reply | Threaded
Open this post in threaded view
|

Re: Scoring-similarity plugin for Nutch 2.3.1

Sebastian Nagel-2
Hi Gajanan,

> Can the *scoring-similarity plugin* for Nutch 1.x be *modified* to run with
> nutch 2.3.1? if yes, how?

Eventually, yes.  Have a look at the differences of another scoring filter plugin
between 1.x and 2.x, and try to apply those to scoring-similarity.

> Can somebody guide me on this?

There is currently no Nutch committer actively working on 2.x - just compare the commit history on
the master and 2.x branches.

Sebastian



On 6/28/19 12:46 PM, Gajanan Watkar wrote:

> Hi all,
> I am using Nutch 2.3.1 with Hbase-1.2.3 as storage backend on top of
> Hadoop-2.5.2 cluster in *deploy mode* with crawled data being indexed to
> solr-6.5.1.
>
> I want to add *focussed crawling capabilities to nutch 2.3.1* similar to
> one provided by *scoring-similarity plugin for nutch 1.x*.
>
> Can somebody guide me on this?
> Is there *something already available* which I could not trace?
> Can the *scoring-similarity plugin* for Nutch 1.x be *modified* to run with
> nutch 2.3.1? if yes, how?
>
> -Gajanan
>

Reply | Threaded
Open this post in threaded view
|

Re: Scoring-similarity plugin for Nutch 2.3.1

Gajanan Watkar
Thanks Sebastian.
I will look for the differences, and try to modify the scoring-similarity
plugin for nutch 1.x to work with 2.x.
At first glance I could make out that mostly changes revolves around
CrawlDatum from 1.x to WebPage in 2.x.
Hope it works.

-Gajanan



On Fri, Jun 28, 2019 at 8:43 PM Sebastian Nagel
<[hidden email]> wrote:

> Hi Gajanan,
>
> > Can the *scoring-similarity plugin* for Nutch 1.x be *modified* to run
> with
> > nutch 2.3.1? if yes, how?
>
> Eventually, yes.  Have a look at the differences of another scoring filter
> plugin
> between 1.x and 2.x, and try to apply those to scoring-similarity.
>
> > Can somebody guide me on this?
>
> There is currently no Nutch committer actively working on 2.x - just
> compare the commit history on
> the master and 2.x branches.
>
> Sebastian
>
>
>
> On 6/28/19 12:46 PM, Gajanan Watkar wrote:
> > Hi all,
> > I am using Nutch 2.3.1 with Hbase-1.2.3 as storage backend on top of
> > Hadoop-2.5.2 cluster in *deploy mode* with crawled data being indexed to
> > solr-6.5.1.
> >
> > I want to add *focussed crawling capabilities to nutch 2.3.1* similar to
> > one provided by *scoring-similarity plugin for nutch 1.x*.
> >
> > Can somebody guide me on this?
> > Is there *something already available* which I could not trace?
> > Can the *scoring-similarity plugin* for Nutch 1.x be *modified* to run
> with
> > nutch 2.3.1? if yes, how?
> >
> > -Gajanan
> >
>
>