Solr 1.4 and Nutch 1.0 Integration

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Solr 1.4 and Nutch 1.0 Integration

Dean Del Ponte-2
I'm new to Solr, but I'm interested in setting it up to act like a google
search appliance to crawl and index my website.

It's my understanding that nutch provides the web crawling but needs to be
integrated with Solr in order to get a google search appliance type
experience.

Two questions:

1.  Is the scenario I'm outlining above possible?
2.  If it is possible, where may I found documentation describing how to set
up a Solr/Nutch instance?

Thanks for your help,

Dean Del Ponte
Reply | Threaded
Open this post in threaded view
|

Re: Solr 1.4 and Nutch 1.0 Integration

Christopher Bader-4
Dean,

I'm not sure what you mean by a "Google search appliance type experience",
but you don't need Solr to create a site-specific search engine.

Nutch and Lucene are enough.

Contact us if you need a Nutch/Lucene consultant.

CB


On Wed, Jun 16, 2010 at 1:17 PM, Dean Del Ponte <[hidden email]>wrote:

> I'm new to Solr, but I'm interested in setting it up to act like a google
> search appliance to crawl and index my website.
>
> It's my understanding that nutch provides the web crawling but needs to be
> integrated with Solr in order to get a google search appliance type
> experience.
>
> Two questions:
>
> 1.  Is the scenario I'm outlining above possible?
> 2.  If it is possible, where may I found documentation describing how to
> set
> up a Solr/Nutch instance?
>
> Thanks for your help,
>
> Dean Del Ponte
>
Reply | Threaded
Open this post in threaded view
|

Re: Solr 1.4 and Nutch 1.0 Integration

Dean Del Ponte-2
Thanks for the offer, but no funding to hire consultants!

The google search appliance,
http://www.google.com/enterprise/search/mini.html, crawls your site, indexes
it and makes the content available for search queries.  I thought this could
be done with solr and nutch as well.

I'm under the impression the solr/nutch integration can be done, but it's
not easy.

On Wed, Jun 16, 2010 at 2:13 PM, Christopher Bader <[hidden email]>wrote:

> Dean,
>
> I'm not sure what you mean by a "Google search appliance type experience",
> but you don't need Solr to create a site-specific search engine.
>
> Nutch and Lucene are enough.
>
> Contact us if you need a Nutch/Lucene consultant.
>
> CB
>
>
> On Wed, Jun 16, 2010 at 1:17 PM, Dean Del Ponte <[hidden email]
> >wrote:
>
> > I'm new to Solr, but I'm interested in setting it up to act like a google
> > search appliance to crawl and index my website.
> >
> > It's my understanding that nutch provides the web crawling but needs to
> be
> > integrated with Solr in order to get a google search appliance type
> > experience.
> >
> > Two questions:
> >
> > 1.  Is the scenario I'm outlining above possible?
> > 2.  If it is possible, where may I found documentation describing how to
> > set
> > up a Solr/Nutch instance?
> >
> > Thanks for your help,
> >
> > Dean Del Ponte
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: Solr 1.4 and Nutch 1.0 Integration

dmcole
At 2:51 PM -0500 6/16/10, Dean Del Ponte wrote:
>The google search appliance,
>http://www.google.com/enterprise/search/mini.html, crawls your site, indexes
>it and makes the content available for search queries.  I thought this could
>be done with solr and nutch as well.

These features are available in the standard Nutch distribution. You
do not need to add Solr to achieve this goal.

\dmc

--
*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+
    David M. Cole                                            [hidden email]
    Editor & Publisher, NewsInc. <http://newsinc.net>        V: (650) 557-2993
    Consultant: The Cole Group <http://colegroup.com/>       F: (650) 475-8479
*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+
Reply | Threaded
Open this post in threaded view
|

RE: Solr 1.4 and Nutch 1.0 Integration

Brian Tingle
In reply to this post by Dean Del Ponte-2
I've had no problems with 'bin/nutch solrindex' once the schema.xml from
the nutch source gets installed into the solr config.  I use pysolr and
django.  The results seem better than stock nutch results, and pysolr
works with django pagination.  You can do a simple solr facet based on
hostname, and I've played a bit with writing a solr filter to add more
facet values into solr at index time.

I've tried this on Nutch 1.0 solr 1.3 and nutch 1.1-trunk (from april?)
on solr 1.4 never nutch 1.0 to solr 1.4.  

I use lucidworks solr distributions.

Here is a blog post on how to do this using nutch 1.0 and solr 1.3
http://www.lucidimagination.com/blog/2009/03/09/nutch-solr/

Another advantage of using solr is that you can put other non-nutch
stuff into the solr index and integrate it into one search.


|-----Original Message-----
|From: Dean Del Ponte [mailto:[hidden email]]
|Sent: Wednesday, June 16, 2010 12:51 PM
|To: [hidden email]
|Subject: Re: Solr 1.4 and Nutch 1.0 Integration
|
|Thanks for the offer, but no funding to hire consultants!
|
|The google search appliance,
|http://www.google.com/enterprise/search/mini.html, crawls your site,
indexes
|it and makes the content available for search queries.  I thought this
could
|be done with solr and nutch as well.
|
|I'm under the impression the solr/nutch integration can be done, but
it's
|not easy.
|
|On Wed, Jun 16, 2010 at 2:13 PM, Christopher Bader
<[hidden email]>wrote:
|
|> Dean,
|>
|> I'm not sure what you mean by a "Google search appliance type
experience",
|> but you don't need Solr to create a site-specific search engine.
|>
|> Nutch and Lucene are enough.
|>
|> Contact us if you need a Nutch/Lucene consultant.
|>
|> CB
|>
|>
|> On Wed, Jun 16, 2010 at 1:17 PM, Dean Del Ponte
<[hidden email]
|> >wrote:
|>
|> > I'm new to Solr, but I'm interested in setting it up to act like a
|google
|> > search appliance to crawl and index my website.
|> >
|> > It's my understanding that nutch provides the web crawling but
needs to
|> be
|> > integrated with Solr in order to get a google search appliance type
|> > experience.
|> >
|> > Two questions:
|> >
|> > 1.  Is the scenario I'm outlining above possible?
|> > 2.  If it is possible, where may I found documentation describing
how to
|> > set
|> > up a Solr/Nutch instance?
|> >
|> > Thanks for your help,
|> >
|> > Dean Del Ponte
|> >
|>