solr index reusable with nutch?

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

solr index reusable with nutch?

Thorsten Scherler-3
Hi all,

is it possible to directly use the solr index in nutch?

My client is creating a portal search based on nutch. In this portal
there is as well my project and ATM I prefer to go with solr instead of
nutch since it its much better for my use case.

Now the question is whether the portal search engine could use the solr
index for my part of the portal.

Can somebody point me to related documentation?

TIA

salu2
--
thorsten

"Together we stand, divided we fall!"
Hey you (Pink Floyd)

Reply | Threaded
Open this post in threaded view
|

Re: solr index reusable with nutch?

Otis Gospodnetic-2
Hi,

Solr should be able to search any Lucene index, not just those created by Solr itself, as long as you configure it properly via schema.xml.  Thus, you should be able to use Solr to search an index created by Nutch.  Haven't tried it.  It would be nice if you could contribute the configuration for doing this.

Otis

----- Original Message ----
From: Thorsten Scherler <[hidden email]>
To: [hidden email]
Sent: Wednesday, December 13, 2006 8:26:51 AM
Subject: solr index reusable with nutch?

Hi all,

is it possible to directly use the solr index in nutch?

My client is creating a portal search based on nutch. In this portal
there is as well my project and ATM I prefer to go with solr instead of
nutch since it its much better for my use case.

Now the question is whether the portal search engine could use the solr
index for my part of the portal.

Can somebody point me to related documentation?

TIA

salu2
--
thorsten

"Together we stand, divided we fall!"
Hey you (Pink Floyd)




Reply | Threaded
Open this post in threaded view
|

Re: solr index reusable with nutch?

Thorsten Scherler-3
On Wed, 2006-12-13 at 07:45 -0800, Otis Gospodnetic wrote:
> Hi,
>
> Solr should be able to search any Lucene index,

ok, good to know. :)

So can I guess that the same is true for nutch? Meaning the index solr
is creating could be used by a nutch searcher.

>  not just those created by Solr itself, as long as you configure it properly via schema.xml.  

http://wiki.apache.org/solr/SchemaXml?highlight=%28schema%29

> Thus, you should be able to use Solr to search an index created by Nutch.

In my use case I need the reverse. Nutch searches the index created by
my solr application. The application is just one component in the portal
and the portal will provide a "global" search engine which should use
the index from solr.

>  Haven't tried it.  It would be nice if you could contribute the configuration for doing this.
>

As I figure it out I will keep you informed.

Thanks for the feedback.

salu2

> Otis
>
> ----- Original Message ----
> From: Thorsten Scherler <[hidden email]>
> To: [hidden email]
> Sent: Wednesday, December 13, 2006 8:26:51 AM
> Subject: solr index reusable with nutch?
>
> Hi all,
>
> is it possible to directly use the solr index in nutch?
>
> My client is creating a portal search based on nutch. In this portal
> there is as well my project and ATM I prefer to go with solr instead of
> nutch since it its much better for my use case.
>
> Now the question is whether the portal search engine could use the solr
> index for my part of the portal.
>
> Can somebody point me to related documentation?
>
> TIA
>
> salu2

Reply | Threaded
Open this post in threaded view
|

Re: solr index reusable with nutch?

Chris Hostetter-3

: In my use case I need the reverse. Nutch searches the index created by
: my solr application. The application is just one component in the portal
: and the portal will provide a "global" search engine which should use
: the index from solr.

If you have a compatible schema, then it should be possible ... but if
your goal is to make an index with a biz object specific schema and then
use it as a single collection/source in a nutch installation, that may not
sork ... i'm not sure how flexible Nutch is about the indexes it can
hanlde: it's probably a question best asked on the Nutch user list.




-Hoss

Reply | Threaded
Open this post in threaded view
|

Re: solr index reusable with nutch?

Thorsten Scherler-3
On Thu, 2006-12-14 at 11:14 -0800, Chris Hostetter wrote:
> : In my use case I need the reverse. Nutch searches the index created by
> : my solr application. The application is just one component in the portal
> : and the portal will provide a "global" search engine which should use
> : the index from solr.
>
> If you have a compatible schema, then it should be possible ... but if
> your goal is to make an index with a biz object specific schema and then
> use it as a single collection/source in a nutch installation, that may not
> sork ...

Yeah, that makes sense.

> i'm not sure how flexible Nutch is about the indexes it can
> hanlde: it's probably a question best asked on the Nutch user list.
>

Yeah, you are right.

Thanks for the feedback.

salu2
--
thorsten

"Together we stand, divided we fall!"
Hey you (Pink Floyd)

Reply | Threaded
Open this post in threaded view
|

Re: solr index reusable with nutch?

Thorsten Scherler-3
In reply to this post by Chris Hostetter-3
On Thu, 2006-12-14 at 11:14 -0800, Chris Hostetter wrote:

> : In my use case I need the reverse. Nutch searches the index created by
> : my solr application. The application is just one component in the portal
> : and the portal will provide a "global" search engine which should use
> : the index from solr.
>
> If you have a compatible schema, then it should be possible ... but if
> your goal is to make an index with a biz object specific schema and then
> use it as a single collection/source in a nutch installation, that may not
> sork ... i'm not sure how flexible Nutch is about the indexes it can
> hanlde: it's probably a question best asked on the Nutch user list.

I did some testing with nutch searching over a solr index. Like Chris
said "compatible schema" are the only important point on this issue.

To put it in other words, nutch uses by default <field name="content"/>
to search and returns some fields by default. So if you are not keen to
write your own nutch plugin for your custom solr schema, just make sure
that you use the field name="content" to store your main text. You can
further enhance the integration by using the "nutch" names for
"important" fields.

Further I have <field name="url"/> in my schema and it is the only field
that I see in the response of nutch.

sh bin/nutch org.apache.nutch.searcher.NutchBean presidencia
Total hits: 3
 0 null//2006/209/disposition/19923-a.html

 1 null//2006/209/disposition/20246-a.html

 2 null//2006/209/disposition/20034-a.html

This is good enough for my client and me since I can transform that
afterward. :)

Thanks Chris and Otis for your feedback.

salu2

>
>
>
>
> -Hoss
>