Nutch/Solr question

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Nutch/Solr question

Bartosz Gadzimski
Hi,

I want to make site search for few of my (and friends) websites but
without access to database data. So using nutch crawling and then I have
2 ways.
1. index data to solr
2. leave it with nutch index

I need help in finding advantages/disadvantages of solr vs nutch
searching because I don't know solr (it's hard to have a big picture)

Each site is quite small so it can be held by solr with no problems.
In solr I probably can't use faceted search or range queries etc.
because I don't have necessary data in schema?

In nutch I can have one search server and use site:domain to limit
results (like google site search) or use multiple indexes (mentioned on
mailing list) but what with solr?

Any input highly appreciated.

Thanks,
Bartosz
Reply | Threaded
Open this post in threaded view
|

Re: Nutch/Solr question

Webmaster-330
Hi,

I have the same problem, i am using Nutch but thinking about using it
with Solr.
I configured the whole Solr and now i am trying to configure nutch to
work with solr.

Like you i have no previous experience with Solr so i used a bunch of
tutorials.
I run a XP and a Linux Ubuntu version on my system and i only configured
nuth/solr for xp so far.
An i run a server with ubuntu so i also might want to configure
solr/nutch for ubuntu.
Only crawl about 10 websites(almost like you) and intend to use the
results as a search engine for friends and colleague's.
Like you want to know what work better, just nutch or in combination
with solr.

These links really helped me out:
http://wiki.apache.org/nutch/GettingNutchRunningWithWindows
http://wiki.apache.org/nutch/GettingNutchRunningWithUbuntu
http://wiki.apache.org/nutch/RunningNutchAndSolr

We might be able to help each other out if you have more
questions/sugguestions.

> Hi,
>
> I want to make site search for few of my (and friends) websites but
> without access to database data. So using nutch crawling and then I
> have 2 ways.
> 1. index data to solr
> 2. leave it with nutch index
>
> I need help in finding advantages/disadvantages of solr vs nutch
> searching because I don't know solr (it's hard to have a big picture)
>
> Each site is quite small so it can be held by solr with no problems.
> In solr I probably can't use faceted search or range queries etc.
> because I don't have necessary data in schema?
>
> In nutch I can have one search server and use site:domain to limit
> results (like google site search) or use multiple indexes (mentioned
> on mailing list) but what with solr?
>
> Any input highly appreciated.
>
> Thanks,
> Bartosz
>
>
> __________ Information from ESET NOD32 Antivirus, version of virus
> signature database 4574 (20091104) __________
>
> The message was checked by ESET NOD32 Antivirus.
>
> http://www.eset.com
>
>
>
>



__________ Information from ESET NOD32 Antivirus, version of virus signature database 4574 (20091104) __________

The message was checked by ESET NOD32 Antivirus.

http://www.eset.com


Reply | Threaded
Open this post in threaded view
|

Re: Nutch/Solr question

Otis Gospodnetic-2-2
In reply to this post by Bartosz Gadzimski
Solr is just a search and indexing server.  It doesn't do crawling.  Nutch does the crawling and page parsing, and can index into Lucene or into a Solr server.

Nutch is a biggish beast, and if you just need to index a site or even a small set of them, you may have an easier time with Droids.

 Otis
--
Sematext is hiring -- http://sematext.com/about/jobs.html?mls
Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR



----- Original Message ----

> From: Bartosz Gadzimski <[hidden email]>
> To: [hidden email]
> Sent: Wed, November 4, 2009 10:41:14 AM
> Subject: Nutch/Solr question
>
> Hi,
>
> I want to make site search for few of my (and friends) websites but without
> access to database data. So using nutch crawling and then I have 2 ways.
> 1. index data to solr
> 2. leave it with nutch index
>
> I need help in finding advantages/disadvantages of solr vs nutch searching
> because I don't know solr (it's hard to have a big picture)
>
> Each site is quite small so it can be held by solr with no problems.
> In solr I probably can't use faceted search or range queries etc. because I
> don't have necessary data in schema?
>
> In nutch I can have one search server and use site:domain to limit results (like
> google site search) or use multiple indexes (mentioned on mailing list) but what
> with solr?
>
> Any input highly appreciated.
>
> Thanks,
> Bartosz