Distributed search on Nutch

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Distributed search on Nutch

Boris Lau-2
Hi everyone,

Wonder if anybody have tips on how to split up the crawl data for
distributed search using "bin/nutch server"?

I am facing problem is setting up distributed search in Nutch 0.9
using hadoop.  I follow the documentation on
http://wiki.apache.org/nutch/NutchHadoopTutorial, the search server
works okay but how would I manage to split up my crawl data between
different nodes (currently experimenting with about 3) to utilise the
parallel search when my current index is merged into one segment?

Thanks for the help in advance!
boris

p.s. would anyone have more references on how to setup distributed
serach in Nutch? the above link is one of the only resources I can
find.  Thanks!
Reply | Threaded
Open this post in threaded view
|

Re: Distributed search on Nutch

Boris Lau-2
Am I wrong in saying that for using distributed searching one would
need to split the existing crawled index into even chunks? and
download them locally?

Or otherwise, if failing to create the index into nice chunks during
crawling, would there be any interface that I can look at to split an
existing index into chunks?

would anybody have any info/reference on it I can investigate further?

Any tips would be appreciated!

Thanks
boris


On Tue, Mar 25, 2008 at 1:46 PM, Boris Lau <[hidden email]> wrote:

> Hi everyone,
>
>  Wonder if anybody have tips on how to split up the crawl data for
>  distributed search using "bin/nutch server"?
>
>  I am facing problem is setting up distributed search in Nutch 0.9
>  using hadoop.  I follow the documentation on
>  http://wiki.apache.org/nutch/NutchHadoopTutorial, the search server
>  works okay but how would I manage to split up my crawl data between
>  different nodes (currently experimenting with about 3) to utilise the
>  parallel search when my current index is merged into one segment?
>
>  Thanks for the help in advance!
>  boris
>
>  p.s. would anyone have more references on how to setup distributed
>  serach in Nutch? the above link is one of the only resources I can
>  find.  Thanks!
>