Distributed Lucene Directory

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Distributed Lucene Directory

Cedric Ho
Hi all,

I am wondering if there exist any implemenation of
org.apache.lucene.store.Directory which can be distributed across
multiple machines with comparable performance to a local FSDirectory
index, or is such an idea feasible in the first place.

By comparable performance I mean a 100G index distributed in 10
machines should achieve the same performance as a 10G index on a local
FSDirectory.

I know that optimization would be a problem for such a big index, but
would the partial optimization introduced in Lucene 2.3 help?

Any thoughts?

Regards,
Cedric Ho

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Distributed Lucene Directory

Karl Wettin
31 jan 2008 kl. 09.42 skrev Cedric Ho:

> I am wondering if there exist any implemenation of
> org.apache.lucene.store.Directory which can be distributed across
> multiple machines with comparable performance to a local FSDirectory
> index, or is such an idea feasible in the first place.
>
> By comparable performance I mean a 100G index distributed in 10
> machines should achieve the same performance as a 10G index on a local
> FSDirectory.

I never used these things and don't know about their caveats, but  
perhaps a combination of

<http://lucene.apache.org/java/2_3_0/api/org/apache/lucene/search/RemoteSearchable.html 
 >

and

<http://lucene.apache.org/java/2_3_0/api/org/apache/lucene/search/ParallelMultiSearcher.html 
 >

can help you?


   karl



---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Distributed Lucene Directory

Cedric Ho
Yes, I am aware of the RemoteSearchable and ParallelSearcher. And I am
doing something similiar now. i.e. split the index on multiple
machines.

But managing such a set of indexes is not trivial. Especially when
need to add redundancies for reliability and update frequently.

I bumped into this a while ago:

http://www.kimchy.org/compasslucene-and-datagrids/

also I've heard there is a Directory implemented for the HDFS but is
unfortunately very slow. which makes me wonder whether this type of
approach is practical (i.e. having good performance, can update index
easily, optimization won't takes too long, etc)

Cedric


On Jan 31, 2008 6:59 PM, Karl Wettin <[hidden email]> wrote:

> 31 jan 2008 kl. 09.42 skrev Cedric Ho:
>
> > I am wondering if there exist any implemenation of
> > org.apache.lucene.store.Directory which can be distributed across
> > multiple machines with comparable performance to a local FSDirectory
> > index, or is such an idea feasible in the first place.
> >
> > By comparable performance I mean a 100G index distributed in 10
> > machines should achieve the same performance as a 10G index on a local
> > FSDirectory.
>
> I never used these things and don't know about their caveats, but
> perhaps a combination of
>
> <http://lucene.apache.org/java/2_3_0/api/org/apache/lucene/search/RemoteSearchable.html
>  >
>
> and
>
> <http://lucene.apache.org/java/2_3_0/api/org/apache/lucene/search/ParallelMultiSearcher.html
>  >
>
> can help you?
>
>
>    karl
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Distributed Lucene Directory

Mark Miller-3

Cedric Ho wrote:
>
> But managing such a set of indexes is not trivial. Especially when
> need to add redundancies for reliability and update frequently.
>  
Agreed. Apparently the Solr guys are working on this now. Certainly not
trivial to do right. You might want to check out that work.

I want to start a project for this functionality myself soon - but just
with Lucene. Personally, I think the only way to go is to use Jini, but
I am waiting for the first release of Apache River before getting
started (*very* soon I hope). That gets you through the 8 fallacies of
distributed computing with almost no work right off the bat. Self
discovery, leasing, redundancy, etc with minimal effort. Hopefully I
will be able to recruit some help with this. From what I can tell, there
is a lot of roll your own for this type of thing out there...it would be
nice to focus some work on a system that can be used by all.

- Mark

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Distributed Lucene Directory

Cedric Ho
On Feb 1, 2008 9:47 AM, Mark Miller <[hidden email]> wrote:
>
> Cedric Ho wrote:
> >
> > But managing such a set of indexes is not trivial. Especially when
> > need to add redundancies for reliability and update frequently.
> >
> Agreed. Apparently the Solr guys are working on this now. Certainly not
> trivial to do right. You might want to check out that work.

do you mean this? (SOLR-303) Distributed Search over HTTP. It seems
quite complicated and is not ready for use yet.
As for Solr, let's just say It provides a lot of great functionalities
that I don't need, and a lot of functionalities that I need is not
there. So I eventually stick with Lucene only.

>
> I want to start a project for this functionality myself soon - but just
> with Lucene. Personally, I think the only way to go is to use Jini, but
> I am waiting for the first release of Apache River before getting
> started (*very* soon I hope). That gets you through the 8 fallacies of
> distributed computing with almost no work right off the bat. Self
> discovery, leasing, redundancy, etc with minimal effort. Hopefully I
> will be able to recruit some help with this. From what I can tell, there
> is a lot of roll your own for this type of thing out there...it would be
> nice to focus some work on a system that can be used by all.

I don't know much about Jini. But I'd be willing to help if you need any. =)


>
> - Mark
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]