Lucene in clustered environment (Tomcat)

classic Classic list List threaded Threaded
10 messages Options
Reply | Threaded
Open this post in threaded view
|

Lucene in clustered environment (Tomcat)

ben-91
Hi

I would like to use Lucene in a clustered environment, what are the
things that I should consider and do?

I would like to use the same ordinary index storage for all the nodes
in the the cluster, possible?

Thanks,
Ben

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Lucene in clustered environment (Tomcat)

Nader Henein
IMHO, Issues that you need to consider

    * Atomicity of updates and deletes if you are using multiple indexes
      on multiple machines (the case if your cluster is over a wide network)
    * Scheduled indecies to core data comparison and sanitization
      (intensive)

This all depends on what the volume of change is on your index and
whether you'll be using a Memory resident index or an FS index.

This should start the ball rolling, we've been using Lucene successfully
on a distributed cluster for a while now, and as long as you're aware of
some basic NDS limitations/constraints you should be fine.

Hope this helps

Nader Henein

Ben wrote:

>Hi
>
>I would like to use Lucene in a clustered environment, what are the
>things that I should consider and do?
>
>I would like to use the same ordinary index storage for all the nodes
>in the the cluster, possible?
>
>Thanks,
>Ben
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: [hidden email]
>For additional commands, e-mail: [hidden email]
>
>
>
>
>
>
>
>  
>

--

Nader S. Henein
Senior Applications Architect

Bayt.com





---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Lucene in clustered environment (Tomcat)

Nader Henein
In reply to this post by ben-91
IMHO, Issues that you need to consider

    * Atomicity of updates and deletes if you are using multiple indexes
      on multiple machines (the case if your cluster is over a wide network)
    * Scheduled indecies to core data comparison and sanitization
      (intensive)

This all depends on what the volume of change is on your index and
whether you'll be using a Memory resident index or an FS index.

This should start the ball rolling, we've been using Lucene successfully
on a distributed cluster for a while now, and as long as you're aware of
some basic NDS limitations/constraints you should be fine.

Hope this helps

Nader Henein

Ben wrote:

>Hi
>
>I would like to use Lucene in a clustered environment, what are the
>things that I should consider and do?
>
>I would like to use the same ordinary index storage for all the nodes
>in the the cluster, possible?
>
>Thanks,
>Ben
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: [hidden email]
>For additional commands, e-mail: [hidden email]
>
>
>
>
>
>
>
>  
>

--

Nader S. Henein
Senior Applications Architect

Bayt.com





---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Lucene in clustered environment (Tomcat)

ben-91
In reply to this post by Nader Henein
My cluster is on a single machine and I am using FS index.

I have already integrated Lucene into my web application for use in a
non-clustered environment. I don't know what I need to do to make it
work in a clustered environment.

Thanks,
Ben

On 6/7/05, Nader Henein <[hidden email]> wrote:

> IMHO, Issues that you need to consider
>
>     * Atomicity of updates and deletes if you are using multiple indexes
>       on multiple machines (the case if your cluster is over a wide network)
>     * Scheduled indecies to core data comparison and sanitization
>       (intensive)
>
> This all depends on what the volume of change is on your index and
> whether you'll be using a Memory resident index or an FS index.
>
> This should start the ball rolling, we've been using Lucene successfully
> on a distributed cluster for a while now, and as long as you're aware of
> some basic NDS limitations/constraints you should be fine.
>
> Hope this helps
>
> Nader Henein
>
> Ben wrote:
>
> >Hi
> >
> >I would like to use Lucene in a clustered environment, what are the
> >things that I should consider and do?
> >
> >I would like to use the same ordinary index storage for all the nodes
> >in the the cluster, possible?
> >
> >Thanks,
> >Ben
> >
> >---------------------------------------------------------------------
> >To unsubscribe, e-mail: [hidden email]
> >For additional commands, e-mail: [hidden email]
> >
> >
> >
> >
> >
> >
> >
> >
> >
>
> --
>
> Nader S. Henein
> Senior Applications Architect
>
> Bayt.com
>
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Lucene in clustered environment (Tomcat)

Nader Henein
When you say your cluster is on a single machine, do you mean that you
have multiple webservers on the same machine all of which search a
single Lucene index? Because if that's the case, your solution is
simple, as long as you persist to a single DB and then designate one of
your servers (or even another server) to update/delete the index. Do you
use Lucene as your persistent store or do you have a DB back there? and
what is your current update/delete strategy because real time inserts
from the webservers directly to the index will not work because you
can't have multiple writers. Updating a dirty flag on rows that need to
be indexed/deleted, or using a table for this task and then batching
your updates would be ideal, and if you're using server specific
scheduling, I strongly recommend Quartz, it's rock solid and really
versatile.

My two cents.

Nader Henein


Ben wrote:

>My cluster is on a single machine and I am using FS index.
>
>I have already integrated Lucene into my web application for use in a
>non-clustered environment. I don't know what I need to do to make it
>work in a clustered environment.
>
>Thanks,
>Ben
>
>On 6/7/05, Nader Henein <[hidden email]> wrote:
>  
>
>>IMHO, Issues that you need to consider
>>
>>    * Atomicity of updates and deletes if you are using multiple indexes
>>      on multiple machines (the case if your cluster is over a wide network)
>>    * Scheduled indecies to core data comparison and sanitization
>>      (intensive)
>>
>>This all depends on what the volume of change is on your index and
>>whether you'll be using a Memory resident index or an FS index.
>>
>>This should start the ball rolling, we've been using Lucene successfully
>>on a distributed cluster for a while now, and as long as you're aware of
>>some basic NDS limitations/constraints you should be fine.
>>
>>Hope this helps
>>
>>Nader Henein
>>
>>Ben wrote:
>>
>>    
>>
>>>Hi
>>>
>>>I would like to use Lucene in a clustered environment, what are the
>>>things that I should consider and do?
>>>
>>>I would like to use the same ordinary index storage for all the nodes
>>>in the the cluster, possible?
>>>
>>>Thanks,
>>>Ben
>>>
>>>---------------------------------------------------------------------
>>>To unsubscribe, e-mail: [hidden email]
>>>For additional commands, e-mail: [hidden email]
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>      
>>>
>>--
>>
>>Nader S. Henein
>>Senior Applications Architect
>>
>>Bayt.com
>>
>>
>>
>>
>>
>>---------------------------------------------------------------------
>>To unsubscribe, e-mail: [hidden email]
>>For additional commands, e-mail: [hidden email]
>>
>>
>>    
>>
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: [hidden email]
>For additional commands, e-mail: [hidden email]
>
>
>
>
>
>
>
>  
>

--

Nader S. Henein
Senior Applications Architect

Bayt.com





---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Lucene in clustered environment (Tomcat)

ben-91
> When you say your cluster is on a single machine, do you mean that you have multiple webservers on the same machine all of which search a single Lucene index?

Yes, this is my case.

> Do you use Lucene as your persistent store or do you have a DB back there?

I use Lucene to search for data stored in a PostgreSQL server.

> what is your current update/delete strategy because real time inserts from the webservers directly to the index will not work because you can't have multiple writers.

I have to do this in real time, what are the available solutions? My
application has the ability to do batch update/delete to a Lucene
index but I would like to do this in real time.

One solution I am thinking is to have each cluster has it own index
and use parallel search. This makes my application even more complex.

> I strongly recommend Quartz, it's rock solid and really versatile.

I am using Quartz, it is really great and supports cluster.

Thanks,
Ben


On 6/7/05, Nader Henein <[hidden email]> wrote:

> When you say your cluster is on a single machine, do you mean that you
> have multiple webservers on the same machine all of which search a
> single Lucene index? Because if that's the case, your solution is
> simple, as long as you persist to a single DB and then designate one of
> your servers (or even another server) to update/delete the index. Do you
> use Lucene as your persistent store or do you have a DB back there? and
> what is your current update/delete strategy because real time inserts
> from the webservers directly to the index will not work because you
> can't have multiple writers. Updating a dirty flag on rows that need to
> be indexed/deleted, or using a table for this task and then batching
> your updates would be ideal, and if you're using server specific
> scheduling, I strongly recommend Quartz, it's rock solid and really
> versatile.
>
> My two cents.
>
> Nader Henein
>
>
> Ben wrote:
>
> >My cluster is on a single machine and I am using FS index.
> >
> >I have already integrated Lucene into my web application for use in a
> >non-clustered environment. I don't know what I need to do to make it
> >work in a clustered environment.
> >
> >Thanks,
> >Ben
> >
> >On 6/7/05, Nader Henein <[hidden email]> wrote:
> >
> >
> >>IMHO, Issues that you need to consider
> >>
> >>    * Atomicity of updates and deletes if you are using multiple indexes
> >>      on multiple machines (the case if your cluster is over a wide network)
> >>    * Scheduled indecies to core data comparison and sanitization
> >>      (intensive)
> >>
> >>This all depends on what the volume of change is on your index and
> >>whether you'll be using a Memory resident index or an FS index.
> >>
> >>This should start the ball rolling, we've been using Lucene successfully
> >>on a distributed cluster for a while now, and as long as you're aware of
> >>some basic NDS limitations/constraints you should be fine.
> >>
> >>Hope this helps
> >>
> >>Nader Henein
> >>
> >>Ben wrote:
> >>
> >>
> >>
> >>>Hi
> >>>
> >>>I would like to use Lucene in a clustered environment, what are the
> >>>things that I should consider and do?
> >>>
> >>>I would like to use the same ordinary index storage for all the nodes
> >>>in the the cluster, possible?
> >>>
> >>>Thanks,
> >>>Ben
> >>>
> >>>---------------------------------------------------------------------
> >>>To unsubscribe, e-mail: [hidden email]
> >>>For additional commands, e-mail: [hidden email]
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>--
> >>
> >>Nader S. Henein
> >>Senior Applications Architect
> >>
> >>Bayt.com
> >>
> >>
> >>
> >>
> >>
> >>---------------------------------------------------------------------
> >>To unsubscribe, e-mail: [hidden email]
> >>For additional commands, e-mail: [hidden email]
> >>
> >>
> >>
> >>
> >
> >---------------------------------------------------------------------
> >To unsubscribe, e-mail: [hidden email]
> >For additional commands, e-mail: [hidden email]
> >
> >
> >
> >
> >
> >
> >
> >
> >
>
> --
>
> Nader S. Henein
> Senior Applications Architect
>
> Bayt.com
>
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Lucene in clustered environment (Tomcat)

Nader Henein
I realize I've already asked you this question, but do you need 100%
real time, because you could run batch them every 2 minutes, and
concerning Parallel search, unless you really need it, it's overkill in
this case, a communal index will serve you well and will be much easier
to maintain. You have to way requirement vs. complexity/ debug time.

Nader Henein

Ben wrote:

>>When you say your cluster is on a single machine, do you mean that you have multiple webservers on the same machine all of which search a single Lucene index?
>>    
>>
>
>Yes, this is my case.
>
>  
>
>>Do you use Lucene as your persistent store or do you have a DB back there?
>>    
>>
>
>I use Lucene to search for data stored in a PostgreSQL server.
>
>  
>
>>what is your current update/delete strategy because real time inserts from the webservers directly to the index will not work because you can't have multiple writers.
>>    
>>
>
>I have to do this in real time, what are the available solutions? My
>application has the ability to do batch update/delete to a Lucene
>index but I would like to do this in real time.
>
>One solution I am thinking is to have each cluster has it own index
>and use parallel search. This makes my application even more complex.
>
>  
>
>>I strongly recommend Quartz, it's rock solid and really versatile.
>>    
>>
>
>I am using Quartz, it is really great and supports cluster.
>
>Thanks,
>Ben
>
>
>On 6/7/05, Nader Henein <[hidden email]> wrote:
>  
>
>>When you say your cluster is on a single machine, do you mean that you
>>have multiple webservers on the same machine all of which search a
>>single Lucene index? Because if that's the case, your solution is
>>simple, as long as you persist to a single DB and then designate one of
>>your servers (or even another server) to update/delete the index. Do you
>>use Lucene as your persistent store or do you have a DB back there? and
>>what is your current update/delete strategy because real time inserts
>>from the webservers directly to the index will not work because you
>>can't have multiple writers. Updating a dirty flag on rows that need to
>>be indexed/deleted, or using a table for this task and then batching
>>your updates would be ideal, and if you're using server specific
>>scheduling, I strongly recommend Quartz, it's rock solid and really
>>versatile.
>>
>>My two cents.
>>
>>Nader Henein
>>
>>
>>Ben wrote:
>>
>>    
>>
>>>My cluster is on a single machine and I am using FS index.
>>>
>>>I have already integrated Lucene into my web application for use in a
>>>non-clustered environment. I don't know what I need to do to make it
>>>work in a clustered environment.
>>>
>>>Thanks,
>>>Ben
>>>
>>>On 6/7/05, Nader Henein <[hidden email]> wrote:
>>>
>>>
>>>      
>>>
>>>>IMHO, Issues that you need to consider
>>>>
>>>>   * Atomicity of updates and deletes if you are using multiple indexes
>>>>     on multiple machines (the case if your cluster is over a wide network)
>>>>   * Scheduled indecies to core data comparison and sanitization
>>>>     (intensive)
>>>>
>>>>This all depends on what the volume of change is on your index and
>>>>whether you'll be using a Memory resident index or an FS index.
>>>>
>>>>This should start the ball rolling, we've been using Lucene successfully
>>>>on a distributed cluster for a while now, and as long as you're aware of
>>>>some basic NDS limitations/constraints you should be fine.
>>>>
>>>>Hope this helps
>>>>
>>>>Nader Henein
>>>>
>>>>Ben wrote:
>>>>
>>>>
>>>>
>>>>        
>>>>
>>>>>Hi
>>>>>
>>>>>I would like to use Lucene in a clustered environment, what are the
>>>>>things that I should consider and do?
>>>>>
>>>>>I would like to use the same ordinary index storage for all the nodes
>>>>>in the the cluster, possible?
>>>>>
>>>>>Thanks,
>>>>>Ben
>>>>>
>>>>>---------------------------------------------------------------------
>>>>>To unsubscribe, e-mail: [hidden email]
>>>>>For additional commands, e-mail: [hidden email]
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>          
>>>>>
>>>>--
>>>>
>>>>Nader S. Henein
>>>>Senior Applications Architect
>>>>
>>>>Bayt.com
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>---------------------------------------------------------------------
>>>>To unsubscribe, e-mail: [hidden email]
>>>>For additional commands, e-mail: [hidden email]
>>>>
>>>>
>>>>
>>>>
>>>>        
>>>>
>>>---------------------------------------------------------------------
>>>To unsubscribe, e-mail: [hidden email]
>>>For additional commands, e-mail: [hidden email]
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>      
>>>
>>--
>>
>>Nader S. Henein
>>Senior Applications Architect
>>
>>Bayt.com
>>
>>
>>
>>
>>
>>---------------------------------------------------------------------
>>To unsubscribe, e-mail: [hidden email]
>>For additional commands, e-mail: [hidden email]
>>
>>
>>    
>>
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: [hidden email]
>For additional commands, e-mail: [hidden email]
>
>
>
>
>
>
>
>  
>

--

Nader S. Henein
Senior Applications Architect

Bayt.com





---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Lucene in clustered environment (Tomcat)

ben-91
How about using JavaGroups to notify other nodes in the cluster about
the changes?

Essentially, each node has the same index stored in a different
location. When one node updates/deletes a record, other nodes will get
a notification about the changes and update their index accordingly?
By using this method, I don't have to modify my Lucene code, I just
need to add additional code to notify other nodes. I believe this
method also scales better.

Cheers,
Ben


On 6/7/05, Nader Henein <[hidden email]> wrote:

> I realize I've already asked you this question, but do you need 100%
> real time, because you could run batch them every 2 minutes, and
> concerning Parallel search, unless you really need it, it's overkill in
> this case, a communal index will serve you well and will be much easier
> to maintain. You have to way requirement vs. complexity/ debug time.
>
> Nader Henein
>
> Ben wrote:
>
> >>When you say your cluster is on a single machine, do you mean that you have multiple webservers on the same machine all of which search a single Lucene index?
> >>
> >>
> >
> >Yes, this is my case.
> >
> >
> >
> >>Do you use Lucene as your persistent store or do you have a DB back there?
> >>
> >>
> >
> >I use Lucene to search for data stored in a PostgreSQL server.
> >
> >
> >
> >>what is your current update/delete strategy because real time inserts from the webservers directly to the index will not work because you can't have multiple writers.
> >>
> >>
> >
> >I have to do this in real time, what are the available solutions? My
> >application has the ability to do batch update/delete to a Lucene
> >index but I would like to do this in real time.
> >
> >One solution I am thinking is to have each cluster has it own index
> >and use parallel search. This makes my application even more complex.
> >
> >
> >
> >>I strongly recommend Quartz, it's rock solid and really versatile.
> >>
> >>
> >
> >I am using Quartz, it is really great and supports cluster.
> >
> >Thanks,
> >Ben
> >
> >
> >On 6/7/05, Nader Henein <[hidden email]> wrote:
> >
> >
> >>When you say your cluster is on a single machine, do you mean that you
> >>have multiple webservers on the same machine all of which search a
> >>single Lucene index? Because if that's the case, your solution is
> >>simple, as long as you persist to a single DB and then designate one of
> >>your servers (or even another server) to update/delete the index. Do you
> >>use Lucene as your persistent store or do you have a DB back there? and
> >>what is your current update/delete strategy because real time inserts
> >>from the webservers directly to the index will not work because you
> >>can't have multiple writers. Updating a dirty flag on rows that need to
> >>be indexed/deleted, or using a table for this task and then batching
> >>your updates would be ideal, and if you're using server specific
> >>scheduling, I strongly recommend Quartz, it's rock solid and really
> >>versatile.
> >>
> >>My two cents.
> >>
> >>Nader Henein
> >>
> >>
> >>Ben wrote:
> >>
> >>
> >>
> >>>My cluster is on a single machine and I am using FS index.
> >>>
> >>>I have already integrated Lucene into my web application for use in a
> >>>non-clustered environment. I don't know what I need to do to make it
> >>>work in a clustered environment.
> >>>
> >>>Thanks,
> >>>Ben
> >>>
> >>>On 6/7/05, Nader Henein <[hidden email]> wrote:
> >>>
> >>>
> >>>
> >>>
> >>>>IMHO, Issues that you need to consider
> >>>>
> >>>>   * Atomicity of updates and deletes if you are using multiple indexes
> >>>>     on multiple machines (the case if your cluster is over a wide network)
> >>>>   * Scheduled indecies to core data comparison and sanitization
> >>>>     (intensive)
> >>>>
> >>>>This all depends on what the volume of change is on your index and
> >>>>whether you'll be using a Memory resident index or an FS index.
> >>>>
> >>>>This should start the ball rolling, we've been using Lucene successfully
> >>>>on a distributed cluster for a while now, and as long as you're aware of
> >>>>some basic NDS limitations/constraints you should be fine.
> >>>>
> >>>>Hope this helps
> >>>>
> >>>>Nader Henein
> >>>>
> >>>>Ben wrote:
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>>Hi
> >>>>>
> >>>>>I would like to use Lucene in a clustered environment, what are the
> >>>>>things that I should consider and do?
> >>>>>
> >>>>>I would like to use the same ordinary index storage for all the nodes
> >>>>>in the the cluster, possible?
> >>>>>
> >>>>>Thanks,
> >>>>>Ben
> >>>>>
> >>>>>---------------------------------------------------------------------
> >>>>>To unsubscribe, e-mail: [hidden email]
> >>>>>For additional commands, e-mail: [hidden email]
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>--
> >>>>
> >>>>Nader S. Henein
> >>>>Senior Applications Architect
> >>>>
> >>>>Bayt.com
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>---------------------------------------------------------------------
> >>>>To unsubscribe, e-mail: [hidden email]
> >>>>For additional commands, e-mail: [hidden email]
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>---------------------------------------------------------------------
> >>>To unsubscribe, e-mail: [hidden email]
> >>>For additional commands, e-mail: [hidden email]
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>--
> >>
> >>Nader S. Henein
> >>Senior Applications Architect
> >>
> >>Bayt.com
> >>
> >>
> >>
> >>
> >>
> >>---------------------------------------------------------------------
> >>To unsubscribe, e-mail: [hidden email]
> >>For additional commands, e-mail: [hidden email]
> >>
> >>
> >>
> >>
> >
> >---------------------------------------------------------------------
> >To unsubscribe, e-mail: [hidden email]
> >For additional commands, e-mail: [hidden email]
> >
> >
> >
> >
> >
> >
> >
> >
> >
>
> --
>
> Nader S. Henein
> Senior Applications Architect
>
> Bayt.com
>
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Lucene in clustered environment (Tomcat)

ben-91
Wouldn't it defeat the purpose of clustering if you have a single
server to manage a single index? What would happen if this server
failed?

Cheers,
Ben

On 6/8/05, Ben <[hidden email]> wrote:

> How about using JavaGroups to notify other nodes in the cluster about
> the changes?
>
> Essentially, each node has the same index stored in a different
> location. When one node updates/deletes a record, other nodes will get
> a notification about the changes and update their index accordingly?
> By using this method, I don't have to modify my Lucene code, I just
> need to add additional code to notify other nodes. I believe this
> method also scales better.
>
> Cheers,
> Ben
>
>
> On 6/7/05, Nader Henein <[hidden email]> wrote:
> > I realize I've already asked you this question, but do you need 100%
> > real time, because you could run batch them every 2 minutes, and
> > concerning Parallel search, unless you really need it, it's overkill in
> > this case, a communal index will serve you well and will be much easier
> > to maintain. You have to way requirement vs. complexity/ debug time.
> >
> > Nader Henein
> >
> > Ben wrote:
> >
> > >>When you say your cluster is on a single machine, do you mean that you have multiple webservers on the same machine all of which search a single Lucene index?
> > >>
> > >>
> > >
> > >Yes, this is my case.
> > >
> > >
> > >
> > >>Do you use Lucene as your persistent store or do you have a DB back there?
> > >>
> > >>
> > >
> > >I use Lucene to search for data stored in a PostgreSQL server.
> > >
> > >
> > >
> > >>what is your current update/delete strategy because real time inserts from the webservers directly to the index will not work because you can't have multiple writers.
> > >>
> > >>
> > >
> > >I have to do this in real time, what are the available solutions? My
> > >application has the ability to do batch update/delete to a Lucene
> > >index but I would like to do this in real time.
> > >
> > >One solution I am thinking is to have each cluster has it own index
> > >and use parallel search. This makes my application even more complex.
> > >
> > >
> > >
> > >>I strongly recommend Quartz, it's rock solid and really versatile.
> > >>
> > >>
> > >
> > >I am using Quartz, it is really great and supports cluster.
> > >
> > >Thanks,
> > >Ben
> > >
> > >
> > >On 6/7/05, Nader Henein <[hidden email]> wrote:
> > >
> > >
> > >>When you say your cluster is on a single machine, do you mean that you
> > >>have multiple webservers on the same machine all of which search a
> > >>single Lucene index? Because if that's the case, your solution is
> > >>simple, as long as you persist to a single DB and then designate one of
> > >>your servers (or even another server) to update/delete the index. Do you
> > >>use Lucene as your persistent store or do you have a DB back there? and
> > >>what is your current update/delete strategy because real time inserts
> > >>from the webservers directly to the index will not work because you
> > >>can't have multiple writers. Updating a dirty flag on rows that need to
> > >>be indexed/deleted, or using a table for this task and then batching
> > >>your updates would be ideal, and if you're using server specific
> > >>scheduling, I strongly recommend Quartz, it's rock solid and really
> > >>versatile.
> > >>
> > >>My two cents.
> > >>
> > >>Nader Henein
> > >>
> > >>
> > >>Ben wrote:
> > >>
> > >>
> > >>
> > >>>My cluster is on a single machine and I am using FS index.
> > >>>
> > >>>I have already integrated Lucene into my web application for use in a
> > >>>non-clustered environment. I don't know what I need to do to make it
> > >>>work in a clustered environment.
> > >>>
> > >>>Thanks,
> > >>>Ben
> > >>>
> > >>>On 6/7/05, Nader Henein <[hidden email]> wrote:
> > >>>
> > >>>
> > >>>
> > >>>
> > >>>>IMHO, Issues that you need to consider
> > >>>>
> > >>>>   * Atomicity of updates and deletes if you are using multiple indexes
> > >>>>     on multiple machines (the case if your cluster is over a wide network)
> > >>>>   * Scheduled indecies to core data comparison and sanitization
> > >>>>     (intensive)
> > >>>>
> > >>>>This all depends on what the volume of change is on your index and
> > >>>>whether you'll be using a Memory resident index or an FS index.
> > >>>>
> > >>>>This should start the ball rolling, we've been using Lucene successfully
> > >>>>on a distributed cluster for a while now, and as long as you're aware of
> > >>>>some basic NDS limitations/constraints you should be fine.
> > >>>>
> > >>>>Hope this helps
> > >>>>
> > >>>>Nader Henein
> > >>>>
> > >>>>Ben wrote:
> > >>>>
> > >>>>
> > >>>>
> > >>>>
> > >>>>
> > >>>>>Hi
> > >>>>>
> > >>>>>I would like to use Lucene in a clustered environment, what are the
> > >>>>>things that I should consider and do?
> > >>>>>
> > >>>>>I would like to use the same ordinary index storage for all the nodes
> > >>>>>in the the cluster, possible?
> > >>>>>
> > >>>>>Thanks,
> > >>>>>Ben
> > >>>>>
> > >>>>>---------------------------------------------------------------------
> > >>>>>To unsubscribe, e-mail: [hidden email]
> > >>>>>For additional commands, e-mail: [hidden email]
> > >>>>>
> > >>>>>
> > >>>>>
> > >>>>>
> > >>>>>
> > >>>>>
> > >>>>>
> > >>>>>
> > >>>>>
> > >>>>>
> > >>>>>
> > >>>>>
> > >>>>>
> > >>>>--
> > >>>>
> > >>>>Nader S. Henein
> > >>>>Senior Applications Architect
> > >>>>
> > >>>>Bayt.com
> > >>>>
> > >>>>
> > >>>>
> > >>>>
> > >>>>
> > >>>>---------------------------------------------------------------------
> > >>>>To unsubscribe, e-mail: [hidden email]
> > >>>>For additional commands, e-mail: [hidden email]
> > >>>>
> > >>>>
> > >>>>
> > >>>>
> > >>>>
> > >>>>
> > >>>---------------------------------------------------------------------
> > >>>To unsubscribe, e-mail: [hidden email]
> > >>>For additional commands, e-mail: [hidden email]
> > >>>
> > >>>
> > >>>
> > >>>
> > >>>
> > >>>
> > >>>
> > >>>
> > >>>
> > >>>
> > >>>
> > >>--
> > >>
> > >>Nader S. Henein
> > >>Senior Applications Architect
> > >>
> > >>Bayt.com
> > >>
> > >>
> > >>
> > >>
> > >>
> > >>---------------------------------------------------------------------
> > >>To unsubscribe, e-mail: [hidden email]
> > >>For additional commands, e-mail: [hidden email]
> > >>
> > >>
> > >>
> > >>
> > >
> > >---------------------------------------------------------------------
> > >To unsubscribe, e-mail: [hidden email]
> > >For additional commands, e-mail: [hidden email]
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> >
> > --
> >
> > Nader S. Henein
> > Senior Applications Architect
> >
> > Bayt.com
> >
> >
> >
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: [hidden email]
> > For additional commands, e-mail: [hidden email]
> >
> >
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Lucene in clustered environment (Tomcat)

Nader Henein
Considering you have all your servers on one machine a simple memory failure and the whole thing goes south. But you're right, we have an independent Lucene index sitting next to each one of our webservers on each machine, but they are all updated from a central location powered and organized by an application that accesses our persistent store on an
oracle database and creates XML files which are then copied to each of the Lucene servers and indexed, if the central utility fails, then the backup kicks in, at worst the indecies aren't up to date for as long as it takes to point the webservers to the Oracle Standby. I wrote a preliminary paper (will send you separately coz the mailing list doesn't allow attachments) about Lucene strategies in a clustered environment, this is a bout 6 months old, I've gone a long way since and I'm finalizing a newer version which I hope to publish so as to offer a solid case study to anyone out there taking that step. Once again this paper is old, but it should get you going.

Nader Henein



Ben wrote:

>Wouldn't it defeat the purpose of clustering if you have a single
>server to manage a single index? What would happen if this server
>failed?
>
>Cheers,
>Ben
>
>On 6/8/05, Ben <[hidden email]> wrote:
>  
>
>>How about using JavaGroups to notify other nodes in the cluster about
>>the changes?
>>
>>Essentially, each node has the same index stored in a different
>>location. When one node updates/deletes a record, other nodes will get
>>a notification about the changes and update their index accordingly?
>>By using this method, I don't have to modify my Lucene code, I just
>>need to add additional code to notify other nodes. I believe this
>>method also scales better.
>>
>>Cheers,
>>Ben
>>
>>
>>On 6/7/05, Nader Henein <[hidden email]> wrote:
>>    
>>
>>>I realize I've already asked you this question, but do you need 100%
>>>real time, because you could run batch them every 2 minutes, and
>>>concerning Parallel search, unless you really need it, it's overkill in
>>>this case, a communal index will serve you well and will be much easier
>>>to maintain. You have to way requirement vs. complexity/ debug time.
>>>
>>>Nader Henein
>>>
>>>Ben wrote:
>>>
>>>      
>>>
>>>>>When you say your cluster is on a single machine, do you mean that you have multiple webservers on the same machine all of which search a single Lucene index?
>>>>>
>>>>>
>>>>>          
>>>>>
>>>>Yes, this is my case.
>>>>
>>>>
>>>>
>>>>        
>>>>
>>>>>Do you use Lucene as your persistent store or do you have a DB back there?
>>>>>
>>>>>
>>>>>          
>>>>>
>>>>I use Lucene to search for data stored in a PostgreSQL server.
>>>>
>>>>
>>>>
>>>>        
>>>>
>>>>>what is your current update/delete strategy because real time inserts from the webservers directly to the index will not work because you can't have multiple writers.
>>>>>
>>>>>
>>>>>          
>>>>>
>>>>I have to do this in real time, what are the available solutions? My
>>>>application has the ability to do batch update/delete to a Lucene
>>>>index but I would like to do this in real time.
>>>>
>>>>One solution I am thinking is to have each cluster has it own index
>>>>and use parallel search. This makes my application even more complex.
>>>>
>>>>
>>>>
>>>>        
>>>>
>>>>>I strongly recommend Quartz, it's rock solid and really versatile.
>>>>>
>>>>>
>>>>>          
>>>>>
>>>>I am using Quartz, it is really great and supports cluster.
>>>>
>>>>Thanks,
>>>>Ben
>>>>
>>>>
>>>>On 6/7/05, Nader Henein <[hidden email]> wrote:
>>>>
>>>>
>>>>        
>>>>
>>>>>When you say your cluster is on a single machine, do you mean that you
>>>>>have multiple webservers on the same machine all of which search a
>>>>>single Lucene index? Because if that's the case, your solution is
>>>>>simple, as long as you persist to a single DB and then designate one of
>>>>>your servers (or even another server) to update/delete the index. Do you
>>>>>use Lucene as your persistent store or do you have a DB back there? and
>>>>>what is your current update/delete strategy because real time inserts
>>>>>          
>>>>>
>>>>>from the webservers directly to the index will not work because you
>>>>        
>>>>
>>>>>can't have multiple writers. Updating a dirty flag on rows that need to
>>>>>be indexed/deleted, or using a table for this task and then batching
>>>>>your updates would be ideal, and if you're using server specific
>>>>>scheduling, I strongly recommend Quartz, it's rock solid and really
>>>>>versatile.
>>>>>
>>>>>My two cents.
>>>>>
>>>>>Nader Henein
>>>>>
>>>>>
>>>>>Ben wrote:
>>>>>
>>>>>
>>>>>
>>>>>          
>>>>>
>>>>>>My cluster is on a single machine and I am using FS index.
>>>>>>
>>>>>>I have already integrated Lucene into my web application for use in a
>>>>>>non-clustered environment. I don't know what I need to do to make it
>>>>>>work in a clustered environment.
>>>>>>
>>>>>>Thanks,
>>>>>>Ben
>>>>>>
>>>>>>On 6/7/05, Nader Henein <[hidden email]> wrote:
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>            
>>>>>>
>>>>>>>IMHO, Issues that you need to consider
>>>>>>>
>>>>>>>  * Atomicity of updates and deletes if you are using multiple indexes
>>>>>>>    on multiple machines (the case if your cluster is over a wide network)
>>>>>>>  * Scheduled indecies to core data comparison and sanitization
>>>>>>>    (intensive)
>>>>>>>
>>>>>>>This all depends on what the volume of change is on your index and
>>>>>>>whether you'll be using a Memory resident index or an FS index.
>>>>>>>
>>>>>>>This should start the ball rolling, we've been using Lucene successfully
>>>>>>>on a distributed cluster for a while now, and as long as you're aware of
>>>>>>>some basic NDS limitations/constraints you should be fine.
>>>>>>>
>>>>>>>Hope this helps
>>>>>>>
>>>>>>>Nader Henein
>>>>>>>
>>>>>>>Ben wrote:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>              
>>>>>>>
>>>>>>>>Hi
>>>>>>>>
>>>>>>>>I would like to use Lucene in a clustered environment, what are the
>>>>>>>>things that I should consider and do?
>>>>>>>>
>>>>>>>>I would like to use the same ordinary index storage for all the nodes
>>>>>>>>in the the cluster, possible?
>>>>>>>>
>>>>>>>>Thanks,
>>>>>>>>Ben
>>>>>>>>
>>>>>>>>---------------------------------------------------------------------
>>>>>>>>To unsubscribe, e-mail: [hidden email]
>>>>>>>>For additional commands, e-mail: [hidden email]
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>                
>>>>>>>>
>>>>>>>--
>>>>>>>
>>>>>>>Nader S. Henein
>>>>>>>Senior Applications Architect
>>>>>>>
>>>>>>>Bayt.com
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>---------------------------------------------------------------------
>>>>>>>To unsubscribe, e-mail: [hidden email]
>>>>>>>For additional commands, e-mail: [hidden email]
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>              
>>>>>>>
>>>>>>---------------------------------------------------------------------
>>>>>>To unsubscribe, e-mail: [hidden email]
>>>>>>For additional commands, e-mail: [hidden email]
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>            
>>>>>>
>>>>>--
>>>>>
>>>>>Nader S. Henein
>>>>>Senior Applications Architect
>>>>>
>>>>>Bayt.com
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>---------------------------------------------------------------------
>>>>>To unsubscribe, e-mail: [hidden email]
>>>>>For additional commands, e-mail: [hidden email]
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>          
>>>>>
>>>>---------------------------------------------------------------------
>>>>To unsubscribe, e-mail: [hidden email]
>>>>For additional commands, e-mail: [hidden email]
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>        
>>>>
>>>--
>>>
>>>Nader S. Henein
>>>Senior Applications Architect
>>>
>>>Bayt.com
>>>
>>>
>>>
>>>
>>>
>>>---------------------------------------------------------------------
>>>To unsubscribe, e-mail: [hidden email]
>>>For additional commands, e-mail: [hidden email]
>>>
>>>
>>>      
>>>
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: [hidden email]
>For additional commands, e-mail: [hidden email]
>
>
>
>
>
>
>
>  
>

--
Nader S. Henein
Senior Applications Developer

Bayt.com





---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]