MultiFieldQueryParser doesn't properly filter out documents when the query string specifies to exclude certain terms

classic Classic list List threaded Threaded
11 messages Options
Reply | Threaded
Open this post in threaded view
|

MultiFieldQueryParser doesn't properly filter out documents when the query string specifies to exclude certain terms

Scott Sellman
I am not sure if this is a problem with Lucene or if I am building my
Query object improperly.  It seems to me, when performing a search that
should exclude certain terms, MultiFieldQueryParser doesn't filter out
documents when it should.  Consider the following example to clarify
what I am talking about.

 

Say the index contains a document with two fields: title and
description. The value stored in the title field is "chocolate shoes"
and the value in description is "hazardous candy".  

 

If I pass in the following query "+chocolate -hazardous", then the
document mentioned above IS returned as a result.  I did a little
investigation and noticed that only if the term "hazardous" exists in
every single field that is a part of MultiFieldQueryParser, will the
document be filtered out of the search results.  

 

Other queries such as "+chocolate" or "+chocolate +hazardous" seem to
work fine.  

 

One note: I did notice the following text in the FAQ section of the
lucene website: "Also MultiFieldQueryParser builds queries that
sometimes behave unexpectedly, namely for AND queries: it requires alls
terms to appear in all field. This is not what one typically wants, for
example in a search over "title" and "body" fields (Lucene 1.9 fixes
this problem)." - it seems there has been some problems noticed in the
past, perhaps they didn't fix all use cases in regards to this.  

 

I am currently using lucene 2.0.0.  

 

Here is the code I am using to build the Query object:

BooleanQuery q = new BooleanQuery();

String[] fields = new String[]{ "name", "description" };

Query keywordQuery = MultiFieldQueryParser.parse(keywords, fields,

                        new
BooleanClause.Occur[]{BooleanClause.Occur.SHOULD,
BooleanClause.Occur.SHOULD}

                        new StandardAnalyzer());

 

q.add(keywordQuery, BooleanClause.Occur.MUST); //true, false);

 

 

Any help or suggestions is appreciated,

Scott Sellman

 

Reply | Threaded
Open this post in threaded view
|

Re: MultiFieldQueryParser doesn't properly filter out documents when the query string specifies to exclude certain terms

Daniel Naber-5
On Tuesday 19 December 2006 23:05, Scott Sellman wrote:

>                         new
> BooleanClause.Occur[]{BooleanClause.Occur.SHOULD,
> BooleanClause.Occur.SHOULD}

Why do you explicitly specify these operators?

> q.add(keywordQuery, BooleanClause.Occur.MUST); //true, false);

You seem to wrap a query in another BooleanQuery. As long as keywordQuery
is the only query that doesn't seem to make sense. Please try using the
MultiFieldQueryParser's constructor, not the static method. I think that
might fix your problem.

Regards
 Daniel

--
http://www.danielnaber.de

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

RE: MultiFieldQueryParser doesn't properly filter out documents when the query string specifies to exclude certain terms

Adam Fleming
In reply to this post by Scott Sellman

Hello Gentlemen (+Ladies?),

I'm integrating Lucene into a Spring web-app, and have found a plethora of great web + print resources to make the integration quick and seamless.  One thing that I have been hard-pressed to find is a good solution for rebuilding the index on a regular basis.  

I'm curious if a you know of a best-practice (or have found something personally that works) for rebuilding a Lucene Index w/o service interruptions.  The assumptions are a spring IOC container w/ an IndexFactory bean.  I have the project configured to work with both FSDirectory and RamDirectory implementations.   If you don't know Spring, you are free to ignore the details - I'll adapt your comments to my code :)

So far I tried rebuilding the index on a regular schedule, but foolishly only added duplicate documents to an existing index.  

Things I have considered are
 - Using two index directories, and rebuilding one while the other is
   in use + switching when the rebuilt index is ready.  This would
   cause the app to alternate between two indexes.  
 - Using a single index, and iterating over the index entirely,
   deleting documents 1 by 1 and re-adding them with fresh data
 - Using a single index, and deleting ALL the documents at once
   and then adding them all back as quickly as possible.


All of my proposed ideas seem fly in the face of Lucene's sipmlicity, and I will be so thankful to be pointed in the right direction.


Happy Holidays and  a big Thank You to the active list users,


Adam Fleming

_________________________________________________________________
Try amazing new 3D maps
http://maps.live.com/?wip=51
---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: MultiFieldQueryParser doesn't properly filter out documents when the query string specifies to exclude certain terms

Erick Erickson
My first question is how many documents would you be deleting on a pass for
option 2? If it's 10 documents out of 10,000, I'd consider just deleting
them and re-adding (see IndexModifier).

Personally, if posible, I prefer your first option, building a completely
new index and switching between them. This is especially useful if something
catastrophic happens to the index as you build it and it winds up being
unusable (power failures *do* happen). You can keep using your old index and
be happy.

Another question is how quickly the index builds and how soon do your users
require that they get up-to-date data?

And remember that no matter what, you must re-open your searcher to see the
updates.

I'd be really reluctant to remove all the items and re-build the index for
several reasons...
1> You wouldn't get the new data being added until you closed/reopened your
searcher.
2> The documents you deleted wouldn't be "gone" until you closed/reopened
your searcher.
3> In the interim, your users wouldn't have access to much of anything....

Best
Erick

On 12/20/06, Adam Fleming <[hidden email]> wrote:

>
>
> Hello Gentlemen (+Ladies?),
>
> I'm integrating Lucene into a Spring web-app, and have found a plethora of
> great web + print resources to make the integration quick and seamless.  One
> thing that I have been hard-pressed to find is a good solution for
> rebuilding the index on a regular basis.
>
> I'm curious if a you know of a best-practice (or have found something
> personally that works) for rebuilding a Lucene Index w/o service
> interruptions.  The assumptions are a spring IOC container w/ an
> IndexFactory bean.  I have the project configured to work with both
> FSDirectory and RamDirectory implementations.   If you don't know Spring,
> you are free to ignore the details - I'll adapt your comments to my code :)
>
> So far I tried rebuilding the index on a regular schedule, but foolishly
> only added duplicate documents to an existing index.
>
> Things I have considered are
> - Using two index directories, and rebuilding one while the other is
>    in use + switching when the rebuilt index is ready.  This would
>    cause the app to alternate between two indexes.
> - Using a single index, and iterating over the index entirely,
>    deleting documents 1 by 1 and re-adding them with fresh data
> - Using a single index, and deleting ALL the documents at once
>    and then adding them all back as quickly as possible.
>
>
> All of my proposed ideas seem fly in the face of Lucene's sipmlicity, and
> I will be so thankful to be pointed in the right direction.
>
>
> Happy Holidays and  a big Thank You to the active list users,
>
>
> Adam Fleming
>
> _________________________________________________________________
> Try amazing new 3D maps
> http://maps.live.com/?wip=51
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>
Reply | Threaded
Open this post in threaded view
|

RE: MultiFieldQueryParser doesn't properly filter out documents when the query string specifies to exclude certain terms

Scott Sellman
In reply to this post by Daniel Naber-5
>Please try using the MultiFieldQueryParser's constructor, not the static >method. I think that might fix your problem.

Yes, after I created a new MultiFieldQueryParser and calling the parse( String query) method my search executed as expected.  

Thanks for your help!
Scott

>> BooleanClause.Occur[]{BooleanClause.Occur.SHOULD,
>> BooleanClause.Occur.SHOULD}

>Why do you explicitly specify these operators?

I am using the parse(String query, String[] fields, BooleanClause.Occur[] flags, Analyzer analyzer) method as opposed to parse(String[] queries, String[] fields, Analyzer analyzer).  They seem to have the same result.

>> q.add(keywordQuery, BooleanClause.Occur.MUST);

>You seem to wrap a query in another BooleanQuery. As long as keywordQuery
>is the only query that doesn't seem to make sense.

I am adding additional Query objects later on in my code.
-----Original Message-----
From: Daniel Naber [mailto:[hidden email]]
Sent: Tuesday, December 19, 2006 4:06 PM
To: [hidden email]
Subject: Re: MultiFieldQueryParser doesn't properly filter out documents when the query string specifies to exclude certain terms

On Tuesday 19 December 2006 23:05, Scott Sellman wrote:

>                         new
> BooleanClause.Occur[]{BooleanClause.Occur.SHOULD,
> BooleanClause.Occur.SHOULD}

Why do you explicitly specify these operators?

> q.add(keywordQuery, BooleanClause.Occur.MUST);

You seem to wrap a query in another BooleanQuery. As long as keywordQuery
is the only query that doesn't seem to make sense. Please try using the
MultiFieldQueryParser's constructor, not the static method. I think that
might fix your problem.

Regards
 Daniel

--
http://www.danielnaber.de

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Rebuilding index on a regular basis

Scott Sellman
In reply to this post by Erick Erickson
Note: I have changed the title of this thread to match its content

I am currently facing a similar issue.  I am dealing with a large index
that is constantly used and needs to be updated on a daily basis.  For
fear of corruption I would rather rebuild the index each time,
performing tests against it before using it.  However the problem I am
having is switching in the old index without causing service
interruption.  As long as queries are being made against the index I am
running into locking issues with the index files, preventing me from
putting the new index in place. Any suggestions?

Thanks,
Scott
-----Original Message-----
From: Erick Erickson [mailto:[hidden email]]
Sent: Wednesday, December 20, 2006 7:59 AM
To: [hidden email]
Subject: Re: MultiFieldQueryParser doesn't properly filter out documents
when the query string specifies to exclude certain terms

My first question is how many documents would you be deleting on a pass
for
option 2? If it's 10 documents out of 10,000, I'd consider just deleting
them and re-adding (see IndexModifier).

Personally, if posible, I prefer your first option, building a
completely
new index and switching between them. This is especially useful if
something
catastrophic happens to the index as you build it and it winds up being
unusable (power failures *do* happen). You can keep using your old index
and
be happy.

Another question is how quickly the index builds and how soon do your
users
require that they get up-to-date data?

And remember that no matter what, you must re-open your searcher to see
the
updates.

I'd be really reluctant to remove all the items and re-build the index
for
several reasons...
1> You wouldn't get the new data being added until you closed/reopened
your
searcher.
2> The documents you deleted wouldn't be "gone" until you
closed/reopened
your searcher.
3> In the interim, your users wouldn't have access to much of
anything....

Best
Erick

On 12/20/06, Adam Fleming <[hidden email]> wrote:
>
>
> Hello Gentlemen (+Ladies?),
>
> I'm integrating Lucene into a Spring web-app, and have found a
plethora of
> great web + print resources to make the integration quick and
seamless.  One
> thing that I have been hard-pressed to find is a good solution for
> rebuilding the index on a regular basis.
>
> I'm curious if a you know of a best-practice (or have found something
> personally that works) for rebuilding a Lucene Index w/o service
> interruptions.  The assumptions are a spring IOC container w/ an
> IndexFactory bean.  I have the project configured to work with both
> FSDirectory and RamDirectory implementations.   If you don't know
Spring,
> you are free to ignore the details - I'll adapt your comments to my
code :)
>
> So far I tried rebuilding the index on a regular schedule, but
foolishly

> only added duplicate documents to an existing index.
>
> Things I have considered are
> - Using two index directories, and rebuilding one while the other is
>    in use + switching when the rebuilt index is ready.  This would
>    cause the app to alternate between two indexes.
> - Using a single index, and iterating over the index entirely,
>    deleting documents 1 by 1 and re-adding them with fresh data
> - Using a single index, and deleting ALL the documents at once
>    and then adding them all back as quickly as possible.
>
>
> All of my proposed ideas seem fly in the face of Lucene's sipmlicity,
and

> I will be so thankful to be pointed in the right direction.
>
>
> Happy Holidays and  a big Thank You to the active list users,
>
>
> Adam Fleming
>
> _________________________________________________________________
> Try amazing new 3D maps
> http://maps.live.com/?wip=51
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Rebuilding index on a regular basis

Erick Erickson
Why not switch where the searchers look rather than copy the index and
restart? That is, your searcher is pointing at index1, and you build the new
one in a a new dir (index2). On some signal, your server closes the searcher
pointing to index1 and opens one pointing to index2 and uses that until
tomorrow, when you do the opposite.

You could even warm up the searcher after you open it but before you start
searching with it if you wanted.

Or, if you are using Linux, say, your index directory could be a symlink and
your process would be
1> build/test the new index
2> shut down the server
3> switch the symlink to point at the new index directory
4> start the server.

You'd still have a small interruption for your users, but we're probably
talking 2 seconds plus however long it takes you to stop/start your
server.....

Erick


On 12/20/06, Scott Sellman <[hidden email]> wrote:

>
> Note: I have changed the title of this thread to match its content
>
> I am currently facing a similar issue.  I am dealing with a large index
> that is constantly used and needs to be updated on a daily basis.  For
> fear of corruption I would rather rebuild the index each time,
> performing tests against it before using it.  However the problem I am
> having is switching in the old index without causing service
> interruption.  As long as queries are being made against the index I am
> running into locking issues with the index files, preventing me from
> putting the new index in place. Any suggestions?
>
> Thanks,
> Scott
> -----Original Message-----
> From: Erick Erickson [mailto:[hidden email]]
> Sent: Wednesday, December 20, 2006 7:59 AM
> To: [hidden email]
> Subject: Re: MultiFieldQueryParser doesn't properly filter out documents
> when the query string specifies to exclude certain terms
>
> My first question is how many documents would you be deleting on a pass
> for
> option 2? If it's 10 documents out of 10,000, I'd consider just deleting
> them and re-adding (see IndexModifier).
>
> Personally, if posible, I prefer your first option, building a
> completely
> new index and switching between them. This is especially useful if
> something
> catastrophic happens to the index as you build it and it winds up being
> unusable (power failures *do* happen). You can keep using your old index
> and
> be happy.
>
> Another question is how quickly the index builds and how soon do your
> users
> require that they get up-to-date data?
>
> And remember that no matter what, you must re-open your searcher to see
> the
> updates.
>
> I'd be really reluctant to remove all the items and re-build the index
> for
> several reasons...
> 1> You wouldn't get the new data being added until you closed/reopened
> your
> searcher.
> 2> The documents you deleted wouldn't be "gone" until you
> closed/reopened
> your searcher.
> 3> In the interim, your users wouldn't have access to much of
> anything....
>
> Best
> Erick
>
> On 12/20/06, Adam Fleming <[hidden email]> wrote:
> >
> >
> > Hello Gentlemen (+Ladies?),
> >
> > I'm integrating Lucene into a Spring web-app, and have found a
> plethora of
> > great web + print resources to make the integration quick and
> seamless.  One
> > thing that I have been hard-pressed to find is a good solution for
> > rebuilding the index on a regular basis.
> >
> > I'm curious if a you know of a best-practice (or have found something
> > personally that works) for rebuilding a Lucene Index w/o service
> > interruptions.  The assumptions are a spring IOC container w/ an
> > IndexFactory bean.  I have the project configured to work with both
> > FSDirectory and RamDirectory implementations.   If you don't know
> Spring,
> > you are free to ignore the details - I'll adapt your comments to my
> code :)
> >
> > So far I tried rebuilding the index on a regular schedule, but
> foolishly
> > only added duplicate documents to an existing index.
> >
> > Things I have considered are
> > - Using two index directories, and rebuilding one while the other is
> >    in use + switching when the rebuilt index is ready.  This would
> >    cause the app to alternate between two indexes.
> > - Using a single index, and iterating over the index entirely,
> >    deleting documents 1 by 1 and re-adding them with fresh data
> > - Using a single index, and deleting ALL the documents at once
> >    and then adding them all back as quickly as possible.
> >
> >
> > All of my proposed ideas seem fly in the face of Lucene's sipmlicity,
> and
> > I will be so thankful to be pointed in the right direction.
> >
> >
> > Happy Holidays and  a big Thank You to the active list users,
> >
> >
> > Adam Fleming
> >
> > _________________________________________________________________
> > Try amazing new 3D maps
> > http://maps.live.com/?wip=51
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: [hidden email]
> > For additional commands, e-mail: [hidden email]
> >
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>
Reply | Threaded
Open this post in threaded view
|

Re: Rebuilding index on a regular basis

Patrek
In reply to this post by Scott Sellman
Hi,

How about this:

1) You copy the files that make your index in a new folder
2) You update your index in that new folder (forcing if necessary, old locks
will not be valid)
3) When update is completed, close your readers, and open them on the new
index.
4) Copy the fresh index files to the previous location for next round, where
you won't need the initial copy to a fresh folder.

That way, you won't have to reindex all your documents (assuming only a
small subset needs updating) and will be able to switch to a more up to date
index more easily and often.

Patrick


On 12/20/06, Scott Sellman <[hidden email]> wrote:

>
> Note: I have changed the title of this thread to match its content
>
> I am currently facing a similar issue.  I am dealing with a large index
> that is constantly used and needs to be updated on a daily basis.  For
> fear of corruption I would rather rebuild the index each time,
> performing tests against it before using it.  However the problem I am
> having is switching in the old index without causing service
> interruption.  As long as queries are being made against the index I am
> running into locking issues with the index files, preventing me from
> putting the new index in place. Any suggestions?
>
> Thanks,
> Scott
>
Reply | Threaded
Open this post in threaded view
|

RE: Rebuilding index on a regular basis

Adam Fleming
In reply to this post by Scott Sellman

Hi Erick,

Thanks for the suggestion of using 2 indexes.  The number of documents is small - about 2000, and it builds quickly - about 3s from a database.  I am currently trying to rebuild every 2 minutes, but could probably reduce that to 5.  That could be as long as 10 minutes, but that's about the limit.

Thanks,

Adam



----------------------------------------

> Date: Wed, 20 Dec 2006 10:58:43 -0500
> From: [hidden email]
> To: [hidden email]
> Subject: Re: MultiFieldQueryParser doesn't properly filter out documents when the query string specifies to exclude certain terms
>
> My first question is how many documents would you be deleting on a pass for
> option 2? If it's 10 documents out of 10,000, I'd consider just deleting
> them and re-adding (see IndexModifier).
>
> Personally, if posible, I prefer your first option, building a completely
> new index and switching between them. This is especially useful if something
> catastrophic happens to the index as you build it and it winds up being
> unusable (power failures *do* happen). You can keep using your old index and
> be happy.
>
> Another question is how quickly the index builds and how soon do your users
> require that they get up-to-date data?
>
> And remember that no matter what, you must re-open your searcher to see the
> updates.
>
> I'd be really reluctant to remove all the items and re-build the index for
> several reasons...
> 1> You wouldn't get the new data being added until you closed/reopened your
> searcher.
> 2> The documents you deleted wouldn't be "gone" until you closed/reopened
> your searcher.
> 3> In the interim, your users wouldn't have access to much of anything....
>
> Best
> Erick
>
> On 12/20/06, Adam Fleming <[hidden email]> wrote:
> >
> >
> > Hello Gentlemen (+Ladies?),
> >
> > I'm integrating Lucene into a Spring web-app, and have found a plethora of
> > great web + print resources to make the integration quick and seamless.  One
> > thing that I have been hard-pressed to find is a good solution for
> > rebuilding the index on a regular basis.
> >
> > I'm curious if a you know of a best-practice (or have found something
> > personally that works) for rebuilding a Lucene Index w/o service
> > interruptions.  The assumptions are a spring IOC container w/ an
> > IndexFactory bean.  I have the project configured to work with both
> > FSDirectory and RamDirectory implementations.   If you don't know Spring,
> > you are free to ignore the details - I'll adapt your comments to my code :)
> >
> > So far I tried rebuilding the index on a regular schedule, but foolishly
> > only added duplicate documents to an existing index.
> >
> > Things I have considered are
> > - Using two index directories, and rebuilding one while the other is
> >    in use + switching when the rebuilt index is ready.  This would
> >    cause the app to alternate between two indexes.
> > - Using a single index, and iterating over the index entirely,
> >    deleting documents 1 by 1 and re-adding them with fresh data
> > - Using a single index, and deleting ALL the documents at once
> >    and then adding them all back as quickly as possible.
> >
> >
> > All of my proposed ideas seem fly in the face of Lucene's sipmlicity, and
> > I will be so thankful to be pointed in the right direction.
> >
> >
> > Happy Holidays and  a big Thank You to the active list users,
> >
> >
> > Adam Fleming
> >
> > _________________________________________________________________
> > Try amazing new 3D maps
> > http://maps.live.com/?wip=51
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: [hidden email]
> > For additional commands, e-mail: [hidden email]
> >
> >

_________________________________________________________________
Try amazing new 3D maps
http://maps.live.com/?wip=51
---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

RE: Rebuilding index on a regular basis

Adam Fleming
In reply to this post by Scott Sellman

Hi Patrik

Thanks for the thoughtful responses.  I am not a pro with Searchers yet, but it seems like closing + opening searchers would still result in a small period of unserviceability.  I would also like to stick to the Directory API so that I can keep the option to use FS or RAM based indexes.  

I think a slight extension to this idea may really do the trick.  It's as follows:

1. Create Index A + Reader A.
2. set CurrentReader = A
3. After time interval T, build Index B + Reader B.
4. set CurrentReader = B
5. After time interval T, rebuild Index A + Reopen A
6. set CurrentReader = A

etc.

The advantage here being that the '=' operation is atomic and indivisible - the currentReader variable always points to a valid and up-todate index.  Although this system doesn't GUARANTEE there won't be a service interruption, in practice if T is long enough there shouldn't be a problem.


Thoughts?  I'm curious if this solution reflects a misunderstanding about the way Lucene works.


Thanks,

Adam



----------------------------------------

> Date: Wed, 20 Dec 2006 22:11:33 -0500
> From: [hidden email]
> To: [hidden email]
> Subject: Re: Rebuilding index on a regular basis
>
> Hi,
>
> How about this:
>
> 1) You copy the files that make your index in a new folder
> 2) You update your index in that new folder (forcing if necessary, old locks
> will not be valid)
> 3) When update is completed, close your readers, and open them on the new
> index.
> 4) Copy the fresh index files to the previous location for next round, where
> you won't need the initial copy to a fresh folder.
>
> That way, you won't have to reindex all your documents (assuming only a
> small subset needs updating) and will be able to switch to a more up to date
> index more easily and often.
>
> Patrick
>
>
> On 12/20/06, Scott Sellman <[hidden email]> wrote:
> >
> > Note: I have changed the title of this thread to match its content
> >
> > I am currently facing a similar issue.  I am dealing with a large index
> > that is constantly used and needs to be updated on a daily basis.  For
> > fear of corruption I would rather rebuild the index each time,
> > performing tests against it before using it.  However the problem I am
> > having is switching in the old index without causing service
> > interruption.  As long as queries are being made against the index I am
> > running into locking issues with the index files, preventing me from
> > putting the new index in place. Any suggestions?
> >
> > Thanks,
> > Scott
> >

_________________________________________________________________
Get the Live.com Holiday Page for recipes, gift-giving ideas, and more.
www.live.com/?addtemplate=holiday
---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Rebuilding index on a regular basis

Erick Erickson
Adam:

I think you're worrying about the wrong thing. There is no "period of
unserviceability" to worry about in closing/reopening a searcher. If, by
saying "searcher", you mean the Lucene IndexSearcher/Reader. If you're
talking about shutting down your service, that's another story.

What you *do* have to think about is whether your service is single or
multi-threaded. If it's single-threaded, somewhere in your process, you have
something like

get request
service request
return response

So as part of, say, the service request part, you close/open a searcher. No
service unavailability here, perhaps a slight delay if you haven't warmed up
your searcher.

If you're multi-threaded, you probably have to make sure that all your
threads are waiting around when you close/open your searcher, since you
*should* be using a single (static) searcher across all your threads. Again,
no unserviceability, but perhaps a small delay depending upon how long it
takes all the threads to finish servicing their requests.....

All that said, it's less important than the fact that when you do reopen
your searcher, you'll get a slow response for that request only as the
searcher builds up its caches. You can avoid this by having your searcher
execute a query or two before you switch it in.

But I wouldn't worry about this until you demonstrate to your satisfaction
that there's actually a problem. Go with "the simplest thing that could
possibly work", analyze any problems, and fix it up. As you say, this index
really isn't very big. Not if it builds in 3 seconds. I rather doubt that
you have to do anything at all fancy to get satisfactory performance. But
what do I know <G>?

Ditto for worrying about FS or RAM based searcher. Don't bother trying the
RAM solution (as it's going to be more complex) until you know an FS based
index won't work. Especially since the FS based index largely *is* a RAM
based index when you consider caching. NOTE: when you're measuring things,
ignore the time it takes to service the *first* request since that'll be
misleading.

Anyway, hope all this helps
Erick

On 12/22/06, Adam Fleming <[hidden email]> wrote:

>
>
> Hi Patrik
>
> Thanks for the thoughtful responses.  I am not a pro with Searchers yet,
> but it seems like closing + opening searchers would still result in a small
> period of unserviceability.  I would also like to stick to the Directory API
> so that I can keep the option to use FS or RAM based indexes.
>
> I think a slight extension to this idea may really do the trick.  It's as
> follows:
>
> 1. Create Index A + Reader A.
> 2. set CurrentReader = A
> 3. After time interval T, build Index B + Reader B.
> 4. set CurrentReader = B
> 5. After time interval T, rebuild Index A + Reopen A
> 6. set CurrentReader = A
>
> etc.
>
> The advantage here being that the '=' operation is atomic and indivisible
> - the currentReader variable always points to a valid and up-todate
> index.  Although this system doesn't GUARANTEE there won't be a service
> interruption, in practice if T is long enough there shouldn't be a problem.
>
>
> Thoughts?  I'm curious if this solution reflects a misunderstanding about
> the way Lucene works.
>
>
> Thanks,
>
> Adam
>
>
>
> ----------------------------------------
> > Date: Wed, 20 Dec 2006 22:11:33 -0500
> > From: [hidden email]
> > To: [hidden email]
> > Subject: Re: Rebuilding index on a regular basis
> >
> > Hi,
> >
> > How about this:
> >
> > 1) You copy the files that make your index in a new folder
> > 2) You update your index in that new folder (forcing if necessary, old
> locks
> > will not be valid)
> > 3) When update is completed, close your readers, and open them on the
> new
> > index.
> > 4) Copy the fresh index files to the previous location for next round,
> where
> > you won't need the initial copy to a fresh folder.
> >
> > That way, you won't have to reindex all your documents (assuming only a
> > small subset needs updating) and will be able to switch to a more up to
> date
> > index more easily and often.
> >
> > Patrick
> >
> >
> > On 12/20/06, Scott Sellman <[hidden email]> wrote:
> > >
> > > Note: I have changed the title of this thread to match its content
> > >
> > > I am currently facing a similar issue.  I am dealing with a large
> index
> > > that is constantly used and needs to be updated on a daily basis.  For
> > > fear of corruption I would rather rebuild the index each time,
> > > performing tests against it before using it.  However the problem I am
> > > having is switching in the old index without causing service
> > > interruption.  As long as queries are being made against the index I
> am
> > > running into locking issues with the index files, preventing me from
> > > putting the new index in place. Any suggestions?
> > >
> > > Thanks,
> > > Scott
> > >
>
> _________________________________________________________________
> Get the Live.com Holiday Page for recipes, gift-giving ideas, and more.
> www.live.com/?addtemplate=holiday
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>