Controlling index file name

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Controlling index file name

021336
We use Lucene to create simple data stores that we deploy with our application.  Our application also supports auto-updating and we refresh these data stores monthly.  Since Lucene computes the names for the index we end up deploying new files each time while leaving the old files to continue taking up space needlessly.

The options I see to resolve this are:
- After each update check all the data store locations and delete CFS files that have a date earlier then the Segments files, or
- Create the index files with a consistent name so that we overwrite existing files with new files during each auto-update.

I found one thread discussing this using "AddIndexes" but I could not follow it through to implementation.  Also, we are using Java 1.4 not 1.5.

Any suggestions?

Reply | Threaded
Open this post in threaded view
|

Re: Controlling index file name

Bhavin Pandya
I also faced same problem in past.
But in my case the index size was not the issue so i maintained two folder
"newindex" and "oldindex"... and swaping at every update.

-Bhavin pandya

----- Original Message -----
From: "021336" <[hidden email]>
To: <[hidden email]>
Sent: Tuesday, April 01, 2008 9:44 PM
Subject: Controlling index file name


>
> We use Lucene to create simple data stores that we deploy with our
> application.  Our application also supports auto-updating and we refresh
> these data stores monthly.  Since Lucene computes the names for the index
> we
> end up deploying new files each time while leaving the old files to
> continue
> taking up space needlessly.
>
> The options I see to resolve this are:
> - After each update check all the data store locations and delete CFS
> files
> that have a date earlier then the Segments files, or
> - Create the index files with a consistent name so that we overwrite
> existing files with new files during each auto-update.
>
> I found one thread discussing this using "AddIndexes" but I could not
> follow
> it through to implementation.  Also, we are using Java 1.4 not 1.5.
>
> Any suggestions?
>
>
> --
> View this message in context:
> http://www.nabble.com/Controlling-index-file-name-tp16418709p16418709.html
> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Controlling index file name

Anshum
Hi,

I guess (but 'm not quite sure) you are looking for a way to
incrementally index(+update existing index), there would be a lot info
available on the same.
What I would suggest would be deleting the indexes from the current
index using deleteDocuments
(http://lucene.apache.org/java/2_3_1/api/org/apache/lucene/index/IndexWriter.html#deleteDocuments(org.apache.lucene.index.Term)) and follow it up by adding the same (but updated) docs into the index. You could periodically merge the indexes (using lucene merger) to avoid the incremental index to grow to a huge size.
To search the indexes you could use a multi searcher for the incremental
index and base index until merged.
Important to mention, for this you would need to design someway to
lookup and compile a list of modified docs(to be newly indexed or
reindexed) and pass it to lucene while indexing.

--
Anshum


On Thu, 2008-04-03 at 18:27 +0530, Bhavin Pandya wrote:

> I also faced same problem in past.
> But in my case the index size was not the issue so i maintained two folder
> "newindex" and "oldindex"... and swaping at every update.
>
> -Bhavin pandya
>
> ----- Original Message -----
> From: "021336" <[hidden email]>
> To: <[hidden email]>
> Sent: Tuesday, April 01, 2008 9:44 PM
> Subject: Controlling index file name
>
>
> >
> > We use Lucene to create simple data stores that we deploy with our
> > application.  Our application also supports auto-updating and we refresh
> > these data stores monthly.  Since Lucene computes the names for the index
> > we
> > end up deploying new files each time while leaving the old files to
> > continue
> > taking up space needlessly.
> >
> > The options I see to resolve this are:
> > - After each update check all the data store locations and delete CFS
> > files
> > that have a date earlier then the Segments files, or
> > - Create the index files with a consistent name so that we overwrite
> > existing files with new files during each auto-update.
> >
> > I found one thread discussing this using "AddIndexes" but I could not
> > follow
> > it through to implementation.  Also, we are using Java 1.4 not 1.5.
> >
> > Any suggestions?
> >
> >
> > --
> > View this message in context:
> > http://www.nabble.com/Controlling-index-file-name-tp16418709p16418709.html
> > Sent from the Lucene - Java Users mailing list archive at Nabble.com.
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: [hidden email]
> > For additional commands, e-mail: [hidden email]
> >
> >
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]