Updating Lucene Index with Unstored fields

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Updating Lucene Index with Unstored fields

philipc
hi,

I'm trying to add a new field to all the documents in a lucene index.
After searching around, I found the only way to do an update
is by retrieve the old documents, update it, delete it, then re-add
to index.

However, this worked for only preserving the stored fields.
i've lost all the unstored fields from the documents.
is there anyway to keep the unstored fields as well?

Or any way to go around the problem,
ie, anyway to export the entire index to a csv file
and then update the cvs, and then import it back?

 - Philip
Reply | Threaded
Open this post in threaded view
|

Re: Updating Lucene Index with Unstored fields

Andrzej Białecki-2
philipc wrote:

> hi,
>
> I'm trying to add a new field to all the documents in a lucene index.
> After searching around, I found the only way to do an update
> is by retrieve the old documents, update it, delete it, then re-add
> to index.
>
> However, this worked for only preserving the stored fields.
> i've lost all the unstored fields from the documents.
> is there anyway to keep the unstored fields as well?
>
> Or any way to go around the problem,
> ie, anyway to export the entire index to a csv file
> and then update the cvs, and then import it back?

Here's an idea: create an index consisting of documents with just this
field, adding documents in exactly the same order as they are in the
other index. Then use ParallelReader to access both indexes at the same
time - ParallelReader will present a merged view of both indexes. You
can also use IndexWriter.addIndexes() to create a merged index.


--
Best regards,
Andrzej Bialecki     <><
  ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Updating Lucene Index with Unstored fields

philipc
thanks for your quick reply.
I'm trying to use your method, but running into a NullPointerException on the IndexWriteer.addIndexes().

code sample

isearcher is IndexSearcher,
newValues is IndexWriter with RAM Directory
                               <snip>
                ParallelReader preader = new ParallelReader();
                preader.add(isearcher.getIndexReader());
                preader.add(new IndexSearcher(newValues.getDirectory()).getIndexReader());
               
                int numdoc = preader.numDocs();
               
                for (int i = 0; i< numdoc; i++){
                        Document d= preader.document(i);
                        System.out.println( d.toString());
                }
                writer.addIndexes(new IndexReader[]{preader});
                               <snip>
this code works fine before the addIndexes line.
it printed the merged index properly.
but addIndexes throws NullPointerException.



java.lang.NullPointerException
        at org.apache.lucene.index.ParallelReader$ParallelTermPositions.seek(ParallelReader.java:358)
        at org.apache.lucene.index.ParallelReader$ParallelTermDocs.seek(ParallelReader.java:320)
        at org.apache.lucene.index.SegmentMerger.appendPostings(SegmentMerger.java:327)
        at org.apache.lucene.index.SegmentMerger.mergeTermInfo(SegmentMerger.java:298)
        at org.apache.lucene.index.SegmentMerger.mergeTermInfos(SegmentMerger.java:272)
        at org.apache.lucene.index.SegmentMerger.mergeTerms(SegmentMerger.java:236)
        at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:89)
        at org.apache.lucene.index.IndexWriter.addIndexes(IndexWriter.java:605)

I was using lucene 1.9.1, but there's a bug for this ,and i've updated to lucene 2.0.0,
but still the same.

thanks in advance,
  Philip



Andrzej Bialecki wrote
philipc wrote:
> hi,
>
> I'm trying to add a new field to all the documents in a lucene index.
> After searching around, I found the only way to do an update
> is by retrieve the old documents, update it, delete it, then re-add
> to index.
>
> However, this worked for only preserving the stored fields.
> i've lost all the unstored fields from the documents.
> is there anyway to keep the unstored fields as well?
>
> Or any way to go around the problem,
> ie, anyway to export the entire index to a csv file
> and then update the cvs, and then import it back?

Here's an idea: create an index consisting of documents with just this
field, adding documents in exactly the same order as they are in the
other index. Then use ParallelReader to access both indexes at the same
time - ParallelReader will present a merged view of both indexes. You
can also use IndexWriter.addIndexes() to create a merged index.


--
Best regards,
Andrzej Bialecki     <><
  ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org