Updating documents in index with some fields not stored

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Updating documents in index with some fields not stored

Chun Wei Ho
I would like to make some updates to values within my large index. I
understand that I have to delete and re-insert each document to be
changed to do that. However I do have some large fields that are
unstored (only indexed and no, these are not the fields that I am
wanting to change), which means I can't re-insert the documents
easily. I would like to find out:

(1) Is it possible just to create a new field index on an existing
index file. My change is that I have a field that is stored but not
indexed, and now I would like to index that field. If it can be done
it would be much more convenient than deleting and re-inserting every
document.

(2) I understand Luke is able to reconstruct the field so that the
document can be re-inserted. Can someone give me a hint on how its
done and if its potentially too time consuming for a large index (up
to million docs and too many terms to count).

Thanks a lot. Any help would be much appreciated.

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Updating documents in index with some fields not stored

Andrzej Białecki-2
Chun Wei Ho wrote:
> (2) I understand Luke is able to reconstruct the field so that the
> document can be re-inserted. Can someone give me a hint on how its
> done and if its potentially too time consuming for a large index (up
> to million docs and too many terms to count).

Luke simply iterates over all terms, and collects terms and their
positions in a selected document, and then builds an array of terms,
inserting them at correct positions. If there are gaps in positions, it
inserts nulls.

For a large index with many terms this could take long (an hour?) -
whether it's a viable option to you depends on the value you put on that
document's data, and how often you need to do this ...

--
Best regards,
Andrzej Bialecki     <><
 ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com



---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Updating documents in index with some fields not stored

Andrzej Białecki-2
In reply to this post by Chun Wei Ho
Chun Wei Ho wrote:
> (2) I understand Luke is able to reconstruct the field so that the
> document can be re-inserted. Can someone give me a hint on how its
> done and if its potentially too time consuming for a large index (up
> to million docs and too many terms to count).

Ah, I forgot to mention: when that function in Luke was written, term
vector support was still a bit shaky. Nowadays, if the index is created
with term vectors with positions, I would rather use that information to
reconstruct the doc. This should be very quick.

--
Best regards,
Andrzej Bialecki     <><
 ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com



---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]