How to retrieve the document by document ID?

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

How to retrieve the document by document ID?

David-317
Hi all:
       How do Lucene give each document an ID  when the document is added
and  How do we retrieve a document  by document ID?  appreciate your help!
--
David
Reply | Threaded
Open this post in threaded view
|

Re: How to retrieve the document by document ID?

Otis Gospodnetic-2
David, please look at the Javadoc for IndexReader.  I believe the API is reader.document(int), where reader is an instance of IndexReader.

Otis

----- Original Message ----
From: David <[hidden email]>
To: [hidden email]
Sent: Friday, January 12, 2007 3:10:42 AM
Subject: How to retrieve the document by document ID?

Hi all:
       How do Lucene give each document an ID  when the document is added
and  How do we retrieve a document  by document ID?  appreciate your help!
--
David




---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: How to retrieve the document by document ID?

David-317
thanks, How do Lucene give each document an ID  when the document is added?
Is the document ID unchanged until the document is deleted?

2007/1/12, Otis Gospodnetic <[hidden email]>:

>
> David, please look at the Javadoc for IndexReader.  I believe the API is
> reader.document(int), where reader is an instance of IndexReader.
>
> Otis
>
> ----- Original Message ----
> From: David <[hidden email]>
> To: [hidden email]
> Sent: Friday, January 12, 2007 3:10:42 AM
> Subject: How to retrieve the document by document ID?
>
> Hi all:
>        How do Lucene give each document an ID  when the document is added
> and  How do we retrieve a document  by document ID?  appreciate your help!
> --
> David
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>


--
David
Reply | Threaded
Open this post in threaded view
|

Re: How to retrieve the document by document ID?

Doron Cohen
David <[hidden email]> wrote on 14/01/2007 20:08:05:

> thanks, How do Lucene give each document an ID  when the document is
added?
> Is the document ID unchanged until the document is deleted?
>

Not exactly.

When the first doc is added, it is assigned id 0.
Next one assigned id 1, etc.
When a doc is deleted, it is first only marked as such.
So if there are 10 docs they have ids 0 to 9.

Now doc 2 and 4 are deleted, - there is no change in ids.
Next doc added is assigned id 10.

Now if/when the segment containing the deleted docs is merged, all info on
those docs is really removed, and docids are modified to remove any holes
in the numbering - result is: 0 docs with ids 0 to 8. Now, next doc added
gets id 9.

Btw, segments are merged either as result of explicit call to optimize(),
or implicitly following addDoc or indexWriter.close() (and depending on
Lucene's merge policy).

Docids are therefore internal, with unstable values.

See also the FAQ - http://wiki.apache.org/jakarta-lucene/LuceneFAQ
Especially "When is it possible for document IDs to change?"


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: How to retrieve the document by document ID?

Doron Cohen
Doron Cohen/Haifa/IBM@IBMIL wrote on 14/01/2007 23:04:27:

> David <[hidden email]> wrote on 14/01/2007 20:08:05:
>
> > thanks, How do Lucene give each document an ID  when the document is
> added?
> > Is the document ID unchanged until the document is deleted?
> >
>
> Not exactly.
>
> When the first doc is added, it is assigned id 0.
> Next one assigned id 1, etc.
> When a doc is deleted, it is first only marked as such.
> So if there are 10 docs they have ids 0 to 9.
>
> Now doc 2 and 4 are deleted, - there is no change in ids.
> Next doc added is assigned id 10.
>
> Now if/when the segment containing the deleted docs is merged, all info
on
> those docs is really removed, and docids are modified to remove any holes
> in the numbering - result is: 0 docs with ids 0 to 8. Now, next doc added

That shuld be 9 docs (not 0...)

> gets id 9.
>
> Btw, segments are merged either as result of explicit call to optimize(),
> or implicitly following addDoc or indexWriter.close() (and depending on
> Lucene's merge policy).
>
> Docids are therefore internal, with unstable values.
>
> See also the FAQ - http://wiki.apache.org/jakarta-lucene/LuceneFAQ
> Especially "When is it possible for document IDs to change?"
>

Also take a look at the http://lucene.apache.org/java/docs/fileformats.html
- at the "Document Numbers" section.


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]