Document ids in Lucene index

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Document ids in Lucene index

wojtek hury
Hi all,
I am wondering if there are possible "holes" in set of index documents
ids. Being more specific - is it possible that there exist integer i
between 0 and IndexReader.maxDoc() such that
reader.document(i) == null
and
reader.isDeleted(i)==false
???

Regards,
wojtek

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Document ids in Lucene index

hossman

: I am wondering if there are possible "holes" in set of index documents
: ids. Being more specific - is it possible that there exist integer i
: between 0 and IndexReader.maxDoc() such that
: reader.document(i) == null
: and
: reader.isDeleted(i)==false
: ???

That should not ever happen ... if it does, I would consider it a bug
until someone smarter then me explained why it isn't.

(minor nit: document(i) won't ever return null, if you call it on a
deleted docId you'll get an IllegalArgumentException)


-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Document ids in Lucene index

wojtek hury
Thank you for the answer. So it means that I can without any problems
iterate over index documents using this algoritm (I don't want to use
MatchAllQuery):

- check maxDoc()
- iterate from 0 to maxDoc() and process doc if it is not deleted

Am I right?
Best,
wojtek

2008/4/12, Chris Hostetter <[hidden email]>:

>
>  : I am wondering if there are possible "holes" in set of index documents
>
> : ids. Being more specific - is it possible that there exist integer i
>  : between 0 and IndexReader.maxDoc() such that
>  : reader.document(i) == null
>  : and
>  : reader.isDeleted(i)==false
>
> : ???
>
>  That should not ever happen ... if it does, I would consider it a bug
>  until someone smarter then me explained why it isn't.
>
>  (minor nit: document(i) won't ever return null, if you call it on a
>  deleted docId you'll get an IllegalArgumentException)
>
>
>  -Hoss
>
>
>  ---------------------------------------------------------------------
>  To unsubscribe, e-mail: [hidden email]
>  For additional commands, e-mail: [hidden email]
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Document ids in Lucene index

Otis Gospodnetic-2
In reply to this post by wojtek hury
Wojtek, yes, that's how you can loop through all docs in the index.

Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch

----- Original Message ----
From: Wojtek H <[hidden email]>
To: [hidden email]
Sent: Sunday, April 13, 2008 1:38:35 PM
Subject: Re: Document ids in Lucene index

Thank you for the answer. So it means that I can without any problems
iterate over index documents using this algoritm (I don't want to use
MatchAllQuery):

- check maxDoc()
- iterate from 0 to maxDoc() and process doc if it is not deleted

Am I right?
Best,
wojtek

2008/4/12, Chris Hostetter <[hidden email]>:

>
>  : I am wondering if there are possible "holes" in set of index documents
>
> : ids. Being more specific - is it possible that there exist integer i
>  : between 0 and IndexReader.maxDoc() such that
>  : reader.document(i) == null
>  : and
>  : reader.isDeleted(i)==false
>
> : ???
>
>  That should not ever happen ... if it does, I would consider it a bug
>  until someone smarter then me explained why it isn't.
>
>  (minor nit: document(i) won't ever return null, if you call it on a
>  deleted docId you'll get an IllegalArgumentException)
>
>
>  -Hoss
>
>
>  ---------------------------------------------------------------------
>  To unsubscribe, e-mail: [hidden email]
>  For additional commands, e-mail: [hidden email]
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]





---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Document ids in Lucene index

hossman
In reply to this post by wojtek hury

: - check maxDoc()
: - iterate from 0 to maxDoc() and process doc if it is not deleted

For the record: that is exactly what MatchAllDocsQuery does ... except
that you have an off by one error (maxDoc returns 1 more then the
largest possible document number).

Even if you don't want the Query API, just use MatchAllDocs to handle the
details for you and save yourself some code...

  Scorer allDocs = (new MatchAllDocs()).weight(searcher).scorer(reader);
  while (allDocs.next()) {
    int doc = allDocs.doc()
       ...do stuff..
  }



-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]