IndexReader.deleteDocument in Lucene 3.6

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

IndexReader.deleteDocument in Lucene 3.6

Nikolay Zamosenchuk
Hi everyone. We are using IndexReader.deleteDocument(Term) method to
delete documents, since it returns the number of deleted documents.
This is used to be sure that some docs were removed. We must know for
sure if documents were deleted. But in lucene 3.6 this method is final
and can't be overridden in our codebase anymore. Method
IndexWriter.deleteDocument(..) is not final and possibly can be used
in our project, but doesn't return any value so we can't be sure
whether ant documents were deleted. So briefly
IndexReader.deleteDocument(Term) is a final but returns number of
deletions performed and IndexWriter.deleteDocument(..) is not final,
but doesn't return any result. Our functionality requires overriding
and result value.

Can anyone please suggest how to solve this issue? Can simply run term
query before, but it seems to be absolutely inefficient.

--
Best regards, Nikolay Zamosenchuk

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: IndexReader.deleteDocument in Lucene 3.6

Yonik Seeley-2-2
On Fri, May 25, 2012 at 5:23 AM, Nikolay Zamosenchuk
<[hidden email]> wrote:
> IndexWriter.deleteDocument(..) is not final,
> but doesn't return any result.

Deleted terms are buffered for good performance, so at the time of
IndexWriter.deleteDocument(Term) we don't know how many documents
match the term.

> Can anyone please suggest how to solve this issue? Can simply run term
> query before, but it seems to be absolutely inefficient.

You could switch to an asynchronous design and use a custom query that
keeps track of how many (or which) documents it matched.

-Yonik
http://lucidimagination.com




> --
> Best regards, Nikolay Zamosenchuk
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

RE: IndexReader.deleteDocument in Lucene 3.6

Edward W. Rouse
To ensure deletion I use a while loop with a counter (to prevent an endless
loop if there's a problem)

    Term term = this.createIdTerm(id);
    Int count = 0;
    while(readDocument(indexName, id) != null)
    {
        count++;
      log.debug("deleting document " + id + " from index " + indexName);
      writer.deleteDocuments(term);
      writer.commit();
        if(count > 10)
        {
          failed = true;
          break;
        }
    }
    If(failed) throw DeleteFailedException("Failed to delete document " + id
+ " from index " + indexName);

And readDocument does this:

    IndexReader reader = this.getReader(indexName);
    Document doc = null;
    TermDocs td = reader.termDocs(this.createIdTerm(id));
    if(td.next())
    {
      int d = td.doc();
      doc = reader.document(d);
    }
    this.returnReader(reader);
    return doc;

Because IndexReader.termDocs doesn't return deleted documents, once the
deletion is successful, readDocument returns a null.

I'm not even sure if I need to make the writer.commit() call, but the load
for us is small enough that performance isn't an issue. If performance does
become an issue I might need to tweak this a bit, but it does ensure that a
deletion is successful or it throws an exception.

> -----Original Message-----
> From: [hidden email] [mailto:[hidden email]] On Behalf Of Yonik
> Seeley
> Sent: Friday, May 25, 2012 12:40 PM
> To: [hidden email]
> Subject: Re: IndexReader.deleteDocument in Lucene 3.6
>
> On Fri, May 25, 2012 at 5:23 AM, Nikolay Zamosenchuk
> <[hidden email]> wrote:
> > IndexWriter.deleteDocument(..) is not final,
> > but doesn't return any result.
>
> Deleted terms are buffered for good performance, so at the time of
> IndexWriter.deleteDocument(Term) we don't know how many documents
> match the term.
>
> > Can anyone please suggest how to solve this issue? Can simply run
> term
> > query before, but it seems to be absolutely inefficient.
>
> You could switch to an asynchronous design and use a custom query that
> keeps track of how many (or which) documents it matched.
>
> -Yonik
> http://lucidimagination.com
>
>
>
>
> > --
> > Best regards, Nikolay Zamosenchuk
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: [hidden email]
> > For additional commands, e-mail: [hidden email]
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]