Deleting Documents

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

Deleting Documents

SEAN MCELROY
Hello,
   
  I'm having difficulty deleting documents from an index. I am using lucene 2.3.1
   
  The program that I have created recursively searches a directory and indexes the documents that it finds. The first thing I do is open the index for writing:
   
  writer = new IndexWriter(indexDir,analyzer);
   
  I then search the directory for certain types of files: text, pdf, doc, etc. I have a basic algorithm that creates a unique id for each document and checks the index to see if this file exists. If the file exists I then compare the date the file was last update against the date it was last indexed. If the file has been updated since it was last indexed I try to remove the file and re-index. I can successfully retrieve existing documents from the index using the file id but I cannot remove a file. This is the code I use to remove the file.
   
  private void removeDocument(IndexWriter writer, File file) throws CorruptIndexException, IOException
  {
      IndexReader indexReader = IndexReader.open(writer.getDirectory());
      indexReader.deleteDocuments(new Term(IDHConstants.FILE_ID,FileNameUtil.getFieId(file)));
      indexReader.close();
  }
   
  This code doesn't work. I have tried IndexWriter.updateDocument and it also does not work. Is is because I have an IndexWriter open when I try to delete the document?
   
  All help welcome.
   
  Regards,
 
Sean
   
   
   
Reply | Threaded
Open this post in threaded view
|

Re: Deleting Documents

Daniel Naber-10
On Dienstag, 17. Juni 2008, SEAN MCELROY wrote:

>   This code doesn't work. I have tried IndexWriter.updateDocument and it
> also does not work. Is is because I have an IndexWriter open when I try
> to delete the document?

IndexWriter now has its own delete() method, you should try to use that.
Also you need to re-open any readers to actually see your deletions. BTW,
what you're trying to do is part of the Lucene demo code. It's a bit
complicated, but the implementation in the demo code should be very
efficient I think.

Regards
 Daniel

--
http://www.danielnaber.de
Reply | Threaded
Open this post in threaded view
|

Re: Deleting Documents

SEAN MCELROY
Thanks.

Daniel Naber <[hidden email]> wrote:  On Dienstag, 17. Juni 2008, SEAN MCELROY wrote:

>   This code doesn't work. I have tried IndexWriter.updateDocument and it
> also does not work. Is is because I have an IndexWriter open when I try
> to delete the document?

IndexWriter now has its own delete() method, you should try to use that.
Also you need to re-open any readers to actually see your deletions. BTW,
what you're trying to do is part of the Lucene demo code. It's a bit
complicated, but the implementation in the demo code should be very
efficient I think.

Regards
Daniel

--
http://www.danielnaber.de

Reply | Threaded
Open this post in threaded view
|

Re: Deleting Documents

Devashish
In reply to this post by SEAN MCELROY
On Tue, 2008-06-17 at 16:03 +0530, SEAN MCELROY wrote:

> Hello,
>
>   I'm having difficulty deleting documents from an index. I am using lucene 2.3.1
>
>   The program that I have created recursively searches a directory and indexes the documents that it finds. The first thing I do is open the index for writing:
>
>   writer = new IndexWriter(indexDir,analyzer);
>
>   I then search the directory for certain types of files: text, pdf, doc, etc. I have a basic algorithm that creates a unique id for each document and checks the index to see if this file exists. If the file exists I then compare the date the file was last update against the date it was last indexed. If the file has been updated since it was last indexed I try to remove the file and re-index. I can successfully retrieve existing documents from the index using the file id but I cannot remove a file. This is the code I use to remove the file.
>
>   private void removeDocument(IndexWriter writer, File file) throws CorruptIndexException, IOException
>   {
>       IndexReader indexReader = IndexReader.open(writer.getDirectory());
>       indexReader.deleteDocuments(new Term(IDHConstants.FILE_ID,FileNameUtil.getFieId(file)));
>       indexReader.close();
>   }
>
>   This code doesn't work. I have tried IndexWriter.updateDocument and it also does not work. Is is because I have an IndexWriter open when I try to delete the document?

Yes it is because you have IndexWriter open while you are trying to
delete using the IndexReader. You should close the IndexWriter first and
then delete using IndexReader, or else use the delete() function of the
IndexWriter itself...

>
>   All help welcome.
>
>   Regards,
>
> Sean
>
>
>
--
Devashish <[hidden email]>
Naukri Tech

Reply | Threaded
Open this post in threaded view
|

Re: Deleting Documents

SEAN MCELROY
I have changed my code to use the delete function of the writer and this doesn't work either. Here's my code:
   
  private void indexDocument(IndexWriter writer, Document document, File file) throws CorruptIndexException, IOException
  {
  writer.deleteDocuments(new Term(IDHConstants.FILE_ID,FileNameUtil.getFileId(file)));
  document.add(new Field(IDHConstants.DATE_LAST_INDEX, DateUtil.formatDate(new Date()), Field.Store.YES, Field.Index.TOKENIZED));
  writer.addDocument(document);
  }
 

Devashish <[hidden email]> wrote:
  On Tue, 2008-06-17 at 16:03 +0530, SEAN MCELROY wrote:

> Hello,
>
> I'm having difficulty deleting documents from an index. I am using lucene 2.3.1
>
> The program that I have created recursively searches a directory and indexes the documents that it finds. The first thing I do is open the index for writing:
>
> writer = new IndexWriter(indexDir,analyzer);
>
> I then search the directory for certain types of files: text, pdf, doc, etc. I have a basic algorithm that creates a unique id for each document and checks the index to see if this file exists. If the file exists I then compare the date the file was last update against the date it was last indexed. If the file has been updated since it was last indexed I try to remove the file and re-index. I can successfully retrieve existing documents from the index using the file id but I cannot remove a file. This is the code I use to remove the file.
>
> private void removeDocument(IndexWriter writer, File file) throws CorruptIndexException, IOException
> {
> IndexReader indexReader = IndexReader.open(writer.getDirectory());
> indexReader.deleteDocuments(new Term(IDHConstants.FILE_ID,FileNameUtil.getFieId(file)));
> indexReader.close();
> }
>
> This code doesn't work. I have tried IndexWriter.updateDocument and it also does not work. Is is because I have an IndexWriter open when I try to delete the document?

Yes it is because you have IndexWriter open while you are trying to
delete using the IndexReader. You should close the IndexWriter first and
then delete using IndexReader, or else use the delete() function of the
IndexWriter itself...

>
> All help welcome.
>
> Regards,
>
> Sean
>
>
>
--
Devashish
Naukri Tech


Reply | Threaded
Open this post in threaded view
|

Re: Deleting Documents

hossman
In reply to this post by Devashish

: >   I'm having difficulty deleting documents from an index. I am using lucene 2.3.1

Since this disscussion is specificly about using the Lucnee Java API, it
should be taking place on the java-user@lucene mailing list -- you'll get
a lot more helpful answers there from other users of the same API.

        http://lucene.apache.org/java/docs/mailinglists.html


-Hoss

Reply | Threaded
Open this post in threaded view
|

Re: Deleting Documents

Devashish
In reply to this post by SEAN MCELROY

I guess you should ask on the java-user group of the lucene mailing
lists....
The suggestion that I can give on the fly is that, use the reader
(IndexReader) for deleting documents and use the writer to add
documents. and open the writer only after closing the reader (both of
them should not be (cannot be) open at the same time).
    Also, it may be possible that your index does not have any document
with File descriptor = file, which you are trying to delete.. because of
which you are finding no change in the index, even after executing the
writer.deleteDocuments() function.....

Regards,
Devashish

On Fri, 2008-06-20 at 16:33 +0530, SEAN MCELROY wrote:

> I have changed my code to use the delete function of the writer and
> this doesn't work either. Here's my code:
>  
> private void indexDocument(IndexWriter writer, Document document, File
> file) throws CorruptIndexException, IOException
> {
> writer.deleteDocuments(new
> Term(IDHConstants.FILE_ID,FileNameUtil.getFileId(file)));
> document.add(new Field(IDHConstants.DATE_LAST_INDEX,
> DateUtil.formatDate(new Date()), Field.Store.YES,
> Field.Index.TOKENIZED));
> writer.addDocument(document);
> }
>
>
> Devashish <[hidden email]> wrote:
>         On Tue, 2008-06-17 at 16:03 +0530, SEAN MCELROY wrote:
>         > Hello,
>         >
>         > I'm having difficulty deleting documents from an index. I am
>         using lucene 2.3.1
>         >
>         > The program that I have created recursively searches a
>         directory and indexes the documents that it finds. The first
>         thing I do is open the index for writing:
>         >
>         > writer = new IndexWriter(indexDir,analyzer);
>         >
>         > I then search the directory for certain types of files:
>         text, pdf, doc, etc. I have a basic algorithm that creates a
>         unique id for each document and checks the index to see if
>         this file exists. If the file exists I then compare the date
>         the file was last update against the date it was last indexed.
>         If the file has been updated since it was last indexed I try
>         to remove the file and re-index. I can successfully retrieve
>         existing documents from the index using the file id but I
>         cannot remove a file. This is the code I use to remove the
>         file.
>         >
>         > private void removeDocument(IndexWriter writer, File file)
>         throws CorruptIndexException, IOException
>         > {
>         > IndexReader indexReader =
>         IndexReader.open(writer.getDirectory());
>         > indexReader.deleteDocuments(new
>         Term(IDHConstants.FILE_ID,FileNameUtil.getFieId(file)));
>         > indexReader.close();
>         > }
>         >
>         > This code doesn't work. I have tried
>         IndexWriter.updateDocument and it also does not work. Is is
>         because I have an IndexWriter open when I try to delete the
>         document?
>        
>         Yes it is because you have IndexWriter open while you are
>         trying to
>         delete using the IndexReader. You should close the IndexWriter
>         first and
>         then delete using IndexReader, or else use the delete()
>         function of the
>         IndexWriter itself...
>        
>         >
>         > All help welcome.
>         >
>         > Regards,
>         >
>         > Sean
>         >
>         >
>         >
>         --
>         Devashish
>         Naukri Tech
>        
>
--
Devashish <[hidden email]>
Naukri Tech