remove duplicate when merging indexes

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

remove duplicate when merging indexes

hari2303
hello all,

   This is my situation ,  i've multiple indexes , for example , index1 , index2 , index3 ... i've to update the indexes every night . If i open my IndexWriter create=false (since i want to update the existing index) ,  am getting duplicate documents appends with the existing indexes , anyone help how do i remove duplicate documents by updating the existing index??????
Reply | Threaded
Open this post in threaded view
|

Re: remove duplicate when merging indexes

Simon Willnauer
You need some kind of unique ID for you documents like a primary key in a RDB.
If you have something like that you can call
IndexWriter#updateDocument(uniqueIDTerm, document) this will delete
the old document and add the new one.

simon

On Tue, Nov 10, 2009 at 10:05 AM, m.harig <[hidden email]> wrote:

>
> hello all,
>
>   This is my situation ,  i've multiple indexes , for example , index1 ,
> index2 , index3 ... i've to update the indexes every night . If i open my
> IndexWriter create=false (since i want to update the existing index) ,  am
> getting duplicate documents appends with the existing indexes , anyone help
> how do i remove duplicate documents by updating the existing index??????
> --
> View this message in context: http://old.nabble.com/remove-duplicate-when-merging-indexes-tp26280244p26280244.html
> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: remove duplicate when merging indexes

hari2303
Thanks simon

    How I do get the unique ID ? will it be added to the index?


Simon Willnauer wrote
You need some kind of unique ID for you documents like a primary key in a RDB.
If you have something like that you can call
IndexWriter#updateDocument(uniqueIDTerm, document) this will delete
the old document and add the new one.

simon

On Tue, Nov 10, 2009 at 10:05 AM, m.harig <m.harig@gmail.com> wrote:
>
> hello all,
>
>   This is my situation ,  i've multiple indexes , for example , index1 ,
> index2 , index3 ... i've to update the indexes every night . If i open my
> IndexWriter create=false (since i want to update the existing index) ,  am
> getting duplicate documents appends with the existing indexes , anyone help
> how do i remove duplicate documents by updating the existing index??????
> --
> View this message in context: http://old.nabble.com/remove-duplicate-when-merging-indexes-tp26280244p26280244.html
> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Reply | Threaded
Open this post in threaded view
|

Re: remove duplicate when merging indexes

Simon Willnauer
On Tue, Nov 10, 2009 at 10:22 AM, m.harig <[hidden email]> wrote:
>
> Thanks simon
>
>    How I do get the unique ID ? will it be added to the index?
There is no such thing build into lucene. You need to generate your
own unique ID. Make sure you do NOT use the document ID as it is
volatile and is likely to change once you modify you index.

simon

>
>
>
> Simon Willnauer wrote:
>>
>> You need some kind of unique ID for you documents like a primary key in a
>> RDB.
>> If you have something like that you can call
>> IndexWriter#updateDocument(uniqueIDTerm, document) this will delete
>> the old document and add the new one.
>>
>> simon
>>
>> On Tue, Nov 10, 2009 at 10:05 AM, m.harig <[hidden email]> wrote:
>>>
>>> hello all,
>>>
>>>   This is my situation ,  i've multiple indexes , for example , index1 ,
>>> index2 , index3 ... i've to update the indexes every night . If i open my
>>> IndexWriter create=false (since i want to update the existing index) ,
>>>  am
>>> getting duplicate documents appends with the existing indexes , anyone
>>> help
>>> how do i remove duplicate documents by updating the existing index??????
>>> --
>>> View this message in context:
>>> http://old.nabble.com/remove-duplicate-when-merging-indexes-tp26280244p26280244.html
>>> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: [hidden email]
>>> For additional commands, e-mail: [hidden email]
>>>
>>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [hidden email]
>> For additional commands, e-mail: [hidden email]
>>
>>
>>
>
> --
> View this message in context: http://old.nabble.com/remove-duplicate-when-merging-indexes-tp26280244p26280473.html
> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: remove duplicate when merging indexes

hari2303
Thanks simon ,,

    this is my code

   doc.add(new Field("id",""+i,Field.Store.YES,Field.Index.NOT_ANALYZED));
                       
   doc.add(new Field("title", indexForm.getTitle(), Field.Store.YES,
                                        Field.Index.ANALYZED));
   doc.add(new Field("contents", indexForm.getContent(),
                Field.Store.YES, Field.Index.ANALYZED));
   writer.updateDocument(new Term("id"), doc);


but still no change .. where am doing wrong??
Reply | Threaded
Open this post in threaded view
|

Re: remove duplicate when merging indexes

Ian Lea
In reply to this post by Simon Willnauer
Try updateDocument(new Term("id", ""+i), doc).

See javadocs for Term constructors.



--
Ian.


On Tue, Nov 10, 2009 at 9:47 AM, m.harig <[hidden email]> wrote:

>
> Thanks again
>
> this is my code ,
>
>  doc.add(new Field("id",""+i,Field.Store.YES,Field.Index.NOT_ANALYZED));
>
>  doc.add(new Field("title", indexForm.getTitle(), Field.Store.YES,
>                                        Field.Index.ANALYZED));
>
>  doc.add(new Field("contents", indexForm.getContent(),
>                                        Field.Store.YES, Field.Index.ANALYZED));
>
>  writer.updateDocument(new Term(""+i), doc);
>
> no changes still .. Am i doing wrong??? help me
> --
> View this message in context: http://old.nabble.com/remove-duplicate-when-merging-indexes-tp26280244p26280758.html
> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: remove duplicate when merging indexes

Simon Willnauer
Ian got it :)

simon

On Tue, Nov 10, 2009 at 10:58 AM, Ian Lea <[hidden email]> wrote:

> Try updateDocument(new Term("id", ""+i), doc).
>
> See javadocs for Term constructors.
>
>
>
> --
> Ian.
>
>
> On Tue, Nov 10, 2009 at 9:47 AM, m.harig <[hidden email]> wrote:
>>
>> Thanks again
>>
>> this is my code ,
>>
>>  doc.add(new Field("id",""+i,Field.Store.YES,Field.Index.NOT_ANALYZED));
>>
>>  doc.add(new Field("title", indexForm.getTitle(), Field.Store.YES,
>>                                        Field.Index.ANALYZED));
>>
>>  doc.add(new Field("contents", indexForm.getContent(),
>>                                        Field.Store.YES, Field.Index.ANALYZED));
>>
>>  writer.updateDocument(new Term(""+i), doc);
>>
>> no changes still .. Am i doing wrong??? help me
>> --
>> View this message in context: http://old.nabble.com/remove-duplicate-when-merging-indexes-tp26280244p26280758.html
>> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [hidden email]
>> For additional commands, e-mail: [hidden email]
>>
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: remove duplicate when merging indexes

hari2303
In reply to this post by Ian Lea
Thanks Ian , it works , thanks a lot.
Ian Lea wrote
Try updateDocument(new Term("id", ""+i), doc).

See javadocs for Term constructors.



--
Ian.


On Tue, Nov 10, 2009 at 9:47 AM, m.harig <m.harig@gmail.com> wrote:
>
> Thanks again
>
> this is my code ,
>
>  doc.add(new Field("id",""+i,Field.Store.YES,Field.Index.NOT_ANALYZED));
>
>  doc.add(new Field("title", indexForm.getTitle(), Field.Store.YES,
>                                        Field.Index.ANALYZED));
>
>  doc.add(new Field("contents", indexForm.getContent(),
>                                        Field.Store.YES, Field.Index.ANALYZED));
>
>  writer.updateDocument(new Term(""+i), doc);
>
> no changes still .. Am i doing wrong??? help me
> --
> View this message in context: http://old.nabble.com/remove-duplicate-when-merging-indexes-tp26280244p26280758.html
> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org