Merging Solr index

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Merging Solr index

Lance Norskog
Hi-
 
http://wiki.apache.org/solr/MergingSolrIndexes recommends using the
Lucene contributed app IndexMergeTool to merge two Solr indexes. What
happens if both indexes have records with the same unique key? Will they
both go into the new index?
 
Is the implementation of unique IDs in the Solr java or in Lucene? If it
is in Solr, how would I hackup a Solr IndexMergeTool?
 
Cheers,
 
Lance Norskog
 
Reply | Threaded
Open this post in threaded view
|

Re: Merging Solr index

Yonik Seeley-2
On Fri, Apr 4, 2008 at 6:26 PM, Norskog, Lance <[hidden email]> wrote:
>  http://wiki.apache.org/solr/MergingSolrIndexes recommends using the
>  Lucene contributed app IndexMergeTool to merge two Solr indexes. What
>  happens if both indexes have records with the same unique key? Will they
>  both go into the new index?

Yes.

>  Is the implementation of unique IDs in the Solr java or in Lucene?

Both.  It was originally just in Solr, but Lucene now has an implementation.
Neither implementation will prevent this as both just remember
documents (in memory) that were added and then periodically delete
older documents with the same id.

-Yonik
Reply | Threaded
Open this post in threaded view
|

RE: Merging Solr index

Lance Norskog
Thanks!

I have learned Solr as a power user and written a couple of simple
filters. I'm not a Lucene heavy. Where is this in Lucene?  Is it the
default? I don't remember Lucene having the notion of a unique id
(primary key).

In this merge code, with the latest Lucene 2.3, will the duplicates in
solr/data1 override the records in solr/data0? Or the other way around?

How do I add the new Lucene implementation?

            try {
                  IndexWriter writer = new IndexWriter(new
File("solr/data0/index"),
                              new StandardAnalyzer(), false);
                  Directory[] dirs = new
Directory[]{FSDirectory.getDirectory(new File("solr/data1/index"))};
                  System.out.println(writer);
                  writer.addIndexes(dirs);
                  writer.close();
            } catch (Exception e) {
                  e.printStackTrace();
            }

Thanks,

Lance Norskog


-----Original Message-----
From: [hidden email] [mailto:[hidden email]] On Behalf Of Yonik
Seeley
Sent: Saturday, April 05, 2008 2:37 PM
To: [hidden email]
Cc: Norskog, Lance
Subject: Re: Merging Solr index

On Fri, Apr 4, 2008 at 6:26 PM, Norskog, Lance <[hidden email]> wrote:
>  http://wiki.apache.org/solr/MergingSolrIndexes recommends using the  
> Lucene contributed app IndexMergeTool to merge two Solr indexes. What

> happens if both indexes have records with the same unique key? Will
> they  both go into the new index?

Yes.

>  Is the implementation of unique IDs in the Solr java or in Lucene?

Both.  It was originally just in Solr, but Lucene now has an
implementation.
Neither implementation will prevent this as both just remember documents
(in memory) that were added and then periodically delete older documents
with the same id.

-Yonik
Reply | Threaded
Open this post in threaded view
|

RE: Merging Solr index

hossman

: I have learned Solr as a power user and written a couple of simple
: filters. I'm not a Lucene heavy. Where is this in Lucene?  Is it the
: default? I don't remember Lucene having the notion of a unique id
: (primary key).

I can't answer that question (because Yonik's answer suprised me too) but
as for this one...

: In this merge code, with the latest Lucene 2.3, will the duplicates in
: solr/data1 override the records in solr/data0? Or the other way around?

neither.  duplicate overwritting is done when adding individual documents;
when merging two indexes this logic doesn't come into play.

The easiest way i can think of to deal with this would be:
  1) merge the indexes (using the existing IndexMerger)
  2) iterate over a TermEnum for the uniqueKey field.
  3) if any term has a docFreq > 1, delete all but the lowest (or
     highest) docid (depending on what order you merged the indexes in)

BTW: Would you mind updating that wiki page with some more details based
on your experience once you get it working?


-Hoss
Reply | Threaded
Open this post in threaded view
|

Re: Merging Solr index

Yonik Seeley-2
In reply to this post by Lance Norskog
On Sat, Apr 5, 2008 at 6:27 PM, Norskog, Lance <[hidden email]> wrote:
> Where is this in Lucene?  Is it the
>  default? I don't remember Lucene having the notion of a unique id
>  (primary key).

It hasn't been around too long.
IndexWriter.updateDocument(Term term, Document doc)

>  In this merge code, with the latest Lucene 2.3, will the duplicates in
>  solr/data1 override the records in solr/data0? Or the other way around?

Neither.  Duplicates will not be removed in either case.

-Yonik