Solr live at Netflix

classic Classic list List threaded Threaded
20 messages Options
Reply | Threaded
Open this post in threaded view
|

Solr live at Netflix

Walter Underwood, Netflix
Here at Netflix, we switched over our site search to Solr two weeks ago.
We've seen zero problems with the server. We average 1.2 million
queries/day on a 250K item index. We're running four Solr servers
with simple round-robin HTTP load-sharing.

This is all on 1.1. I've been too busy tuning to upgrade.

Thanks everyone, this is a great piece of software.

wunder
--
Walter Underwood
Search Guy, Netflix

Reply | Threaded
Open this post in threaded view
|

Re: Solr live at Netflix

hossman

: Here at Netflix, we switched over our site search to Solr two weeks ago.

That's great Walter ... could I persuade you to add a few notes about this
to...

http://wiki.apache.org/solr/PublicServers
http://wiki.apache.org/solr/SolrPerformanceData


-Hoss

Reply | Threaded
Open this post in threaded view
|

Re: Solr live at Netflix

Walter Underwood, Netflix
I think Chris Harris is doing that. I'll check it and touch it up
afterwards. Avoid race conditions. --wunder


On 10/2/07 4:26 PM, "Chris Hostetter" <[hidden email]> wrote:

>
> : Here at Netflix, we switched over our site search to Solr two weeks ago.
>
> That's great Walter ... could I persuade you to add a few notes about this
> to...
>
> http://wiki.apache.org/solr/PublicServers
> http://wiki.apache.org/solr/SolrPerformanceData
>
>
> -Hoss
>

Reply | Threaded
Open this post in threaded view
|

Re: Solr live at Netflix

TomSolrList
In reply to this post by Walter Underwood, Netflix
Nice!

And there seem to be some improvements. For example, "Gamers" and "Gamera"
no longer stem to the same word :-)

Tom

On 10/2/07, Walter Underwood <[hidden email]> wrote:

>
> Here at Netflix, we switched over our site search to Solr two weeks ago.
> We've seen zero problems with the server. We average 1.2 million
> queries/day on a 250K item index. We're running four Solr servers
> with simple round-robin HTTP load-sharing.
>
> This is all on 1.1. I've been too busy tuning to upgrade.
>
> Thanks everyone, this is a great piece of software.
>
> wunder
> --
> Walter Underwood
> Search Guy, Netflix
>
>
Reply | Threaded
Open this post in threaded view
|

question about bi-gram analysis on query

Keene, David
In reply to this post by Walter Underwood, Netflix
Hey guys,

I'm trying to index a field in Chinese using the CJKTokenizer, and I'm finding that my searches on the index are not working at all.  The index is created properly (looking with Luke), and when I search against it with Luke the data comes back as I would expect.  Also, when I use the analysis page of solr admin, the result is what I would expect.  On an actual search though, nothing is found.

Here are the relevant snippets from my confs:

<fieldtype name="text_zh" class="solr.TextField">
  <analyzer>
    <tokenizer
      class="org.apache.solr.analysis.ja.CJKTokenizerFactory"/>
      <filter class="solr.LowerCaseFilterFactory"/>
      <filter class="solr.TrimFilterFactory" />
  </analyzer>
</fieldtype>

...

<field name="text" type="text_zh" indexed="true" stored="false" multiValued="true"/>


So if I send in
美聯社
it correctly creates 2 tokens
美聯  聯社  

And if I do a search in Luke and the solr analysis page for美聯, I get a hit.  But on the actual search, I don't.

Also, I've noticed that the parsed query on luke is:
text:"美聯 聯社"
and in solr it is:
text:"美聯 聯社 "
I noticed there is an extra space in the solr parsed query.  I don't know if that makes a difference.

I'm really at a loss.  Does anyone know why I don’t get search hits back?

Thanks,
Dave Keene
 
Reply | Threaded
Open this post in threaded view
|

Re: Solr live at Netflix

Norberto Meijome-2
In reply to this post by Walter Underwood, Netflix
On Tue, 02 Oct 2007 15:26:33 -0700
Walter Underwood <[hidden email]> wrote:

> Here at Netflix, we switched over our site search to Solr two weeks ago.
> We've seen zero problems with the server. We average 1.2 million
> queries/day on a 250K item index. We're running four Solr servers
> with simple round-robin HTTP load-sharing.

Hi Walter,
would you mind sharing hardware specs, OS, index size, VM settings, OS specific tunings ?

unless that will be added to the wiki... :)

thanks in advance,
B

_________________________
{Beto|Norberto|Numard} Meijome

"Have the courage to take your own thoughts
seriously, for they will shape you."
   Albert Einstein

I speak for myself, not my employer. Contents may be hot. Slippery when wet. Reading disclaimers makes you go blind. Writing them is worse. You have been Warned.
Reply | Threaded
Open this post in threaded view
|

Re: Solr live at Netflix

Matthew Runo
Yes. Congratulations on your launch. I'd love sort of a case study, I  
think SOLR could really benefit with a good "heres our schema, heres  
the site, this is the type of server/jvm, etc etc" sort of thing.

The example app is fine and all, but a real life example with a site  
that uses facets like this would really make it easier to get up and  
running with a non-trivial installation.

+--------------------------------------------------------+
  | Matthew Runo
  | Zappos Development
  | [hidden email]
  | 702-943-7833
+--------------------------------------------------------+


On Oct 2, 2007, at 5:53 PM, Norberto Meijome wrote:

> On Tue, 02 Oct 2007 15:26:33 -0700
> Walter Underwood <[hidden email]> wrote:
>
>> Here at Netflix, we switched over our site search to Solr two  
>> weeks ago.
>> We've seen zero problems with the server. We average 1.2 million
>> queries/day on a 250K item index. We're running four Solr servers
>> with simple round-robin HTTP load-sharing.
>
> Hi Walter,
> would you mind sharing hardware specs, OS, index size, VM settings,  
> OS specific tunings ?
>
> unless that will be added to the wiki... :)
>
> thanks in advance,
> B
>
> _________________________
> {Beto|Norberto|Numard} Meijome
>
> "Have the courage to take your own thoughts
> seriously, for they will shape you."
>    Albert Einstein
>
> I speak for myself, not my employer. Contents may be hot. Slippery  
> when wet. Reading disclaimers makes you go blind. Writing them is  
> worse. You have been Warned.
>

Reply | Threaded
Open this post in threaded view
|

Re: Solr live at Netflix

Otis Gospodnetic-2
In reply to this post by Walter Underwood, Netflix
I'm curious about this one.  I'm assuming Porter stemmer would stem Gamers and Gamera to the same stem (Game?).  If the stems are different, which stemmer are you using?  A smarter custom morphological stemmer?

Thanks,
Otis

----- Original Message ----
From: Tom Hill <[hidden email]>
To: [hidden email]
Sent: Tuesday, October 2, 2007 8:16:18 PM
Subject: Re: Solr live at Netflix

Nice!

And there seem to be some improvements. For example, "Gamers" and "Gamera"
no longer stem to the same word :-)

Tom

On 10/2/07, Walter Underwood <[hidden email]> wrote:

>
> Here at Netflix, we switched over our site search to Solr two weeks ago.
> We've seen zero problems with the server. We average 1.2 million
> queries/day on a 250K item index. We're running four Solr servers
> with simple round-robin HTTP load-sharing.
>
> This is all on 1.1. I've been too busy tuning to upgrade.
>
> Thanks everyone, this is a great piece of software.
>
> wunder
> --
> Walter Underwood
> Search Guy, Netflix
>
>



Reply | Threaded
Open this post in threaded view
|

RE: Solr live at Netflix

Wagner,Harry
Otis,
Take a look at KStem:
http://ciir.cs.umass.edu/cgi-bin/downloads/downloads.cgi  It's less
aggressive than Porter.  I modified the Lucene version to work with
Solr, but don't know if it was adopted into the Solr source.  Let me
know if you are interested and I'll send you a jar file.

Cheers!
harry

-----Original Message-----
From: Otis Gospodnetic [mailto:[hidden email]]
Sent: Thursday, October 04, 2007 10:36 AM
To: [hidden email]
Subject: Re: Solr live at Netflix

I'm curious about this one.  I'm assuming Porter stemmer would stem
Gamers and Gamera to the same stem (Game?).  If the stems are different,
which stemmer are you using?  A smarter custom morphological stemmer?

Thanks,
Otis

----- Original Message ----
From: Tom Hill <[hidden email]>
To: [hidden email]
Sent: Tuesday, October 2, 2007 8:16:18 PM
Subject: Re: Solr live at Netflix

Nice!

And there seem to be some improvements. For example, "Gamers" and
"Gamera"
no longer stem to the same word :-)

Tom

On 10/2/07, Walter Underwood <[hidden email]> wrote:
>
> Here at Netflix, we switched over our site search to Solr two weeks
ago.

> We've seen zero problems with the server. We average 1.2 million
> queries/day on a 250K item index. We're running four Solr servers
> with simple round-robin HTTP load-sharing.
>
> This is all on 1.1. I've been too busy tuning to upgrade.
>
> Thanks everyone, this is a great piece of software.
>
> wunder
> --
> Walter Underwood
> Search Guy, Netflix
>
>



Reply | Threaded
Open this post in threaded view
|

Re: Solr live at Netflix

Walter Underwood, Netflix
In reply to this post by Otis Gospodnetic-2
Gamera and Gamers do not stem to the same word, but the old Netflix
engine did conflate those two words. The Metaphones for those are
KMR and KMRS, respectively, and the old engine did fuzzy matching
on Metaphones, something I don't recommend. It also matched "skiing"
to "sings".

wunder

On 10/4/07 7:35 AM, "Otis Gospodnetic" <[hidden email]> wrote:

> I'm curious about this one.  I'm assuming Porter stemmer would stem Gamers and
> Gamera to the same stem (Game?).  If the stems are different, which stemmer
> are you using?  A smarter custom morphological stemmer?
>
> Thanks,
> Otis
>
> ----- Original Message ----
> From: Tom Hill <[hidden email]>
> To: [hidden email]
> Sent: Tuesday, October 2, 2007 8:16:18 PM
> Subject: Re: Solr live at Netflix
>
> Nice!
>
> And there seem to be some improvements. For example, "Gamers" and "Gamera"
> no longer stem to the same word :-)
>
> Tom
>
> On 10/2/07, Walter Underwood <[hidden email]> wrote:
>>
>> Here at Netflix, we switched over our site search to Solr two weeks ago.
>> We've seen zero problems with the server. We average 1.2 million
>> queries/day on a 250K item index. We're running four Solr servers
>> with simple round-robin HTTP load-sharing.
>>
>> This is all on 1.1. I've been too busy tuning to upgrade.
>>
>> Thanks everyone, this is a great piece of software.
>>
>> wunder
>> --
>> Walter Underwood
>> Search Guy, Netflix


Reply | Threaded
Open this post in threaded view
|

RE: question about bi-gram analysis on query

T. Kuro Kurosaka
In reply to this post by Keene, David
Hello David,
> And if I do a search in Luke and the solr analysis page
> for美聯, I get a hit.  But on the actual search, I don't.

I think you need to tell us what you mean by "actual search"
and your code that interfaces with Solr.

-kuro
Reply | Threaded
Open this post in threaded view
|

RE: question about bi-gram analysis on query

Keene, David
Hi,

Thanks for responding.  I should have been clearer..

By "actual search" I meant hitting the search demo page on the solr admin page.  So I get no results on this query:

/solr/select/?q=%E7%BE%8E%E8%81%AF&version=2.2&start=0&rows=10&indent=on

But the same query (with the data in my index) on the analysis page shows me a hit (and the same search in Luke gets me a hit too).

I've tried this on 1.1, 1.2 and nightly as of yesterday. I assume that I am missing something really obvious..

-Dave


-----Original Message-----
From: Teruhiko Kurosaka [mailto:[hidden email]]
Sent: Thursday, October 04, 2007 12:44 PM
To: Keene, David
Cc: [hidden email]
Subject: RE: question about bi-gram analysis on query

Hello David,
> And if I do a search in Luke and the solr analysis page
> for美聯, I get a hit.  But on the actual search, I don't.

I think you need to tell us what you mean by "actual search"
and your code that interfaces with Solr.

-kuro
Reply | Threaded
Open this post in threaded view
|

Re: question about bi-gram analysis on query

Otis Gospodnetic-2
In reply to this post by Keene, David
Dave,

Have you tried using ....&debugQuery=true ? :)

Otis
 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Simpy -- http://www.simpy.com/  -  Tag  -  Search  -  Share

----- Original Message ----
From: "Keene, David" <[hidden email]>
To: Teruhiko Kurosaka <[hidden email]>
Cc: [hidden email]
Sent: Thursday, October 4, 2007 4:44:59 PM
Subject: RE: question about bi-gram analysis on query

Hi,

Thanks for responding.  I should have been clearer..

By "actual search" I meant hitting the search demo page on the solr admin page.  So I get no results on this query:

/solr/select/?q=%E7%BE%8E%E8%81%AF&version=2.2&start=0&rows=10&indent=on

But the same query (with the data in my index) on the analysis page shows me a hit (and the same search in Luke gets me a hit too).

I've tried this on 1.1, 1.2 and nightly as of yesterday. I assume that I am missing something really obvious..

-Dave


-----Original Message-----
From: Teruhiko Kurosaka [mailto:[hidden email]]
Sent: Thursday, October 04, 2007 12:44 PM
To: Keene, David
Cc: [hidden email]
Subject: RE: question about bi-gram analysis on query

Hello David,
> And if I do a search in Luke and the solr analysis page
> for美聯, I get a hit.  But on the actual search, I don't.

I think you need to tell us what you mean by "actual search"
and your code that interfaces with Solr.

-kuro



Reply | Threaded
Open this post in threaded view
|

RE: question about bi-gram analysis on query

Keene, David
Hi Otis!

Yes, I've run the query though debugQuery=yes.  I posted the difference between the debug output and the Luke parsed query in my original post.  here's a snippet:

>>>>>>>>
Also, I've noticed that the parsed query on luke is:
text:"美聯 聯社"
and in solr it is:
text:"美聯 聯社 "
I noticed there is an extra space in the solr parsed query.  I don't know if that makes a difference.
>>>>>>>>>

Like I said, the space in there is the only difference between my parsed query in Luke and the debug query output .  I even stepped though the response in solr with eclipse, and confirmed that the parsed query was tokenenized properly (bigram), but had that extra space in there.

On the analysis page, I see hits come up in the text fine, but nothing on a search from the main page.

Thanks,
Dave


-----Original Message-----
From: Otis Gospodnetic [mailto:[hidden email]]
Sent: Friday, October 05, 2007 11:51 PM
To: [hidden email]
Subject: Re: question about bi-gram analysis on query

Dave,

Have you tried using ....&debugQuery=true ? :)

Otis
 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Simpy -- http://www.simpy.com/  -  Tag  -  Search  -  Share

----- Original Message ----
From: "Keene, David" <[hidden email]>
To: Teruhiko Kurosaka <[hidden email]>
Cc: [hidden email]
Sent: Thursday, October 4, 2007 4:44:59 PM
Subject: RE: question about bi-gram analysis on query

Hi,

Thanks for responding.  I should have been clearer..

By "actual search" I meant hitting the search demo page on the solr admin page.  So I get no results on this query:

/solr/select/?q=%E7%BE%8E%E8%81%AF&version=2.2&start=0&rows=10&indent=on

But the same query (with the data in my index) on the analysis page shows me a hit (and the same search in Luke gets me a hit too).

I've tried this on 1.1, 1.2 and nightly as of yesterday. I assume that I am missing something really obvious..

-Dave


-----Original Message-----
From: Teruhiko Kurosaka [mailto:[hidden email]]
Sent: Thursday, October 04, 2007 12:44 PM
To: Keene, David
Cc: [hidden email]
Subject: RE: question about bi-gram analysis on query

Hello David,
> And if I do a search in Luke and the solr analysis page
> for美聯, I get a hit.  But on the actual search, I don't.

I think you need to tell us what you mean by "actual search"
and your code that interfaces with Solr.

-kuro



Reply | Threaded
Open this post in threaded view
|

Merging multiple Solr Indexes

Ycrux
Hi guys!

Is there a simple way (or command line tool)
to merge different Solr indexes (located on different machines)
into one ?

cheers
Y.
Reply | Threaded
Open this post in threaded view
|

Re: Merging multiple Solr Indexes

hossman

: Is there a simple way (or command line tool)
: to merge different Solr indexes (located on different machines)
: into one ?

Solr doesn't have anything like this built in, but there is a command line
java class in Lucene that can do this, it's
org.apache.lucene.misc.IndexMergeTool in the miscelleneous contrib.

PS...

http://people.apache.org/~hossman/#threadhijack

When starting a new discussion on a mailing list, please do not reply to
an existing message, instead start a fresh email.  Even if you change the
subject line of your email, other mail headers still track which thread
you replied to and your question is "hidden" in that thread and gets less
attention.   It makes following discussions in the mailing list archives
particularly difficult.
See Also:  http://en.wikipedia.org/wiki/Thread_hijacking



-Hoss

Reply | Threaded
Open this post in threaded view
|

Re: Merging multiple Solr Indexes

Ycrux
Hi Chris !

Thanks for the pointer. After two silent days waiting for reply,
I decided to  implement a command line for that. Works like a charm !!!
If anyone is interested, don't hesitate to ask for.

cheers
Y.

Chris Hostetter a écrit :

> : Is there a simple way (or command line tool)
> : to merge different Solr indexes (located on different machines)
> : into one ?
>
> Solr doesn't have anything like this built in, but there is a command line
> java class in Lucene that can do this, it's
> org.apache.lucene.misc.IndexMergeTool in the miscelleneous contrib.
>
> PS...
>
> http://people.apache.org/~hossman/#threadhijack
>
> When starting a new discussion on a mailing list, please do not reply to
> an existing message, instead start a fresh email.  Even if you change the
> subject line of your email, other mail headers still track which thread
> you replied to and your question is "hidden" in that thread and gets less
> attention.   It makes following discussions in the mailing list archives
> particularly difficult.
> See Also:  http://en.wikipedia.org/wiki/Thread_hijacking
>
>
>
> -Hoss
>
>  
>
>  

Reply | Threaded
Open this post in threaded view
|

Re: Merging multiple Solr Indexes

hossman

: Thanks for the pointer. After two silent days waiting for reply,
: I decided to  implement a command line for that. Works like a charm !!!

well, sometimes people just don't post because they don't know the
answer to something (better then 50 people posting "i don't know").

but a big part of hte problem has to do with the issue i mentioned in the
PS.  When you post an offtopic reply to an existing thread, (even if you
change the subject)  many people don't notice it -- because they've
already read some of the messages and decided the thread topic doesn't
interest them...


: > http://people.apache.org/~hossman/#threadhijack
: >
: > When starting a new discussion on a mailing list, please do not reply to an
: > existing message, instead start a fresh email.  Even if you change the
: > subject line of your email, other mail headers still track which thread you
: > replied to and your question is "hidden" in that thread and gets less
: > attention.   It makes following discussions in the mailing list archives
: > particularly difficult.
: > See Also:  http://en.wikipedia.org/wiki/Thread_hijacking



-Hoss

Reply | Threaded
Open this post in threaded view
|

Re: Merging multiple Solr Indexes

jjlarrea
At 9:51 PM -0700 10/7/07, Chris Hostetter wrote:
>: Thanks for the pointer. After two silent days waiting for reply,
>: I decided to  implement a command line for that. Works like a charm !!!
>
>well, sometimes people just don't post because they don't know the
>answer to something (better then 50 people posting "i don't know").

And then there's the case of people who intend to respond and sometimes even take the time to start on a response, but then get sidetracked by other things, and by the time they get around to hitting "send" the point has already been mooted by other posters... there's so much useful and quick commentary on solr-user one has to be quick!

Thus in light of your having given up and written your own merger, and Hoss' mention of the built-in org.apache.lucene.misc.IndexMergeTool (which I wish I'd known about), I didn't send the following, which I'm only sending now in case someone else might find BeanShell a useful tool for rapid prototyping, and with apologies for cluttering the list with something which is technically now off-topic.

- J.J.

>At 1:23 PM +0200 10/6/07, Ycrux wrote:
>>Is there a simple way (or command line tool)
>>to merge different Solr indexes (located on different machines)
>>into one ?
>
>Perhaps someone else can think of a way to do this that capitalizes on Solr-specific features, but it can certainly be done using standard Lucene calls if the remote index is accessible as a filesystem mount, e.g. via NFS.
>
>I have a lot of little Lucene helper scripts written in BeanShell (available and documented at http://www.beanshell.org/) to save the bother of figuring out where to put the classes and get them in my classpath; while BSH is interpreted, since all of the work is done in Lucene code there's no performance issue.  All you need is a bsh.jar and a lucene.jar in the classpath.
>
>----- merge.bsh -----
>
>#!/usr/bin/java bsh.Interpreter
>
>import org.apache.lucene.index.IndexReader;
>import org.apache.lucene.index.IndexWriter;
>
>
>if( bsh.args.length < 2 ) {
>        print( "Usage: Merge [-create] <dest-index> <src-index> [ <src-index2> ... ]" );
>        return(-1);
>}
>
>int argnum = 0;
>
>boolean create = false;
>
>if( "-create".equals(bsh.args[argnum]) ) {
>    create = true; ++argnum;
>}
>
>String dstName = bsh.args[argnum++];
>
>java.util.ArrayList readerList = new java.util.ArrayList();
>
>while( argnum < bsh.args.length ) {
>    String srcName=bsh.args[argnum++];
>    IndexReader reader = IndexReader.open(srcName);
>    print( srcName + ":\t" + reader.numDocs() + " documents");
>    readerList.add( reader );
>}
>
>IndexReader[] readerArray = new IndexReader[ readerList.size() ];
>for( int i = 0; i < readerArray.length; i++ )
>        readerArray[i] = (IndexReader)readerList.get(i);
>readerList = null;
>
>IndexWriter writer = null;
>
>try {
>        print( (create ? "Creating" : "Opening") + dstName + " for merge");
>        writer = new IndexWriter(dstName, new StandardAnalyzer(), create);
>        if( readerArray.length > 0 ) {
>                t0 = System.currentTimeMillis();
>                c0 = writer.docCount();
>                print( dstName + ":\t" + c0 + " documents");
>                writer.addIndexes( readerArray );
>                t1 = System.currentTimeMillis();
>                c1 = writer.docCount();
>                print( "Index " + dstName + " went from " + c0 + " to " + c1 + " (" + (c1 - c0) + ") documents in "
>                        + (t1 - t0)/1000.0 + "sec" + " (e.g. " + ((t1 - t0) / ((c1 - c0)*1.0)) + " millisec each" );
>        }
>}
>catch( Exception ex ) {
>        ex.printStackTrace();
>}
>finally {
>        if( writer != null )
>                writer.close();
>        for( int i = 0; i < readerArray.length; i++ )
>            if( readerArray[i] != null )
>                        readerArray[i].close();
>}
>
>----- /merge.bsh -----
>
>Also note that it is often much faster to merge n indexes into a new empty index (e.g. with the -create option) than to merge n-1 indexes into an existing index, due to to the pre- and post-optimizations that addIndexes does.
>
>- J.J.
Reply | Threaded
Open this post in threaded view
|

Re: Merging multiple Solr Indexes

Ycrux
Seems good. Thanks

cheers
Y.

J.J. Larrea a écrit :

> At 9:51 PM -0700 10/7/07, Chris Hostetter wrote:
>  
>> : Thanks for the pointer. After two silent days waiting for reply,
>> : I decided to  implement a command line for that. Works like a charm !!!
>>
>> well, sometimes people just don't post because they don't know the
>> answer to something (better then 50 people posting "i don't know").
>>    
>
> And then there's the case of people who intend to respond and sometimes even take the time to start on a response, but then get sidetracked by other things, and by the time they get around to hitting "send" the point has already been mooted by other posters... there's so much useful and quick commentary on solr-user one has to be quick!
>
> Thus in light of your having given up and written your own merger, and Hoss' mention of the built-in org.apache.lucene.misc.IndexMergeTool (which I wish I'd known about), I didn't send the following, which I'm only sending now in case someone else might find BeanShell a useful tool for rapid prototyping, and with apologies for cluttering the list with something which is technically now off-topic.
>
> - J.J.
>
>  
>> At 1:23 PM +0200 10/6/07, Ycrux wrote:
>>    
>>> Is there a simple way (or command line tool)
>>> to merge different Solr indexes (located on different machines)
>>> into one ?
>>>      
>> Perhaps someone else can think of a way to do this that capitalizes on Solr-specific features, but it can certainly be done using standard Lucene calls if the remote index is accessible as a filesystem mount, e.g. via NFS.
>>
>> I have a lot of little Lucene helper scripts written in BeanShell (available and documented at http://www.beanshell.org/) to save the bother of figuring out where to put the classes and get them in my classpath; while BSH is interpreted, since all of the work is done in Lucene code there's no performance issue.  All you need is a bsh.jar and a lucene.jar in the classpath.
>>
>> ----- merge.bsh -----
>>
>> #!/usr/bin/java bsh.Interpreter
>>
>> import org.apache.lucene.index.IndexReader;
>> import org.apache.lucene.index.IndexWriter;
>>
>>
>> if( bsh.args.length < 2 ) {
>>        print( "Usage: Merge [-create] <dest-index> <src-index> [ <src-index2> ... ]" );
>>        return(-1);
>> }
>>
>> int argnum = 0;
>>
>> boolean create = false;
>>
>> if( "-create".equals(bsh.args[argnum]) ) {
>>    create = true; ++argnum;
>> }
>>
>> String dstName = bsh.args[argnum++];
>>
>> java.util.ArrayList readerList = new java.util.ArrayList();
>>
>> while( argnum < bsh.args.length ) {
>>    String srcName=bsh.args[argnum++];
>>    IndexReader reader = IndexReader.open(srcName);
>>    print( srcName + ":\t" + reader.numDocs() + " documents");
>>    readerList.add( reader );
>> }
>>
>> IndexReader[] readerArray = new IndexReader[ readerList.size() ];
>> for( int i = 0; i < readerArray.length; i++ )
>>        readerArray[i] = (IndexReader)readerList.get(i);
>> readerList = null;
>>
>> IndexWriter writer = null;
>>
>> try {
>>        print( (create ? "Creating" : "Opening") + dstName + " for merge");
>>        writer = new IndexWriter(dstName, new StandardAnalyzer(), create);
>>        if( readerArray.length > 0 ) {
>>                t0 = System.currentTimeMillis();
>>                c0 = writer.docCount();
>>                print( dstName + ":\t" + c0 + " documents");
>>                writer.addIndexes( readerArray );
>>                t1 = System.currentTimeMillis();
>>                c1 = writer.docCount();
>>                print( "Index " + dstName + " went from " + c0 + " to " + c1 + " (" + (c1 - c0) + ") documents in "
>>                        + (t1 - t0)/1000.0 + "sec" + " (e.g. " + ((t1 - t0) / ((c1 - c0)*1.0)) + " millisec each" );
>>        }
>> }
>> catch( Exception ex ) {
>>        ex.printStackTrace();
>> }
>> finally {
>>        if( writer != null )
>>                writer.close();
>>        for( int i = 0; i < readerArray.length; i++ )
>>            if( readerArray[i] != null )
>>                        readerArray[i].close();
>> }
>>
>> ----- /merge.bsh -----
>>
>> Also note that it is often much faster to merge n indexes into a new empty index (e.g. with the -create option) than to merge n-1 indexes into an existing index, due to to the pre- and post-optimizations that addIndexes does.
>>
>> - J.J.
>>    
>  
>
>