Solr vs Sphinx

classic Classic list List threaded Threaded
20 messages Options
Reply | Threaded
Open this post in threaded view
|

Solr vs Sphinx

wojtekpia
I came across this article praising Sphinx: http://www.theregister.co.uk/2009/05/08/dziuba_sphinx/. The article specifically mentions Solr as an 'aging' technology, and states that performance on Sphinx is 2x-4x faster than Solr. Has anyone compared Sphinx to Solr? Or used Sphinx in the past? I realize that you can't just say one is faster than the other because it depends so much on configuration, requirements, # docs, size of each doc, etc. I'm just looking for general observations. I've found other articles comparing Solr with Sphinx and most state that performance is similar between the two.

Thanks,

Wojtek
Reply | Threaded
Open this post in threaded view
|

Re: Solr vs Sphinx

Yonik Seeley-2-2
It's probably the case that every search engine out there is faster
than Solr at one thing or another, and that Solr is faster or better
at some other things.

I prefer to spend my time improving Solr rather than engage in
benchmarking wars... and Solr 1.4 will have a ton of speed
improvements over Solr 1.3.

-Yonik
http://www.lucidimagination.com
Reply | Threaded
Open this post in threaded view
|

Re: Solr vs Sphinx

Grant Ingersoll-2
In reply to this post by wojtekpia

On May 13, 2009, at 11:55 AM, wojtekpia wrote:

>
> I came across this article praising Sphinx:
> http://www.theregister.co.uk/2009/05/08/dziuba_sphinx/. The article
> specifically mentions Solr as an 'aging' technology,

Solr is the same age as Sphinx (2006), so if Solr is aging, then so is  
Sphinx.  But, hey aren't we all aging?  It sure beats not aging.  ;-)  
That being said, we are always open to suggestions and improvements.  
Lucene has seen a massive speedup on indexing that comes through in  
Solr in the past year (and it was fast before), and Solr 1.4 looks to  
be faster than 1.3 (and it was fast before, too.)  The Solr community  
is clearly interested in moving things forward and staying fresh, as  
is the Lucene community.

> and states that
> performance on Sphinx is 2x-4x faster than Solr. Has anyone compared  
> Sphinx
> to Solr? Or used Sphinx in the past? I realize that you can't just  
> say one
> is faster than the other because it depends so much on configuration,
> requirements, # docs, size of each doc, etc. I'm just looking for  
> general
> observations. I've found other articles comparing Solr with Sphinx  
> and most
> state that performance is similar between the two.

I can't speak to Sphinx, as I haven't used it.

As for performance tests, those are always apples and oranges.  If one  
camp does them, then the other camp says "You don't know how to use  
our product" and vice versa.  I think that applies here.  So, when you  
see things like "Internal tests show" that is always a red flag in my  
mind.  I've contacted others in the past who have done "comparisons"  
and after one round of emailing it was almost always clear that they  
didn't know what best practices are for any given product and thus  
were doing things sub-optimally.

One thing in the article that is worthwhile to consider is the fact  
that some (most?) people would likely benefit from not removing  
stopwords, as they can enhance phrase based searching and thus improve  
relevance.  Obviously, with Solr, it is easy to keep stopwords by  
simply removing the StopwordFilterFactor from the analysis process and  
then dealing with them appropriately at query time.  However, it is  
likely the case that too many Solr users simply rely on the example  
schema when it comes to setup instead of actively investigating what  
the proper choices are for their situation.

Finally, an old baseball saying comes to mind: "Pitchers only bother  
to throw at .300 hitters".  Solr is a pretty darn full featured search  
platform with a large and active community, a commercial friendly  
license, and it also performs quite well.

-Grant
Reply | Threaded
Open this post in threaded view
|

Re: Solr vs Sphinx

tbenge
Our company has a large search deployment serving > 50 M search hits / per
day.

We've been leveraging Lucene for several years and have recently deployed
Solr for the distributed search feature.  We were hitting scaling limits
with lucene due to our index size.

I did an evaluation of Sphinx and found Solr / Lucene to be more suitable
for our needs and much more flexible.  Performance in the Solr deployment (
especially with 1.4) has been better than expected.

Thanks to all the Solr developers for a great product.

Hopefully we'll have the opportunity to contribute to the project as it
moves forward.

Todd

On Wed, May 13, 2009 at 10:33 AM, Grant Ingersoll <[hidden email]>wrote:

>
> On May 13, 2009, at 11:55 AM, wojtekpia wrote:
>
>
>> I came across this article praising Sphinx:
>> http://www.theregister.co.uk/2009/05/08/dziuba_sphinx/. The article
>> specifically mentions Solr as an 'aging' technology,
>>
>
> Solr is the same age as Sphinx (2006), so if Solr is aging, then so is
> Sphinx.  But, hey aren't we all aging?  It sure beats not aging.  ;-)  That
> being said, we are always open to suggestions and improvements.  Lucene has
> seen a massive speedup on indexing that comes through in Solr in the past
> year (and it was fast before), and Solr 1.4 looks to be faster than 1.3 (and
> it was fast before, too.)  The Solr community is clearly interested in
> moving things forward and staying fresh, as is the Lucene community.
>
>  and states that
>> performance on Sphinx is 2x-4x faster than Solr. Has anyone compared
>> Sphinx
>> to Solr? Or used Sphinx in the past? I realize that you can't just say one
>> is faster than the other because it depends so much on configuration,
>> requirements, # docs, size of each doc, etc. I'm just looking for general
>> observations. I've found other articles comparing Solr with Sphinx and
>> most
>> state that performance is similar between the two.
>>
>
> I can't speak to Sphinx, as I haven't used it.
>
> As for performance tests, those are always apples and oranges.  If one camp
> does them, then the other camp says "You don't know how to use our product"
> and vice versa.  I think that applies here.  So, when you see things like
> "Internal tests show" that is always a red flag in my mind.  I've contacted
> others in the past who have done "comparisons" and after one round of
> emailing it was almost always clear that they didn't know what best
> practices are for any given product and thus were doing things
> sub-optimally.
>
> One thing in the article that is worthwhile to consider is the fact that
> some (most?) people would likely benefit from not removing stopwords, as
> they can enhance phrase based searching and thus improve relevance.
>  Obviously, with Solr, it is easy to keep stopwords by simply removing the
> StopwordFilterFactor from the analysis process and then dealing with them
> appropriately at query time.  However, it is likely the case that too many
> Solr users simply rely on the example schema when it comes to setup instead
> of actively investigating what the proper choices are for their situation.
>
> Finally, an old baseball saying comes to mind: "Pitchers only bother to
> throw at .300 hitters".  Solr is a pretty darn full featured search platform
> with a large and active community, a commercial friendly license, and it
> also performs quite well.
>
> -Grant
>
Reply | Threaded
Open this post in threaded view
|

Re: Solr vs Sphinx

Michael McCandless-2
In reply to this post by Grant Ingersoll-2
On Wed, May 13, 2009 at 12:33 PM, Grant Ingersoll <[hidden email]> wrote:
> I've contacted
> others in the past who have done "comparisons" and after one round of
> emailing it was almost always clear that they didn't know what best
> practices are for any given product and thus were doing things
> sub-optimally.

While I agree, one should properly match & tune all apps they are
testing (for a fair comparison), we in turn must set out-of-the-box
defaults (in Lucene and Solr) that get you as close to the "best
practices" as possible.

We don't always do that, and I think we should do better.

My most recent example of this is BooleanQuery's performance.  It
turns out, if you setAllowDocsOutOfOrder(true), it yields a sizable
performance gain (27% on my most recent test) for OR queries.

So why haven't we enabled this by default, already?  (As far as I can
tell it's functionally equivalent, as long as the Collector can accept
out-of-order docs, which our core collectors can).

We can't expect the "other camp" to discover that this obscure setting
must be set, to maximize Lucene's OR query performance.

Mike
Reply | Threaded
Open this post in threaded view
|

Re: Solr vs Sphinx

Andrew Klochkov
This post was updated on .
CONTENTS DELETED
The author has deleted this message.
Reply | Threaded
Open this post in threaded view
|

Re: Solr vs Sphinx

Grant Ingersoll-2
In reply to this post by Michael McCandless-2
Totally agree on optimizing out of the box experience, it's just never  
a one size fits all thing.  And we have to be very careful about micro-
benchmarks driving these settings.  Currently, many of us use  
Wikipedia, but that's just one doc set and I'd venture to say most  
Solr users do not have docs that look anything like Wikipedia.  One of  
the things the Open Relevance project (http://wiki.apache.org/lucene-java/OpenRelevance 
, see the discussion on [hidden email]) should aim to do is bring  
in a variety of test collections, from lots of different genres.  This  
will help both with relevance and with speed testing.

-Grant

On May 14, 2009, at 6:47 AM, Michael McCandless wrote:

> On Wed, May 13, 2009 at 12:33 PM, Grant Ingersoll  
> <[hidden email]> wrote:
>> I've contacted
>> others in the past who have done "comparisons" and after one round of
>> emailing it was almost always clear that they didn't know what best
>> practices are for any given product and thus were doing things
>> sub-optimally.
>
> While I agree, one should properly match & tune all apps they are
> testing (for a fair comparison), we in turn must set out-of-the-box
> defaults (in Lucene and Solr) that get you as close to the "best
> practices" as possible.
>
> We don't always do that, and I think we should do better.
>
> My most recent example of this is BooleanQuery's performance.  It
> turns out, if you setAllowDocsOutOfOrder(true), it yields a sizable
> performance gain (27% on my most recent test) for OR queries.
>
> So why haven't we enabled this by default, already?  (As far as I can
> tell it's functionally equivalent, as long as the Collector can accept
> out-of-order docs, which our core collectors can).
>
> We can't expect the "other camp" to discover that this obscure setting
> must be set, to maximize Lucene's OR query performance.
>
> Mike

--------------------------
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)  
using Solr/Lucene:
http://www.lucidimagination.com/search

Reply | Threaded
Open this post in threaded view
|

Re: Solr vs Sphinx

Marvin Humphrey
In reply to this post by Michael McCandless-2
On Thu, May 14, 2009 at 06:47:01AM -0400, Michael McCandless wrote:
> While I agree, one should properly match & tune all apps they are
> testing (for a fair comparison), we in turn must set out-of-the-box
> defaults (in Lucene and Solr) that get you as close to the "best
> practices" as possible.

So, should Lucene use the non-compound file format by default because some
idiot's sloppy benchmarks might run a smidge faster, even though that will
cause many users to run out of file descriptors?

Anyone doing comparative benchmarking who doesn't submit their code to the
support list for the software under review is either a dolt or a propagandist.

Good benchmarking is extremely difficult, like all experimental science.  If
there isn't ample evidence that the benchmarker appreciates that, their tests
aren't worth a second thought.  If you don't avail yourself of the help of
experts when assembling your experiment, you are unserious.

Richard Feynman:

    "...if you're doing an experiment, you should report everything that you
    think might make it invalid - not only what you think is right about it:
    other causes that could possibly explain your results; and things you
    thought of that you've eliminated by some other experiment, and how they
    worked - to make sure the other fellow can tell they have been eliminated."

Marvin Humphrey

Reply | Threaded
Open this post in threaded view
|

Re: Solr vs Sphinx

Michael McCandless-2
In reply to this post by Andrew Klochkov
On Thu, May 14, 2009 at 6:51 AM, Andrey Klochkov
<[hidden email]> wrote:

> Can you please point me to some information concerning allowDocsOutOfOrder?
> What's this at all?

There is this cryptic static setter (in Lucene):

  BooleanQuery.setAllowDocsOutOfOrder(boolean)

It defaults to false, which means BooleanScorer2 will always be used
to compute hits for a BooleanQuery.  When set to true, BooleanScorer
will instead be used, when possible.  BooleanScorer gets better
performance, but it collects docs out of order, which for some
external collectors might cause a problem.

All of Lucene's core collectors work fine with out-of-order collection
(but I'm not sure about Solr's collectors).

If you experiment with this, please post back with your results!

Mike
Reply | Threaded
Open this post in threaded view
|

Re: Solr vs Sphinx

Gerald
In reply to this post by Yonik Seeley-2-2

Yonik Seeley-2 wrote
It's probably the case that every search engine out there is faster
than Solr at one thing or another, and that Solr is faster or better
at some other things.

I prefer to spend my time improving Solr rather than engage in
benchmarking wars... and Solr 1.4 will have a ton of speed
improvements over Solr 1.3.

-Yonik
http://www.lucidimagination.com
Solr is very fast even with 1.3 and the developers have done an incredible job.

However, maybe the next Solr improvement should be the creation of a configuration manager and/or automated tuning tool.  I know that optimizing Solr performance can be time consuming and sometimes frustrating.

Reply | Threaded
Open this post in threaded view
|

Re: Solr vs Sphinx

Michael McCandless-2
In reply to this post by Marvin Humphrey
On Thu, May 14, 2009 at 9:07 AM, Marvin Humphrey <[hidden email]> wrote:
> Richard Feynman:
>
>    "...if you're doing an experiment, you should report everything that you
>    think might make it invalid - not only what you think is right about it:
>    other causes that could possibly explain your results; and things you
>    thought of that you've eliminated by some other experiment, and how they
>    worked - to make sure the other fellow can tell they have been eliminated."

Excellent quote!

> So, should Lucene use the non-compound file format by default because some
> idiot's sloppy benchmarks might run a smidge faster, even though that will
> cause many users to run out of file descriptors?

No, I don't think we should change that default.

Nor (for example) can we switch to SweetSpotSimilarity by default,
even though it seems to improve relevance, because it requires
app-dependent configuration.

Nor should we set IndexWriter's RAM buffer to 1 GB.  Etc.

But when there is a choice that has near zero downside and improves
performance (like my example), we should make the switch.

Making IndexReader.open return a readOnly reader is another example
(... which we plan to do in 3.0).

Every time Lucene or Solr has a default built-in setting, we should
think carefully about how to set it.

> Anyone doing comparative benchmarking who doesn't submit their code to the
> support list for the software under review is either a dolt or a propagandist.
>
> Good benchmarking is extremely difficult, like all experimental science.  If
> there isn't ample evidence that the benchmarker appreciates that, their tests
> aren't worth a second thought.  If you don't avail yourself of the help of
> experts when assembling your experiment, you are unserious.

Agreed.

Mike
Reply | Threaded
Open this post in threaded view
|

Re: Solr vs Sphinx

Mike Klaas
In reply to this post by Gerald

On 14-May-09, at 9:46 AM, gdeconto wrote:
>
> Solr is very fast even with 1.3 and the developers have done an  
> incredible
> job.
>
> However, maybe the next Solr improvement should be the creation of a
> configuration manager and/or automated tuning tool.  I know that  
> optimizing
> Solr performance can be time consuming and sometimes frustrating.

"Making Solr more self-service" has been a theme we have had and  
should strive to move toward.  In some respects, extreme  
configurability is a liability, if considerable tweaking and  
experimentation is needed to achieve optimum results.  You can't  
expect everyone to put in the investment to develop the expertise.

That said, it is very difficult to come up with appropriate auto-
tuning heuristics that don't fail.  It almost calls for a level higher  
than Solr that you could hint what you want to do with the field  
(sort, facet, etc.), and it makes the field definitions  
appropriately.  The problem with such abstractions is that they are  
invariably leaky, and thus diagnosing problems requires similar  
expertise as omitting the abstraction step in the first place.

Getting this trade-off right is one of the central problems of  
computer science.

-Mike
Reply | Threaded
Open this post in threaded view
|

Re: Solr vs Sphinx

Mark Miller-3
In reply to this post by Michael McCandless-2
Michael McCandless wrote:
> So why haven't we enabled this by default, already?
Why isn't Lucene done already :)

- Mark


Reply | Threaded
Open this post in threaded view
|

Re: Solr vs Sphinx

Michael McCandless-2
On Thu, May 14, 2009 at 8:36 PM, Mark Miller <[hidden email]> wrote:
> Michael McCandless wrote:
>>
>> So why haven't we enabled this by default, already?
>
> Why isn't Lucene done already :)

I hear you :)

Mike
Reply | Threaded
Open this post in threaded view
|

Re: Solr vs Sphinx

Mark Miller-3
In reply to this post by Mike Klaas
In the spirit of good defaults:

I think we should change the Solr highlighter to highlight phrase
queries by default, as well as prefix,range,wildcard constantscore
queries. Its awkward to have to tell people you have to turn those on.
I'd certainly prefer to have to turn them off if I have some limitation
rather than on.

- Mark
Reply | Threaded
Open this post in threaded view
|

Re: Solr vs Sphinx

Eric Pugh-4
Something that would be interesting is to share solr configs for  
various types of indexing tasks.  From a solr configuration aimed at  
indexing web pages to one doing large amounts of text to one that  
indexes specific structured data.  I could see those being posted on  
the wiki and helping folks who say "I want to do X, is there an  
example?".

I think most folks start with the example Solr install and tweak from  
there, which probably isn't the best path...

Eric

On May 15, 2009, at 8:09 AM, Mark Miller wrote:

> In the spirit of good defaults:
>
> I think we should change the Solr highlighter to highlight phrase  
> queries by default, as well as prefix,range,wildcard constantscore  
> queries. Its awkward to have to tell people you have to turn those  
> on. I'd certainly prefer to have to turn them off if I have some  
> limitation rather than on.
>
> - Mark

-----------------------------------------------------
Eric Pugh | Principal | OpenSource Connections, LLC | 434.466.1467 | http://www.opensourceconnections.com
Free/Busy: http://tinyurl.com/eric-cal




Reply | Threaded
Open this post in threaded view
|

Re: Solr vs Sphinx

Matthew Runo
I agree regarding posting different types of files - because right now  
if you're just starting out with Solr, taking the sample files from  
the distro and going from there is the /only path/ =\

Thanks for your time!

Matthew Runo
Software Engineer, Zappos.com
[hidden email] - 702-943-7833

On May 15, 2009, at 6:41 AM, Eric Pugh wrote:

> Something that would be interesting is to share solr configs for  
> various types of indexing tasks.  From a solr configuration aimed at  
> indexing web pages to one doing large amounts of text to one that  
> indexes specific structured data.  I could see those being posted on  
> the wiki and helping folks who say "I want to do X, is there an  
> example?".
>
> I think most folks start with the example Solr install and tweak  
> from there, which probably isn't the best path...
>
> Eric
>
> On May 15, 2009, at 8:09 AM, Mark Miller wrote:
>
>> In the spirit of good defaults:
>>
>> I think we should change the Solr highlighter to highlight phrase  
>> queries by default, as well as prefix,range,wildcard constantscore  
>> queries. Its awkward to have to tell people you have to turn those  
>> on. I'd certainly prefer to have to turn them off if I have some  
>> limitation rather than on.
>>
>> - Mark
>
> -----------------------------------------------------
> Eric Pugh | Principal | OpenSource Connections, LLC | 434.466.1467 | http://www.opensourceconnections.com
> Free/Busy: http://tinyurl.com/eric-cal
>
>
>
>

Reply | Threaded
Open this post in threaded view
|

Re: Solr vs Sphinx

Fergus McMenemie
In reply to this post by Eric Pugh-4
>Something that would be interesting is to share solr configs for  
>various types of indexing tasks.  From a solr configuration aimed at  
>indexing web pages to one doing large amounts of text to one that  
>indexes specific structured data.  I could see those being posted on  
>the wiki and helping folks who say "I want to do X, is there an  
>example?".
>
>I think most folks start with the example Solr install and tweak from  
>there, which probably isn't the best path...
>
>Eric

Yep a solr "cookbook" with lots of different example recipes. However
these would need to be very actively maintained to ensure they always
represented best practice. While using cocoon I made extensive use
of the examples section of the cocoon website. However most of the,
massive number of, examples represent obsolete cocoon practise. Or
there were four or five examples doing the same thing in different
ways with no text explaining the pros/cons of the different approaches.
This held me, as a newcomer, back and gave a bad impression of cocoon.

I was wondering about a performance hints page. I was caught by an
issue indexing CSV content where the use of &overwrite=false made
an almost 3x difference to my indexing speed. Still do not really
know why!

>
>On May 15, 2009, at 8:09 AM, Mark Miller wrote:
>
>> In the spirit of good defaults:
>>
>> I think we should change the Solr highlighter to highlight phrase  
>> queries by default, as well as prefix,range,wildcard constantscore  
>> queries. Its awkward to have to tell people you have to turn those  
>> on. I'd certainly prefer to have to turn them off if I have some  
>> limitation rather than on.

Yep I agree, all whizzy new features should ideally be on by default
unless there is a significant performance penalty. It is not enough
that to issue a default solrconfig.xml with the feature on, it has to
be on by default inside the code.
 
>>
>> - Mark
>
>-----------------------------------------------------
>Eric Pugh | Principal | OpenSource Connections, LLC | 434.466.1467 | http://www.opensourceconnections.com
>Free/Busy: http://tinyurl.com/eric-cal

Fergus
Reply | Threaded
Open this post in threaded view
|

Re: Solr vs Sphinx

sunnyShiny06
Hi guys,

I work now for serveral month on solr and really you provide quick answer ... and you're very nice to work with.
But I've got huge issue that I couldn't fixe after lot of post.

My indexation take one two days to be done. For 8G of data indexed and 1,5M of docs (ok I've plenty of links in my table but it takes such a long time).

Second I've to do update every 20mn but every update represent maybe 20 000docs
and when I use the replication I must replicate all the new index folder optimized because Ive too much datas updated and too much segment needs to be generate and I have to merge datas. So I lost my cache and my CPU goes mad.

And I can't have more than 20request/sec.



Fergus McMenemie-2 wrote
>Something that would be interesting is to share solr configs for  
>various types of indexing tasks.  From a solr configuration aimed at  
>indexing web pages to one doing large amounts of text to one that  
>indexes specific structured data.  I could see those being posted on  
>the wiki and helping folks who say "I want to do X, is there an  
>example?".
>
>I think most folks start with the example Solr install and tweak from  
>there, which probably isn't the best path...
>
>Eric

Yep a solr "cookbook" with lots of different example recipes. However
these would need to be very actively maintained to ensure they always
represented best practice. While using cocoon I made extensive use
of the examples section of the cocoon website. However most of the,
massive number of, examples represent obsolete cocoon practise. Or
there were four or five examples doing the same thing in different
ways with no text explaining the pros/cons of the different approaches.
This held me, as a newcomer, back and gave a bad impression of cocoon.

I was wondering about a performance hints page. I was caught by an
issue indexing CSV content where the use of &overwrite=false made
an almost 3x difference to my indexing speed. Still do not really
know why!

>
>On May 15, 2009, at 8:09 AM, Mark Miller wrote:
>
>> In the spirit of good defaults:
>>
>> I think we should change the Solr highlighter to highlight phrase  
>> queries by default, as well as prefix,range,wildcard constantscore  
>> queries. Its awkward to have to tell people you have to turn those  
>> on. I'd certainly prefer to have to turn them off if I have some  
>> limitation rather than on.

Yep I agree, all whizzy new features should ideally be on by default
unless there is a significant performance penalty. It is not enough
that to issue a default solrconfig.xml with the feature on, it has to
be on by default inside the code.
 
>>
>> - Mark
>
>-----------------------------------------------------
>Eric Pugh | Principal | OpenSource Connections, LLC | 434.466.1467 | http://www.opensourceconnections.com
>Free/Busy: http://tinyurl.com/eric-cal

Fergus
Reply | Threaded
Open this post in threaded view
|

Re: Solr vs Sphinx

Otis Gospodnetic-2

Hi,

Could you please start a new thread?


Thanks,
Otis


----- Original Message ----

> From: sunnyfr <[hidden email]>
> To: [hidden email]
> Sent: Wednesday, June 3, 2009 10:20:06 AM
> Subject: Re: Solr vs Sphinx
>
>
> Hi guys,
>
> I work now for serveral month on solr and really you provide quick answer
> ... and you're very nice to work with.
> But I've got huge issue that I couldn't fixe after lot of post.
>
> My indexation take one two days to be done. For 8G of data indexed and 1,5M
> of docs (ok I've plenty of links in my table but it takes such a long time).
>
> Second I've to do update every 20mn but every update represent maybe 20
> 000docs
> and when I use the replication I must replicate all the new index folder
> optimized because Ive too much datas updated and too much segment needs to
> be generate and I have to merge datas. So I lost my cache and my CPU goes
> mad.
>
> And I can't have more than 20request/sec.
>
>
>
>
> Fergus McMenemie-2 wrote:
> >
> >>Something that would be interesting is to share solr configs for  
> >>various types of indexing tasks.  From a solr configuration aimed at  
> >>indexing web pages to one doing large amounts of text to one that  
> >>indexes specific structured data.  I could see those being posted on  
> >>the wiki and helping folks who say "I want to do X, is there an  
> >>example?".
> >>
> >>I think most folks start with the example Solr install and tweak from  
> >>there, which probably isn't the best path...
> >>
> >>Eric
> >
> > Yep a solr "cookbook" with lots of different example recipes. However
> > these would need to be very actively maintained to ensure they always
> > represented best practice. While using cocoon I made extensive use
> > of the examples section of the cocoon website. However most of the,
> > massive number of, examples represent obsolete cocoon practise. Or
> > there were four or five examples doing the same thing in different
> > ways with no text explaining the pros/cons of the different approaches.
> > This held me, as a newcomer, back and gave a bad impression of cocoon.
> >
> > I was wondering about a performance hints page. I was caught by an
> > issue indexing CSV content where the use of &overwrite=false made
> > an almost 3x difference to my indexing speed. Still do not really
> > know why!
> >
> >>
> >>On May 15, 2009, at 8:09 AM, Mark Miller wrote:
> >>
> >>> In the spirit of good defaults:
> >>>
> >>> I think we should change the Solr highlighter to highlight phrase  
> >>> queries by default, as well as prefix,range,wildcard constantscore  
> >>> queries. Its awkward to have to tell people you have to turn those  
> >>> on. I'd certainly prefer to have to turn them off if I have some  
> >>> limitation rather than on.
> >
> > Yep I agree, all whizzy new features should ideally be on by default
> > unless there is a significant performance penalty. It is not enough
> > that to issue a default solrconfig.xml with the feature on, it has to
> > be on by default inside the code.
> >  
> >>>
> >>> - Mark
> >>
> >>-----------------------------------------------------
> >>Eric Pugh | Principal | OpenSource Connections, LLC | 434.466.1467 |
> http://www.opensourceconnections.com
> >>Free/Busy: http://tinyurl.com/eric-cal
> >
> > Fergus
> >
> >
>
> --
> View this message in context:
> http://www.nabble.com/Solr-vs-Sphinx-tp23524676p23852364.html
> Sent from the Solr - User mailing list archive at Nabble.com.