Coming back to search after some time... SOLR or Elastic for text search?

classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

Coming back to search after some time... SOLR or Elastic for text search?

dc tech
I am SOLR fant and had implemented it in our company over 10 years ago.
I moved away from that role and the new search team in the meanwhile
implemented a proprietary (and expensive) nosql style search engine. That
the project did not go well, and now I am back to project and reviewing the
technology stack.

Some of the team think that ElasticSearch could be a good option,
especially since we can easily get hosted versions with AWS where we have
all the contractual stuff sorted out.

Whle SOLR definitely seems more advanced  (LTR, streaming expressions,
graph, and all the knobs and dials for relevancy tuning), Elastic may be
sufficient for our needs. It does not seem to have LTR out of the box but
the relevancy tuning knobs and dials seem to be similar to what SOLR has.

The corpus size is not a challenge  - we have about one million document,
of which about 1/2 have full text, while the test are simpler (i.e. company
directory etc.).
The query volumes are also quite low (max 5/second at peak).
We have implemented the content ingestion and processing pipelines already
in python and SPARK, so most of the data will be pushed in using APIs.

I would really appreciate any guidance from the community !!
Reply | Threaded
Open this post in threaded view
|

Re: Coming back to search after some time... SOLR or Elastic for text search?

Charlie Hull-3
On 15/01/2020 04:02, Dc Tech wrote:
> I am SOLR fant and had implemented it in our company over 10 years ago.
> I moved away from that role and the new search team in the meanwhile
> implemented a proprietary (and expensive) nosql style search engine. That
> the project did not go well, and now I am back to project and reviewing the
> technology stack.
>
> Some of the team think that ElasticSearch could be a good option,
> especially since we can easily get hosted versions with AWS where we have
> all the contractual stuff sorted out.
You can, but you should be aware that:
1. Amazon's hosted Elasticsearch isn't great, often lags behind the
current version, doesn't allow plugins etc.
2.  Amazon and Elastic are currently engaged in legal battles over who
is the most open sourcey,who allegedly copied code that was 'open' but
commercially licensed, who would like to capture the hosted search
market...not sure how this will pan out (Google for details)
3. You can also buy fully hosted Solr from several places.
> Whle SOLR definitely seems more advanced  (LTR, streaming expressions,
> graph, and all the knobs and dials for relevancy tuning), Elastic may be
> sufficient for our needs. It does not seem to have LTR out of the box but
> the relevancy tuning knobs and dials seem to be similar to what SOLR has.
Yes, they're basically the same under the hood (unsurprising as they're
both based on Lucene). If you need LTR there's an ES plugin for that
(disclaimer, my new employer built and maintains it:
https://github.com/o19s/elasticsearch-learning-to-rank). I've lost track
of the amount of times I've been asked 'Elasticsearch or Solr, which
should I choose?' and my current thoughts are:

1. Don't switch from one to the other for the sake of it.  Switching
search engines rarely addresses underlying issues (content quality, team
skills, relevance tuning methodology)
2. Elasticsearch is easier to get started with, but at some point you'll
need to learn how it all works
3. Solr is harder to get started with, but you'll know more about how it
all works earlier
4. Both can be used for most search projects, most features are the
same, both can scale.
5. Lots of Elasticsearch projects (and developers) are focused on logs,
which is often not really a 'search' project.

>
> The corpus size is not a challenge  - we have about one million document,
> of which about 1/2 have full text, while the test are simpler (i.e. company
> directory etc.).
> The query volumes are also quite low (max 5/second at peak).
> We have implemented the content ingestion and processing pipelines already
> in python and SPARK, so most of the data will be pushed in using APIs.
>
> I would really appreciate any guidance from the community !!
>
Sounds like a pretty small setup to be honest, but as ever the devil is
in the details.

Cheers

Charlie

--
Charlie Hull
Flax - Open Source Enterprise Search (now part of OpenSourceConnections)

tel/fax: +44 (0)8700 118334
mobile:  +44 (0)7767 825828
web: www.o19.com

Reply | Threaded
Open this post in threaded view
|

Re: Coming back to search after some time... SOLR or Elastic for text search?

Jan Høydahl / Cominvent
Hi,

Choosing the solr community mailing list to ask advice for whether to choose ES - you already know what to expect, not?
More often than not the choice comes down to policy, standardization, what skills you have in the house etc rather than ticking off feature checkboxes.
Sometimes company values also may drive a choice, i.e. Solr is 100% Apache and not open core, which may matter if you plan to get involved in the community, and contribute features or patches.

However, if I were in your shoes as architect to evaluate tech stack, and there was not a clear choice based on the above, I’d do what projects normally do, to ask yourself what you really need from the engine. Maybe you have some features in your requirement list that makes one a much better choice over the other. Or maybe after that exercise you are still wondering what to choose, in which case you just follow your gut feeling and make a choice :)

Jan

> 15. jan. 2020 kl. 10:07 skrev Charlie Hull <[hidden email]>:
>
> On 15/01/2020 04:02, Dc Tech wrote:
>> I am SOLR fant and had implemented it in our company over 10 years ago.
>> I moved away from that role and the new search team in the meanwhile
>> implemented a proprietary (and expensive) nosql style search engine. That
>> the project did not go well, and now I am back to project and reviewing the
>> technology stack.
>>
>> Some of the team think that ElasticSearch could be a good option,
>> especially since we can easily get hosted versions with AWS where we have
>> all the contractual stuff sorted out.
> You can, but you should be aware that:
> 1. Amazon's hosted Elasticsearch isn't great, often lags behind the current version, doesn't allow plugins etc.
> 2.  Amazon and Elastic are currently engaged in legal battles over who is the most open sourcey,who allegedly copied code that was 'open' but commercially licensed, who would like to capture the hosted search market...not sure how this will pan out (Google for details)
> 3. You can also buy fully hosted Solr from several places.
>> Whle SOLR definitely seems more advanced  (LTR, streaming expressions,
>> graph, and all the knobs and dials for relevancy tuning), Elastic may be
>> sufficient for our needs. It does not seem to have LTR out of the box but
>> the relevancy tuning knobs and dials seem to be similar to what SOLR has.
> Yes, they're basically the same under the hood (unsurprising as they're both based on Lucene). If you need LTR there's an ES plugin for that (disclaimer, my new employer built and maintains it: https://github.com/o19s/elasticsearch-learning-to-rank). I've lost track of the amount of times I've been asked 'Elasticsearch or Solr, which should I choose?' and my current thoughts are:
>
> 1. Don't switch from one to the other for the sake of it.  Switching search engines rarely addresses underlying issues (content quality, team skills, relevance tuning methodology)
> 2. Elasticsearch is easier to get started with, but at some point you'll need to learn how it all works
> 3. Solr is harder to get started with, but you'll know more about how it all works earlier
> 4. Both can be used for most search projects, most features are the same, both can scale.
> 5. Lots of Elasticsearch projects (and developers) are focused on logs, which is often not really a 'search' project.
>
>>
>> The corpus size is not a challenge  - we have about one million document,
>> of which about 1/2 have full text, while the test are simpler (i.e. company
>> directory etc.).
>> The query volumes are also quite low (max 5/second at peak).
>> We have implemented the content ingestion and processing pipelines already
>> in python and SPARK, so most of the data will be pushed in using APIs.
>>
>> I would really appreciate any guidance from the community !!
>>
> Sounds like a pretty small setup to be honest, but as ever the devil is in the details.
>
> Cheers
>
> Charlie
>
> --
> Charlie Hull
> Flax - Open Source Enterprise Search (now part of OpenSourceConnections)
>
> tel/fax: +44 (0)8700 118334
> mobile:  +44 (0)7767 825828
> web: www.o19.com
>

Reply | Threaded
Open this post in threaded view
|

Re: Coming back to search after some time... SOLR or Elastic for text search?

dc tech
Thank you Jan and Charlie.

I should say that in terms of posting to the community regarding Elastic vs Solr - this is probably the most civil and helpful community that I have been a part of - and your answers have only reinforced that  notion !!

Thank you for your responses. I am glad to hear that both can do most of it, which was my gut feeling as well.

Charlie, to your point - the team probably feels that Elastic  is easier to get started with hence the preference, as well as the hosting options (with the caveats you noted). Agree with you completely that tech is not the real issue.

Jan,  agree with  the points you made on team skills.  On our previous proprietary engine - that was in fact the biggest issue - the engine was powerful enough and had good references.  However, we were not able to exploit it to good effect.  

Thank you again.

>
> On Jan 15, 2020, at 5:10 AM, Jan Høydahl <[hidden email]> wrote:
>
> Hi,
>
> Choosing the solr community mailing list to ask advice for whether to choose ES - you already know what to expect, not?
> More often than not the choice comes down to policy, standardization, what skills you have in the house etc rather than ticking off feature checkboxes.
> Sometimes company values also may drive a choice, i.e. Solr is 100% Apache and not open core, which may matter if you plan to get involved in the community, and contribute features or patches.
>
> However, if I were in your shoes as architect to evaluate tech stack, and there was not a clear choice based on the above, I’d do what projects normally do, to ask yourself what you really need from the engine. Maybe you have some features in your requirement list that makes one a much better choice over the other. Or maybe after that exercise you are still wondering what to choose, in which case you just follow your gut feeling and make a choice :)
>
> Jan
>
>> 15. jan. 2020 kl. 10:07 skrev Charlie Hull <[hidden email]>:
>>
>>> On 15/01/2020 04:02, Dc Tech wrote:
>>> I am SOLR fant and had implemented it in our company over 10 years ago.
>>> I moved away from that role and the new search team in the meanwhile
>>> implemented a proprietary (and expensive) nosql style search engine. That
>>> the project did not go well, and now I am back to project and reviewing the
>>> technology stack.
>>>
>>> Some of the team think that ElasticSearch could be a good option,
>>> especially since we can easily get hosted versions with AWS where we have
>>> all the contractual stuff sorted out.
>> You can, but you should be aware that:
>> 1. Amazon's hosted Elasticsearch isn't great, often lags behind the current version, doesn't allow plugins etc.
>> 2.  Amazon and Elastic are currently engaged in legal battles over who is the most open sourcey,who allegedly copied code that was 'open' but commercially licensed, who would like to capture the hosted search market...not sure how this will pan out (Google for details)
>> 3. You can also buy fully hosted Solr from several places.
>>> Whle SOLR definitely seems more advanced  (LTR, streaming expressions,
>>> graph, and all the knobs and dials for relevancy tuning), Elastic may be
>>> sufficient for our needs. It does not seem to have LTR out of the box but
>>> the relevancy tuning knobs and dials seem to be similar to what SOLR has.
>> Yes, they're basically the same under the hood (unsurprising as they're both based on Lucene). If you need LTR there's an ES plugin for that (disclaimer, my new employer built and maintains it: https://github.com/o19s/elasticsearch-learning-to-rank). I've lost track of the amount of times I've been asked 'Elasticsearch or Solr, which should I choose?' and my current thoughts are:
>>
>> 1. Don't switch from one to the other for the sake of it.  Switching search engines rarely addresses underlying issues (content quality, team skills, relevance tuning methodology)
>> 2. Elasticsearch is easier to get started with, but at some point you'll need to learn how it all works
>> 3. Solr is harder to get started with, but you'll know more about how it all works earlier
>> 4. Both can be used for most search projects, most features are the same, both can scale.
>> 5. Lots of Elasticsearch projects (and developers) are focused on logs, which is often not really a 'search' project.
>>
>>>
>>> The corpus size is not a challenge  - we have about one million document,
>>> of which about 1/2 have full text, while the test are simpler (i.e. company
>>> directory etc.).
>>> The query volumes are also quite low (max 5/second at peak).
>>> We have implemented the content ingestion and processing pipelines already
>>> in python and SPARK, so most of the data will be pushed in using APIs.
>>>
>>> I would really appreciate any guidance from the community !!
>>>
>> Sounds like a pretty small setup to be honest, but as ever the devil is in the details.
>>
>> Cheers
>>
>> Charlie
>>
>> --
>> Charlie Hull
>> Flax - Open Source Enterprise Search (now part of OpenSourceConnections)
>>
>> tel/fax: +44 (0)8700 118334
>> mobile:  +44 (0)7767 825828
>> web: www.o19.com
>>
>
Reply | Threaded
Open this post in threaded view
|

Re: Coming back to search after some time... SOLR or Elastic for text search?

Walter Underwood
Elasticsearch is easier to set up the first time, but that should not be a deciding factor. Decide on features, not something you’ll do once.

ES has most configuration power at query time. Solr has most at index time. If every query is different, like log search, ES will be better. If queries only differ in the text, keeping filters, fields, and weighting the same, Solr is probably best.

For product or ecommerce search, Solr has richer features.

I would not base a choice solely on LTR. 90% of sites don’t need that and you’ll need a lot of production data (queries and clicks) before you can start using it.

Finally, ES has had some pretty embarrassing issues with clustering. You might find it works great with a single host, then throws away updates when clustered in prod. Getting distributed fault tolerance right is very, very hard, which is why Solr uses Zookeeper.

wunder
Walter Underwood
[hidden email]
http://observer.wunderwood.org/  (my blog)

> On Jan 15, 2020, at 3:42 AM, Dc Tech <[hidden email]> wrote:
>
> Thank you Jan and Charlie.
>
> I should say that in terms of posting to the community regarding Elastic vs Solr - this is probably the most civil and helpful community that I have been a part of - and your answers have only reinforced that  notion !!
>
> Thank you for your responses. I am glad to hear that both can do most of it, which was my gut feeling as well.
>
> Charlie, to your point - the team probably feels that Elastic  is easier to get started with hence the preference, as well as the hosting options (with the caveats you noted). Agree with you completely that tech is not the real issue.
>
> Jan,  agree with  the points you made on team skills.  On our previous proprietary engine - that was in fact the biggest issue - the engine was powerful enough and had good references.  However, we were not able to exploit it to good effect.  
>
> Thank you again.
>
>>
>> On Jan 15, 2020, at 5:10 AM, Jan Høydahl <[hidden email]> wrote:
>>
>> Hi,
>>
>> Choosing the solr community mailing list to ask advice for whether to choose ES - you already know what to expect, not?
>> More often than not the choice comes down to policy, standardization, what skills you have in the house etc rather than ticking off feature checkboxes.
>> Sometimes company values also may drive a choice, i.e. Solr is 100% Apache and not open core, which may matter if you plan to get involved in the community, and contribute features or patches.
>>
>> However, if I were in your shoes as architect to evaluate tech stack, and there was not a clear choice based on the above, I’d do what projects normally do, to ask yourself what you really need from the engine. Maybe you have some features in your requirement list that makes one a much better choice over the other. Or maybe after that exercise you are still wondering what to choose, in which case you just follow your gut feeling and make a choice :)
>>
>> Jan
>>
>>> 15. jan. 2020 kl. 10:07 skrev Charlie Hull <[hidden email]>:
>>>
>>>> On 15/01/2020 04:02, Dc Tech wrote:
>>>> I am SOLR fant and had implemented it in our company over 10 years ago.
>>>> I moved away from that role and the new search team in the meanwhile
>>>> implemented a proprietary (and expensive) nosql style search engine. That
>>>> the project did not go well, and now I am back to project and reviewing the
>>>> technology stack.
>>>>
>>>> Some of the team think that ElasticSearch could be a good option,
>>>> especially since we can easily get hosted versions with AWS where we have
>>>> all the contractual stuff sorted out.
>>> You can, but you should be aware that:
>>> 1. Amazon's hosted Elasticsearch isn't great, often lags behind the current version, doesn't allow plugins etc.
>>> 2.  Amazon and Elastic are currently engaged in legal battles over who is the most open sourcey,who allegedly copied code that was 'open' but commercially licensed, who would like to capture the hosted search market...not sure how this will pan out (Google for details)
>>> 3. You can also buy fully hosted Solr from several places.
>>>> Whle SOLR definitely seems more advanced  (LTR, streaming expressions,
>>>> graph, and all the knobs and dials for relevancy tuning), Elastic may be
>>>> sufficient for our needs. It does not seem to have LTR out of the box but
>>>> the relevancy tuning knobs and dials seem to be similar to what SOLR has.
>>> Yes, they're basically the same under the hood (unsurprising as they're both based on Lucene). If you need LTR there's an ES plugin for that (disclaimer, my new employer built and maintains it: https://github.com/o19s/elasticsearch-learning-to-rank). I've lost track of the amount of times I've been asked 'Elasticsearch or Solr, which should I choose?' and my current thoughts are:
>>>
>>> 1. Don't switch from one to the other for the sake of it.  Switching search engines rarely addresses underlying issues (content quality, team skills, relevance tuning methodology)
>>> 2. Elasticsearch is easier to get started with, but at some point you'll need to learn how it all works
>>> 3. Solr is harder to get started with, but you'll know more about how it all works earlier
>>> 4. Both can be used for most search projects, most features are the same, both can scale.
>>> 5. Lots of Elasticsearch projects (and developers) are focused on logs, which is often not really a 'search' project.
>>>
>>>>
>>>> The corpus size is not a challenge  - we have about one million document,
>>>> of which about 1/2 have full text, while the test are simpler (i.e. company
>>>> directory etc.).
>>>> The query volumes are also quite low (max 5/second at peak).
>>>> We have implemented the content ingestion and processing pipelines already
>>>> in python and SPARK, so most of the data will be pushed in using APIs.
>>>>
>>>> I would really appreciate any guidance from the community !!
>>>>
>>> Sounds like a pretty small setup to be honest, but as ever the devil is in the details.
>>>
>>> Cheers
>>>
>>> Charlie
>>>
>>> --
>>> Charlie Hull
>>> Flax - Open Source Enterprise Search (now part of OpenSourceConnections)
>>>
>>> tel/fax: +44 (0)8700 118334
>>> mobile:  +44 (0)7767 825828
>>> web: www.o19.com
>>>
>>

Reply | Threaded
Open this post in threaded view
|

Re: Coming back to search after some time... SOLR or Elastic for text search?

Charlie Hull-3
In reply to this post by dc tech
On 15/01/2020 11:42, Dc Tech wrote:
> Thank you Jan and Charlie.
>
> I should say that in terms of posting to the community regarding Elastic vs Solr - this is probably the most civil and helpful community that I have been a part of - and your answers have only reinforced that  notion !!
>
> Thank you for your responses. I am glad to hear that both can do most of it, which was my gut feeling as well.
>
> Charlie, to your point - the team probably feels that Elastic  is easier to get started with hence the preference, as well as the hosting options (with the caveats you noted). Agree with you completely that tech is not the real issue.
>
> Jan,  agree with  the points you made on team skills.  On our previous proprietary engine - that was in fact the biggest issue - the engine was powerful enough and had good references.  However, we were not able to exploit it to good effect.

Hi again,

The dirty secret that few will voice is that...most search engines are
basically the same. Once you've worked on a search project you can apply
those skills to any future search engine. This is why I'm currently
focused on supporting the search team, not the search tech - how do you
learn and improve those relevance tuning skills, considering it's
really, really hard to recruit people with existing high-level search
skills (and if you can find them you probably can't afford them).

Cheers

Charlie

>
> Thank you again.
>
>> On Jan 15, 2020, at 5:10 AM, Jan Høydahl <[hidden email]> wrote:
>>
>> Hi,
>>
>> Choosing the solr community mailing list to ask advice for whether to choose ES - you already know what to expect, not?
>> More often than not the choice comes down to policy, standardization, what skills you have in the house etc rather than ticking off feature checkboxes.
>> Sometimes company values also may drive a choice, i.e. Solr is 100% Apache and not open core, which may matter if you plan to get involved in the community, and contribute features or patches.
>>
>> However, if I were in your shoes as architect to evaluate tech stack, and there was not a clear choice based on the above, I’d do what projects normally do, to ask yourself what you really need from the engine. Maybe you have some features in your requirement list that makes one a much better choice over the other. Or maybe after that exercise you are still wondering what to choose, in which case you just follow your gut feeling and make a choice :)
>>
>> Jan
>>
>>> 15. jan. 2020 kl. 10:07 skrev Charlie Hull <[hidden email]>:
>>>
>>>> On 15/01/2020 04:02, Dc Tech wrote:
>>>> I am SOLR fant and had implemented it in our company over 10 years ago.
>>>> I moved away from that role and the new search team in the meanwhile
>>>> implemented a proprietary (and expensive) nosql style search engine. That
>>>> the project did not go well, and now I am back to project and reviewing the
>>>> technology stack.
>>>>
>>>> Some of the team think that ElasticSearch could be a good option,
>>>> especially since we can easily get hosted versions with AWS where we have
>>>> all the contractual stuff sorted out.
>>> You can, but you should be aware that:
>>> 1. Amazon's hosted Elasticsearch isn't great, often lags behind the current version, doesn't allow plugins etc.
>>> 2.  Amazon and Elastic are currently engaged in legal battles over who is the most open sourcey,who allegedly copied code that was 'open' but commercially licensed, who would like to capture the hosted search market...not sure how this will pan out (Google for details)
>>> 3. You can also buy fully hosted Solr from several places.
>>>> Whle SOLR definitely seems more advanced  (LTR, streaming expressions,
>>>> graph, and all the knobs and dials for relevancy tuning), Elastic may be
>>>> sufficient for our needs. It does not seem to have LTR out of the box but
>>>> the relevancy tuning knobs and dials seem to be similar to what SOLR has.
>>> Yes, they're basically the same under the hood (unsurprising as they're both based on Lucene). If you need LTR there's an ES plugin for that (disclaimer, my new employer built and maintains it: https://github.com/o19s/elasticsearch-learning-to-rank). I've lost track of the amount of times I've been asked 'Elasticsearch or Solr, which should I choose?' and my current thoughts are:
>>>
>>> 1. Don't switch from one to the other for the sake of it.  Switching search engines rarely addresses underlying issues (content quality, team skills, relevance tuning methodology)
>>> 2. Elasticsearch is easier to get started with, but at some point you'll need to learn how it all works
>>> 3. Solr is harder to get started with, but you'll know more about how it all works earlier
>>> 4. Both can be used for most search projects, most features are the same, both can scale.
>>> 5. Lots of Elasticsearch projects (and developers) are focused on logs, which is often not really a 'search' project.
>>>
>>>> The corpus size is not a challenge  - we have about one million document,
>>>> of which about 1/2 have full text, while the test are simpler (i.e. company
>>>> directory etc.).
>>>> The query volumes are also quite low (max 5/second at peak).
>>>> We have implemented the content ingestion and processing pipelines already
>>>> in python and SPARK, so most of the data will be pushed in using APIs.
>>>>
>>>> I would really appreciate any guidance from the community !!
>>>>
>>> Sounds like a pretty small setup to be honest, but as ever the devil is in the details.
>>>
>>> Cheers
>>>
>>> Charlie
>>>
>>> --
>>> Charlie Hull
>>> Flax - Open Source Enterprise Search (now part of OpenSourceConnections)
>>>
>>> tel/fax: +44 (0)8700 118334
>>> mobile:  +44 (0)7767 825828
>>> web: www.o19.com
>>>

--
Charlie Hull
Flax - Open Source Enterprise Search, now part of OSC

tel/fax: +44 (0)8700 118334
mobile:  +44 (0)7767 825828
web: www.o19s.com

Reply | Threaded
Open this post in threaded view
|

Re: Coming back to search after some time... SOLR or Elastic for text search?

Emir Arnautović
In reply to this post by dc tech
Hi Jan,
Here is a blog post related to this topic: https://sematext.com/blog/solr-vs-elasticsearch-differences/ <https://sematext.com/blog/solr-vs-elasticsearch-differences/>
It also contains links to other resources that might help you make a decision.

HTH,
Emir
--
Monitoring - Log Management - Alerting - Anomaly Detection
Solr & Elasticsearch Consulting Support Training - http://sematext.com/



> On 15 Jan 2020, at 05:02, Dc Tech <[hidden email]> wrote:
>
> I am SOLR fant and had implemented it in our company over 10 years ago.
> I moved away from that role and the new search team in the meanwhile
> implemented a proprietary (and expensive) nosql style search engine. That
> the project did not go well, and now I am back to project and reviewing the
> technology stack.
>
> Some of the team think that ElasticSearch could be a good option,
> especially since we can easily get hosted versions with AWS where we have
> all the contractual stuff sorted out.
>
> Whle SOLR definitely seems more advanced  (LTR, streaming expressions,
> graph, and all the knobs and dials for relevancy tuning), Elastic may be
> sufficient for our needs. It does not seem to have LTR out of the box but
> the relevancy tuning knobs and dials seem to be similar to what SOLR has.
>
> The corpus size is not a challenge  - we have about one million document,
> of which about 1/2 have full text, while the test are simpler (i.e. company
> directory etc.).
> The query volumes are also quite low (max 5/second at peak).
> We have implemented the content ingestion and processing pipelines already
> in python and SPARK, so most of the data will be pushed in using APIs.
>
> I would really appreciate any guidance from the community !!

Reply | Threaded
Open this post in threaded view
|

Re: Coming back to search after some time... SOLR or Elastic for text search?

Nicolas Paris-2
In reply to this post by dc tech
> We have implemented the content ingestion and processing pipelines already
> in python and SPARK, so most of the data will be pushed in using APIs.

I use the spark-solr library in production and have looked at the ES
equivalent and the solr connector looks much more advanced for both
loading and fetching data. In particular the fetching part uses the solr
export handler which makes things incredibly fast. Also spark-solr uses
the dataframe API while ES looks to be stuck with the RDD api AFAIK.

A good connector to spark offer lot of perspectives in term of data
transformation and machine learning advanced features within the search
engine.

On Tue, Jan 14, 2020 at 11:02:17PM -0500, Dc Tech wrote:

> I am SOLR fant and had implemented it in our company over 10 years ago.
> I moved away from that role and the new search team in the meanwhile
> implemented a proprietary (and expensive) nosql style search engine. That
> the project did not go well, and now I am back to project and reviewing the
> technology stack.
>
> Some of the team think that ElasticSearch could be a good option,
> especially since we can easily get hosted versions with AWS where we have
> all the contractual stuff sorted out.
>
> Whle SOLR definitely seems more advanced  (LTR, streaming expressions,
> graph, and all the knobs and dials for relevancy tuning), Elastic may be
> sufficient for our needs. It does not seem to have LTR out of the box but
> the relevancy tuning knobs and dials seem to be similar to what SOLR has.
>
> The corpus size is not a challenge  - we have about one million document,
> of which about 1/2 have full text, while the test are simpler (i.e. company
> directory etc.).
> The query volumes are also quite low (max 5/second at peak).
> We have implemented the content ingestion and processing pipelines already
> in python and SPARK, so most of the data will be pushed in using APIs.
>
> I would really appreciate any guidance from the community !!

--
nicolas