solr example synonyms file

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

solr example synonyms file

Yonik Seeley-2-2
Just saw this tweet:
"Solr initial synonyms file has aaa pointing to aaaa. Inconvenient
when someone is searching for AAA related items."

I think I'll review that example synonyms file ;-)

-Yonik
http://www.lucidimagination.com

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: solr example synonyms file

Robert Muir
On Tue, Nov 2, 2010 at 3:56 PM, Yonik Seeley <[hidden email]> wrote:
> Just saw this tweet:
> "Solr initial synonyms file has aaa pointing to aaaa. Inconvenient
> when someone is searching for AAA related items."
>
> I think I'll review that example synonyms file ;-)

In my opinion, something like grey/gray would be a better example...

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: solr example synonyms file

Lance Norskog-2
I just used One Fish Two Fish Red Fish Blue Fish but I think that has
license problems.
Also, the sample should include multi-word left-hand values because they work.

On Tue, Nov 2, 2010 at 1:00 PM, Robert Muir <[hidden email]> wrote:

> On Tue, Nov 2, 2010 at 3:56 PM, Yonik Seeley <[hidden email]> wrote:
>> Just saw this tweet:
>> "Solr initial synonyms file has aaa pointing to aaaa. Inconvenient
>> when someone is searching for AAA related items."
>>
>> I think I'll review that example synonyms file ;-)
>
> In my opinion, something like grey/gray would be a better example...
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>



--
Lance Norskog
[hidden email]

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: solr example synonyms file

Robert Muir
On Tue, Nov 2, 2010 at 9:50 PM, Lance Norskog <[hidden email]> wrote:
> I just used One Fish Two Fish Red Fish Blue Fish but I think that has
> license problems.
> Also, the sample should include multi-word left-hand values because they work.
>

I don't think we should do this... i suggest only using single word
synonyms in the example for performance reasons!

it doesnt really matter how rare they are: even "the quick brown fox"
=> something is terrible, because its going to invoke SynonymFilter's
"slow path" for every single instance of "the".

i know some insist its just an "example" and not defaults, but this
isn't true, else why did this email thread even come up? its used as
"defaults", and we should keep it very fast.

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: solr example synonyms file

Mark Miller-3
On 11/2/10 9:57 PM, Robert Muir wrote:

> On Tue, Nov 2, 2010 at 9:50 PM, Lance Norskog <[hidden email]> wrote:
>> I just used One Fish Two Fish Red Fish Blue Fish but I think that has
>> license problems.
>> Also, the sample should include multi-word left-hand values because they work.
>>
>
> I don't think we should do this... i suggest only using single word
> synonyms in the example for performance reasons!
>
> it doesnt really matter how rare they are: even "the quick brown fox"
> => something is terrible, because its going to invoke SynonymFilter's
> "slow path" for every single instance of "the".
>
> i know some insist its just an "example" and not defaults, but this
> isn't true, else why did this email thread even come up? its used as
> "defaults", and we should keep it very fast.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>

We have discussed this before - there is always nasty compromise when it
comes to example vs default. Good for one is often not good for the
other. But like it or not, our example pretty much is the defacto
default as you say.

As a reminder, in the past we have talked about doing both an example
with all the bells and whistles, and a performance config that you
should really start from. But we have not gotten there obviously ;) Adds
some dev/maint overhead as well.

No real points, just chiming in with that.

- Mark

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: solr example synonyms file

Robert Muir
On Tue, Nov 2, 2010 at 10:08 PM, Mark Miller <[hidden email]> wrote:

> On 11/2/10 9:57 PM, Robert Muir wrote:
>> On Tue, Nov 2, 2010 at 9:50 PM, Lance Norskog <[hidden email]> wrote:
>>> I just used One Fish Two Fish Red Fish Blue Fish but I think that has
>>> license problems.
>>> Also, the sample should include multi-word left-hand values because they work.
>>>
>>
>> I don't think we should do this... i suggest only using single word
>> synonyms in the example for performance reasons!
>>
>> it doesnt really matter how rare they are: even "the quick brown fox"
>> => something is terrible, because its going to invoke SynonymFilter's
>> "slow path" for every single instance of "the".
>>
>> i know some insist its just an "example" and not defaults, but this
>> isn't true, else why did this email thread even come up? its used as
>> "defaults", and we should keep it very fast.
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [hidden email]
>> For additional commands, e-mail: [hidden email]
>>
>
> We have discussed this before - there is always nasty compromise when it
> comes to example vs default. Good for one is often not good for the
> other. But like it or not, our example pretty much is the defacto
> default as you say.
>
> As a reminder, in the past we have talked about doing both an example
> with all the bells and whistles, and a performance config that you
> should really start from. But we have not gotten there obviously ;) Adds
> some dev/maint overhead as well.
>
> No real points, just chiming in with that.
>

another idea i started for textTight, happy to try and wrap it up /
contribute if there is interest.
but this is really only applicable to 'textTight', since its stemming
etc isn't insane like 'text'
I generated the following with a mix of automatic and manual methods
from 2+2lemma.txt (http://wordlist.sourceforge.net/ public domain/BSD)
i'm sure other people must suffer with similar tuning like this...
here's just some examples

sample synonyms for textTight, built from only variant spellings
(mostly brit <-> us):
barbeque => barbecue
blonde => blond
conventionalising => conventionalizing
convertor => converter
conveyers => conveyors
...

sample stemmer corrections for textTight, the plural-only stemmer (via
StemmerOverrideFilter):
errata    erratum
news    news
radii      radius
cavalrymen cavalryman
...

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]