synonyms

classic Classic list List threaded Threaded
11 messages Options
Reply | Threaded
Open this post in threaded view
|

synonyms

Lucas F. A. Teixeira-2
Hello all,

We r having some problems using solr synonyms. If I define a synonym for
example:

refrigerador,geladeira

And if I search for "refrigerador", I'll have all results for
"refrigerador", for "geladeira", and all results for the flexed words
for what i've typed (refrigerador, refrigerado, refrigeração, etc). But
I won't find the results for the flexed words of the synonym that i've
defined (geladeira), for example "gelado, gelo, etc".


Do you guys know how can i solve this issue?

Thanks all!

[]s,

Lucas
Reply | Threaded
Open this post in threaded view
|

Re: synonyms

hossman
: And if I search for "refrigerador", I'll have all results for "refrigerador",
: for "geladeira", and all results for the flexed words for what i've typed
: (refrigerador, refrigerado, refrigeração, etc). But I won't find the results
: for the flexed words of the synonym that i've defined (geladeira), for example
: "gelado, gelo, etc".

I'm not sure what "flexed" means ... it looks like you are refering to
other words with a common stem.

if you use the SynonymFilter before you use your stemming filter, it
should work fine.



-Hoss
Reply | Threaded
Open this post in threaded view
|

Re: synonyms

Erick Erickson
In reply to this post by Lucas F. A. Teixeira-2
Your problem might be solved by (from memory, so check it), using a filter
for indexing that collapses flexed (accented etc?) characters.
See IsoLatin1AccentFilter....

Best
Erick

On Tue, Mar 25, 2008 at 1:56 PM, Lucas F. A. Teixeira <
[hidden email]> wrote:

> Hello all,
>
> We r having some problems using solr synonyms. If I define a synonym for
> example:
>
> refrigerador,geladeira
>
> And if I search for "refrigerador", I'll have all results for
> "refrigerador", for "geladeira", and all results for the flexed words
> for what i've typed (refrigerador, refrigerado, refrigeração, etc). But
> I won't find the results for the flexed words of the synonym that i've
> defined (geladeira), for example "gelado, gelo, etc".
>
>
> Do you guys know how can i solve this issue?
>
> Thanks all!
>
> []s,
>
> Lucas
>
Reply | Threaded
Open this post in threaded view
|

Re: synonyms

Lucas F. A. Teixeira-2
Thanks Erick,

But its already being used :-(

still looking for something.... :-)

Thank you!

[]s,

Lucas

Erick Erickson wrote:

> Your problem might be solved by (from memory, so check it), using a filter
> for indexing that collapses flexed (accented etc?) characters.
> See IsoLatin1AccentFilter....
>
> Best
> Erick
>
> On Tue, Mar 25, 2008 at 1:56 PM, Lucas F. A. Teixeira <
> [hidden email]> wrote:
>
>  
>> Hello all,
>>
>> We r having some problems using solr synonyms. If I define a synonym for
>> example:
>>
>> refrigerador,geladeira
>>
>> And if I search for "refrigerador", I'll have all results for
>> "refrigerador", for "geladeira", and all results for the flexed words
>> for what i've typed (refrigerador, refrigerado, refrigeração, etc). But
>> I won't find the results for the flexed words of the synonym that i've
>> defined (geladeira), for example "gelado, gelo, etc".
>>
>>
>> Do you guys know how can i solve this issue?
>>
>> Thanks all!
>>
>> []s,
>>
>> Lucas
>>
>>    
>
>  
Reply | Threaded
Open this post in threaded view
|

Re: synonyms

Erick Erickson
Hmmmm. Could you provide some more examples? I'm having a hard
time figuring out what's going into the index, what you're searching
on and what you're getting...

In particular what filters are you using for *both* indexing and queries...

Best
Erick

On Fri, Mar 28, 2008 at 1:33 PM, Lucas F. A. Teixeira <
[hidden email]> wrote:

> Thanks Erick,
>
> But its already being used :-(
>
> still looking for something.... :-)
>
> Thank you!
>
> []s,
>
> Lucas
>
> Erick Erickson wrote:
> > Your problem might be solved by (from memory, so check it), using a
> filter
> > for indexing that collapses flexed (accented etc?) characters.
> > See IsoLatin1AccentFilter....
> >
> > Best
> > Erick
> >
> > On Tue, Mar 25, 2008 at 1:56 PM, Lucas F. A. Teixeira <
> > [hidden email]> wrote:
> >
> >
> >> Hello all,
> >>
> >> We r having some problems using solr synonyms. If I define a synonym
> for
> >> example:
> >>
> >> refrigerador,geladeira
> >>
> >> And if I search for "refrigerador", I'll have all results for
> >> "refrigerador", for "geladeira", and all results for the flexed words
> >> for what i've typed (refrigerador, refrigerado, refrigeração, etc). But
> >> I won't find the results for the flexed words of the synonym that i've
> >> defined (geladeira), for example "gelado, gelo, etc".
> >>
> >>
> >> Do you guys know how can i solve this issue?
> >>
> >> Thanks all!
> >>
> >> []s,
> >>
> >> Lucas
> >>
> >>
> >
> >
>
Reply | Threaded
Open this post in threaded view
|

RE: synonyms

Lance Norskog-2
Lucas-

Your examples are Portuguese and Spanish. You might find a Spanish-language
stemmer that follows the very rigid conjugation in Spanish (and I'm assuming
in Portuguese as well). Spanish follows conjugation rules that embed much
more semantics than English, so a huge number of synonyms can be stemmed to
the same word.

Lance

-----Original Message-----
From: Erick Erickson [mailto:[hidden email]]
Sent: Friday, March 28, 2008 11:13 AM
To: [hidden email]
Subject: Re: synonyms

Hmmmm. Could you provide some more examples? I'm having a hard time figuring
out what's going into the index, what you're searching on and what you're
getting...

In particular what filters are you using for *both* indexing and queries...

Best
Erick

On Fri, Mar 28, 2008 at 1:33 PM, Lucas F. A. Teixeira <
[hidden email]> wrote:

> Thanks Erick,
>
> But its already being used :-(
>
> still looking for something.... :-)
>
> Thank you!
>
> []s,
>
> Lucas
>
> Erick Erickson wrote:
> > Your problem might be solved by (from memory, so check it), using a
> filter
> > for indexing that collapses flexed (accented etc?) characters.
> > See IsoLatin1AccentFilter....
> >
> > Best
> > Erick
> >
> > On Tue, Mar 25, 2008 at 1:56 PM, Lucas F. A. Teixeira <
> > [hidden email]> wrote:
> >
> >
> >> Hello all,
> >>
> >> We r having some problems using solr synonyms. If I define a
> >> synonym
> for
> >> example:
> >>
> >> refrigerador,geladeira
> >>
> >> And if I search for "refrigerador", I'll have all results for
> >> "refrigerador", for "geladeira", and all results for the flexed
> >> words for what i've typed (refrigerador, refrigerado, refrigeração,
> >> etc). But I won't find the results for the flexed words of the
> >> synonym that i've defined (geladeira), for example "gelado, gelo, etc".
> >>
> >>
> >> Do you guys know how can i solve this issue?
> >>
> >> Thanks all!
> >>
> >> []s,
> >>
> >> Lucas
> >>
> >>
> >
> >
>

Reply | Threaded
Open this post in threaded view
|

Re: synonyms

Leonardo Santagada

On 28/03/2008, at 16:28, Lance Norskog wrote:

> Lucas-
>
> Your examples are Portuguese and Spanish. You might find a Spanish-
> language
> stemmer that follows the very rigid conjugation in Spanish (and I'm  
> assuming
> in Portuguese as well). Spanish follows conjugation rules that embed  
> much
> more semantics than English, so a huge number of synonyms can be  
> stemmed to
> the same word.

Well his examples are in brazilian portuguese and not spanish and the  
biggest problem is that a spanish stemmer is not goin to work. I  
haven't found a pt_BR steammer, have I overlooked something?

--
Leonardo Santagada




Reply | Threaded
Open this post in threaded view
|

Re: synonyms

Christian Vogler-3
On Friday 28 March 2008 21:44:29 Leonardo Santagada wrote:
> Well his examples are in brazilian portuguese and not spanish and the
> biggest problem is that a spanish stemmer is not goin to work. I
> haven't found a pt_BR steammer, have I overlooked something?

Try the Snowball Porter filter factory. The algorithm is specified in plain
text files, so adding new stemmers to the codebase is pretty easy. The hard
part is finding a good specification of the algorithm for Brazilian
Portuguese.

A Google search reveals some references to Brazilian Portuguese versions of
the Porter algorithm. Maybe one of these is suitably unencumbered for
implementation and distribution as free software.

As a last resort, there already is a Snowball Porter stemmer for Portuguese in
the SOLR codebase. However, I do not know how suitable it would be for
adaptation to Brazilian Portuguese, as I know zilch about the variant spoken
in Portugal.

Best  regards
- Christian
Reply | Threaded
Open this post in threaded view
|

Re: synonyms

Lucas F. A. Teixeira-2
In reply to this post by Leonardo Santagada
Hello All,

We've implemented a PortugueseSteemer. We want to let it available for
everyone. Where can I commit it ?


[]s,

Lucas

Leonardo Santagada wrote:

>
> On 28/03/2008, at 16:28, Lance Norskog wrote:
>> Lucas-
>>
>> Your examples are Portuguese and Spanish. You might find a
>> Spanish-language
>> stemmer that follows the very rigid conjugation in Spanish (and I'm
>> assuming
>> in Portuguese as well). Spanish follows conjugation rules that embed
>> much
>> more semantics than English, so a huge number of synonyms can be
>> stemmed to
>> the same word.
>
> Well his examples are in brazilian portuguese and not spanish and the
> biggest problem is that a spanish stemmer is not goin to work. I
> haven't found a pt_BR steammer, have I overlooked something?
>
> --
> Leonardo Santagada
>
>
>
>
>
>
Reply | Threaded
Open this post in threaded view
|

Re: synonyms

hossman

: We've implemented a PortugueseSteemer. We want to let it available for
: everyone. Where can I commit it ?

If it's just a Stemmer that has no Solr dependencies (or a Stemmer built
as a TokenFilter) the best thing to do is contribute it to the
Lucene-Java project...

http://wiki.apache.org/lucene-java/HowToContribute

...the best place for it to live would be in the contrib/analysis package.

make sure to note in the issue how it is differnet/better then the
existing Stemmers for Portuguese so people can differentiate it.



-Hoss

Reply | Threaded
Open this post in threaded view
|

Re: synonyms

Otis Gospodnetic-2
In reply to this post by Lucas F. A. Teixeira-2
Concretely, it would be good to know how your Portuguese Stemmer is different/better than Porter's.

Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch

----- Original Message ----
From: Chris Hostetter <[hidden email]>
To: [hidden email]
Sent: Monday, March 31, 2008 6:17:48 PM
Subject: Re: synonyms


: We've implemented a PortugueseSteemer. We want to let it available for
: everyone. Where can I commit it ?

If it's just a Stemmer that has no Solr dependencies (or a Stemmer built
as a TokenFilter) the best thing to do is contribute it to the
Lucene-Java project...

http://wiki.apache.org/lucene-java/HowToContribute

...the best place for it to live would be in the contrib/analysis package.

make sure to note in the issue how it is differnet/better then the
existing Stemmers for Portuguese so people can differentiate it.



-Hoss