Ignore accent in a request

classic Classic list List threaded Threaded
12 messages Options
Reply | Threaded
Open this post in threaded view
|

Ignore accent in a request

SAUNIER Maxence
Hello,

How can I ignore accent in the query result ?

Request : http://*****:8983/solr/***/select?defType=dismax&q=je+suis+avarié&qf=title%5e20+subject%5e15+category%5e1+content%5e0.5&mm=757

I want to have doc with avarié and avarie.

I have add this in my schema :

  {
    "name": "string",
    "positionIncrementGap": "100",
    "analyzer": {
      "filters": [
        {
          "class": "solr.LowerCaseFilterFactory"
        },
        {
          "class": "solr.ASCIIFoldingFilterFactory"
        },
        {
          "class": "solr.EdgeNGramFilterFactory",
          "minGramSize": "3",
          "maxGramSize": "50"
        }
      ],
      "tokenizer": {
        "class": "solr.KeywordTokenizerFactory"
      }
    },
    "stored": true,
    "indexed": true,
    "sortMissingLast": true,
    "class": "solr.TextField"
  },

But it not working.

Thanks.
Reply | Threaded
Open this post in threaded view
|

Re: Ignore accent in a request

Erick Erickson
exactly _how_ is it "not working"?

Try building your parameters _up_ rather than starting with a lot, e.g.
select?defType=dismax&q=je suis avarié&qf=title
^^ assumes you expect a match on title. Then:
select?defType=dismax&q=je suis avarié&qf=title subject

etc.

Because mm=757 looks really wrong. From the docs:
Defines the minimum number of clauses that must match, regardless of
how many clauses there are in total.

edismax is used much more than dismax as it's more flexible, but
that's not germane here.

finally, try adding &debug=query to the url to see exactly how the
query is parsed.

Best,
Erick

On Mon, Feb 4, 2019 at 9:09 AM SAUNIER Maxence <[hidden email]> wrote:

>
> Hello,
>
> How can I ignore accent in the query result ?
>
> Request : http://*****:8983/solr/***/select?defType=dismax&q=je+suis+avarié&qf=title%5e20+subject%5e15+category%5e1+content%5e0.5&mm=757
>
> I want to have doc with avarié and avarie.
>
> I have add this in my schema :
>
>   {
>     "name": "string",
>     "positionIncrementGap": "100",
>     "analyzer": {
>       "filters": [
>         {
>           "class": "solr.LowerCaseFilterFactory"
>         },
>         {
>           "class": "solr.ASCIIFoldingFilterFactory"
>         },
>         {
>           "class": "solr.EdgeNGramFilterFactory",
>           "minGramSize": "3",
>           "maxGramSize": "50"
>         }
>       ],
>       "tokenizer": {
>         "class": "solr.KeywordTokenizerFactory"
>       }
>     },
>     "stored": true,
>     "indexed": true,
>     "sortMissingLast": true,
>     "class": "solr.TextField"
>   },
>
> But it not working.
>
> Thanks.
Reply | Threaded
Open this post in threaded view
|

RE: Ignore accent in a request

SAUNIER Maxence
Hello,

Thanks for you answer.

I have test :

select?defType=dismax&q=je suis avarié&qf=content
90.000 results

select?defType=dismax&q=je suis avarie&qf=content
60.000 results

With avarié, I dont find documents with avarie and with avarie, I don't find documents with avarié.

I want to find they 150.000 documents with avarié or avarie.

Thanks

-----Message d'origine-----
De : Erick Erickson <[hidden email]>
Envoyé : jeudi 7 février 2019 19:37
À : solr-user <[hidden email]>
Objet : Re: Ignore accent in a request

exactly _how_ is it "not working"?

Try building your parameters _up_ rather than starting with a lot, e.g.
select?defType=dismax&q=je suis avarié&qf=title ^^ assumes you expect a match on title. Then:
select?defType=dismax&q=je suis avarié&qf=title subject

etc.

Because mm=757 looks really wrong. From the docs:
Defines the minimum number of clauses that must match, regardless of how many clauses there are in total.

edismax is used much more than dismax as it's more flexible, but that's not germane here.

finally, try adding &debug=query to the url to see exactly how the query is parsed.

Best,
Erick

On Mon, Feb 4, 2019 at 9:09 AM SAUNIER Maxence <[hidden email]> wrote:

>
> Hello,
>
> How can I ignore accent in the query result ?
>
> Request :
> http://*****:8983/solr/***/select?defType=dismax&q=je+suis+avarié&qf=t
> itle%5e20+subject%5e15+category%5e1+content%5e0.5&mm=757
>
> I want to have doc with avarié and avarie.
>
> I have add this in my schema :
>
>   {
>     "name": "string",
>     "positionIncrementGap": "100",
>     "analyzer": {
>       "filters": [
>         {
>           "class": "solr.LowerCaseFilterFactory"
>         },
>         {
>           "class": "solr.ASCIIFoldingFilterFactory"
>         },
>         {
>           "class": "solr.EdgeNGramFilterFactory",
>           "minGramSize": "3",
>           "maxGramSize": "50"
>         }
>       ],
>       "tokenizer": {
>         "class": "solr.KeywordTokenizerFactory"
>       }
>     },
>     "stored": true,
>     "indexed": true,
>     "sortMissingLast": true,
>     "class": "solr.TextField"
>   },
>
> But it not working.
>
> Thanks.
Reply | Threaded
Open this post in threaded view
|

RE: Ignore accent in a request

Gopesh Sharma
We have fixed this type of issue by using Synonyms by adding SynonymFilterFactory(Before Solr 7).

-----Original Message-----
From: SAUNIER Maxence <[hidden email]>
Sent: Friday, February 8, 2019 3:36 PM
To: [hidden email]
Subject: RE: Ignore accent in a request

Hello,

Thanks for you answer.

I have test :

select?defType=dismax&q=je suis avarié&qf=content
90.000 results

select?defType=dismax&q=je suis avarie&qf=content
60.000 results

With avarié, I dont find documents with avarie and with avarie, I don't find documents with avarié.

I want to find they 150.000 documents with avarié or avarie.

Thanks

-----Message d'origine-----
De : Erick Erickson <[hidden email]> Envoyé : jeudi 7 février 2019 19:37 À : solr-user <[hidden email]> Objet : Re: Ignore accent in a request

exactly _how_ is it "not working"?

Try building your parameters _up_ rather than starting with a lot, e.g.
select?defType=dismax&q=je suis avarié&qf=title ^^ assumes you expect a match on title. Then:
select?defType=dismax&q=je suis avarié&qf=title subject

etc.

Because mm=757 looks really wrong. From the docs:
Defines the minimum number of clauses that must match, regardless of how many clauses there are in total.

edismax is used much more than dismax as it's more flexible, but that's not germane here.

finally, try adding &debug=query to the url to see exactly how the query is parsed.

Best,
Erick

On Mon, Feb 4, 2019 at 9:09 AM SAUNIER Maxence <[hidden email]> wrote:

>
> Hello,
>
> How can I ignore accent in the query result ?
>
> Request :
> http://*****:8983/solr/***/select?defType=dismax&q=je+suis+avarié&qf=t
> itle%5e20+subject%5e15+category%5e1+content%5e0.5&mm=757
>
> I want to have doc with avarié and avarie.
>
> I have add this in my schema :
>
>   {
>     "name": "string",
>     "positionIncrementGap": "100",
>     "analyzer": {
>       "filters": [
>         {
>           "class": "solr.LowerCaseFilterFactory"
>         },
>         {
>           "class": "solr.ASCIIFoldingFilterFactory"
>         },
>         {
>           "class": "solr.EdgeNGramFilterFactory",
>           "minGramSize": "3",
>           "maxGramSize": "50"
>         }
>       ],
>       "tokenizer": {
>         "class": "solr.KeywordTokenizerFactory"
>       }
>     },
>     "stored": true,
>     "indexed": true,
>     "sortMissingLast": true,
>     "class": "solr.TextField"
>   },
>
> But it not working.
>
> Thanks.
Reply | Threaded
Open this post in threaded view
|

Re: Ignore accent in a request

elisabeth benoit
Hello,

We use solr 7 and use

<charFilter class="solr.MappingCharFilterFactory"
mapping="mapping-ISOLatin1Accent.txt"/>

with mapping-ISOLatin1Accent.txt

containing lines like

# À => A
"\u00C0" => "A"

# Á => A
"\u00C1" => "A"

# Â => A
"\u00C2" => "A"

# Ã => A
"\u00C3" => "A"

# Ä => A
"\u00C4" => "A"

# Å => A
"\u00C5" => "A"

# Ā Ă Ą =>
"\u0100" => "A"
"\u0102" => "A"
"\u0104" => "A"

# Æ => AE
"\u00C6" => "AE"

# Ç => C
"\u00C7" => "C"

# é => e
"\u00E9" => "e"

Best regards,
Elisabeth

Le ven. 8 févr. 2019 à 11:18, Gopesh Sharma <[hidden email]> a
écrit :

> We have fixed this type of issue by using Synonyms by adding
> SynonymFilterFactory(Before Solr 7).
>
> -----Original Message-----
> From: SAUNIER Maxence <[hidden email]>
> Sent: Friday, February 8, 2019 3:36 PM
> To: [hidden email]
> Subject: RE: Ignore accent in a request
>
> Hello,
>
> Thanks for you answer.
>
> I have test :
>
> select?defType=dismax&q=je suis avarié&qf=content
> 90.000 results
>
> select?defType=dismax&q=je suis avarie&qf=content
> 60.000 results
>
> With avarié, I dont find documents with avarie and with avarie, I don't
> find documents with avarié.
>
> I want to find they 150.000 documents with avarié or avarie.
>
> Thanks
>
> -----Message d'origine-----
> De : Erick Erickson <[hidden email]> Envoyé : jeudi 7 février
> 2019 19:37 À : solr-user <[hidden email]> Objet : Re: Ignore
> accent in a request
>
> exactly _how_ is it "not working"?
>
> Try building your parameters _up_ rather than starting with a lot, e.g.
> select?defType=dismax&q=je suis avarié&qf=title ^^ assumes you expect a
> match on title. Then:
> select?defType=dismax&q=je suis avarié&qf=title subject
>
> etc.
>
> Because mm=757 looks really wrong. From the docs:
> Defines the minimum number of clauses that must match, regardless of how
> many clauses there are in total.
>
> edismax is used much more than dismax as it's more flexible, but that's
> not germane here.
>
> finally, try adding &debug=query to the url to see exactly how the query
> is parsed.
>
> Best,
> Erick
>
> On Mon, Feb 4, 2019 at 9:09 AM SAUNIER Maxence <[hidden email]> wrote:
> >
> > Hello,
> >
> > How can I ignore accent in the query result ?
> >
> > Request :
> > http://*****:8983/solr/***/select?defType=dismax&q=je+suis+avarié&qf=t
> > itle%5e20+subject%5e15+category%5e1+content%5e0.5&mm=757
> >
> > I want to have doc with avarié and avarie.
> >
> > I have add this in my schema :
> >
> >   {
> >     "name": "string",
> >     "positionIncrementGap": "100",
> >     "analyzer": {
> >       "filters": [
> >         {
> >           "class": "solr.LowerCaseFilterFactory"
> >         },
> >         {
> >           "class": "solr.ASCIIFoldingFilterFactory"
> >         },
> >         {
> >           "class": "solr.EdgeNGramFilterFactory",
> >           "minGramSize": "3",
> >           "maxGramSize": "50"
> >         }
> >       ],
> >       "tokenizer": {
> >         "class": "solr.KeywordTokenizerFactory"
> >       }
> >     },
> >     "stored": true,
> >     "indexed": true,
> >     "sortMissingLast": true,
> >     "class": "solr.TextField"
> >   },
> >
> > But it not working.
> >
> > Thanks.
>
Reply | Threaded
Open this post in threaded view
|

RE: Ignore accent in a request

SAUNIER Maxence
Thanks you !

-----Message d'origine-----
De : elisabeth benoit <[hidden email]>
Envoyé : vendredi 8 février 2019 14:12
À : [hidden email]
Objet : Re: Ignore accent in a request

Hello,

We use solr 7 and use

<charFilter class="solr.MappingCharFilterFactory"
mapping="mapping-ISOLatin1Accent.txt"/>

with mapping-ISOLatin1Accent.txt

containing lines like

# À => A
"\u00C0" => "A"

# Á => A
"\u00C1" => "A"

# Â => A
"\u00C2" => "A"

# Ã => A
"\u00C3" => "A"

# Ä => A
"\u00C4" => "A"

# Å => A
"\u00C5" => "A"

# Ā Ă Ą =>
"\u0100" => "A"
"\u0102" => "A"
"\u0104" => "A"

# Æ => AE
"\u00C6" => "AE"

# Ç => C
"\u00C7" => "C"

# é => e
"\u00E9" => "e"

Best regards,
Elisabeth

Le ven. 8 févr. 2019 à 11:18, Gopesh Sharma <[hidden email]> a écrit :

> We have fixed this type of issue by using Synonyms by adding
> SynonymFilterFactory(Before Solr 7).
>
> -----Original Message-----
> From: SAUNIER Maxence <[hidden email]>
> Sent: Friday, February 8, 2019 3:36 PM
> To: [hidden email]
> Subject: RE: Ignore accent in a request
>
> Hello,
>
> Thanks for you answer.
>
> I have test :
>
> select?defType=dismax&q=je suis avarié&qf=content
> 90.000 results
>
> select?defType=dismax&q=je suis avarie&qf=content
> 60.000 results
>
> With avarié, I dont find documents with avarie and with avarie, I
> don't find documents with avarié.
>
> I want to find they 150.000 documents with avarié or avarie.
>
> Thanks
>
> -----Message d'origine-----
> De : Erick Erickson <[hidden email]> Envoyé : jeudi 7 février
> 2019 19:37 À : solr-user <[hidden email]> Objet : Re:
> Ignore accent in a request
>
> exactly _how_ is it "not working"?
>
> Try building your parameters _up_ rather than starting with a lot, e.g.
> select?defType=dismax&q=je suis avarié&qf=title ^^ assumes you expect
> a match on title. Then:
> select?defType=dismax&q=je suis avarié&qf=title subject
>
> etc.
>
> Because mm=757 looks really wrong. From the docs:
> Defines the minimum number of clauses that must match, regardless of
> how many clauses there are in total.
>
> edismax is used much more than dismax as it's more flexible, but
> that's not germane here.
>
> finally, try adding &debug=query to the url to see exactly how the
> query is parsed.
>
> Best,
> Erick
>
> On Mon, Feb 4, 2019 at 9:09 AM SAUNIER Maxence <[hidden email]> wrote:
> >
> > Hello,
> >
> > How can I ignore accent in the query result ?
> >
> > Request :
> > http://*****:8983/solr/***/select?defType=dismax&q=je+suis+avarié&qf
> > =t
> > itle%5e20+subject%5e15+category%5e1+content%5e0.5&mm=757
> >
> > I want to have doc with avarié and avarie.
> >
> > I have add this in my schema :
> >
> >   {
> >     "name": "string",
> >     "positionIncrementGap": "100",
> >     "analyzer": {
> >       "filters": [
> >         {
> >           "class": "solr.LowerCaseFilterFactory"
> >         },
> >         {
> >           "class": "solr.ASCIIFoldingFilterFactory"
> >         },
> >         {
> >           "class": "solr.EdgeNGramFilterFactory",
> >           "minGramSize": "3",
> >           "maxGramSize": "50"
> >         }
> >       ],
> >       "tokenizer": {
> >         "class": "solr.KeywordTokenizerFactory"
> >       }
> >     },
> >     "stored": true,
> >     "indexed": true,
> >     "sortMissingLast": true,
> >     "class": "solr.TextField"
> >   },
> >
> > But it not working.
> >
> > Thanks.
>
Reply | Threaded
Open this post in threaded view
|

Re: Ignore accent in a request

Erick Erickson
Elisabeth's suggestion is spot on for the accent.

One other thing I noticed. You are using
KeywordTokenizerFactory combined with
EdgeNGramFilterFactory. This implies that you
can't search for individual _words_, only
prefix queries, i.e.
je
je s
je su
je sui
je suis

You can't search for "suis" for instance.

basically this is an efficient way to search
anything starting with three-or-more letter prefixes
at the expense of index size. You might be better
off just using wildcards (restrict to three letters
at the prefix though).

This is perfectly valid, I'm mostly asking if it's
your intent.

Best,
Erick

On Fri, Feb 8, 2019 at 9:35 AM SAUNIER Maxence <[hidden email]> wrote:

>
> Thanks you !
>
> -----Message d'origine-----
> De : elisabeth benoit <[hidden email]>
> Envoyé : vendredi 8 février 2019 14:12
> À : [hidden email]
> Objet : Re: Ignore accent in a request
>
> Hello,
>
> We use solr 7 and use
>
> <charFilter class="solr.MappingCharFilterFactory"
> mapping="mapping-ISOLatin1Accent.txt"/>
>
> with mapping-ISOLatin1Accent.txt
>
> containing lines like
>
> # À => A
> "\u00C0" => "A"
>
> # Á => A
> "\u00C1" => "A"
>
> # Â => A
> "\u00C2" => "A"
>
> # Ã => A
> "\u00C3" => "A"
>
> # Ä => A
> "\u00C4" => "A"
>
> # Å => A
> "\u00C5" => "A"
>
> # Ā Ă Ą =>
> "\u0100" => "A"
> "\u0102" => "A"
> "\u0104" => "A"
>
> # Æ => AE
> "\u00C6" => "AE"
>
> # Ç => C
> "\u00C7" => "C"
>
> # é => e
> "\u00E9" => "e"
>
> Best regards,
> Elisabeth
>
> Le ven. 8 févr. 2019 à 11:18, Gopesh Sharma <[hidden email]> a écrit :
>
> > We have fixed this type of issue by using Synonyms by adding
> > SynonymFilterFactory(Before Solr 7).
> >
> > -----Original Message-----
> > From: SAUNIER Maxence <[hidden email]>
> > Sent: Friday, February 8, 2019 3:36 PM
> > To: [hidden email]
> > Subject: RE: Ignore accent in a request
> >
> > Hello,
> >
> > Thanks for you answer.
> >
> > I have test :
> >
> > select?defType=dismax&q=je suis avarié&qf=content
> > 90.000 results
> >
> > select?defType=dismax&q=je suis avarie&qf=content
> > 60.000 results
> >
> > With avarié, I dont find documents with avarie and with avarie, I
> > don't find documents with avarié.
> >
> > I want to find they 150.000 documents with avarié or avarie.
> >
> > Thanks
> >
> > -----Message d'origine-----
> > De : Erick Erickson <[hidden email]> Envoyé : jeudi 7 février
> > 2019 19:37 À : solr-user <[hidden email]> Objet : Re:
> > Ignore accent in a request
> >
> > exactly _how_ is it "not working"?
> >
> > Try building your parameters _up_ rather than starting with a lot, e.g.
> > select?defType=dismax&q=je suis avarié&qf=title ^^ assumes you expect
> > a match on title. Then:
> > select?defType=dismax&q=je suis avarié&qf=title subject
> >
> > etc.
> >
> > Because mm=757 looks really wrong. From the docs:
> > Defines the minimum number of clauses that must match, regardless of
> > how many clauses there are in total.
> >
> > edismax is used much more than dismax as it's more flexible, but
> > that's not germane here.
> >
> > finally, try adding &debug=query to the url to see exactly how the
> > query is parsed.
> >
> > Best,
> > Erick
> >
> > On Mon, Feb 4, 2019 at 9:09 AM SAUNIER Maxence <[hidden email]> wrote:
> > >
> > > Hello,
> > >
> > > How can I ignore accent in the query result ?
> > >
> > > Request :
> > > http://*****:8983/solr/***/select?defType=dismax&q=je+suis+avarié&qf
> > > =t
> > > itle%5e20+subject%5e15+category%5e1+content%5e0.5&mm=757
> > >
> > > I want to have doc with avarié and avarie.
> > >
> > > I have add this in my schema :
> > >
> > >   {
> > >     "name": "string",
> > >     "positionIncrementGap": "100",
> > >     "analyzer": {
> > >       "filters": [
> > >         {
> > >           "class": "solr.LowerCaseFilterFactory"
> > >         },
> > >         {
> > >           "class": "solr.ASCIIFoldingFilterFactory"
> > >         },
> > >         {
> > >           "class": "solr.EdgeNGramFilterFactory",
> > >           "minGramSize": "3",
> > >           "maxGramSize": "50"
> > >         }
> > >       ],
> > >       "tokenizer": {
> > >         "class": "solr.KeywordTokenizerFactory"
> > >       }
> > >     },
> > >     "stored": true,
> > >     "indexed": true,
> > >     "sortMissingLast": true,
> > >     "class": "solr.TextField"
> > >   },
> > >
> > > But it not working.
> > >
> > > Thanks.
> >
Reply | Threaded
Open this post in threaded view
|

RE: Ignore accent in a request

SAUNIER Maxence
For the charFilter, I need to reindex all documents ?

-----Message d'origine-----
De : Erick Erickson <[hidden email]>
Envoyé : vendredi 8 février 2019 18:03
À : solr-user <[hidden email]>
Objet : Re: Ignore accent in a request

Elisabeth's suggestion is spot on for the accent.

One other thing I noticed. You are using KeywordTokenizerFactory combined with EdgeNGramFilterFactory. This implies that you can't search for individual _words_, only prefix queries, i.e.
je
je s
je su
je sui
je suis

You can't search for "suis" for instance.

basically this is an efficient way to search anything starting with three-or-more letter prefixes at the expense of index size. You might be better off just using wildcards (restrict to three letters at the prefix though).

This is perfectly valid, I'm mostly asking if it's your intent.

Best,
Erick

On Fri, Feb 8, 2019 at 9:35 AM SAUNIER Maxence <[hidden email]> wrote:

>
> Thanks you !
>
> -----Message d'origine-----
> De : elisabeth benoit <[hidden email]> Envoyé : vendredi 8
> février 2019 14:12 À : [hidden email] Objet : Re: Ignore
> accent in a request
>
> Hello,
>
> We use solr 7 and use
>
> <charFilter class="solr.MappingCharFilterFactory"
> mapping="mapping-ISOLatin1Accent.txt"/>
>
> with mapping-ISOLatin1Accent.txt
>
> containing lines like
>
> # À => A
> "\u00C0" => "A"
>
> # Á => A
> "\u00C1" => "A"
>
> # Â => A
> "\u00C2" => "A"
>
> # Ã => A
> "\u00C3" => "A"
>
> # Ä => A
> "\u00C4" => "A"
>
> # Å => A
> "\u00C5" => "A"
>
> # Ā Ă Ą =>
> "\u0100" => "A"
> "\u0102" => "A"
> "\u0104" => "A"
>
> # Æ => AE
> "\u00C6" => "AE"
>
> # Ç => C
> "\u00C7" => "C"
>
> # é => e
> "\u00E9" => "e"
>
> Best regards,
> Elisabeth
>
> Le ven. 8 févr. 2019 à 11:18, Gopesh Sharma <[hidden email]> a écrit :
>
> > We have fixed this type of issue by using Synonyms by adding
> > SynonymFilterFactory(Before Solr 7).
> >
> > -----Original Message-----
> > From: SAUNIER Maxence <[hidden email]>
> > Sent: Friday, February 8, 2019 3:36 PM
> > To: [hidden email]
> > Subject: RE: Ignore accent in a request
> >
> > Hello,
> >
> > Thanks for you answer.
> >
> > I have test :
> >
> > select?defType=dismax&q=je suis avarié&qf=content
> > 90.000 results
> >
> > select?defType=dismax&q=je suis avarie&qf=content
> > 60.000 results
> >
> > With avarié, I dont find documents with avarie and with avarie, I
> > don't find documents with avarié.
> >
> > I want to find they 150.000 documents with avarié or avarie.
> >
> > Thanks
> >
> > -----Message d'origine-----
> > De : Erick Erickson <[hidden email]> Envoyé : jeudi 7
> > février
> > 2019 19:37 À : solr-user <[hidden email]> Objet : Re:
> > Ignore accent in a request
> >
> > exactly _how_ is it "not working"?
> >
> > Try building your parameters _up_ rather than starting with a lot, e.g.
> > select?defType=dismax&q=je suis avarié&qf=title ^^ assumes you
> > expect a match on title. Then:
> > select?defType=dismax&q=je suis avarié&qf=title subject
> >
> > etc.
> >
> > Because mm=757 looks really wrong. From the docs:
> > Defines the minimum number of clauses that must match, regardless of
> > how many clauses there are in total.
> >
> > edismax is used much more than dismax as it's more flexible, but
> > that's not germane here.
> >
> > finally, try adding &debug=query to the url to see exactly how the
> > query is parsed.
> >
> > Best,
> > Erick
> >
> > On Mon, Feb 4, 2019 at 9:09 AM SAUNIER Maxence <[hidden email]> wrote:
> > >
> > > Hello,
> > >
> > > How can I ignore accent in the query result ?
> > >
> > > Request :
> > > http://*****:8983/solr/***/select?defType=dismax&q=je+suis+avarié&
> > > qf
> > > =t
> > > itle%5e20+subject%5e15+category%5e1+content%5e0.5&mm=757
> > >
> > > I want to have doc with avarié and avarie.
> > >
> > > I have add this in my schema :
> > >
> > >   {
> > >     "name": "string",
> > >     "positionIncrementGap": "100",
> > >     "analyzer": {
> > >       "filters": [
> > >         {
> > >           "class": "solr.LowerCaseFilterFactory"
> > >         },
> > >         {
> > >           "class": "solr.ASCIIFoldingFilterFactory"
> > >         },
> > >         {
> > >           "class": "solr.EdgeNGramFilterFactory",
> > >           "minGramSize": "3",
> > >           "maxGramSize": "50"
> > >         }
> > >       ],
> > >       "tokenizer": {
> > >         "class": "solr.KeywordTokenizerFactory"
> > >       }
> > >     },
> > >     "stored": true,
> > >     "indexed": true,
> > >     "sortMissingLast": true,
> > >     "class": "solr.TextField"
> > >   },
> > >
> > > But it not working.
> > >
> > > Thanks.
> >
Reply | Threaded
Open this post in threaded view
|

Re: Ignore accent in a request

elisabeth benoit
yes you do

and use the char filter at index and query time

Le ven. 8 févr. 2019 à 19:20, SAUNIER Maxence <[hidden email]> a écrit :

> For the charFilter, I need to reindex all documents ?
>
> -----Message d'origine-----
> De : Erick Erickson <[hidden email]>
> Envoyé : vendredi 8 février 2019 18:03
> À : solr-user <[hidden email]>
> Objet : Re: Ignore accent in a request
>
> Elisabeth's suggestion is spot on for the accent.
>
> One other thing I noticed. You are using KeywordTokenizerFactory combined
> with EdgeNGramFilterFactory. This implies that you can't search for
> individual _words_, only prefix queries, i.e.
> je
> je s
> je su
> je sui
> je suis
>
> You can't search for "suis" for instance.
>
> basically this is an efficient way to search anything starting with
> three-or-more letter prefixes at the expense of index size. You might be
> better off just using wildcards (restrict to three letters at the prefix
> though).
>
> This is perfectly valid, I'm mostly asking if it's your intent.
>
> Best,
> Erick
>
> On Fri, Feb 8, 2019 at 9:35 AM SAUNIER Maxence <[hidden email]> wrote:
> >
> > Thanks you !
> >
> > -----Message d'origine-----
> > De : elisabeth benoit <[hidden email]> Envoyé : vendredi 8
> > février 2019 14:12 À : [hidden email] Objet : Re: Ignore
> > accent in a request
> >
> > Hello,
> >
> > We use solr 7 and use
> >
> > <charFilter class="solr.MappingCharFilterFactory"
> > mapping="mapping-ISOLatin1Accent.txt"/>
> >
> > with mapping-ISOLatin1Accent.txt
> >
> > containing lines like
> >
> > # À => A
> > "\u00C0" => "A"
> >
> > # Á => A
> > "\u00C1" => "A"
> >
> > # Â => A
> > "\u00C2" => "A"
> >
> > # Ã => A
> > "\u00C3" => "A"
> >
> > # Ä => A
> > "\u00C4" => "A"
> >
> > # Å => A
> > "\u00C5" => "A"
> >
> > # Ā Ă Ą =>
> > "\u0100" => "A"
> > "\u0102" => "A"
> > "\u0104" => "A"
> >
> > # Æ => AE
> > "\u00C6" => "AE"
> >
> > # Ç => C
> > "\u00C7" => "C"
> >
> > # é => e
> > "\u00E9" => "e"
> >
> > Best regards,
> > Elisabeth
> >
> > Le ven. 8 févr. 2019 à 11:18, Gopesh Sharma <[hidden email]>
> a écrit :
> >
> > > We have fixed this type of issue by using Synonyms by adding
> > > SynonymFilterFactory(Before Solr 7).
> > >
> > > -----Original Message-----
> > > From: SAUNIER Maxence <[hidden email]>
> > > Sent: Friday, February 8, 2019 3:36 PM
> > > To: [hidden email]
> > > Subject: RE: Ignore accent in a request
> > >
> > > Hello,
> > >
> > > Thanks for you answer.
> > >
> > > I have test :
> > >
> > > select?defType=dismax&q=je suis avarié&qf=content
> > > 90.000 results
> > >
> > > select?defType=dismax&q=je suis avarie&qf=content
> > > 60.000 results
> > >
> > > With avarié, I dont find documents with avarie and with avarie, I
> > > don't find documents with avarié.
> > >
> > > I want to find they 150.000 documents with avarié or avarie.
> > >
> > > Thanks
> > >
> > > -----Message d'origine-----
> > > De : Erick Erickson <[hidden email]> Envoyé : jeudi 7
> > > février
> > > 2019 19:37 À : solr-user <[hidden email]> Objet : Re:
> > > Ignore accent in a request
> > >
> > > exactly _how_ is it "not working"?
> > >
> > > Try building your parameters _up_ rather than starting with a lot, e.g.
> > > select?defType=dismax&q=je suis avarié&qf=title ^^ assumes you
> > > expect a match on title. Then:
> > > select?defType=dismax&q=je suis avarié&qf=title subject
> > >
> > > etc.
> > >
> > > Because mm=757 looks really wrong. From the docs:
> > > Defines the minimum number of clauses that must match, regardless of
> > > how many clauses there are in total.
> > >
> > > edismax is used much more than dismax as it's more flexible, but
> > > that's not germane here.
> > >
> > > finally, try adding &debug=query to the url to see exactly how the
> > > query is parsed.
> > >
> > > Best,
> > > Erick
> > >
> > > On Mon, Feb 4, 2019 at 9:09 AM SAUNIER Maxence <[hidden email]>
> wrote:
> > > >
> > > > Hello,
> > > >
> > > > How can I ignore accent in the query result ?
> > > >
> > > > Request :
> > > > http://*****:8983/solr/***/select?defType=dismax&q=je+suis+avarié&
> > > > qf
> > > > =t
> > > > itle%5e20+subject%5e15+category%5e1+content%5e0.5&mm=757
> > > >
> > > > I want to have doc with avarié and avarie.
> > > >
> > > > I have add this in my schema :
> > > >
> > > >   {
> > > >     "name": "string",
> > > >     "positionIncrementGap": "100",
> > > >     "analyzer": {
> > > >       "filters": [
> > > >         {
> > > >           "class": "solr.LowerCaseFilterFactory"
> > > >         },
> > > >         {
> > > >           "class": "solr.ASCIIFoldingFilterFactory"
> > > >         },
> > > >         {
> > > >           "class": "solr.EdgeNGramFilterFactory",
> > > >           "minGramSize": "3",
> > > >           "maxGramSize": "50"
> > > >         }
> > > >       ],
> > > >       "tokenizer": {
> > > >         "class": "solr.KeywordTokenizerFactory"
> > > >       }
> > > >     },
> > > >     "stored": true,
> > > >     "indexed": true,
> > > >     "sortMissingLast": true,
> > > >     "class": "solr.TextField"
> > > >   },
> > > >
> > > > But it not working.
> > > >
> > > > Thanks.
> > >
>
Reply | Threaded
Open this post in threaded view
|

Re: Ignore accent in a request

Ere Maijala
Please note that mapping characters works well for a small set of
characters, but if you want full UNICODE normalization, take a look at
the ICUFoldingFilter:
https://lucene.apache.org/solr/guide/6_6/filter-descriptions.html#FilterDescriptions-ICUFoldingFilter

--Ere

elisabeth benoit kirjoitti 8.2.2019 klo 22.47:

> yes you do
>
> and use the char filter at index and query time
>
> Le ven. 8 févr. 2019 à 19:20, SAUNIER Maxence <[hidden email]> a écrit :
>
>> For the charFilter, I need to reindex all documents ?
>>
>> -----Message d'origine-----
>> De : Erick Erickson <[hidden email]>
>> Envoyé : vendredi 8 février 2019 18:03
>> À : solr-user <[hidden email]>
>> Objet : Re: Ignore accent in a request
>>
>> Elisabeth's suggestion is spot on for the accent.
>>
>> One other thing I noticed. You are using KeywordTokenizerFactory combined
>> with EdgeNGramFilterFactory. This implies that you can't search for
>> individual _words_, only prefix queries, i.e.
>> je
>> je s
>> je su
>> je sui
>> je suis
>>
>> You can't search for "suis" for instance.
>>
>> basically this is an efficient way to search anything starting with
>> three-or-more letter prefixes at the expense of index size. You might be
>> better off just using wildcards (restrict to three letters at the prefix
>> though).
>>
>> This is perfectly valid, I'm mostly asking if it's your intent.
>>
>> Best,
>> Erick
>>
>> On Fri, Feb 8, 2019 at 9:35 AM SAUNIER Maxence <[hidden email]> wrote:
>>>
>>> Thanks you !
>>>
>>> -----Message d'origine-----
>>> De : elisabeth benoit <[hidden email]> Envoyé : vendredi 8
>>> février 2019 14:12 À : [hidden email] Objet : Re: Ignore
>>> accent in a request
>>>
>>> Hello,
>>>
>>> We use solr 7 and use
>>>
>>> <charFilter class="solr.MappingCharFilterFactory"
>>> mapping="mapping-ISOLatin1Accent.txt"/>
>>>
>>> with mapping-ISOLatin1Accent.txt
>>>
>>> containing lines like
>>>
>>> # À => A
>>> "\u00C0" => "A"
>>>
>>> # Á => A
>>> "\u00C1" => "A"
>>>
>>> # Â => A
>>> "\u00C2" => "A"
>>>
>>> # Ã => A
>>> "\u00C3" => "A"
>>>
>>> # Ä => A
>>> "\u00C4" => "A"
>>>
>>> # Å => A
>>> "\u00C5" => "A"
>>>
>>> # Ā Ă Ą =>
>>> "\u0100" => "A"
>>> "\u0102" => "A"
>>> "\u0104" => "A"
>>>
>>> # Æ => AE
>>> "\u00C6" => "AE"
>>>
>>> # Ç => C
>>> "\u00C7" => "C"
>>>
>>> # é => e
>>> "\u00E9" => "e"
>>>
>>> Best regards,
>>> Elisabeth
>>>
>>> Le ven. 8 févr. 2019 à 11:18, Gopesh Sharma <[hidden email]>
>> a écrit :
>>>
>>>> We have fixed this type of issue by using Synonyms by adding
>>>> SynonymFilterFactory(Before Solr 7).
>>>>
>>>> -----Original Message-----
>>>> From: SAUNIER Maxence <[hidden email]>
>>>> Sent: Friday, February 8, 2019 3:36 PM
>>>> To: [hidden email]
>>>> Subject: RE: Ignore accent in a request
>>>>
>>>> Hello,
>>>>
>>>> Thanks for you answer.
>>>>
>>>> I have test :
>>>>
>>>> select?defType=dismax&q=je suis avarié&qf=content
>>>> 90.000 results
>>>>
>>>> select?defType=dismax&q=je suis avarie&qf=content
>>>> 60.000 results
>>>>
>>>> With avarié, I dont find documents with avarie and with avarie, I
>>>> don't find documents with avarié.
>>>>
>>>> I want to find they 150.000 documents with avarié or avarie.
>>>>
>>>> Thanks
>>>>
>>>> -----Message d'origine-----
>>>> De : Erick Erickson <[hidden email]> Envoyé : jeudi 7
>>>> février
>>>> 2019 19:37 À : solr-user <[hidden email]> Objet : Re:
>>>> Ignore accent in a request
>>>>
>>>> exactly _how_ is it "not working"?
>>>>
>>>> Try building your parameters _up_ rather than starting with a lot, e.g.
>>>> select?defType=dismax&q=je suis avarié&qf=title ^^ assumes you
>>>> expect a match on title. Then:
>>>> select?defType=dismax&q=je suis avarié&qf=title subject
>>>>
>>>> etc.
>>>>
>>>> Because mm=757 looks really wrong. From the docs:
>>>> Defines the minimum number of clauses that must match, regardless of
>>>> how many clauses there are in total.
>>>>
>>>> edismax is used much more than dismax as it's more flexible, but
>>>> that's not germane here.
>>>>
>>>> finally, try adding &debug=query to the url to see exactly how the
>>>> query is parsed.
>>>>
>>>> Best,
>>>> Erick
>>>>
>>>> On Mon, Feb 4, 2019 at 9:09 AM SAUNIER Maxence <[hidden email]>
>> wrote:
>>>>>
>>>>> Hello,
>>>>>
>>>>> How can I ignore accent in the query result ?
>>>>>
>>>>> Request :
>>>>> http://*****:8983/solr/***/select?defType=dismax&q=je+suis+avarié&
>>>>> qf
>>>>> =t
>>>>> itle%5e20+subject%5e15+category%5e1+content%5e0.5&mm=757
>>>>>
>>>>> I want to have doc with avarié and avarie.
>>>>>
>>>>> I have add this in my schema :
>>>>>
>>>>>   {
>>>>>     "name": "string",
>>>>>     "positionIncrementGap": "100",
>>>>>     "analyzer": {
>>>>>       "filters": [
>>>>>         {
>>>>>           "class": "solr.LowerCaseFilterFactory"
>>>>>         },
>>>>>         {
>>>>>           "class": "solr.ASCIIFoldingFilterFactory"
>>>>>         },
>>>>>         {
>>>>>           "class": "solr.EdgeNGramFilterFactory",
>>>>>           "minGramSize": "3",
>>>>>           "maxGramSize": "50"
>>>>>         }
>>>>>       ],
>>>>>       "tokenizer": {
>>>>>         "class": "solr.KeywordTokenizerFactory"
>>>>>       }
>>>>>     },
>>>>>     "stored": true,
>>>>>     "indexed": true,
>>>>>     "sortMissingLast": true,
>>>>>     "class": "solr.TextField"
>>>>>   },
>>>>>
>>>>> But it not working.
>>>>>
>>>>> Thanks.
>>>>
>>
>

--
Ere Maijala
Kansalliskirjasto / The National Library of Finland
Reply | Threaded
Open this post in threaded view
|

Re: Ignore accent in a request

elisabeth benoit
Thanks for the hint. We've been using the char filter for full unidecode
normalization. Is the ICUFoldingFilter supposed to be faster? Or just
simpler to use?

Le lun. 11 févr. 2019 à 09:58, Ere Maijala <[hidden email]> a
écrit :

> Please note that mapping characters works well for a small set of
> characters, but if you want full UNICODE normalization, take a look at
> the ICUFoldingFilter:
>
> https://lucene.apache.org/solr/guide/6_6/filter-descriptions.html#FilterDescriptions-ICUFoldingFilter
>
> --Ere
>
> elisabeth benoit kirjoitti 8.2.2019 klo 22.47:
> > yes you do
> >
> > and use the char filter at index and query time
> >
> > Le ven. 8 févr. 2019 à 19:20, SAUNIER Maxence <[hidden email]> a
> écrit :
> >
> >> For the charFilter, I need to reindex all documents ?
> >>
> >> -----Message d'origine-----
> >> De : Erick Erickson <[hidden email]>
> >> Envoyé : vendredi 8 février 2019 18:03
> >> À : solr-user <[hidden email]>
> >> Objet : Re: Ignore accent in a request
> >>
> >> Elisabeth's suggestion is spot on for the accent.
> >>
> >> One other thing I noticed. You are using KeywordTokenizerFactory
> combined
> >> with EdgeNGramFilterFactory. This implies that you can't search for
> >> individual _words_, only prefix queries, i.e.
> >> je
> >> je s
> >> je su
> >> je sui
> >> je suis
> >>
> >> You can't search for "suis" for instance.
> >>
> >> basically this is an efficient way to search anything starting with
> >> three-or-more letter prefixes at the expense of index size. You might be
> >> better off just using wildcards (restrict to three letters at the prefix
> >> though).
> >>
> >> This is perfectly valid, I'm mostly asking if it's your intent.
> >>
> >> Best,
> >> Erick
> >>
> >> On Fri, Feb 8, 2019 at 9:35 AM SAUNIER Maxence <[hidden email]>
> wrote:
> >>>
> >>> Thanks you !
> >>>
> >>> -----Message d'origine-----
> >>> De : elisabeth benoit <[hidden email]> Envoyé : vendredi 8
> >>> février 2019 14:12 À : [hidden email] Objet : Re: Ignore
> >>> accent in a request
> >>>
> >>> Hello,
> >>>
> >>> We use solr 7 and use
> >>>
> >>> <charFilter class="solr.MappingCharFilterFactory"
> >>> mapping="mapping-ISOLatin1Accent.txt"/>
> >>>
> >>> with mapping-ISOLatin1Accent.txt
> >>>
> >>> containing lines like
> >>>
> >>> # À => A
> >>> "\u00C0" => "A"
> >>>
> >>> # Á => A
> >>> "\u00C1" => "A"
> >>>
> >>> # Â => A
> >>> "\u00C2" => "A"
> >>>
> >>> # Ã => A
> >>> "\u00C3" => "A"
> >>>
> >>> # Ä => A
> >>> "\u00C4" => "A"
> >>>
> >>> # Å => A
> >>> "\u00C5" => "A"
> >>>
> >>> # Ā Ă Ą =>
> >>> "\u0100" => "A"
> >>> "\u0102" => "A"
> >>> "\u0104" => "A"
> >>>
> >>> # Æ => AE
> >>> "\u00C6" => "AE"
> >>>
> >>> # Ç => C
> >>> "\u00C7" => "C"
> >>>
> >>> # é => e
> >>> "\u00E9" => "e"
> >>>
> >>> Best regards,
> >>> Elisabeth
> >>>
> >>> Le ven. 8 févr. 2019 à 11:18, Gopesh Sharma <[hidden email]
> >
> >> a écrit :
> >>>
> >>>> We have fixed this type of issue by using Synonyms by adding
> >>>> SynonymFilterFactory(Before Solr 7).
> >>>>
> >>>> -----Original Message-----
> >>>> From: SAUNIER Maxence <[hidden email]>
> >>>> Sent: Friday, February 8, 2019 3:36 PM
> >>>> To: [hidden email]
> >>>> Subject: RE: Ignore accent in a request
> >>>>
> >>>> Hello,
> >>>>
> >>>> Thanks for you answer.
> >>>>
> >>>> I have test :
> >>>>
> >>>> select?defType=dismax&q=je suis avarié&qf=content
> >>>> 90.000 results
> >>>>
> >>>> select?defType=dismax&q=je suis avarie&qf=content
> >>>> 60.000 results
> >>>>
> >>>> With avarié, I dont find documents with avarie and with avarie, I
> >>>> don't find documents with avarié.
> >>>>
> >>>> I want to find they 150.000 documents with avarié or avarie.
> >>>>
> >>>> Thanks
> >>>>
> >>>> -----Message d'origine-----
> >>>> De : Erick Erickson <[hidden email]> Envoyé : jeudi 7
> >>>> février
> >>>> 2019 19:37 À : solr-user <[hidden email]> Objet : Re:
> >>>> Ignore accent in a request
> >>>>
> >>>> exactly _how_ is it "not working"?
> >>>>
> >>>> Try building your parameters _up_ rather than starting with a lot,
> e.g.
> >>>> select?defType=dismax&q=je suis avarié&qf=title ^^ assumes you
> >>>> expect a match on title. Then:
> >>>> select?defType=dismax&q=je suis avarié&qf=title subject
> >>>>
> >>>> etc.
> >>>>
> >>>> Because mm=757 looks really wrong. From the docs:
> >>>> Defines the minimum number of clauses that must match, regardless of
> >>>> how many clauses there are in total.
> >>>>
> >>>> edismax is used much more than dismax as it's more flexible, but
> >>>> that's not germane here.
> >>>>
> >>>> finally, try adding &debug=query to the url to see exactly how the
> >>>> query is parsed.
> >>>>
> >>>> Best,
> >>>> Erick
> >>>>
> >>>> On Mon, Feb 4, 2019 at 9:09 AM SAUNIER Maxence <[hidden email]>
> >> wrote:
> >>>>>
> >>>>> Hello,
> >>>>>
> >>>>> How can I ignore accent in the query result ?
> >>>>>
> >>>>> Request :
> >>>>> http://*****:8983/solr/***/select?defType=dismax&q=je+suis+avarié&
> >>>>> qf
> >>>>> =t
> >>>>> itle%5e20+subject%5e15+category%5e1+content%5e0.5&mm=757
> >>>>>
> >>>>> I want to have doc with avarié and avarie.
> >>>>>
> >>>>> I have add this in my schema :
> >>>>>
> >>>>>   {
> >>>>>     "name": "string",
> >>>>>     "positionIncrementGap": "100",
> >>>>>     "analyzer": {
> >>>>>       "filters": [
> >>>>>         {
> >>>>>           "class": "solr.LowerCaseFilterFactory"
> >>>>>         },
> >>>>>         {
> >>>>>           "class": "solr.ASCIIFoldingFilterFactory"
> >>>>>         },
> >>>>>         {
> >>>>>           "class": "solr.EdgeNGramFilterFactory",
> >>>>>           "minGramSize": "3",
> >>>>>           "maxGramSize": "50"
> >>>>>         }
> >>>>>       ],
> >>>>>       "tokenizer": {
> >>>>>         "class": "solr.KeywordTokenizerFactory"
> >>>>>       }
> >>>>>     },
> >>>>>     "stored": true,
> >>>>>     "indexed": true,
> >>>>>     "sortMissingLast": true,
> >>>>>     "class": "solr.TextField"
> >>>>>   },
> >>>>>
> >>>>> But it not working.
> >>>>>
> >>>>> Thanks.
> >>>>
> >>
> >
>
> --
> Ere Maijala
> Kansalliskirjasto / The National Library of Finland
>
Reply | Threaded
Open this post in threaded view
|

Re: Ignore accent in a request

Ere Maijala
I'm not brave enough to try char filter with such a large table, so I
can't really comment on that. I gave up with char filter after running
into some trouble handling cyrillic letters. At least ICUFoldingFilter
is really simple to use, and with more recent Solr versions you can also
use it with MappingCharFilter if necessary by defining a filter that
leaves given characters alone (see
https://lucene.apache.org/solr/guide/7_6/filter-descriptions.html#FilterDescriptions-ICUFoldingFilter
instead of the previous link I posted for up to date documentation).
Here's the real life configuration we use:

https://github.com/NatLibFi/finna-solr/blob/master/vufind/biblio/conf/schema.xml#L6

--Ere

elisabeth benoit kirjoitti 11.2.2019 klo 11.37:

> Thanks for the hint. We've been using the char filter for full unidecode
> normalization. Is the ICUFoldingFilter supposed to be faster? Or just
> simpler to use?
>
> Le lun. 11 févr. 2019 à 09:58, Ere Maijala <[hidden email]> a
> écrit :
>
>> Please note that mapping characters works well for a small set of
>> characters, but if you want full UNICODE normalization, take a look at
>> the ICUFoldingFilter:
>>
>> https://lucene.apache.org/solr/guide/6_6/filter-descriptions.html#FilterDescriptions-ICUFoldingFilter
>>
>> --Ere
>>
>> elisabeth benoit kirjoitti 8.2.2019 klo 22.47:
>>> yes you do
>>>
>>> and use the char filter at index and query time
>>>
>>> Le ven. 8 févr. 2019 à 19:20, SAUNIER Maxence <[hidden email]> a
>> écrit :
>>>
>>>> For the charFilter, I need to reindex all documents ?
>>>>
>>>> -----Message d'origine-----
>>>> De : Erick Erickson <[hidden email]>
>>>> Envoyé : vendredi 8 février 2019 18:03
>>>> À : solr-user <[hidden email]>
>>>> Objet : Re: Ignore accent in a request
>>>>
>>>> Elisabeth's suggestion is spot on for the accent.
>>>>
>>>> One other thing I noticed. You are using KeywordTokenizerFactory
>> combined
>>>> with EdgeNGramFilterFactory. This implies that you can't search for
>>>> individual _words_, only prefix queries, i.e.
>>>> je
>>>> je s
>>>> je su
>>>> je sui
>>>> je suis
>>>>
>>>> You can't search for "suis" for instance.
>>>>
>>>> basically this is an efficient way to search anything starting with
>>>> three-or-more letter prefixes at the expense of index size. You might be
>>>> better off just using wildcards (restrict to three letters at the prefix
>>>> though).
>>>>
>>>> This is perfectly valid, I'm mostly asking if it's your intent.
>>>>
>>>> Best,
>>>> Erick
>>>>
>>>> On Fri, Feb 8, 2019 at 9:35 AM SAUNIER Maxence <[hidden email]>
>> wrote:
>>>>>
>>>>> Thanks you !
>>>>>
>>>>> -----Message d'origine-----
>>>>> De : elisabeth benoit <[hidden email]> Envoyé : vendredi 8
>>>>> février 2019 14:12 À : [hidden email] Objet : Re: Ignore
>>>>> accent in a request
>>>>>
>>>>> Hello,
>>>>>
>>>>> We use solr 7 and use
>>>>>
>>>>> <charFilter class="solr.MappingCharFilterFactory"
>>>>> mapping="mapping-ISOLatin1Accent.txt"/>
>>>>>
>>>>> with mapping-ISOLatin1Accent.txt
>>>>>
>>>>> containing lines like
>>>>>
>>>>> # À => A
>>>>> "\u00C0" => "A"
>>>>>
>>>>> # Á => A
>>>>> "\u00C1" => "A"
>>>>>
>>>>> # Â => A
>>>>> "\u00C2" => "A"
>>>>>
>>>>> # Ã => A
>>>>> "\u00C3" => "A"
>>>>>
>>>>> # Ä => A
>>>>> "\u00C4" => "A"
>>>>>
>>>>> # Å => A
>>>>> "\u00C5" => "A"
>>>>>
>>>>> # Ā Ă Ą =>
>>>>> "\u0100" => "A"
>>>>> "\u0102" => "A"
>>>>> "\u0104" => "A"
>>>>>
>>>>> # Æ => AE
>>>>> "\u00C6" => "AE"
>>>>>
>>>>> # Ç => C
>>>>> "\u00C7" => "C"
>>>>>
>>>>> # é => e
>>>>> "\u00E9" => "e"
>>>>>
>>>>> Best regards,
>>>>> Elisabeth
>>>>>
>>>>> Le ven. 8 févr. 2019 à 11:18, Gopesh Sharma <[hidden email]
>>>
>>>> a écrit :
>>>>>
>>>>>> We have fixed this type of issue by using Synonyms by adding
>>>>>> SynonymFilterFactory(Before Solr 7).
>>>>>>
>>>>>> -----Original Message-----
>>>>>> From: SAUNIER Maxence <[hidden email]>
>>>>>> Sent: Friday, February 8, 2019 3:36 PM
>>>>>> To: [hidden email]
>>>>>> Subject: RE: Ignore accent in a request
>>>>>>
>>>>>> Hello,
>>>>>>
>>>>>> Thanks for you answer.
>>>>>>
>>>>>> I have test :
>>>>>>
>>>>>> select?defType=dismax&q=je suis avarié&qf=content
>>>>>> 90.000 results
>>>>>>
>>>>>> select?defType=dismax&q=je suis avarie&qf=content
>>>>>> 60.000 results
>>>>>>
>>>>>> With avarié, I dont find documents with avarie and with avarie, I
>>>>>> don't find documents with avarié.
>>>>>>
>>>>>> I want to find they 150.000 documents with avarié or avarie.
>>>>>>
>>>>>> Thanks
>>>>>>
>>>>>> -----Message d'origine-----
>>>>>> De : Erick Erickson <[hidden email]> Envoyé : jeudi 7
>>>>>> février
>>>>>> 2019 19:37 À : solr-user <[hidden email]> Objet : Re:
>>>>>> Ignore accent in a request
>>>>>>
>>>>>> exactly _how_ is it "not working"?
>>>>>>
>>>>>> Try building your parameters _up_ rather than starting with a lot,
>> e.g.
>>>>>> select?defType=dismax&q=je suis avarié&qf=title ^^ assumes you
>>>>>> expect a match on title. Then:
>>>>>> select?defType=dismax&q=je suis avarié&qf=title subject
>>>>>>
>>>>>> etc.
>>>>>>
>>>>>> Because mm=757 looks really wrong. From the docs:
>>>>>> Defines the minimum number of clauses that must match, regardless of
>>>>>> how many clauses there are in total.
>>>>>>
>>>>>> edismax is used much more than dismax as it's more flexible, but
>>>>>> that's not germane here.
>>>>>>
>>>>>> finally, try adding &debug=query to the url to see exactly how the
>>>>>> query is parsed.
>>>>>>
>>>>>> Best,
>>>>>> Erick
>>>>>>
>>>>>> On Mon, Feb 4, 2019 at 9:09 AM SAUNIER Maxence <[hidden email]>
>>>> wrote:
>>>>>>>
>>>>>>> Hello,
>>>>>>>
>>>>>>> How can I ignore accent in the query result ?
>>>>>>>
>>>>>>> Request :
>>>>>>> http://*****:8983/solr/***/select?defType=dismax&q=je+suis+avarié&
>>>>>>> qf
>>>>>>> =t
>>>>>>> itle%5e20+subject%5e15+category%5e1+content%5e0.5&mm=757
>>>>>>>
>>>>>>> I want to have doc with avarié and avarie.
>>>>>>>
>>>>>>> I have add this in my schema :
>>>>>>>
>>>>>>>   {
>>>>>>>     "name": "string",
>>>>>>>     "positionIncrementGap": "100",
>>>>>>>     "analyzer": {
>>>>>>>       "filters": [
>>>>>>>         {
>>>>>>>           "class": "solr.LowerCaseFilterFactory"
>>>>>>>         },
>>>>>>>         {
>>>>>>>           "class": "solr.ASCIIFoldingFilterFactory"
>>>>>>>         },
>>>>>>>         {
>>>>>>>           "class": "solr.EdgeNGramFilterFactory",
>>>>>>>           "minGramSize": "3",
>>>>>>>           "maxGramSize": "50"
>>>>>>>         }
>>>>>>>       ],
>>>>>>>       "tokenizer": {
>>>>>>>         "class": "solr.KeywordTokenizerFactory"
>>>>>>>       }
>>>>>>>     },
>>>>>>>     "stored": true,
>>>>>>>     "indexed": true,
>>>>>>>     "sortMissingLast": true,
>>>>>>>     "class": "solr.TextField"
>>>>>>>   },
>>>>>>>
>>>>>>> But it not working.
>>>>>>>
>>>>>>> Thanks.
>>>>>>
>>>>
>>>
>>
>> --
>> Ere Maijala
>> Kansalliskirjasto / The National Library of Finland
>>
>

--
Ere Maijala
Kansalliskirjasto / The National Library of Finland