Wildcard search not working

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

Wildcard search not working

Ribeaud, Christian (Ext)
Hi,

What would be the reasons making the wildcard search for Lucene Query Parser NOT working?

We are using Solr 5.4.1 and, using the admin console, I am triggering for instance searches with term 'roche' in a specific core. Everything fine, I am getting for instance two matches. I would expect at least the same number of matches with term 'r?che'. However, this does NOT happen. I am getting zero matches. Same problem occurs with 'r*che'. 'roch?' does not work neither but 'roch*' works.

Switching debug mode brings following output:

"debug": {
    "rawquerystring": "roch?",
    "querystring": "roch?",
    "parsedquery": "text:roch?",
    "parsedquery_toString": "text:roch?",
    "explain": {},
    "QParser": "LuceneQParser",
...

Any idea? Thanks and cheers,

christian

Reply | Threaded
Open this post in threaded view
|

Re: Wildcard search not working

Ahmet Arslan
Hi Chiristian,

The query r?che may not return at least the same number of matches as roche depending on your analysis chain.
The difference is roche is analyzed but r?che don't. Wildcard queries are executed on the indexed/analyzed terms.
For example, if roche is indexed/analyzed as roch, the query r?che won't match it.

Please see : https://wiki.apache.org/solr/MultitermQueryAnalysis

Ahmet



On Thursday, August 11, 2016 4:42 PM, "Ribeaud, Christian (Ext)" <[hidden email]> wrote:
Hi,

What would be the reasons making the wildcard search for Lucene Query Parser NOT working?

We are using Solr 5.4.1 and, using the admin console, I am triggering for instance searches with term 'roche' in a specific core. Everything fine, I am getting for instance two matches. I would expect at least the same number of matches with term 'r?che'. However, this does NOT happen. I am getting zero matches. Same problem occurs with 'r*che'. 'roch?' does not work neither but 'roch*' works.

Switching debug mode brings following output:

"debug": {
    "rawquerystring": "roch?",
    "querystring": "roch?",
    "parsedquery": "text:roch?",
    "parsedquery_toString": "text:roch?",
    "explain": {},
    "QParser": "LuceneQParser",
...

Any idea? Thanks and cheers,

christian
Reply | Threaded
Open this post in threaded view
|

RE: Wildcard search not working

Ribeaud, Christian (Ext)
Hi Ahmet,

Many thanks for your reply. I had a look at the URL you pointed out but, honestly, I have to admit that I did not fully understand you.
Let's be a bit more concrete. Following the schema snippet for the corresponding field:

...
<field name="title" type="text_de" indexed="true" stored="true" required="false" multiValued="false" />

<!-- German -->
<fieldType name="text_de" class="solr.TextField" positionIncrementGap="100">
    <analyzer>
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/stopwords_de.txt" format="snowball" />
        <filter class="solr.GermanNormalizationFilterFactory"/>
        <filter class="solr.GermanLightStemFilterFactory"/>
        <!-- less aggressive: <filter class="solr.GermanMinimalStemFilterFactory"/> -->
        <!-- more aggressive: <filter class="solr.SnowballPorterFilterFactory" language="German2"/> -->
    </analyzer>
</fieldType>
...

What is wrong with this schema? Respectively, what should I change to be able to correctly do wildcard searches?

Many thanks for your time. Cheers,

christian
--
Christian Ribeaud
Software Engineer (External)
NIBR / WSJ-310.5.17
Novartis Campus
CH-4056 Basel


-----Original Message-----
From: Ahmet Arslan [mailto:[hidden email]]
Sent: Donnerstag, 11. August 2016 16:00
To: [hidden email]; Ribeaud, Christian (Ext)
Subject: Re: Wildcard search not working

Hi Chiristian,

The query r?che may not return at least the same number of matches as roche depending on your analysis chain.
The difference is roche is analyzed but r?che don't. Wildcard queries are executed on the indexed/analyzed terms.
For example, if roche is indexed/analyzed as roch, the query r?che won't match it.

Please see : https://wiki.apache.org/solr/MultitermQueryAnalysis

Ahmet



On Thursday, August 11, 2016 4:42 PM, "Ribeaud, Christian (Ext)" <[hidden email]> wrote:
Hi,

What would be the reasons making the wildcard search for Lucene Query Parser NOT working?

We are using Solr 5.4.1 and, using the admin console, I am triggering for instance searches with term 'roche' in a specific core. Everything fine, I am getting for instance two matches. I would expect at least the same number of matches with term 'r?che'. However, this does NOT happen. I am getting zero matches. Same problem occurs with 'r*che'. 'roch?' does not work neither but 'roch*' works.

Switching debug mode brings following output:

"debug": {
    "rawquerystring": "roch?",
    "querystring": "roch?",
    "parsedquery": "text:roch?",
    "parsedquery_toString": "text:roch?",
    "explain": {},
    "QParser": "LuceneQParser",
...

Any idea? Thanks and cheers,

christian
Reply | Threaded
Open this post in threaded view
|

Re: Wildcard search not working

Malcolm Upayavira Holmes
You have a stemming filter in your analysis chain. Go to the analysis
tab, select the 'text' field, and put "Roche" into both boxes. Click
analyse. I bet you you will see Roch, not Roche, because of your
stemming filter shown below.

That's what Ahmet shrewdly identified above.

Upayavira

On Thu, 11 Aug 2016, at 08:31 PM, Ribeaud, Christian (Ext) wrote:

> Hi Ahmet,
>
> Many thanks for your reply. I had a look at the URL you pointed out but,
> honestly, I have to admit that I did not fully understand you.
> Let's be a bit more concrete. Following the schema snippet for the
> corresponding field:
>
> ...
> <field name="title" type="text_de" indexed="true" stored="true"
> required="false" multiValued="false" />
>
> <!-- German -->
> <fieldType name="text_de" class="solr.TextField"
> positionIncrementGap="100">
>     <analyzer>
>         <tokenizer class="solr.StandardTokenizerFactory"/>
>         <filter class="solr.LowerCaseFilterFactory"/>
>         <filter class="solr.StopFilterFactory" ignoreCase="true"
>         words="lang/stopwords_de.txt" format="snowball" />
>         <filter class="solr.GermanNormalizationFilterFactory"/>
>         <filter class="solr.GermanLightStemFilterFactory"/>
>         <!-- less aggressive: <filter
>         class="solr.GermanMinimalStemFilterFactory"/> -->
>         <!-- more aggressive: <filter
>         class="solr.SnowballPorterFilterFactory" language="German2"/> -->
>     </analyzer>
> </fieldType>
> ...
>
> What is wrong with this schema? Respectively, what should I change to be
> able to correctly do wildcard searches?
>
> Many thanks for your time. Cheers,
>
> christian
> --
> Christian Ribeaud
> Software Engineer (External)
> NIBR / WSJ-310.5.17
> Novartis Campus
> CH-4056 Basel
>
>
> -----Original Message-----
> From: Ahmet Arslan [mailto:[hidden email]]
> Sent: Donnerstag, 11. August 2016 16:00
> To: [hidden email]; Ribeaud, Christian (Ext)
> Subject: Re: Wildcard search not working
>
> Hi Chiristian,
>
> The query r?che may not return at least the same number of matches as
> roche depending on your analysis chain.
> The difference is roche is analyzed but r?che don't. Wildcard queries are
> executed on the indexed/analyzed terms.
> For example, if roche is indexed/analyzed as roch, the query r?che won't
> match it.
>
> Please see : https://wiki.apache.org/solr/MultitermQueryAnalysis
>
> Ahmet
>
>
>
> On Thursday, August 11, 2016 4:42 PM, "Ribeaud, Christian (Ext)"
> <[hidden email]> wrote:
> Hi,
>
> What would be the reasons making the wildcard search for Lucene Query
> Parser NOT working?
>
> We are using Solr 5.4.1 and, using the admin console, I am triggering for
> instance searches with term 'roche' in a specific core. Everything fine,
> I am getting for instance two matches. I would expect at least the same
> number of matches with term 'r?che'. However, this does NOT happen. I am
> getting zero matches. Same problem occurs with 'r*che'. 'roch?' does not
> work neither but 'roch*' works.
>
> Switching debug mode brings following output:
>
> "debug": {
>     "rawquerystring": "roch?",
>     "querystring": "roch?",
>     "parsedquery": "text:roch?",
>     "parsedquery_toString": "text:roch?",
>     "explain": {},
>     "QParser": "LuceneQParser",
> ...
>
> Any idea? Thanks and cheers,
>
> christian
Reply | Threaded
Open this post in threaded view
|

Re: Wildcard search not working

Ahmet Arslan
In reply to this post by Ribeaud, Christian (Ext)
Hi Christian,

Please use the following filter before/above the stemmer.
<filter class="solr.KeywordRepeatFilterFactory"/>

Plus, you may want to add :

<analyzer type="multiterm">
  <tokenizer class="solr.KeywordTokenizerFactory" />
  <filter class="solr.LowerCaseFilterFactory"/>
  <filter class="solr.GermanNormalizationFilterFactory"/></analyzer>

Ahmet



On Thursday, August 11, 2016 9:31 PM, "Ribeaud, Christian (Ext)" <[hidden email]> wrote:
Hi Ahmet,

Many thanks for your reply. I had a look at the URL you pointed out but, honestly, I have to admit that I did not fully understand you.
Let's be a bit more concrete. Following the schema snippet for the corresponding field:

...
<field name="title" type="text_de" indexed="true" stored="true" required="false" multiValued="false" />

<!-- German -->
<fieldType name="text_de" class="solr.TextField" positionIncrementGap="100">
    <analyzer>
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/stopwords_de.txt" format="snowball" />
        <filter class="solr.GermanNormalizationFilterFactory"/>
        <filter class="solr.GermanLightStemFilterFactory"/>
        <!-- less aggressive: <filter class="solr.GermanMinimalStemFilterFactory"/> -->
        <!-- more aggressive: <filter class="solr.SnowballPorterFilterFactory" language="German2"/> -->
    </analyzer>
</fieldType>
...

What is wrong with this schema? Respectively, what should I change to be able to correctly do wildcard searches?

Many thanks for your time. Cheers,

christian
--
Christian Ribeaud
Software Engineer (External)
NIBR / WSJ-310.5.17
Novartis Campus
CH-4056 Basel



-----Original Message-----
From: Ahmet Arslan [mailto:[hidden email]]
Sent: Donnerstag, 11. August 2016 16:00
To: [hidden email]; Ribeaud, Christian (Ext)
Subject: Re: Wildcard search not working

Hi Chiristian,

The query r?che may not return at least the same number of matches as roche depending on your analysis chain.
The difference is roche is analyzed but r?che don't. Wildcard queries are executed on the indexed/analyzed terms.
For example, if roche is indexed/analyzed as roch, the query r?che won't match it.

Please see : https://wiki.apache.org/solr/MultitermQueryAnalysis

Ahmet



On Thursday, August 11, 2016 4:42 PM, "Ribeaud, Christian (Ext)" <[hidden email]> wrote:
Hi,

What would be the reasons making the wildcard search for Lucene Query Parser NOT working?

We are using Solr 5.4.1 and, using the admin console, I am triggering for instance searches with term 'roche' in a specific core. Everything fine, I am getting for instance two matches. I would expect at least the same number of matches with term 'r?che'. However, this does NOT happen. I am getting zero matches. Same problem occurs with 'r*che'. 'roch?' does not work neither but 'roch*' works.

Switching debug mode brings following output:

"debug": {
    "rawquerystring": "roch?",
    "querystring": "roch?",
    "parsedquery": "text:roch?",
    "parsedquery_toString": "text:roch?",
    "explain": {},
    "QParser": "LuceneQParser",
...

Any idea? Thanks and cheers,

christian
Reply | Threaded
Open this post in threaded view
|

RE: Wildcard search not working

Ribeaud, Christian (Ext)
Hi Ahmet, Hi Upayavira,

OK, it seems that I have to dive a bit deeper in the Solr filters and tokenizers. I've just realized that my command there is too limited.
Thanks a lot guys so far for help. Cheers and have a nice day,

christian

-----Original Message-----
From: Ahmet Arslan [mailto:[hidden email]]
Sent: Freitag, 12. August 2016 07:41
To: [hidden email]; Ribeaud, Christian (Ext)
Subject: Re: Wildcard search not working

Hi Christian,

Please use the following filter before/above the stemmer.
<filter class="solr.KeywordRepeatFilterFactory"/>

Plus, you may want to add :

<analyzer type="multiterm">
  <tokenizer class="solr.KeywordTokenizerFactory" />
  <filter class="solr.LowerCaseFilterFactory"/>
  <filter class="solr.GermanNormalizationFilterFactory"/></analyzer>

Ahmet



On Thursday, August 11, 2016 9:31 PM, "Ribeaud, Christian (Ext)" <[hidden email]> wrote:
Hi Ahmet,

Many thanks for your reply. I had a look at the URL you pointed out but, honestly, I have to admit that I did not fully understand you.
Let's be a bit more concrete. Following the schema snippet for the corresponding field:

...
<field name="title" type="text_de" indexed="true" stored="true" required="false" multiValued="false" />

<!-- German -->
<fieldType name="text_de" class="solr.TextField" positionIncrementGap="100">
    <analyzer>
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/stopwords_de.txt" format="snowball" />
        <filter class="solr.GermanNormalizationFilterFactory"/>
        <filter class="solr.GermanLightStemFilterFactory"/>
        <!-- less aggressive: <filter class="solr.GermanMinimalStemFilterFactory"/> -->
        <!-- more aggressive: <filter class="solr.SnowballPorterFilterFactory" language="German2"/> -->
    </analyzer>
</fieldType>
...

What is wrong with this schema? Respectively, what should I change to be able to correctly do wildcard searches?

Many thanks for your time. Cheers,

christian
--
Christian Ribeaud
Software Engineer (External)
NIBR / WSJ-310.5.17
Novartis Campus
CH-4056 Basel



-----Original Message-----
From: Ahmet Arslan [mailto:[hidden email]]
Sent: Donnerstag, 11. August 2016 16:00
To: [hidden email]; Ribeaud, Christian (Ext)
Subject: Re: Wildcard search not working

Hi Chiristian,

The query r?che may not return at least the same number of matches as roche depending on your analysis chain.
The difference is roche is analyzed but r?che don't. Wildcard queries are executed on the indexed/analyzed terms.
For example, if roche is indexed/analyzed as roch, the query r?che won't match it.

Please see : https://wiki.apache.org/solr/MultitermQueryAnalysis

Ahmet



On Thursday, August 11, 2016 4:42 PM, "Ribeaud, Christian (Ext)" <[hidden email]> wrote:
Hi,

What would be the reasons making the wildcard search for Lucene Query Parser NOT working?

We are using Solr 5.4.1 and, using the admin console, I am triggering for instance searches with term 'roche' in a specific core. Everything fine, I am getting for instance two matches. I would expect at least the same number of matches with term 'r?che'. However, this does NOT happen. I am getting zero matches. Same problem occurs with 'r*che'. 'roch?' does not work neither but 'roch*' works.

Switching debug mode brings following output:

"debug": {
    "rawquerystring": "roch?",
    "querystring": "roch?",
    "parsedquery": "text:roch?",
    "parsedquery_toString": "text:roch?",
    "explain": {},
    "QParser": "LuceneQParser",
...

Any idea? Thanks and cheers,

christian