query with @ and *

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

query with @ and *

Mannott, Birgit
Hi,

I have a problem when searching on email addresses.
@ seems to be handled as a special character but I don't find anything about it in the documentation.

This is my test data
[hidden email]
[hidden email]

searching for test* results both, ok.
searching for [hidden email] results the correct one, ok.
searching for test results both, what I didn't expect but it's ok.
searching for test@one* results none and that's the problem.

Escaping the char @ doesn't change it.
It seems that every query containing @ and * has no result.

Has anyone an idea how to change this?

Thanks,
Birgit





Reply | Threaded
Open this post in threaded view
|

Re: query with @ and *

Atita Arora
Hi,

Can you give us a little information about the query parser you using in
your handler ?

Thanks,
Ati


On Thu, Sep 14, 2017 at 4:36 PM, Mannott, Birgit <[hidden email]>
wrote:

> Hi,
>
> I have a problem when searching on email addresses.
> @ seems to be handled as a special character but I don't find anything
> about it in the documentation.
>
> This is my test data
> [hidden email]
> [hidden email]
>
> searching for test* results both, ok.
> searching for [hidden email] results the correct one, ok.
> searching for test results both, what I didn't expect but it's ok.
> searching for test@one* results none and that's the problem.
>
> Escaping the char @ doesn't change it.
> It seems that every query containing @ and * has no result.
>
> Has anyone an idea how to change this?
>
> Thanks,
> Birgit
>
>
>
>
>
>
Reply | Threaded
Open this post in threaded view
|

Re: query with @ and *

Shawn Heisey-2
In reply to this post by Mannott, Birgit
On 9/14/2017 5:06 AM, Mannott, Birgit wrote:
> I have a problem when searching on email addresses.
> @ seems to be handled as a special character but I don't find anything about it in the documentation.
>
> This is my test data
> [hidden email]
> [hidden email]

Chances are that have analysis defined on this field, and that the
analysis includes a tokenizer or tokenizer/filter combination that
splits on punctuation.  This means that for the both entries, you have
three terms.  For the first one, those terms are test, one, and com. 
For the second one, they are test,  two, and com.  The rest of what I'm
writing assumes that this is the case.

> searching for test* results both, ok.

This matches the term "test" in both entries.

> searching for [hidden email] results the correct one, ok.

Query analysis probably splits the same way index analysis does, so the 
actual search is for all three terms.

> searching for test results both, what I didn't expect but it's ok.

In this case, it matches the simple term "test" that's in the index on
both documents.

> searching for test@one* results none and that's the problem.

When you include wildcards in a query, most query analysis is skipped, 
so it's looking for the literal text "test@one" followed by any
characters.  Because the index analysis removed the @ character and
split the things around it into separate terms, this will not match any
of the terms in the index.

Wildcards, while they do work in many cases, are often not the correct
way to do queries.

Thanks,
Shawn

Reply | Threaded
Open this post in threaded view
|

Re: query with @ and *

Susheel Kumar-3
You may want to use UAX29URLEmailTokenizerFactory tokenizer into your
analysis chain.

Thanks,
Susheel


On Thu, Sep 14, 2017 at 8:46 AM, Shawn Heisey <[hidden email]> wrote:

> On 9/14/2017 5:06 AM, Mannott, Birgit wrote:
> > I have a problem when searching on email addresses.
> > @ seems to be handled as a special character but I don't find anything
> about it in the documentation.
> >
> > This is my test data
> > [hidden email]
> > [hidden email]
>
> Chances are that have analysis defined on this field, and that the
> analysis includes a tokenizer or tokenizer/filter combination that
> splits on punctuation.  This means that for the both entries, you have
> three terms.  For the first one, those terms are test, one, and com.
> For the second one, they are test,  two, and com.  The rest of what I'm
> writing assumes that this is the case.
>
> > searching for test* results both, ok.
>
> This matches the term "test" in both entries.
>
> > searching for [hidden email] results the correct one, ok.
>
> Query analysis probably splits the same way index analysis does, so the
> actual search is for all three terms.
>
> > searching for test results both, what I didn't expect but it's ok.
>
> In this case, it matches the simple term "test" that's in the index on
> both documents.
>
> > searching for test@one* results none and that's the problem.
>
> When you include wildcards in a query, most query analysis is skipped,
> so it's looking for the literal text "test@one" followed by any
> characters.  Because the index analysis removed the @ character and
> split the things around it into separate terms, this will not match any
> of the terms in the index.
>
> Wildcards, while they do work in many cases, are often not the correct
> way to do queries.
>
> Thanks,
> Shawn
>
>
Reply | Threaded
Open this post in threaded view
|

Re: query with @ and *

Erick Erickson
See: https://lucidworks.com/2011/11/29/whats-with-lowercasing-wildcard-multiterm-queries-in-solr/

It discusses the general problem of particular filters being able to
cope with wildcards or not. Generally any filter that could
potentially produce more than one output token per input token is
skipped when wildcards are encountered.

Best,
Erick

On Thu, Sep 14, 2017 at 6:26 AM, Susheel Kumar <[hidden email]> wrote:

> You may want to use UAX29URLEmailTokenizerFactory tokenizer into your
> analysis chain.
>
> Thanks,
> Susheel
>
>
> On Thu, Sep 14, 2017 at 8:46 AM, Shawn Heisey <[hidden email]> wrote:
>
>> On 9/14/2017 5:06 AM, Mannott, Birgit wrote:
>> > I have a problem when searching on email addresses.
>> > @ seems to be handled as a special character but I don't find anything
>> about it in the documentation.
>> >
>> > This is my test data
>> > [hidden email]
>> > [hidden email]
>>
>> Chances are that have analysis defined on this field, and that the
>> analysis includes a tokenizer or tokenizer/filter combination that
>> splits on punctuation.  This means that for the both entries, you have
>> three terms.  For the first one, those terms are test, one, and com.
>> For the second one, they are test,  two, and com.  The rest of what I'm
>> writing assumes that this is the case.
>>
>> > searching for test* results both, ok.
>>
>> This matches the term "test" in both entries.
>>
>> > searching for [hidden email] results the correct one, ok.
>>
>> Query analysis probably splits the same way index analysis does, so the
>> actual search is for all three terms.
>>
>> > searching for test results both, what I didn't expect but it's ok.
>>
>> In this case, it matches the simple term "test" that's in the index on
>> both documents.
>>
>> > searching for test@one* results none and that's the problem.
>>
>> When you include wildcards in a query, most query analysis is skipped,
>> so it's looking for the literal text "test@one" followed by any
>> characters.  Because the index analysis removed the @ character and
>> split the things around it into separate terms, this will not match any
>> of the terms in the index.
>>
>> Wildcards, while they do work in many cases, are often not the correct
>> way to do queries.
>>
>> Thanks,
>> Shawn
>>
>>