[Q] Ref Guide - What is Multi-Term Expansion?

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

[Q] Ref Guide - What is Multi-Term Expansion?

Paras Lehana
Hi Community,

In Ref Guide 8.3's Understanding Analyzers subsection *Analysis for
Multi-Term Expansion*
<https://lucene.apache.org/solr/guide/8_3/analyzers.html#analysis-for-multi-term-expansion>,
the text talks about multi-term expansion and explicit use of *analyzer
type="multiterm"*.

I could not understand what exactly is multi-term expansion and what are
the use cases for using "multiterm". *[Q1]*

--
--
Regards,

*Paras Lehana* [65871]
Development Engineer, Auto-Suggest,
IndiaMART Intermesh Ltd.

8th Floor, Tower A, Advant-Navis Business Park, Sector 142,
Noida, UP, IN - 201303

Mob.: +91-9560911996
Work: 01203916600 | Extn:  *8173*

--
IMPORTANT: 
NEVER share your IndiaMART OTP/ Password with anyone.
Reply | Threaded
Open this post in threaded view
|

Re: [Q] Ref Guide - What is Multi-Term Expansion?

Alexandre Rafalovitch
It mentions it in the start  paragraph "Prefix, Wildcard, Regex, etc."

So, if you search for "abc*" it expands to all terms that start from
"abc", but then not everything can handle this situation as it is a
lot of terms in the same position. So, not all analyzers can handle
that and normally it is just an automatically built subset of safe
ones.

I mark them with "(multi)" in my - very out of date, but still useful
- resource: http://www.solr-start.com/info/analyzers/

Regards,
   Alex.

On Wed, 6 Nov 2019 at 21:19, Paras Lehana <[hidden email]> wrote:

>
> Hi Community,
>
> In Ref Guide 8.3's Understanding Analyzers subsection *Analysis for
> Multi-Term Expansion*
> <https://lucene.apache.org/solr/guide/8_3/analyzers.html#analysis-for-multi-term-expansion>,
> the text talks about multi-term expansion and explicit use of *analyzer
> type="multiterm"*.
>
> I could not understand what exactly is multi-term expansion and what are
> the use cases for using "multiterm". *[Q1]*
>
> --
> --
> Regards,
>
> *Paras Lehana* [65871]
> Development Engineer, Auto-Suggest,
> IndiaMART Intermesh Ltd.
>
> 8th Floor, Tower A, Advant-Navis Business Park, Sector 142,
> Noida, UP, IN - 201303
>
> Mob.: +91-9560911996
> Work: 01203916600 | Extn:  *8173*
>
> --
> IMPORTANT:
> NEVER share your IndiaMART OTP/ Password with anyone.
Reply | Threaded
Open this post in threaded view
|

Re: [Q] Ref Guide - What is Multi-Term Expansion?

Erick Erickson
Say you want to search for “run*”. That should match “run”, “runner”, “running”, “runs” etc. one term->many == multiterm expansion. Conceptually, the search becomes (run OR runner OR running OR runs), all terms actually found in the index that have the prefix “run”.

My advice would be to ignore it completely, that’s an expert level option that came about because we got really tired of explaining that wildcards didn’t used to have _any_ analysis done, so searching for “Run*" would not match “run” due to the case difference.

Best,
Erick

> On Nov 6, 2019, at 7:21 AM, Alexandre Rafalovitch <[hidden email]> wrote:
>
> It mentions it in the start  paragraph "Prefix, Wildcard, Regex, etc."
>
> So, if you search for "abc*" it expands to all terms that start from
> "abc", but then not everything can handle this situation as it is a
> lot of terms in the same position. So, not all analyzers can handle
> that and normally it is just an automatically built subset of safe
> ones.
>
> I mark them with "(multi)" in my - very out of date, but still useful
> - resource: http://www.solr-start.com/info/analyzers/
>
> Regards,
>   Alex.
>
> On Wed, 6 Nov 2019 at 21:19, Paras Lehana <[hidden email]> wrote:
>>
>> Hi Community,
>>
>> In Ref Guide 8.3's Understanding Analyzers subsection *Analysis for
>> Multi-Term Expansion*
>> <https://lucene.apache.org/solr/guide/8_3/analyzers.html#analysis-for-multi-term-expansion>,
>> the text talks about multi-term expansion and explicit use of *analyzer
>> type="multiterm"*.
>>
>> I could not understand what exactly is multi-term expansion and what are
>> the use cases for using "multiterm". *[Q1]*
>>
>> --
>> --
>> Regards,
>>
>> *Paras Lehana* [65871]
>> Development Engineer, Auto-Suggest,
>> IndiaMART Intermesh Ltd.
>>
>> 8th Floor, Tower A, Advant-Navis Business Park, Sector 142,
>> Noida, UP, IN - 201303
>>
>> Mob.: +91-9560911996
>> Work: 01203916600 | Extn:  *8173*
>>
>> --
>> IMPORTANT:
>> NEVER share your IndiaMART OTP/ Password with anyone.

Reply | Threaded
Open this post in threaded view
|

Re: [Q] Ref Guide - What is Multi-Term Expansion?

Paras Lehana
Thank you so much, Erick and Alex!

Strange how I could not understand it when wildcard were the first things
we used in queries when I migrated Auto-Suggest to Solr. I remember how
much we faced Stemming not working on partial user queries (servicin* not
matching with services). We started using "partialQuery OR partialQuery*"
but it scored exactly matching terms more. We experimented with many more
options like KeywordRepeat before actually moving to EdgeNGrams. Erick's
articles contributed so much in the journey!

Anyways, I'm clear with the definition now. For future reference, this is
the summary of the text:

Many filters won't work with multi-term expansion (expansion of terms due
to wildcard or regex, for example, run* -> run, running, runner) and those
filters will give the input unchanged. If you want to specify how your
chain behaves differently for multi-terms, define tokenizers/filters in
<analyzer type="multiterm"> additionally.

Here is a nice list of analyzers supporting multi-term expansion (credits
to Alexandre Rafalovitch): http://www.solr-start.com/info/analyzers/



On Wed, 6 Nov 2019 at 19:04, Erick Erickson <[hidden email]> wrote:

> Say you want to search for “run*”. That should match “run”, “runner”,
> “running”, “runs” etc. one term->many == multiterm expansion. Conceptually,
> the search becomes (run OR runner OR running OR runs), all terms actually
> found in the index that have the prefix “run”.
>
> My advice would be to ignore it completely, that’s an expert level option
> that came about because we got really tired of explaining that wildcards
> didn’t used to have _any_ analysis done, so searching for “Run*" would not
> match “run” due to the case difference.
>
> Best,
> Erick
> > On Nov 6, 2019, at 7:21 AM, Alexandre Rafalovitch <[hidden email]>
> wrote:
> >
> > It mentions it in the start  paragraph "Prefix, Wildcard, Regex, etc."
> >
> > So, if you search for "abc*" it expands to all terms that start from
> > "abc", but then not everything can handle this situation as it is a
> > lot of terms in the same position. So, not all analyzers can handle
> > that and normally it is just an automatically built subset of safe
> > ones.
> >
> > I mark them with "(multi)" in my - very out of date, but still useful
> > - resource: http://www.solr-start.com/info/analyzers/
> >
> > Regards,
> >   Alex.
> >
> > On Wed, 6 Nov 2019 at 21:19, Paras Lehana <[hidden email]>
> wrote:
> >>
> >> Hi Community,
> >>
> >> In Ref Guide 8.3's Understanding Analyzers subsection *Analysis for
> >> Multi-Term Expansion*
> >> <
> https://lucene.apache.org/solr/guide/8_3/analyzers.html#analysis-for-multi-term-expansion
> >,
> >> the text talks about multi-term expansion and explicit use of *analyzer
> >> type="multiterm"*.
> >>
> >> I could not understand what exactly is multi-term expansion and what are
> >> the use cases for using "multiterm". *[Q1]*
> >>
> >> --
> >> --
> >> Regards,
> >>
> >> *Paras Lehana* [65871]
> >> Development Engineer, Auto-Suggest,
> >> IndiaMART Intermesh Ltd.
> >>
> >> 8th Floor, Tower A, Advant-Navis Business Park, Sector 142,
> >> Noida, UP, IN - 201303
> >>
> >> Mob.: +91-9560911996
> >> Work: 01203916600 | Extn:  *8173*
> >>
> >> --
> >> IMPORTANT:
> >> NEVER share your IndiaMART OTP/ Password with anyone.
>
>

--
--
Regards,

*Paras Lehana* [65871]
Development Engineer, Auto-Suggest,
IndiaMART Intermesh Ltd.

8th Floor, Tower A, Advant-Navis Business Park, Sector 142,
Noida, UP, IN - 201303

Mob.: +91-9560911996
Work: 01203916600 | Extn:  *8173*

--
IMPORTANT: 
NEVER share your IndiaMART OTP/ Password with anyone.