Anyone have experience with Query Auto-Suggestor?

classic Classic list List threaded Threaded
15 messages Options
Reply | Threaded
Open this post in threaded view
|

Anyone have experience with Query Auto-Suggestor?

Audrey Lorberfeld - Audrey.Lorberfeld@ibm.com
Hi All,

We plan to incorporate a query autocomplete functionality into our search engine (like this: https://lucene.apache.org/solr/guide/8_1/suggester.html
). And I was wondering if anyone has personal experience with this component and would like to share? Basically, we are just looking for some best practices from more experienced Solr admins so that we have a starting place to launch this in our beta.

Thank you!

Best,
Audrey
Reply | Threaded
Open this post in threaded view
|

Re: Anyone have experience with Query Auto-Suggestor?

David Hastings
Ive used this quite a bit, my biggest piece of advice is to choose a field
that you know is clean, with well defined terms/words, you dont want an
autocomplete that has a massive dictionary, also it will make the
start/reload times pretty slow

On Mon, Jan 20, 2020 at 11:47 AM Audrey Lorberfeld -
[hidden email] <[hidden email]> wrote:

> Hi All,
>
> We plan to incorporate a query autocomplete functionality into our search
> engine (like this: https://lucene.apache.org/solr/guide/8_1/suggester.html
> ). And I was wondering if anyone has personal experience with this
> component and would like to share? Basically, we are just looking for some
> best practices from more experienced Solr admins so that we have a starting
> place to launch this in our beta.
>
> Thank you!
>
> Best,
> Audrey
>
Reply | Threaded
Open this post in threaded view
|

Re: Re: Anyone have experience with Query Auto-Suggestor?

Audrey Lorberfeld - Audrey.Lorberfeld@ibm.com
David,

Thank you, that is useful. So, would you recommend using a (clean) field over an external dictionary file? We have lots of "top queries" and measure their nDCG. A thought was to programmatically generate an external file where the weight per query term (or phrase) == its nDCG. Bad idea?

Best,
Audrey

On 1/20/20, 11:51 AM, "David Hastings" <[hidden email]> wrote:

    Ive used this quite a bit, my biggest piece of advice is to choose a field
    that you know is clean, with well defined terms/words, you dont want an
    autocomplete that has a massive dictionary, also it will make the
    start/reload times pretty slow
   
    On Mon, Jan 20, 2020 at 11:47 AM Audrey Lorberfeld -
    [hidden email] <[hidden email]> wrote:
   
    > Hi All,
    >
    > We plan to incorporate a query autocomplete functionality into our search
    > engine (like this: https://urldefense.proofpoint.com/v2/url?u=https-3A__lucene.apache.org_solr_guide_8-5F1_suggester.html&d=DwIBaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=_8ViuZIeSRdQjONA8yHWPZIBlhj291HU3JpNIx5a55M&m=L8V-izaMW_v4j-1zvfiXSqm6aAoaRtk-VJXA6okBs_U&s=vnE9KGyF3jky9fSi22XUJEEbKLM1CA7mWAKrl2qhKC0&e= 
    > ). And I was wondering if anyone has personal experience with this
    > component and would like to share? Basically, we are just looking for some
    > best practices from more experienced Solr admins so that we have a starting
    > place to launch this in our beta.
    >
    > Thank you!
    >
    > Best,
    > Audrey
    >
   

Reply | Threaded
Open this post in threaded view
|

Re: Re: Anyone have experience with Query Auto-Suggestor?

David Hastings
Not a bad idea at all, however ive never used an external file before, just
a field in the index, so not an area im familiar with

On Mon, Jan 20, 2020 at 11:55 AM Audrey Lorberfeld -
[hidden email] <[hidden email]> wrote:

> David,
>
> Thank you, that is useful. So, would you recommend using a (clean) field
> over an external dictionary file? We have lots of "top queries" and measure
> their nDCG. A thought was to programmatically generate an external file
> where the weight per query term (or phrase) == its nDCG. Bad idea?
>
> Best,
> Audrey
>
> On 1/20/20, 11:51 AM, "David Hastings" <[hidden email]>
> wrote:
>
>     Ive used this quite a bit, my biggest piece of advice is to choose a
> field
>     that you know is clean, with well defined terms/words, you dont want an
>     autocomplete that has a massive dictionary, also it will make the
>     start/reload times pretty slow
>
>     On Mon, Jan 20, 2020 at 11:47 AM Audrey Lorberfeld -
>     [hidden email] <[hidden email]> wrote:
>
>     > Hi All,
>     >
>     > We plan to incorporate a query autocomplete functionality into our
> search
>     > engine (like this:
> https://urldefense.proofpoint.com/v2/url?u=https-3A__lucene.apache.org_solr_guide_8-5F1_suggester.html&d=DwIBaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=_8ViuZIeSRdQjONA8yHWPZIBlhj291HU3JpNIx5a55M&m=L8V-izaMW_v4j-1zvfiXSqm6aAoaRtk-VJXA6okBs_U&s=vnE9KGyF3jky9fSi22XUJEEbKLM1CA7mWAKrl2qhKC0&e=
>     > ). And I was wondering if anyone has personal experience with this
>     > component and would like to share? Basically, we are just looking
> for some
>     > best practices from more experienced Solr admins so that we have a
> starting
>     > place to launch this in our beta.
>     >
>     > Thank you!
>     >
>     > Best,
>     > Audrey
>     >
>
>
>
Reply | Threaded
Open this post in threaded view
|

Re: Re: Anyone have experience with Query Auto-Suggestor?

Alessandro Benedetti
I have been working extensively on query autocompletion, these blogs should
be helpful to you:

https://sease.io/2015/07/solr-you-complete-me.html
https://sease.io/2018/06/apache-lucene-blendedinfixsuggester-how-it-works-bugs-and-improvements.html

You idea of using search quality evaluation to drive the autocompletion is
interesting.
How do you currently calculate the NDCG for a query? What's your golden
truth?
Using that approach you will autocomplete favouring query completion that
your search engine is able to process better, not necessarily closer to the
user intent, still it could work.

We should differentiate here between the suggester dictionary (where the
suggestions come from, in your case it could be your extracted data) and
the kind of suggestion (that in your case could be the free text suggester
lookup)

Cheers
--------------------------
Alessandro Benedetti
Search Consultant, R&D Software Engineer, Director
www.sease.io


On Mon, 20 Jan 2020 at 17:02, David Hastings <[hidden email]>
wrote:

> Not a bad idea at all, however ive never used an external file before, just
> a field in the index, so not an area im familiar with
>
> On Mon, Jan 20, 2020 at 11:55 AM Audrey Lorberfeld -
> [hidden email] <[hidden email]> wrote:
>
> > David,
> >
> > Thank you, that is useful. So, would you recommend using a (clean) field
> > over an external dictionary file? We have lots of "top queries" and
> measure
> > their nDCG. A thought was to programmatically generate an external file
> > where the weight per query term (or phrase) == its nDCG. Bad idea?
> >
> > Best,
> > Audrey
> >
> > On 1/20/20, 11:51 AM, "David Hastings" <[hidden email]>
> > wrote:
> >
> >     Ive used this quite a bit, my biggest piece of advice is to choose a
> > field
> >     that you know is clean, with well defined terms/words, you dont want
> an
> >     autocomplete that has a massive dictionary, also it will make the
> >     start/reload times pretty slow
> >
> >     On Mon, Jan 20, 2020 at 11:47 AM Audrey Lorberfeld -
> >     [hidden email] <[hidden email]> wrote:
> >
> >     > Hi All,
> >     >
> >     > We plan to incorporate a query autocomplete functionality into our
> > search
> >     > engine (like this:
> >
> https://urldefense.proofpoint.com/v2/url?u=https-3A__lucene.apache.org_solr_guide_8-5F1_suggester.html&d=DwIBaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=_8ViuZIeSRdQjONA8yHWPZIBlhj291HU3JpNIx5a55M&m=L8V-izaMW_v4j-1zvfiXSqm6aAoaRtk-VJXA6okBs_U&s=vnE9KGyF3jky9fSi22XUJEEbKLM1CA7mWAKrl2qhKC0&e=
> >     > ). And I was wondering if anyone has personal experience with this
> >     > component and would like to share? Basically, we are just looking
> > for some
> >     > best practices from more experienced Solr admins so that we have a
> > starting
> >     > place to launch this in our beta.
> >     >
> >     > Thank you!
> >     >
> >     > Best,
> >     > Audrey
> >     >
> >
> >
> >
>
---------------
Alessandro Benedetti
Search Consultant, R&D Software Engineer, Director
Sease Ltd. - www.sease.io
Reply | Threaded
Open this post in threaded view
|

Re: Re: Anyone have experience with Query Auto-Suggestor?

Erik Hatcher-4
In reply to this post by Audrey Lorberfeld - Audrey.Lorberfeld@ibm.com
It's a great idea.   And then index that file into a separate lean collection of just the suggestions, along with the weight as another field on those documents, to use for ranking them at query time with standard /select queries.  (this separate suggest collection would also have appropriate tokenization to match the partial words as the user types, like ngramming)

        Erik


> On Jan 20, 2020, at 11:54 AM, Audrey Lorberfeld - [hidden email] <[hidden email]> wrote:
>
> David,
>
> Thank you, that is useful. So, would you recommend using a (clean) field over an external dictionary file? We have lots of "top queries" and measure their nDCG. A thought was to programmatically generate an external file where the weight per query term (or phrase) == its nDCG. Bad idea?
>
> Best,
> Audrey
>
> On 1/20/20, 11:51 AM, "David Hastings" <[hidden email]> wrote:
>
>    Ive used this quite a bit, my biggest piece of advice is to choose a field
>    that you know is clean, with well defined terms/words, you dont want an
>    autocomplete that has a massive dictionary, also it will make the
>    start/reload times pretty slow
>
>    On Mon, Jan 20, 2020 at 11:47 AM Audrey Lorberfeld -
>    [hidden email] <[hidden email]> wrote:
>
>> Hi All,
>>
>> We plan to incorporate a query autocomplete functionality into our search
>> engine (like this: https://urldefense.proofpoint.com/v2/url?u=https-3A__lucene.apache.org_solr_guide_8-5F1_suggester.html&d=DwIBaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=_8ViuZIeSRdQjONA8yHWPZIBlhj291HU3JpNIx5a55M&m=L8V-izaMW_v4j-1zvfiXSqm6aAoaRtk-VJXA6okBs_U&s=vnE9KGyF3jky9fSi22XUJEEbKLM1CA7mWAKrl2qhKC0&e= 
>> ). And I was wondering if anyone has personal experience with this
>> component and would like to share? Basically, we are just looking for some
>> best practices from more experienced Solr admins so that we have a starting
>> place to launch this in our beta.
>>
>> Thank you!
>>
>> Best,
>> Audrey
>>
>
>

Reply | Threaded
Open this post in threaded view
|

Re: Re: Re: Anyone have experience with Query Auto-Suggestor?

Audrey Lorberfeld - Audrey.Lorberfeld@ibm.com
Erik,

Thank you! Yes, that's exactly how we were thinking of architecting it. And our ML engineer suggested something else for the suggestion weights, actually -- to build a model that would programmatically update the weights based on those suggestions' live clicks @ position k, etc. Pretty cool idea...



On 1/23/20, 2:26 PM, "Erik Hatcher" <[hidden email]> wrote:

    It's a great idea.   And then index that file into a separate lean collection of just the suggestions, along with the weight as another field on those documents, to use for ranking them at query time with standard /select queries.  (this separate suggest collection would also have appropriate tokenization to match the partial words as the user types, like ngramming)
   
    Erik
   
   
    > On Jan 20, 2020, at 11:54 AM, Audrey Lorberfeld - [hidden email] <[hidden email]> wrote:
    >
    > David,
    >
    > Thank you, that is useful. So, would you recommend using a (clean) field over an external dictionary file? We have lots of "top queries" and measure their nDCG. A thought was to programmatically generate an external file where the weight per query term (or phrase) == its nDCG. Bad idea?
    >
    > Best,
    > Audrey
    >
    > On 1/20/20, 11:51 AM, "David Hastings" <[hidden email]> wrote:
    >
    >    Ive used this quite a bit, my biggest piece of advice is to choose a field
    >    that you know is clean, with well defined terms/words, you dont want an
    >    autocomplete that has a massive dictionary, also it will make the
    >    start/reload times pretty slow
    >
    >    On Mon, Jan 20, 2020 at 11:47 AM Audrey Lorberfeld -
    >    [hidden email] <[hidden email]> wrote:
    >
    >> Hi All,
    >>
    >> We plan to incorporate a query autocomplete functionality into our search
    >> engine (like this: https://urldefense.proofpoint.com/v2/url?u=https-3A__lucene.apache.org_solr_guide_8-5F1_suggester.html&d=DwIBaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=_8ViuZIeSRdQjONA8yHWPZIBlhj291HU3JpNIx5a55M&m=L8V-izaMW_v4j-1zvfiXSqm6aAoaRtk-VJXA6okBs_U&s=vnE9KGyF3jky9fSi22XUJEEbKLM1CA7mWAKrl2qhKC0&e= 
    >> ). And I was wondering if anyone has personal experience with this
    >> component and would like to share? Basically, we are just looking for some
    >> best practices from more experienced Solr admins so that we have a starting
    >> place to launch this in our beta.
    >>
    >> Thank you!
    >>
    >> Best,
    >> Audrey
    >>
    >
    >
   
   

Reply | Threaded
Open this post in threaded view
|

Re: Re: Re: Anyone have experience with Query Auto-Suggestor?

Audrey Lorberfeld - Audrey.Lorberfeld@ibm.com
In reply to this post by Alessandro Benedetti
Hi Alessandro,

I'm so happy there is someone who's done extensive work with QAC here!

Right now, we measure nDCG via a Dynamic Bayesian Network. To break it down, we:
- use a DBN model to generate a "score" for each query_url pair.
- We then plug that score into a mathematical formula we found in a research paper (happy to share the paper if you're interested) for assigning labels 0-4.
- We then cross-reference the scored & labeled query_url pairs with 1k of our system's top queries and 1k of our system's random queries.
- We use that dataset as our ground truth.
- We then query the system in real time each day for those 2k queries, label them, and compare those labels with our ground truth to get our system's nDCG.

I hope that makes sense! Lots of steps __

Due to computational overhead reasons, we are pretty committed to using an external file & a separate Solr core for our suggestions. We are also planning to use the Suggester to add a little human nudge towards "successful" queries. I'm not sure whether that's what the Suggester is really meant to do, but we are not using it as a naïve prefix-matcher, but more of a query-suggestion tool. So, if we know that the query "blue pages" is less successful than the query "bluepages" (assuming we can identify the user's intent with this query), we will not show suggestions that match "blue pages," instead we will show suggestions that match "bluepages." Sort of like a query rewrite, except with fuzzy prefix matching, not the introduction of synonyms/expansions.

What we are concerned with currently is how to define a "successful" query. We have things like abandonment rate, dwell time, etc., but if you have any advice on more ways to identify successful queries, that'd be great. We want to stay away from defining success as "popularity," since that will just create a closed language system where people only query popular queries, and those queries stay popular only because people are querying them (assuming people click on the suggestions, of course).

Let me know your thoughts!

On 1/23/20, 10:45 AM, "Alessandro Benedetti" <[hidden email]> wrote:

    I have been working extensively on query autocompletion, these blogs should
    be helpful to you:
   
    https://urldefense.proofpoint.com/v2/url?u=https-3A__sease.io_2015_07_solr-2Dyou-2Dcomplete-2Dme.html&d=DwIFaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=_8ViuZIeSRdQjONA8yHWPZIBlhj291HU3JpNIx5a55M&m=0lExcWXK-kGTAfpnv-kU_LGminLzJjJKv6hYBFQG7iI&s=c149I_QBokd35FBMGaUxoBPMViUXAdZtVnkSKTINndE&e= 
    https://urldefense.proofpoint.com/v2/url?u=https-3A__sease.io_2018_06_apache-2Dlucene-2Dblendedinfixsuggester-2Dhow-2Dit-2Dworks-2Dbugs-2Dand-2Dimprovements.html&d=DwIFaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=_8ViuZIeSRdQjONA8yHWPZIBlhj291HU3JpNIx5a55M&m=0lExcWXK-kGTAfpnv-kU_LGminLzJjJKv6hYBFQG7iI&s=m8s2XvI7tR1t9bNaA4SI-w90MdbLZTYxc0mBMz8RMSw&e= 
   
    You idea of using search quality evaluation to drive the autocompletion is
    interesting.
    How do you currently calculate the NDCG for a query? What's your golden
    truth?
    Using that approach you will autocomplete favouring query completion that
    your search engine is able to process better, not necessarily closer to the
    user intent, still it could work.
   
    We should differentiate here between the suggester dictionary (where the
    suggestions come from, in your case it could be your extracted data) and
    the kind of suggestion (that in your case could be the free text suggester
    lookup)
   
    Cheers
    --------------------------
    Alessandro Benedetti
    Search Consultant, R&D Software Engineer, Director
    www.sease.io
   
   
    On Mon, 20 Jan 2020 at 17:02, David Hastings <[hidden email]>
    wrote:
   
    > Not a bad idea at all, however ive never used an external file before, just
    > a field in the index, so not an area im familiar with
    >
    > On Mon, Jan 20, 2020 at 11:55 AM Audrey Lorberfeld -
    > [hidden email] <[hidden email]> wrote:
    >
    > > David,
    > >
    > > Thank you, that is useful. So, would you recommend using a (clean) field
    > > over an external dictionary file? We have lots of "top queries" and
    > measure
    > > their nDCG. A thought was to programmatically generate an external file
    > > where the weight per query term (or phrase) == its nDCG. Bad idea?
    > >
    > > Best,
    > > Audrey
    > >
    > > On 1/20/20, 11:51 AM, "David Hastings" <[hidden email]>
    > > wrote:
    > >
    > >     Ive used this quite a bit, my biggest piece of advice is to choose a
    > > field
    > >     that you know is clean, with well defined terms/words, you dont want
    > an
    > >     autocomplete that has a massive dictionary, also it will make the
    > >     start/reload times pretty slow
    > >
    > >     On Mon, Jan 20, 2020 at 11:47 AM Audrey Lorberfeld -
    > >     [hidden email] <[hidden email]> wrote:
    > >
    > >     > Hi All,
    > >     >
    > >     > We plan to incorporate a query autocomplete functionality into our
    > > search
    > >     > engine (like this:
    > >
    > https://urldefense.proofpoint.com/v2/url?u=https-3A__lucene.apache.org_solr_guide_8-5F1_suggester.html&d=DwIBaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=_8ViuZIeSRdQjONA8yHWPZIBlhj291HU3JpNIx5a55M&m=L8V-izaMW_v4j-1zvfiXSqm6aAoaRtk-VJXA6okBs_U&s=vnE9KGyF3jky9fSi22XUJEEbKLM1CA7mWAKrl2qhKC0&e=
    > >     > ). And I was wondering if anyone has personal experience with this
    > >     > component and would like to share? Basically, we are just looking
    > > for some
    > >     > best practices from more experienced Solr admins so that we have a
    > > starting
    > >     > place to launch this in our beta.
    > >     >
    > >     > Thank you!
    > >     >
    > >     > Best,
    > >     > Audrey
    > >     >
    > >
    > >
    > >
    >
   

Reply | Threaded
Open this post in threaded view
|

Re: Re: Re: Anyone have experience with Query Auto-Suggestor?

Lucky Sharma
In reply to this post by Audrey Lorberfeld - Audrey.Lorberfeld@ibm.com
Hi Audrey,
As suggested by Erik, you can index the data into a seperate collection and
You can instead of adding weights inthe document you can also use LTR with
in Solr to rerank on the features.

Regards,
Lucky Sharma

On Fri, 24 Jan, 2020, 8:01 pm Audrey Lorberfeld - [hidden email],
<[hidden email]> wrote:

> Erik,
>
> Thank you! Yes, that's exactly how we were thinking of architecting it.
> And our ML engineer suggested something else for the suggestion weights,
> actually -- to build a model that would programmatically update the weights
> based on those suggestions' live clicks @ position k, etc. Pretty cool
> idea...
>
>
>
> On 1/23/20, 2:26 PM, "Erik Hatcher" <[hidden email]> wrote:
>
>     It's a great idea.   And then index that file into a separate lean
> collection of just the suggestions, along with the weight as another field
> on those documents, to use for ranking them at query time with standard
> /select queries.  (this separate suggest collection would also have
> appropriate tokenization to match the partial words as the user types, like
> ngramming)
>
>         Erik
>
>
>     > On Jan 20, 2020, at 11:54 AM, Audrey Lorberfeld -
> [hidden email] <[hidden email]> wrote:
>     >
>     > David,
>     >
>     > Thank you, that is useful. So, would you recommend using a (clean)
> field over an external dictionary file? We have lots of "top queries" and
> measure their nDCG. A thought was to programmatically generate an external
> file where the weight per query term (or phrase) == its nDCG. Bad idea?
>     >
>     > Best,
>     > Audrey
>     >
>     > On 1/20/20, 11:51 AM, "David Hastings" <[hidden email]>
> wrote:
>     >
>     >    Ive used this quite a bit, my biggest piece of advice is to
> choose a field
>     >    that you know is clean, with well defined terms/words, you dont
> want an
>     >    autocomplete that has a massive dictionary, also it will make the
>     >    start/reload times pretty slow
>     >
>     >    On Mon, Jan 20, 2020 at 11:47 AM Audrey Lorberfeld -
>     >    [hidden email] <[hidden email]> wrote:
>     >
>     >> Hi All,
>     >>
>     >> We plan to incorporate a query autocomplete functionality into our
> search
>     >> engine (like this:
> https://urldefense.proofpoint.com/v2/url?u=https-3A__lucene.apache.org_solr_guide_8-5F1_suggester.html&d=DwIBaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=_8ViuZIeSRdQjONA8yHWPZIBlhj291HU3JpNIx5a55M&m=L8V-izaMW_v4j-1zvfiXSqm6aAoaRtk-VJXA6okBs_U&s=vnE9KGyF3jky9fSi22XUJEEbKLM1CA7mWAKrl2qhKC0&e=
>     >> ). And I was wondering if anyone has personal experience with this
>     >> component and would like to share? Basically, we are just looking
> for some
>     >> best practices from more experienced Solr admins so that we have a
> starting
>     >> place to launch this in our beta.
>     >>
>     >> Thank you!
>     >>
>     >> Best,
>     >> Audrey
>     >>
>     >
>     >
>
>
>
>
Reply | Threaded
Open this post in threaded view
|

Re: Re: Re: Anyone have experience with Query Auto-Suggestor?

David Hastings
This is a really cool idea!  My only concern is that the edge case
searches, where a user knows exactly what they want to find, would be
autocomplete into something that happens to be more "successful" rather
than what they were looking for.  for example, i want to know the legal
implications of jay z's 99 problems.   most of the autocompletes i imagine
would be for the lyrics for the song, or links to the video or jay z
himself, when what im looking for is a line by line analysis of the song
itself and how it relates to the fourth amendment:
http://pdf.textfiles.com/academics/lj56-2_mason_article.pdf

But in general this is a really clever idea, especially in the retail
arena.  However i suspect your use case is more in research, and after
years of dealing with lawyers and librarians, they tend to not like having
their searches intercepted, they know what they're looking for and they
tend to get mad if you assume they dont :)

On Fri, Jan 24, 2020 at 9:59 AM Lucky Sharma <[hidden email]> wrote:

> Hi Audrey,
> As suggested by Erik, you can index the data into a seperate collection and
> You can instead of adding weights inthe document you can also use LTR with
> in Solr to rerank on the features.
>
> Regards,
> Lucky Sharma
>
> On Fri, 24 Jan, 2020, 8:01 pm Audrey Lorberfeld -
> [hidden email],
> <[hidden email]> wrote:
>
> > Erik,
> >
> > Thank you! Yes, that's exactly how we were thinking of architecting it.
> > And our ML engineer suggested something else for the suggestion weights,
> > actually -- to build a model that would programmatically update the
> weights
> > based on those suggestions' live clicks @ position k, etc. Pretty cool
> > idea...
> >
> >
> >
> > On 1/23/20, 2:26 PM, "Erik Hatcher" <[hidden email]> wrote:
> >
> >     It's a great idea.   And then index that file into a separate lean
> > collection of just the suggestions, along with the weight as another
> field
> > on those documents, to use for ranking them at query time with standard
> > /select queries.  (this separate suggest collection would also have
> > appropriate tokenization to match the partial words as the user types,
> like
> > ngramming)
> >
> >         Erik
> >
> >
> >     > On Jan 20, 2020, at 11:54 AM, Audrey Lorberfeld -
> > [hidden email] <[hidden email]> wrote:
> >     >
> >     > David,
> >     >
> >     > Thank you, that is useful. So, would you recommend using a (clean)
> > field over an external dictionary file? We have lots of "top queries" and
> > measure their nDCG. A thought was to programmatically generate an
> external
> > file where the weight per query term (or phrase) == its nDCG. Bad idea?
> >     >
> >     > Best,
> >     > Audrey
> >     >
> >     > On 1/20/20, 11:51 AM, "David Hastings" <
> [hidden email]>
> > wrote:
> >     >
> >     >    Ive used this quite a bit, my biggest piece of advice is to
> > choose a field
> >     >    that you know is clean, with well defined terms/words, you dont
> > want an
> >     >    autocomplete that has a massive dictionary, also it will make
> the
> >     >    start/reload times pretty slow
> >     >
> >     >    On Mon, Jan 20, 2020 at 11:47 AM Audrey Lorberfeld -
> >     >    [hidden email] <[hidden email]> wrote:
> >     >
> >     >> Hi All,
> >     >>
> >     >> We plan to incorporate a query autocomplete functionality into our
> > search
> >     >> engine (like this:
> >
> https://urldefense.proofpoint.com/v2/url?u=https-3A__lucene.apache.org_solr_guide_8-5F1_suggester.html&d=DwIBaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=_8ViuZIeSRdQjONA8yHWPZIBlhj291HU3JpNIx5a55M&m=L8V-izaMW_v4j-1zvfiXSqm6aAoaRtk-VJXA6okBs_U&s=vnE9KGyF3jky9fSi22XUJEEbKLM1CA7mWAKrl2qhKC0&e=
> >     >> ). And I was wondering if anyone has personal experience with this
> >     >> component and would like to share? Basically, we are just looking
> > for some
> >     >> best practices from more experienced Solr admins so that we have a
> > starting
> >     >> place to launch this in our beta.
> >     >>
> >     >> Thank you!
> >     >>
> >     >> Best,
> >     >> Audrey
> >     >>
> >     >
> >     >
> >
> >
> >
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: Re: Re: Anyone have experience with Query Auto-Suggestor?

Lucky Sharma
In reply to this post by Lucky Sharma
Hi Audrey,
As suggested by Erik, you can index the data into a seperate collection and
You can instead of adding weights inthe document you can also use
LTR(Learning to Rank) with in Solr to rerank on the documents.
And also to increase more relevance with in the Autosuggestion and making
positional context of the user in case of Multi token keywords you can also
bigrams/trigrams to generate edge n-grams.



Regards,
Lucky Sharma

On Fri, 24 Jan, 2020, 8:28 pm Lucky Sharma, <[hidden email]> wrote:

> Hi Audrey,
> As suggested by Erik, you can index the data into a seperate collection
> and You can instead of adding weights inthe document you can also use LTR
> with in Solr to rerank on the features.
>
> Regards,
> Lucky Sharma
>
> On Fri, 24 Jan, 2020, 8:01 pm Audrey Lorberfeld -
> [hidden email], <[hidden email]> wrote:
>
>> Erik,
>>
>> Thank you! Yes, that's exactly how we were thinking of architecting it.
>> And our ML engineer suggested something else for the suggestion weights,
>> actually -- to build a model that would programmatically update the weights
>> based on those suggestions' live clicks @ position k, etc. Pretty cool
>> idea...
>>
>>
>>
>> On 1/23/20, 2:26 PM, "Erik Hatcher" <[hidden email]> wrote:
>>
>>     It's a great idea.   And then index that file into a separate lean
>> collection of just the suggestions, along with the weight as another field
>> on those documents, to use for ranking them at query time with standard
>> /select queries.  (this separate suggest collection would also have
>> appropriate tokenization to match the partial words as the user types, like
>> ngramming)
>>
>>         Erik
>>
>>
>>     > On Jan 20, 2020, at 11:54 AM, Audrey Lorberfeld -
>> [hidden email] <[hidden email]> wrote:
>>     >
>>     > David,
>>     >
>>     > Thank you, that is useful. So, would you recommend using a (clean)
>> field over an external dictionary file? We have lots of "top queries" and
>> measure their nDCG. A thought was to programmatically generate an external
>> file where the weight per query term (or phrase) == its nDCG. Bad idea?
>>     >
>>     > Best,
>>     > Audrey
>>     >
>>     > On 1/20/20, 11:51 AM, "David Hastings" <
>> [hidden email]> wrote:
>>     >
>>     >    Ive used this quite a bit, my biggest piece of advice is to
>> choose a field
>>     >    that you know is clean, with well defined terms/words, you dont
>> want an
>>     >    autocomplete that has a massive dictionary, also it will make the
>>     >    start/reload times pretty slow
>>     >
>>     >    On Mon, Jan 20, 2020 at 11:47 AM Audrey Lorberfeld -
>>     >    [hidden email] <[hidden email]> wrote:
>>     >
>>     >> Hi All,
>>     >>
>>     >> We plan to incorporate a query autocomplete functionality into our
>> search
>>     >> engine (like this:
>> https://urldefense.proofpoint.com/v2/url?u=https-3A__lucene.apache.org_solr_guide_8-5F1_suggester.html&d=DwIBaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=_8ViuZIeSRdQjONA8yHWPZIBlhj291HU3JpNIx5a55M&m=L8V-izaMW_v4j-1zvfiXSqm6aAoaRtk-VJXA6okBs_U&s=vnE9KGyF3jky9fSi22XUJEEbKLM1CA7mWAKrl2qhKC0&e=
>>     >> ). And I was wondering if anyone has personal experience with this
>>     >> component and would like to share? Basically, we are just looking
>> for some
>>     >> best practices from more experienced Solr admins so that we have a
>> starting
>>     >> place to launch this in our beta.
>>     >>
>>     >> Thank you!
>>     >>
>>     >> Best,
>>     >> Audrey
>>     >>
>>     >
>>     >
>>
>>
>>
>>
Reply | Threaded
Open this post in threaded view
|

Re: Re: Re: Re: Anyone have experience with Query Auto-Suggestor?

Audrey Lorberfeld - Audrey.Lorberfeld@ibm.com
In reply to this post by David Hastings
David,

True! But we are hoping that these are purely seen as suggestions and that people, if they know exactly what they are wanting to type/looking for, will simply ignore the dropdown options.

On 1/24/20, 10:03 AM, "David Hastings" <[hidden email]> wrote:

    This is a really cool idea!  My only concern is that the edge case
    searches, where a user knows exactly what they want to find, would be
    autocomplete into something that happens to be more "successful" rather
    than what they were looking for.  for example, i want to know the legal
    implications of jay z's 99 problems.   most of the autocompletes i imagine
    would be for the lyrics for the song, or links to the video or jay z
    himself, when what im looking for is a line by line analysis of the song
    itself and how it relates to the fourth amendment:
    https://urldefense.proofpoint.com/v2/url?u=http-3A__pdf.textfiles.com_academics_lj56-2D2-5Fmason-5Farticle.pdf&d=DwIFaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=_8ViuZIeSRdQjONA8yHWPZIBlhj291HU3JpNIx5a55M&m=CPAGySYcW7hCqtFtjaThX2vIAhcKEMHHhYpqtqHkx-Q&s=XEyh7ewstUTlEuyKcYHaTU1vHMYA2-Db_nIYnl89yw4&e= 
   
    But in general this is a really clever idea, especially in the retail
    arena.  However i suspect your use case is more in research, and after
    years of dealing with lawyers and librarians, they tend to not like having
    their searches intercepted, they know what they're looking for and they
    tend to get mad if you assume they dont :)
   
    On Fri, Jan 24, 2020 at 9:59 AM Lucky Sharma <[hidden email]> wrote:
   
    > Hi Audrey,
    > As suggested by Erik, you can index the data into a seperate collection and
    > You can instead of adding weights inthe document you can also use LTR with
    > in Solr to rerank on the features.
    >
    > Regards,
    > Lucky Sharma
    >
    > On Fri, 24 Jan, 2020, 8:01 pm Audrey Lorberfeld -
    > [hidden email],
    > <[hidden email]> wrote:
    >
    > > Erik,
    > >
    > > Thank you! Yes, that's exactly how we were thinking of architecting it.
    > > And our ML engineer suggested something else for the suggestion weights,
    > > actually -- to build a model that would programmatically update the
    > weights
    > > based on those suggestions' live clicks @ position k, etc. Pretty cool
    > > idea...
    > >
    > >
    > >
    > > On 1/23/20, 2:26 PM, "Erik Hatcher" <[hidden email]> wrote:
    > >
    > >     It's a great idea.   And then index that file into a separate lean
    > > collection of just the suggestions, along with the weight as another
    > field
    > > on those documents, to use for ranking them at query time with standard
    > > /select queries.  (this separate suggest collection would also have
    > > appropriate tokenization to match the partial words as the user types,
    > like
    > > ngramming)
    > >
    > >         Erik
    > >
    > >
    > >     > On Jan 20, 2020, at 11:54 AM, Audrey Lorberfeld -
    > > [hidden email] <[hidden email]> wrote:
    > >     >
    > >     > David,
    > >     >
    > >     > Thank you, that is useful. So, would you recommend using a (clean)
    > > field over an external dictionary file? We have lots of "top queries" and
    > > measure their nDCG. A thought was to programmatically generate an
    > external
    > > file where the weight per query term (or phrase) == its nDCG. Bad idea?
    > >     >
    > >     > Best,
    > >     > Audrey
    > >     >
    > >     > On 1/20/20, 11:51 AM, "David Hastings" <
    > [hidden email]>
    > > wrote:
    > >     >
    > >     >    Ive used this quite a bit, my biggest piece of advice is to
    > > choose a field
    > >     >    that you know is clean, with well defined terms/words, you dont
    > > want an
    > >     >    autocomplete that has a massive dictionary, also it will make
    > the
    > >     >    start/reload times pretty slow
    > >     >
    > >     >    On Mon, Jan 20, 2020 at 11:47 AM Audrey Lorberfeld -
    > >     >    [hidden email] <[hidden email]> wrote:
    > >     >
    > >     >> Hi All,
    > >     >>
    > >     >> We plan to incorporate a query autocomplete functionality into our
    > > search
    > >     >> engine (like this:
    > >
    > https://urldefense.proofpoint.com/v2/url?u=https-3A__lucene.apache.org_solr_guide_8-5F1_suggester.html&d=DwIBaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=_8ViuZIeSRdQjONA8yHWPZIBlhj291HU3JpNIx5a55M&m=L8V-izaMW_v4j-1zvfiXSqm6aAoaRtk-VJXA6okBs_U&s=vnE9KGyF3jky9fSi22XUJEEbKLM1CA7mWAKrl2qhKC0&e=
    > >     >> ). And I was wondering if anyone has personal experience with this
    > >     >> component and would like to share? Basically, we are just looking
    > > for some
    > >     >> best practices from more experienced Solr admins so that we have a
    > > starting
    > >     >> place to launch this in our beta.
    > >     >>
    > >     >> Thank you!
    > >     >>
    > >     >> Best,
    > >     >> Audrey
    > >     >>
    > >     >
    > >     >
    > >
    > >
    > >
    > >
    >
   

Reply | Threaded
Open this post in threaded view
|

Re: Anyone have experience with Query Auto-Suggestor?

Walter Underwood
In reply to this post by Lucky Sharma
Click-based weights are vulnerable to spamming. Some of us fondly remember when
Google was showing Microsoft as the first hit for “evil empire” thanks to a click attack.

For our ecommerce search, we use the actual titles of books weighted by order volume.
Decorated titles are reduced to a base title, so “Managerial Accounting: Student Value Edition”
becomes just “Managerial Accounting”. Showing all the variations is the job of the
real results page.

wunder
Walter Underwood
[hidden email]
http://observer.wunderwood.org/  (my blog)

> On Jan 24, 2020, at 7:07 AM, Lucky Sharma <[hidden email]> wrote:
>
> Hi Audrey,
> As suggested by Erik, you can index the data into a seperate collection and
> You can instead of adding weights inthe document you can also use
> LTR(Learning to Rank) with in Solr to rerank on the documents.
> And also to increase more relevance with in the Autosuggestion and making
> positional context of the user in case of Multi token keywords you can also
> bigrams/trigrams to generate edge n-grams.
>
>
>
> Regards,
> Lucky Sharma
>
> On Fri, 24 Jan, 2020, 8:28 pm Lucky Sharma, <[hidden email]> wrote:
>
>> Hi Audrey,
>> As suggested by Erik, you can index the data into a seperate collection
>> and You can instead of adding weights inthe document you can also use LTR
>> with in Solr to rerank on the features.
>>
>> Regards,
>> Lucky Sharma
>>
>> On Fri, 24 Jan, 2020, 8:01 pm Audrey Lorberfeld -
>> [hidden email], <[hidden email]> wrote:
>>
>>> Erik,
>>>
>>> Thank you! Yes, that's exactly how we were thinking of architecting it.
>>> And our ML engineer suggested something else for the suggestion weights,
>>> actually -- to build a model that would programmatically update the weights
>>> based on those suggestions' live clicks @ position k, etc. Pretty cool
>>> idea...
>>>
>>>
>>>
>>> On 1/23/20, 2:26 PM, "Erik Hatcher" <[hidden email]> wrote:
>>>
>>>    It's a great idea.   And then index that file into a separate lean
>>> collection of just the suggestions, along with the weight as another field
>>> on those documents, to use for ranking them at query time with standard
>>> /select queries.  (this separate suggest collection would also have
>>> appropriate tokenization to match the partial words as the user types, like
>>> ngramming)
>>>
>>>        Erik
>>>
>>>
>>>> On Jan 20, 2020, at 11:54 AM, Audrey Lorberfeld -
>>> [hidden email] <[hidden email]> wrote:
>>>>
>>>> David,
>>>>
>>>> Thank you, that is useful. So, would you recommend using a (clean)
>>> field over an external dictionary file? We have lots of "top queries" and
>>> measure their nDCG. A thought was to programmatically generate an external
>>> file where the weight per query term (or phrase) == its nDCG. Bad idea?
>>>>
>>>> Best,
>>>> Audrey
>>>>
>>>> On 1/20/20, 11:51 AM, "David Hastings" <
>>> [hidden email]> wrote:
>>>>
>>>>   Ive used this quite a bit, my biggest piece of advice is to
>>> choose a field
>>>>   that you know is clean, with well defined terms/words, you dont
>>> want an
>>>>   autocomplete that has a massive dictionary, also it will make the
>>>>   start/reload times pretty slow
>>>>
>>>>   On Mon, Jan 20, 2020 at 11:47 AM Audrey Lorberfeld -
>>>>   [hidden email] <[hidden email]> wrote:
>>>>
>>>>> Hi All,
>>>>>
>>>>> We plan to incorporate a query autocomplete functionality into our
>>> search
>>>>> engine (like this:
>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__lucene.apache.org_solr_guide_8-5F1_suggester.html&d=DwIBaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=_8ViuZIeSRdQjONA8yHWPZIBlhj291HU3JpNIx5a55M&m=L8V-izaMW_v4j-1zvfiXSqm6aAoaRtk-VJXA6okBs_U&s=vnE9KGyF3jky9fSi22XUJEEbKLM1CA7mWAKrl2qhKC0&e=
>>>>> ). And I was wondering if anyone has personal experience with this
>>>>> component and would like to share? Basically, we are just looking
>>> for some
>>>>> best practices from more experienced Solr admins so that we have a
>>> starting
>>>>> place to launch this in our beta.
>>>>>
>>>>> Thank you!
>>>>>
>>>>> Best,
>>>>> Audrey
>>>>>
>>>>
>>>>
>>>
>>>
>>>
>>>

Reply | Threaded
Open this post in threaded view
|

Re: Re: Anyone have experience with Query Auto-Suggestor?

Audrey Lorberfeld - Audrey.Lorberfeld@ibm.com
Oh, great! Thank you, this is helpful!

On 1/24/20, 6:43 PM, "Walter Underwood" <[hidden email]> wrote:

    Click-based weights are vulnerable to spamming. Some of us fondly remember when
    Google was showing Microsoft as the first hit for “evil empire” thanks to a click attack.
   
    For our ecommerce search, we use the actual titles of books weighted by order volume.
    Decorated titles are reduced to a base title, so “Managerial Accounting: Student Value Edition”
    becomes just “Managerial Accounting”. Showing all the variations is the job of the
    real results page.
   
    wunder
    Walter Underwood
    [hidden email]
    https://urldefense.proofpoint.com/v2/url?u=http-3A__observer.wunderwood.org_&d=DwIFaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=_8ViuZIeSRdQjONA8yHWPZIBlhj291HU3JpNIx5a55M&m=3oEhRJWEHDoz3HXt87Y_FXxPTUZg1zSA5r4P6urviug&s=87IOY_vKNONtR2r2IkW-NnZ4Rn3wI-OIO6RSdqdOMfU&e=   (my blog)
   
    > On Jan 24, 2020, at 7:07 AM, Lucky Sharma <[hidden email]> wrote:
    >
    > Hi Audrey,
    > As suggested by Erik, you can index the data into a seperate collection and
    > You can instead of adding weights inthe document you can also use
    > LTR(Learning to Rank) with in Solr to rerank on the documents.
    > And also to increase more relevance with in the Autosuggestion and making
    > positional context of the user in case of Multi token keywords you can also
    > bigrams/trigrams to generate edge n-grams.
    >
    >
    >
    > Regards,
    > Lucky Sharma
    >
    > On Fri, 24 Jan, 2020, 8:28 pm Lucky Sharma, <[hidden email]> wrote:
    >
    >> Hi Audrey,
    >> As suggested by Erik, you can index the data into a seperate collection
    >> and You can instead of adding weights inthe document you can also use LTR
    >> with in Solr to rerank on the features.
    >>
    >> Regards,
    >> Lucky Sharma
    >>
    >> On Fri, 24 Jan, 2020, 8:01 pm Audrey Lorberfeld -
    >> [hidden email], <[hidden email]> wrote:
    >>
    >>> Erik,
    >>>
    >>> Thank you! Yes, that's exactly how we were thinking of architecting it.
    >>> And our ML engineer suggested something else for the suggestion weights,
    >>> actually -- to build a model that would programmatically update the weights
    >>> based on those suggestions' live clicks @ position k, etc. Pretty cool
    >>> idea...
    >>>
    >>>
    >>>
    >>> On 1/23/20, 2:26 PM, "Erik Hatcher" <[hidden email]> wrote:
    >>>
    >>>    It's a great idea.   And then index that file into a separate lean
    >>> collection of just the suggestions, along with the weight as another field
    >>> on those documents, to use for ranking them at query time with standard
    >>> /select queries.  (this separate suggest collection would also have
    >>> appropriate tokenization to match the partial words as the user types, like
    >>> ngramming)
    >>>
    >>>        Erik
    >>>
    >>>
    >>>> On Jan 20, 2020, at 11:54 AM, Audrey Lorberfeld -
    >>> [hidden email] <[hidden email]> wrote:
    >>>>
    >>>> David,
    >>>>
    >>>> Thank you, that is useful. So, would you recommend using a (clean)
    >>> field over an external dictionary file? We have lots of "top queries" and
    >>> measure their nDCG. A thought was to programmatically generate an external
    >>> file where the weight per query term (or phrase) == its nDCG. Bad idea?
    >>>>
    >>>> Best,
    >>>> Audrey
    >>>>
    >>>> On 1/20/20, 11:51 AM, "David Hastings" <
    >>> [hidden email]> wrote:
    >>>>
    >>>>   Ive used this quite a bit, my biggest piece of advice is to
    >>> choose a field
    >>>>   that you know is clean, with well defined terms/words, you dont
    >>> want an
    >>>>   autocomplete that has a massive dictionary, also it will make the
    >>>>   start/reload times pretty slow
    >>>>
    >>>>   On Mon, Jan 20, 2020 at 11:47 AM Audrey Lorberfeld -
    >>>>   [hidden email] <[hidden email]> wrote:
    >>>>
    >>>>> Hi All,
    >>>>>
    >>>>> We plan to incorporate a query autocomplete functionality into our
    >>> search
    >>>>> engine (like this:
    >>> https://urldefense.proofpoint.com/v2/url?u=https-3A__lucene.apache.org_solr_guide_8-5F1_suggester.html&d=DwIBaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=_8ViuZIeSRdQjONA8yHWPZIBlhj291HU3JpNIx5a55M&m=L8V-izaMW_v4j-1zvfiXSqm6aAoaRtk-VJXA6okBs_U&s=vnE9KGyF3jky9fSi22XUJEEbKLM1CA7mWAKrl2qhKC0&e=
    >>>>> ). And I was wondering if anyone has personal experience with this
    >>>>> component and would like to share? Basically, we are just looking
    >>> for some
    >>>>> best practices from more experienced Solr admins so that we have a
    >>> starting
    >>>>> place to launch this in our beta.
    >>>>>
    >>>>> Thank you!
    >>>>>
    >>>>> Best,
    >>>>> Audrey
    >>>>>
    >>>>
    >>>>
    >>>
    >>>
    >>>
    >>>
   
   

Reply | Threaded
Open this post in threaded view
|

Re: Re: Re: Anyone have experience with Query Auto-Suggestor?

Audrey Lorberfeld - Audrey.Lorberfeld@ibm.com
In reply to this post by Erik Hatcher-4
Hi all, reviving this thread.

For those of you who use an external file for your suggestions, how do you decide from your query logs what suggestions to include? Just starting out with some exploratory analysis of clicks, dwell times, etc., and would love to hear from the community any advise.

Thanks!

Best,
Audrey

On 1/23/20, 2:26 PM, "Erik Hatcher" <[hidden email]> wrote:

    It's a great idea.   And then index that file into a separate lean collection of just the suggestions, along with the weight as another field on those documents, to use for ranking them at query time with standard /select queries.  (this separate suggest collection would also have appropriate tokenization to match the partial words as the user types, like ngramming)
   
    Erik
   
   
    > On Jan 20, 2020, at 11:54 AM, Audrey Lorberfeld - [hidden email] <[hidden email]> wrote:
    >
    > David,
    >
    > Thank you, that is useful. So, would you recommend using a (clean) field over an external dictionary file? We have lots of "top queries" and measure their nDCG. A thought was to programmatically generate an external file where the weight per query term (or phrase) == its nDCG. Bad idea?
    >
    > Best,
    > Audrey
    >
    > On 1/20/20, 11:51 AM, "David Hastings" <[hidden email]> wrote:
    >
    >    Ive used this quite a bit, my biggest piece of advice is to choose a field
    >    that you know is clean, with well defined terms/words, you dont want an
    >    autocomplete that has a massive dictionary, also it will make the
    >    start/reload times pretty slow
    >
    >    On Mon, Jan 20, 2020 at 11:47 AM Audrey Lorberfeld -
    >    [hidden email] <[hidden email]> wrote:
    >
    >> Hi All,
    >>
    >> We plan to incorporate a query autocomplete functionality into our search
    >> engine (like this: https://urldefense.proofpoint.com/v2/url?u=https-3A__lucene.apache.org_solr_guide_8-5F1_suggester.html&d=DwIBaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=_8ViuZIeSRdQjONA8yHWPZIBlhj291HU3JpNIx5a55M&m=L8V-izaMW_v4j-1zvfiXSqm6aAoaRtk-VJXA6okBs_U&s=vnE9KGyF3jky9fSi22XUJEEbKLM1CA7mWAKrl2qhKC0&e= 
    >> ). And I was wondering if anyone has personal experience with this
    >> component and would like to share? Basically, we are just looking for some
    >> best practices from more experienced Solr admins so that we have a starting
    >> place to launch this in our beta.
    >>
    >> Thank you!
    >>
    >> Best,
    >> Audrey
    >>
    >
    >