Autocomplete terms from the middle of name/description of a Doc

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Autocomplete terms from the middle of name/description of a Doc

Ugo Matrangolo
Hi,

I'm working on making our autocomplete engine a bit more smart.

The actual impl is a basic facet based autocompletion as described in the
'SOLR 3 Enterprise Search' book: we use all the typed tokens except the
last one to build a facet.prefix query on an autocomplete facet field we
built at index time.

This allows us to have something like:

'espress' --> '*espress*o machine', '*espress*o maker', etc

We want something like:

'espress' -> '*espress*o machine', '*espress*o maker', 'kMix *espress*o
maker'

Note that the last suggested term could be not obtained by quering on the
facet prefix as we do now. What we need is a way to find the 'espress'
string in the middle of the name/description of our products.

Any suggestions ?

Cheers,
Ugo
Reply | Threaded
Open this post in threaded view
|

Re: Autocomplete terms from the middle of name/description of a Doc

Chantal Ackermann-2
Hi Ugo,

You can use facet.prefix on a tokenized field instead of a String field.

Example:
<field name="product" type="string" … />
<field name="product_tokens" type="text_split" … /><!-- use e.g. WhitespaceTokenizer or WordDelimiter and others, see example schema.xml that comes with SOLR -->

facet.prefix on "product" will only return hits that match the start of the single token stored in that field.
As "product_tokens" contains the value of "product" tokenized in a fashion that suites you, it can contain multiple tokens. facet.prefix on "product_tokens" will return hits that match *any* of these tokens - which is what you want.

Chantal

Am 25.07.2012 um 15:29 schrieb Ugo Matrangolo:

> Hi,
>
> I'm working on making our autocomplete engine a bit more smart.
>
> The actual impl is a basic facet based autocompletion as described in the
> 'SOLR 3 Enterprise Search' book: we use all the typed tokens except the
> last one to build a facet.prefix query on an autocomplete facet field we
> built at index time.
>
> This allows us to have something like:
>
> 'espress' --> '*espress*o machine', '*espress*o maker', etc
>
> We want something like:
>
> 'espress' -> '*espress*o machine', '*espress*o maker', 'kMix *espress*o
> maker'
>
> Note that the last suggested term could be not obtained by quering on the
> facet prefix as we do now. What we need is a way to find the 'espress'
> string in the middle of the name/description of our products.
>
> Any suggestions ?
>
> Cheers,
> Ugo

Reply | Threaded
Open this post in threaded view
|

Re: Autocomplete terms from the middle of name/description of a Doc

Ugo Matrangolo
Hi,

thank you for the suggestions. However, I think that this is not going to
work.

Suppose I have a product with a title='kMix Espresso maker'. If I tokenize
this and put the result in product_tokens I should get
'[kMix][Espresso][maker]'.

If now I try to search with facet.field='product_tokens' and
facet.prefix='espresso' I should get only 'espresso' while I want 'kMix
Espresso maker'.

Is that correct ?

Cheers,
Ugo.

On Wed, Jul 25, 2012 at 3:11 PM, Chantal Ackermann <
[hidden email]> wrote:

> Hi Ugo,
>
> You can use facet.prefix on a tokenized field instead of a String field.
>
> Example:
> <field name="product" type="string" … />
> <field name="product_tokens" type="text_split" … /><!-- use e.g.
> WhitespaceTokenizer or WordDelimiter and others, see example schema.xml
> that comes with SOLR -->
>
> facet.prefix on "product" will only return hits that match the start of
> the single token stored in that field.
> As "product_tokens" contains the value of "product" tokenized in a fashion
> that suites you, it can contain multiple tokens. facet.prefix on
> "product_tokens" will return hits that match *any* of these tokens - which
> is what you want.
>
> Chantal
>
> Am 25.07.2012 um 15:29 schrieb Ugo Matrangolo:
>
> > Hi,
> >
> > I'm working on making our autocomplete engine a bit more smart.
> >
> > The actual impl is a basic facet based autocompletion as described in the
> > 'SOLR 3 Enterprise Search' book: we use all the typed tokens except the
> > last one to build a facet.prefix query on an autocomplete facet field we
> > built at index time.
> >
> > This allows us to have something like:
> >
> > 'espress' --> '*espress*o machine', '*espress*o maker', etc
> >
> > We want something like:
> >
> > 'espress' -> '*espress*o machine', '*espress*o maker', 'kMix *espress*o
> > maker'
> >
> > Note that the last suggested term could be not obtained by quering on the
> > facet prefix as we do now. What we need is a way to find the 'espress'
> > string in the middle of the name/description of our products.
> >
> > Any suggestions ?
> >
> > Cheers,
> > Ugo
>
>
Reply | Threaded
Open this post in threaded view
|

Re: Autocomplete terms from the middle of name/description of a Doc

Chantal Ackermann-2

> Suppose I have a product with a title='kMix Espresso maker'. If I tokenize
> this and put the result in product_tokens I should get
> '[kMix][Espresso][maker]'.
>
> If now I try to search with facet.field='product_tokens' and
> facet.prefix='espresso' I should get only 'espresso' while I want 'kMix
> Espresso maker'.

Yes, you are probably right. I did use this approach at somepoint. Your remark has made me check my code again.
I was using n_gram in the end.

(facet.prefix on tokenized fields might work in certain circumstances where you can get the actual value from the string field (or its facet) in parallel.)

This is the jquery autocomplete plugin instantiation:

        $(function() {
                $("#qterm").autocomplete({
                        minLength: 1,
                        source: function(request,response) {
                                jQuery.ajax({
                                        url: "/solr/select",
                                        dataType: "json",
                                        data: {
    q : "title_ngrams:\"" + request.term + "\"",
        rows: 0,
        facet: "true",
        "facet.field": "title",
        "facet.mincount": 1,
        "facet.sort": "index",
        "facet.limit": 10,
                                                "fq": "end_date:[NOW TO *]"
    wt: "json"
                                        },
                                        success: function( data ) {
                                                /*var result = jQuery.map( data.facet_counts.facet_fields.title, function( item, index ) {
                                                        if (index%2) return null;
                                                        else return {
                                                                //label: item,
                                                                value: item
                                                        }
                                                });*/
                                                var result = [];
                                                var facets = data.facet_counts.facet_fields.title;
                                                var j = 0;
        for (i=0; i<facets.length; i=i+2) {
          result[j] = facets[i];
        j = j+1;
        }
                                                response(result);
                                        }
                                });
                        }
                });

And here the fieldtype ngram for "title_ngram". "title" is a string type field.

                <!-- NGram configuration for searching for wordparts without the use of wildcards.
                        This is for suggesting search terms e.g. sourcing an autocomplete widget. -->
                <fieldType name="ngram" class="solr.TextField">
                        <analyzer type="index">
                                <tokenizer class="solr.KeywordTokenizerFactory" />
                                <filter class="solr.LengthFilterFactory" min="1" max="500" />
                                <filter class="solr.TrimFilterFactory" />
                                <filter class="solr.ISOLatin1AccentFilterFactory" />
                                <filter class="solr.WordDelimiterFilterFactory" splitOnCaseChange="1"
                                 splitOnNumerics="1" stemEnglishPossessive="1" generateWordParts="1"
                                 generateNumberParts="1" catenateAll="1" preserveOriginal="1" />
                                <filter class="solr.LowerCaseFilterFactory" />
                                <filter class="solr.EdgeNGramFilterFactory" minGramSize="2" maxGramSize="15" side="front"/>
                                <filter class="solr.RemoveDuplicatesTokenFilterFactory" />
                        </analyzer>
                        <analyzer type="query">
                                <tokenizer class="solr.KeywordTokenizerFactory" />
                                <filter class="solr.TrimFilterFactory" />
                                <filter class="solr.ISOLatin1AccentFilterFactory" />
                                <filter class="solr.WordDelimiterFilterFactory" splitOnCaseChange="1"
                                 splitOnNumerics="1" stemEnglishPossessive="1" generateWordParts="1"
                                 generateNumberParts="1" catenateAll="0" preserveOriginal="1" />
                                <filter class="solr.LowerCaseFilterFactory" />
                        </analyzer>
                </fieldType>

Hope this one gets you going…
Chantal
Reply | Threaded
Open this post in threaded view
|

Re: Autocomplete terms from the middle of name/description of a Doc

Rajinimaski
Hi,

   One approach for this can be to get fact.prefix results for prefix based
suggests and for suggesting names from middle of doc what you can do is
index that name field with white space and edge ngram filter; search on
that field with prefix key word and fl=title only.. Then concatenate both :
facet prefix results and doc fields obtained for that search.

Ex: user searched for "lcd"
query should be  :  q=name_edgramed=lcd&facet.prefix= lcd &fl=
name_edgramed.

You will get documents matched results having this keyword and also faceted
results with this prefix.

--Rajani







On Thu, Jul 26, 2012 at 12:21 AM, Chantal Ackermann <
[hidden email]> wrote:

>
> > Suppose I have a product with a title='kMix Espresso maker'. If I
> tokenize
> > this and put the result in product_tokens I should get
> > '[kMix][Espresso][maker]'.
> >
> > If now I try to search with facet.field='product_tokens' and
> > facet.prefix='espresso' I should get only 'espresso' while I want 'kMix
> > Espresso maker'.
>
> Yes, you are probably right. I did use this approach at somepoint. Your
> remark has made me check my code again.
> I was using n_gram in the end.
>
> (facet.prefix on tokenized fields might work in certain circumstances
> where you can get the actual value from the string field (or its facet) in
> parallel.)
>
> This is the jquery autocomplete plugin instantiation:
>
>         $(function() {
>                 $("#qterm").autocomplete({
>                         minLength: 1,
>                         source: function(request,response) {
>                                 jQuery.ajax({
>                                         url: "/solr/select",
>                                         dataType: "json",
>                                         data: {
>                                         q : "title_ngrams:\"" +
> request.term + "\"",
>                                         rows: 0,
>                                         facet: "true",
>                                         "facet.field": "title",
>                                         "facet.mincount": 1,
>                                         "facet.sort": "index",
>                                         "facet.limit": 10,
>                                                 "fq": "end_date:[NOW TO *]"
>                                         wt: "json"
>                                         },
>                                         success: function( data ) {
>                                                 /*var result = jQuery.map(
> data.facet_counts.facet_fields.title, function( item, index ) {
>                                                         if (index%2)
> return null;
>                                                         else return {
>                                                                 //label:
> item,
>                                                                 value: item
>                                                         }
>                                                 });*/
>                                                 var result = [];
>                                                 var facets =
> data.facet_counts.facet_fields.title;
>                                                 var j = 0;
>                                         for (i=0; i<facets.length; i=i+2) {
>                                                 result[j] = facets[i];
>                                                 j = j+1;
>                                         }
>                                                 response(result);
>                                         }
>                                 });
>                         }
>                 });
>
> And here the fieldtype ngram for "title_ngram". "title" is a string type
> field.
>
>                 <!-- NGram configuration for searching for wordparts
> without the use of wildcards.
>                         This is for suggesting search terms e.g. sourcing
> an autocomplete widget. -->
>                 <fieldType name="ngram" class="solr.TextField">
>                         <analyzer type="index">
>                                 <tokenizer
> class="solr.KeywordTokenizerFactory" />
>                                 <filter class="solr.LengthFilterFactory"
> min="1" max="500" />
>                                 <filter class="solr.TrimFilterFactory" />
>                                 <filter
> class="solr.ISOLatin1AccentFilterFactory" />
>                                 <filter
> class="solr.WordDelimiterFilterFactory" splitOnCaseChange="1"
>                                  splitOnNumerics="1"
> stemEnglishPossessive="1" generateWordParts="1"
>                                  generateNumberParts="1" catenateAll="1"
> preserveOriginal="1" />
>                                 <filter
> class="solr.LowerCaseFilterFactory" />
>                                 <filter
> class="solr.EdgeNGramFilterFactory" minGramSize="2" maxGramSize="15"
> side="front"/>
>                                 <filter
> class="solr.RemoveDuplicatesTokenFilterFactory" />
>                         </analyzer>
>                         <analyzer type="query">
>                                 <tokenizer
> class="solr.KeywordTokenizerFactory" />
>                                 <filter class="solr.TrimFilterFactory" />
>                                 <filter
> class="solr.ISOLatin1AccentFilterFactory" />
>                                 <filter
> class="solr.WordDelimiterFilterFactory" splitOnCaseChange="1"
>                                  splitOnNumerics="1"
> stemEnglishPossessive="1" generateWordParts="1"
>                                  generateNumberParts="1" catenateAll="0"
> preserveOriginal="1" />
>                                 <filter
> class="solr.LowerCaseFilterFactory" />
>                         </analyzer>
>                 </fieldType>
>
> Hope this one gets you going…
> Chantal