Implementing phrase autopop up

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Implementing phrase autopop up

Darniz
hello all
Let me first explain the task i am trying to do.
i have article with title for example
<doc>
<str name="title">>Car Insurance for Teenage Drivers</str>
</doc>

<doc>
<str name="title">A Total Loss? </str>
</doc>
If a user begins to type car insu i want the autopop to show up with the entire phrase.
There are two ways to implement this.
First is to use the termcomponent and the other is to use a field with field type which uses solr.EdgeNGramFilterFactor filter.

I started with using with Term component and i declared a term request handler and gave the following query

http://localhost:8080/solr/terms?terms.fl=title&terms.prefix=car
The issue is that its not giving the entire pharse, it gives me back results like car, caravan, carbon. Now  i know using terms.prefix will only give me results where the sentence start with car. On top of this i also want if there is word like car somewhere in between the title that should also show up in autopop very much similar like google where a word is not necessarily start at the beginning but it could be present anywhere in the middle of the title.
The question is does TermComponent is a good candidate or  using a custom field lets the name is autoPopupText with field type configured with all filter and EdgeNGramFilterFactor defined and copying the title to the autoPopupText field and using it to power autopopup.

The other thing is that using  EdgeNGramFilterFactor is more from index point of view when you index document you need to know which fields you want to copy to autoPopupText field where as using Term component is more like you can define at query time what fields you want to use to fetch autocomplete from.

Any idea whats the best and why the Term component is not giving me an entire phrase which i mentioned earlier.
FYI
my title field is of type text.
Thanks
darniz
Reply | Threaded
Open this post in threaded view
|

Re: Implementing phrase autopop up

Shalin Shekhar Mangar
On Tue, Nov 24, 2009 at 10:12 AM, darniz <[hidden email]> wrote:

>
> hello all
> Let me first explain the task i am trying to do.
> i have article with title for example
> <doc>
> <str name="title">>Car Insurance for Teenage Drivers</str>
> </doc>
> −
> <doc>
> <str name="title">A Total Loss? </str>
> </doc>
> If a user begins to type car insu i want the autopop to show up with the
> entire phrase.
> There are two ways to implement this.
> First is to use the termcomponent and the other is to use a field with
> field
> type which uses solr.EdgeNGramFilterFactor filter.
>
> I started with using with Term component and i declared a term request
> handler and gave the following query
>
> http://localhost:8080/solr/terms?terms.fl=title&terms.prefix=car
> The issue is that its not giving the entire pharse, it gives me back
> results
> like car, caravan, carbon. Now  i know using terms.prefix will only give me
> results where the sentence start with car. On top of this i also want if
> there is word like car somewhere in between the title that should also show
> up in autopop very much similar like google where a word is not necessarily
> start at the beginning but it could be present anywhere in the middle of
> the
> title.
> The question is does TermComponent is a good candidate or  using a custom
> field lets the name is autoPopupText with field type configured with all
> filter and EdgeNGramFilterFactor defined and copying the title to the
> autoPopupText field and using it to power autopopup.
>
> The other thing is that using  EdgeNGramFilterFactor is more from index
> point of view when you index document you need to know which fields you
> want
> to copy to autoPopupText field where as using Term component is more like
> you can define at query time what fields you want to use to fetch
> autocomplete from.
>
> Any idea whats the best and why the Term component is not giving me an
> entire phrase which i mentioned earlier.
> FYI
> my title field is of type text.
>


You are using a tokenized field type with TermsComponent therefore each word
in your phrase gets indexed as a separate token. You should use a
non-tokenized type (such as a string type) with TermsComponent. However,
this will only let you search by prefix and not by words in between the
phrase.

Your best bet here would be to use EdgeNGramFilterFactory. If your index is
very large, you can consider doing a prefix search on shingles too.

--
Regards,
Shalin Shekhar Mangar.
Reply | Threaded
Open this post in threaded view
|

Re: Implementing phrase autopop up

Darniz
Thanks for your input
You made a valid point, if we are using field type as text to get autocomplete it wont work because it goes through tokenizer.
Hence looks like for my use case i need to have a field which uses ngram and copy. Here is what i did

i created a filed as same as the lucid blog says.

<field name="autocomp" type="edgytext" indexed="true" stored="true" omitNorms="true" omitTermFreqAndPositions="true"/>

with the following field configurtion

<fieldType name="edgytext" class="solr.TextField" positionIncrementGap="100">

<analyzer type="index">
<tokenizer class="solr.KeywordTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.EdgeNGramFilterFactory" minGramSize="1" maxGramSize="25"/>
</analyzer>

<analyzer type="query">
<tokenizer class="solr.KeywordTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
</fieldType>

Now when i query i get the correct phrases for example if search for
autocomp:"how to" i get all the correct phrases like

How to find a car
How to find a mechanic
How to choose the right insurance company

etc... which is good.

Now I have two question.
1) Is it necessary to give the query in quote. My gut feeling is yes, since  if you dont give quote i get phrases beginning with How followed by some other words like How can etc...

2)if i search for word for example choose, it gives me nothing
I was expecting to see a result considering there is a word "choose" in the phrase
How to choose the right insurance company

i might look more at documentation but do you have anything to advice.

darniz








Shalin Shekhar Mangar wrote
On Tue, Nov 24, 2009 at 10:12 AM, darniz <rnizamuddin@edmunds.com> wrote:

>
> hello all
> Let me first explain the task i am trying to do.
> i have article with title for example
> <doc>
> <str name="title">>Car Insurance for Teenage Drivers</str>
> </doc>
> −
> <doc>
> <str name="title">A Total Loss? </str>
> </doc>
> If a user begins to type car insu i want the autopop to show up with the
> entire phrase.
> There are two ways to implement this.
> First is to use the termcomponent and the other is to use a field with
> field
> type which uses solr.EdgeNGramFilterFactor filter.
>
> I started with using with Term component and i declared a term request
> handler and gave the following query
>
> http://localhost:8080/solr/terms?terms.fl=title&terms.prefix=car
> The issue is that its not giving the entire pharse, it gives me back
> results
> like car, caravan, carbon. Now  i know using terms.prefix will only give me
> results where the sentence start with car. On top of this i also want if
> there is word like car somewhere in between the title that should also show
> up in autopop very much similar like google where a word is not necessarily
> start at the beginning but it could be present anywhere in the middle of
> the
> title.
> The question is does TermComponent is a good candidate or  using a custom
> field lets the name is autoPopupText with field type configured with all
> filter and EdgeNGramFilterFactor defined and copying the title to the
> autoPopupText field and using it to power autopopup.
>
> The other thing is that using  EdgeNGramFilterFactor is more from index
> point of view when you index document you need to know which fields you
> want
> to copy to autoPopupText field where as using Term component is more like
> you can define at query time what fields you want to use to fetch
> autocomplete from.
>
> Any idea whats the best and why the Term component is not giving me an
> entire phrase which i mentioned earlier.
> FYI
> my title field is of type text.
>


You are using a tokenized field type with TermsComponent therefore each word
in your phrase gets indexed as a separate token. You should use a
non-tokenized type (such as a string type) with TermsComponent. However,
this will only let you search by prefix and not by words in between the
phrase.

Your best bet here would be to use EdgeNGramFilterFactory. If your index is
very large, you can consider doing a prefix search on shingles too.

--
Regards,
Shalin Shekhar Mangar.
Reply | Threaded
Open this post in threaded view
|

Re: Implementing phrase autopop up

Darniz
can anybody update me if its possible that a word within a phrase is match, that phrase can be displayed.

darniz
darniz wrote
Thanks for your input
You made a valid point, if we are using field type as text to get autocomplete it wont work because it goes through tokenizer.
Hence looks like for my use case i need to have a field which uses ngram and copy. Here is what i did

i created a filed as same as the lucid blog says.

<field name="autocomp" type="edgytext" indexed="true" stored="true" omitNorms="true" omitTermFreqAndPositions="true"/>

with the following field configurtion

<fieldType name="edgytext" class="solr.TextField" positionIncrementGap="100">

<analyzer type="index">
<tokenizer class="solr.KeywordTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.EdgeNGramFilterFactory" minGramSize="1" maxGramSize="25"/>
</analyzer>

<analyzer type="query">
<tokenizer class="solr.KeywordTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
</fieldType>

Now when i query i get the correct phrases for example if search for
autocomp:"how to" i get all the correct phrases like

How to find a car
How to find a mechanic
How to choose the right insurance company

etc... which is good.

Now I have two question.
1) Is it necessary to give the query in quote. My gut feeling is yes, since  if you dont give quote i get phrases beginning with How followed by some other words like How can etc...

2)if i search for word for example choose, it gives me nothing
I was expecting to see a result considering there is a word "choose" in the phrase
How to choose the right insurance company

i might look more at documentation but do you have anything to advice.

darniz








Shalin Shekhar Mangar wrote
On Tue, Nov 24, 2009 at 10:12 AM, darniz <rnizamuddin@edmunds.com> wrote:

>
> hello all
> Let me first explain the task i am trying to do.
> i have article with title for example
> <doc>
> <str name="title">>Car Insurance for Teenage Drivers</str>
> </doc>
> −
> <doc>
> <str name="title">A Total Loss? </str>
> </doc>
> If a user begins to type car insu i want the autopop to show up with the
> entire phrase.
> There are two ways to implement this.
> First is to use the termcomponent and the other is to use a field with
> field
> type which uses solr.EdgeNGramFilterFactor filter.
>
> I started with using with Term component and i declared a term request
> handler and gave the following query
>
> http://localhost:8080/solr/terms?terms.fl=title&terms.prefix=car
> The issue is that its not giving the entire pharse, it gives me back
> results
> like car, caravan, carbon. Now  i know using terms.prefix will only give me
> results where the sentence start with car. On top of this i also want if
> there is word like car somewhere in between the title that should also show
> up in autopop very much similar like google where a word is not necessarily
> start at the beginning but it could be present anywhere in the middle of
> the
> title.
> The question is does TermComponent is a good candidate or  using a custom
> field lets the name is autoPopupText with field type configured with all
> filter and EdgeNGramFilterFactor defined and copying the title to the
> autoPopupText field and using it to power autopopup.
>
> The other thing is that using  EdgeNGramFilterFactor is more from index
> point of view when you index document you need to know which fields you
> want
> to copy to autoPopupText field where as using Term component is more like
> you can define at query time what fields you want to use to fetch
> autocomplete from.
>
> Any idea whats the best and why the Term component is not giving me an
> entire phrase which i mentioned earlier.
> FYI
> my title field is of type text.
>


You are using a tokenized field type with TermsComponent therefore each word
in your phrase gets indexed as a separate token. You should use a
non-tokenized type (such as a string type) with TermsComponent. However,
this will only let you search by prefix and not by words in between the
phrase.

Your best bet here would be to use EdgeNGramFilterFactory. If your index is
very large, you can consider doing a prefix search on shingles too.

--
Regards,
Shalin Shekhar Mangar.
Reply | Threaded
Open this post in threaded view
|

Re: Implementing phrase autopop up

Shalin Shekhar Mangar
In reply to this post by Darniz
On Tue, Nov 24, 2009 at 11:58 PM, darniz <[hidden email]> wrote:

>
>
> i created a filed as same as the lucid blog says.
>
> <field name="autocomp" type="edgytext" indexed="true" stored="true"
> omitNorms="true" omitTermFreqAndPositions="true"/>
>
> with the following field configurtion
>
> <fieldType name="edgytext" class="solr.TextField"
> positionIncrementGap="100">
> −
> <analyzer type="index">
> <tokenizer class="solr.KeywordTokenizerFactory"/>
> <filter class="solr.LowerCaseFilterFactory"/>
> <filter class="solr.EdgeNGramFilterFactory" minGramSize="1"
> maxGramSize="25"/>
> </analyzer>
> −
> <analyzer type="query">
> <tokenizer class="solr.KeywordTokenizerFactory"/>
> <filter class="solr.LowerCaseFilterFactory"/>
> </analyzer>
> </fieldType>
>
> Now when i query i get the correct phrases for example if search for
> autocomp:"how to" i get all the correct phrases like
>
> How to find a car
> How to find a mechanic
> How to choose the right insurance company
>
> etc... which is good.
>
> Now I have two question.
> 1) Is it necessary to give the query in quote. My gut feeling is yes, since
> if you dont give quote i get phrases beginning with How followed by some
> other words like How can etc...
>

Yes since we want to do phrase searches on n-grams



> 2)if i search for word for example choose, it gives me nothing
> I was expecting to see a result considering there is a word "choose" in the
> phrase
> How to choose the right insurance company
>
> i might look more at documentation but do you have anything to advice.
>
>
EdgeNgram creates n-grams from the starting or the ending edge therefore you
can't match words in the middle of a phrase. Try using NGramFilterFactory
instead.

--
Regards,
Shalin Shekhar Mangar.