split on white space and then EdgeNGramFilterFactory

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

split on white space and then EdgeNGramFilterFactory

Rajinimaski
Hi,

   I wanted to do split on white space and then apply
EdgeNGramFilterFactory.

Example : A field in a document has text content : "smart phone, i24
xpress exchange offer, 500 dollars"

smart s sm sma smar smart
phone p ph pho phon phone
i24  i i2 i24
xpress x xp xpr xpre xpres xpress

so on.....

If I search on  "xpres"  I should get this document record matched

What field type can support this?

I was trying with below one but was not able to achieve the above
requirement.

<fieldType name="text" class="solr.TextField" positionIncrementGap="100">
<analyzer>
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.EdgeNGramFilterFactory" minGramSize="1" maxGramSize="25"
/>
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
</type>

Any suggestions?

Thanks,
Rajani
Reply | Threaded
Open this post in threaded view
|

Re: split on white space and then EdgeNGramFilterFactory

Jack Krupansky-2
Only do the ngram filter at index time. So, add a query-time analyzer to
that field type but without the ngram filter.

Also, add &debugQuery to your query request to see what Lucene query is
generated.

And, use the Solr admin analyzer to validate both index-time and query-time
analysis of your terms.

-- Jack Krupansky

-----Original Message-----
From: Rajani Maski
Sent: Thursday, August 02, 2012 7:26 AM
To: [hidden email]
Subject: split on white space and then EdgeNGramFilterFactory

Hi,

   I wanted to do split on white space and then apply
EdgeNGramFilterFactory.

Example : A field in a document has text content : "smart phone, i24
xpress exchange offer, 500 dollars"

smart s sm sma smar smart
phone p ph pho phon phone
i24  i i2 i24
xpress x xp xpr xpre xpres xpress

so on.....

If I search on  "xpres"  I should get this document record matched

What field type can support this?

I was trying with below one but was not able to achieve the above
requirement.

<fieldType name="text" class="solr.TextField" positionIncrementGap="100">
<analyzer>
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.EdgeNGramFilterFactory" minGramSize="1" maxGramSize="25"
/>
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
</type>

Any suggestions?

Thanks,
Rajani

Reply | Threaded
Open this post in threaded view
|

Re: split on white space and then EdgeNGramFilterFactory

Rajinimaski
Yes this works, Thank you.


Regards
Rajani

On Thu, Aug 2, 2012 at 6:04 PM, Jack Krupansky <[hidden email]>wrote:

> Only do the ngram filter at index time. So, add a query-time analyzer to
> that field type but without the ngram filter.
>
> Also, add &debugQuery to your query request to see what Lucene query is
> generated.
>
> And, use the Solr admin analyzer to validate both index-time and
> query-time analysis of your terms.
>
> -- Jack Krupansky
>
> -----Original Message----- From: Rajani Maski
> Sent: Thursday, August 02, 2012 7:26 AM
> To: [hidden email]
> Subject: split on white space and then EdgeNGramFilterFactory
>
>
> Hi,
>
>   I wanted to do split on white space and then apply
> EdgeNGramFilterFactory.
>
> Example : A field in a document has text content : "smart phone, i24
> xpress exchange offer, 500 dollars"
>
> smart s sm sma smar smart
> phone p ph pho phon phone
> i24  i i2 i24
> xpress x xp xpr xpre xpres xpress
>
> so on.....
>
> If I search on  "xpres"  I should get this document record matched
>
> What field type can support this?
>
> I was trying with below one but was not able to achieve the above
> requirement.
>
> <fieldType name="text" class="solr.TextField" positionIncrementGap="100">
> <analyzer>
> <tokenizer class="solr.**WhitespaceTokenizerFactory"/>
> <filter class="solr.**EdgeNGramFilterFactory" minGramSize="1"
> maxGramSize="25"
> />
> <filter class="solr.**LowerCaseFilterFactory"/>
> </analyzer>
> </type>
>
> Any suggestions?
>
> Thanks,
> Rajani
>