Type of auto suggest feature

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Type of auto suggest feature

Rudenko, Artur
Hi,
I am quite new to solr and I am interested in implementing a sort of auto terms suggest (not auto complete) feature based on the user query.
Users builds some query (on multiple fields) and I am trying to help him refining his query by suggesting to add more terms based on his current query.
The suggestions should contain synonyms and different word forms (query:close , result: closed, closing) and also some other "interesting" (hard to define what interesting is) terms and phrases based on that search.

The queries are perform on text field with about 1000 words on document sets of about 20-50M

So far I came up with solution that uses Suggester component over the 1000 words text field (copy field) as shown below and im trying to find how to add to it more "interesting" terms and phrases based on the text field


<field name="text_total_shingle_synonyms" type="text_total_shingle_synonyms" indexed="true" stored="true" termVectors="true" termOffsets="true" termPositions="true" required="false" multiValued="true" />

<copyField source="text_en_total" dest="text_total_shingle_synonyms"/>

<fieldType name="text_total_shingle_synonyms" class="solr.TextField" positionIncrementGap="100">
  <analyzer type="index">
                <tokenizer class="solr.StandardTokenizerFactory"/>
                <!-- Case insensitive stop word removal.-->
                <filter class="solr.LowerCaseFilterFactory"/>
                <filter class="solr.EnglishPossessiveFilterFactory"/>
                <filter class="solr.KeywordMarkerFilterFactory" protected="protwords.txt"/>
                <filter class="solr.ShingleFilterFactory" maxShingleSize="4" />
  </analyzer>
  <analyzer type="query">
                <tokenizer class="solr.StandardTokenizerFactory"/>
                <filter class="solr.SynonymFilterFactory" synonyms="synonyms_suggest.txt" ignoreCase="true" expand="false"/> <!-- in example it is set to false, we have it as true -->
                <filter class="solr.LowerCaseFilterFactory"/>
                <filter class="solr.EnglishPossessiveFilterFactory"/>
                <filter class="solr.KeywordMarkerFilterFactory" protected="protwords.txt"/>
                <!--   <filter class="solr.PorterStemFilterFactory"/>     -->

</analyzer>
</fieldType>


Thanks,
Artur Rudenko



This electronic message may contain proprietary and confidential information of Verint Systems Inc., its affiliates and/or subsidiaries. The information is intended to be for the use of the individual(s) or entity(ies) named above. If you are not the intended recipient (or authorized to receive this e-mail for the intended recipient), you may not use, copy, disclose or distribute to anyone this message or any information contained in this message. If you have received this electronic message in error, please notify us by replying to this e-mail.
Reply | Threaded
Open this post in threaded view
|

Re: Type of auto suggest feature

Paras Lehana
Hey Artur,

If I have understood correctly, you want to suggest terms related to the
query. It would be helpful if you describe the use case as well. Anyways,
please go through this once:

   1. Keep different form of words as different documents so that they
   could be suggested ("closed", "close" and "closing" should be different
   docs). Use stemming (Snowball Porter Stemmer Filter
   <https://lucene.apache.org/solr/guide/8_3/filter-descriptions.html#snowball-porter-stemmer-filter>)
   so that docs with different forms could be matched.
   2. The "interesting" terms are probably related terms in your case that
   can be addressed with Synonym factory. Again, the related terms should be
   in different documents. Add all the related words in the Synonym file
   separated with commas.
   3. Will your query only have single terms? If no and if there are
   multiple terms, how do you want to handle that? This may require few more
   analyzers and tweaking in query.
   4. If you still want to suggest terms for partial words (to suggest
   "closing" if query is "clo"), use Edge NGrams
   <https://lucene.apache.org/solr/guide/8_3/tokenizers.html#edge-n-gram-tokenizer>.
   Use Standard Tokenizer
   <https://lucene.apache.org/solr/guide/8_3/tokenizers.html#Tokenizers-StandardTokenizer>
   to split words. What do you want to achieve with Shingle factory?
   5. I think all of the above can be simply handled without Suggester
   component. Anyways, keep exploring different ways.

Please do tell if you have any queries.

On Sun, 24 Nov 2019 at 19:11, Rudenko, Artur <[hidden email]>
wrote:

> Hi,
> I am quite new to solr and I am interested in implementing a sort of auto
> terms suggest (not auto complete) feature based on the user query.
> Users builds some query (on multiple fields) and I am trying to help him
> refining his query by suggesting to add more terms based on his current
> query.
> The suggestions should contain synonyms and different word forms
> (query:close , result: closed, closing) and also some other "interesting"
> (hard to define what interesting is) terms and phrases based on that search.
>
> The queries are perform on text field with about 1000 words on document
> sets of about 20-50M
>
> So far I came up with solution that uses Suggester component over the 1000
> words text field (copy field) as shown below and im trying to find how to
> add to it more "interesting" terms and phrases based on the text field
>
>
> <field name="text_total_shingle_synonyms"
> type="text_total_shingle_synonyms" indexed="true" stored="true"
> termVectors="true" termOffsets="true" termPositions="true" required="false"
> multiValued="true" />
>
> <copyField source="text_en_total" dest="text_total_shingle_synonyms"/>
>
> <fieldType name="text_total_shingle_synonyms" class="solr.TextField"
> positionIncrementGap="100">
>   <analyzer type="index">
>                 <tokenizer class="solr.StandardTokenizerFactory"/>
>                 <!-- Case insensitive stop word removal.-->
>                 <filter class="solr.LowerCaseFilterFactory"/>
>                 <filter class="solr.EnglishPossessiveFilterFactory"/>
>                 <filter class="solr.KeywordMarkerFilterFactory"
> protected="protwords.txt"/>
>                 <filter class="solr.ShingleFilterFactory"
> maxShingleSize="4" />
>   </analyzer>
>   <analyzer type="query">
>                 <tokenizer class="solr.StandardTokenizerFactory"/>
>                 <filter class="solr.SynonymFilterFactory"
> synonyms="synonyms_suggest.txt" ignoreCase="true" expand="false"/> <!-- in
> example it is set to false, we have it as true -->
>                 <filter class="solr.LowerCaseFilterFactory"/>
>                 <filter class="solr.EnglishPossessiveFilterFactory"/>
>                 <filter class="solr.KeywordMarkerFilterFactory"
> protected="protwords.txt"/>
>                 <!--   <filter class="solr.PorterStemFilterFactory"/>
>  -->
>
> </analyzer>
> </fieldType>
>
>
> Thanks,
> Artur Rudenko
>
>
>
> This electronic message may contain proprietary and confidential
> information of Verint Systems Inc., its affiliates and/or subsidiaries. The
> information is intended to be for the use of the individual(s) or
> entity(ies) named above. If you are not the intended recipient (or
> authorized to receive this e-mail for the intended recipient), you may not
> use, copy, disclose or distribute to anyone this message or any information
> contained in this message. If you have received this electronic message in
> error, please notify us by replying to this e-mail.
>


--
--
Regards,

*Paras Lehana* [65871]
Development Engineer, Auto-Suggest,
IndiaMART Intermesh Ltd.

8th Floor, Tower A, Advant-Navis Business Park, Sector 142,
Noida, UP, IN - 201303

Mob.: +91-9560911996
Work: 01203916600 | Extn:  *8173*

--
IMPORTANT: 
NEVER share your IndiaMART OTP/ Password with anyone.