Solr 6.5 autosuggest suggests misspelt words and unwanted words

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Solr 6.5 autosuggest suggests misspelt words and unwanted words

Sri Sirisha Vallabhaneni
Hi ,

My Data contains un-curated data - which consists of *cuss words, misspelt
words* like *neeeed* instead of *need. *We are using a
auto-suggest/auto-complete that heavily relies on indexed data to recommend
suggestions as the user types in his query. We are using a list of stop
words consisting of cuss words to keep check on what is recommended to the
user and this list might get huge with time as well. Is there any clean way
to get around the problem

1. of eliminating cuss words entirely in suggestions
2. not suggesting misspelt words at all.

Thanks and Regards,
Sri
Reply | Threaded
Open this post in threaded view
|

Re: Solr 6.5 autosuggest suggests misspelt words and unwanted words

Alessandro Benedetti
Hi,
you should curate your data, that is fundamental to have an healthy search
solution, but let's see what you can do anyway :

1) curate a dictionary of such bad words and then configure analysis to skip
them
2) Have you tried different dictionary implementations ? I would assume that
each single mispelled word has a low document frequency. You could use the
High Frequency Document Dictionary[1] and see how it goes.


[1]
https://lucene.apache.org/solr/guide/7_3/suggester.html#highfrequencydictionaryfactory



-----
---------------
Alessandro Benedetti
Search Consultant, R&D Software Engineer, Director
Sease Ltd. - www.sease.io
--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
---------------
Alessandro Benedetti
Search Consultant, R&D Software Engineer, Director
Sease Ltd. - www.sease.io