Restrict search on term/phrase count in document.

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Restrict search on term/phrase count in document.

Modassar Ather-2
Hi,

Is there a way to restrict search with a term/phrase occurring n number of
times in it?
For example, find the documents which has a term/phrase 5 or more times in
them.

The "Terms component" seems to provide a way but not sure how it will work
for complex queries.
Please note that the Solr version I am using is 6.5.1. Kindly provide your
inputs.

Best,
Modassar
Reply | Threaded
Open this post in threaded view
|

Re: Restrict search on term/phrase count in document.

Alexandre Rafalovitch
That is kind of unusual. What is the business issue you are trying to
solve? Perhaps there is a different way to look at this problem.

Regards,
     Alex

On Mon, Nov 5, 2018, 5:20 AM Modassar Ather <[hidden email] wrote:

> Hi,
>
> Is there a way to restrict search with a term/phrase occurring n number of
> times in it?
> For example, find the documents which has a term/phrase 5 or more times in
> them.
>
> The "Terms component" seems to provide a way but not sure how it will work
> for complex queries.
> Please note that the Solr version I am using is 6.5.1. Kindly provide your
> inputs.
>
> Best,
> Modassar
>
Reply | Threaded
Open this post in threaded view
|

Re: Restrict search on term/phrase count in document.

Alessandro Benedetti
In reply to this post by Modassar Ather-2
I agree with Alexandre, it seems suspicious.
Anyway, if you want to query for single term frequencies occurrence you
could make use of the function range query parser :

https://lucene.apache.org/solr/guide/6_6/other-parsers.html#OtherParsers-FunctionRangeQueryParser

And the function:

termfreq
Returns the number of times the term appears in the field for that document.
termfreq(text,'memory')

tf
Term frequency; returns the term frequency factor for the given term, using
the Similarity for the field. The tf-idf value increases proportionally to
the number of times a word appears in the document, but is offset by the
frequency of the word in the document, which helps to control for the fact
that some words are generally more common than others. See also idf.
tf(text,'solr')

Cheers



-----
---------------
Alessandro Benedetti
Search Consultant, R&D Software Engineer, Director
Sease Ltd. - www.sease.io
--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
---------------
Alessandro Benedetti
Search Consultant, R&D Software Engineer, Director
Sease Ltd. - www.sease.io
Reply | Threaded
Open this post in threaded view
|

Re: Restrict search on term/phrase count in document.

Modassar Ather-2
Thanks for your replies.

The requirement is basically to avoid documents which may have a match but
with very less number of term or phrase in it. May a be 1/2 matches.
The user is interested in those document which has matched term/phrase
beyond a certain number.
This can be a valid feature/requirement.

Best,
Modassar

On Mon, Nov 19, 2018 at 10:55 PM Alessandro Benedetti <[hidden email]>
wrote:

> I agree with Alexandre, it seems suspicious.
> Anyway, if you want to query for single term frequencies occurrence you
> could make use of the function range query parser :
>
>
> https://lucene.apache.org/solr/guide/6_6/other-parsers.html#OtherParsers-FunctionRangeQueryParser
>
> And the function:
>
> termfreq
> Returns the number of times the term appears in the field for that
> document.
> termfreq(text,'memory')
>
> tf
> Term frequency; returns the term frequency factor for the given term, using
> the Similarity for the field. The tf-idf value increases proportionally to
> the number of times a word appears in the document, but is offset by the
> frequency of the word in the document, which helps to control for the fact
> that some words are generally more common than others. See also idf.
> tf(text,'solr')
>
> Cheers
>
>
>
> -----
> ---------------
> Alessandro Benedetti
> Search Consultant, R&D Software Engineer, Director
> Sease Ltd. - www.sease.io
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>