Quantcast

omitTermFreq only?

classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

omitTermFreq only?

Jibo John
Hello,

I was wondering if there is a way we can omit only the Term Frequency in solr?

omitTermFreqAndPositions =true wouldn't work for us since we need the positions for supporting phrase queries.

Thanks,
-Jibo
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: omitTermFreq only?

Markus Jelsma-2
A dirty hack is to return 1.0f for each tf > 0. Just a couple of lines code
for a custom similarity class.

> Hello,
>
> I was wondering if there is a way we can omit only the Term Frequency in
> solr?
>
> omitTermFreqAndPositions =true wouldn't work for us since we need the
> positions for supporting phrase queries.
>
> Thanks,
> -Jibo
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: omitTermFreq only?

Jibo John
Sorry I should have made the objectives clear. The goal is to reduce the index size by avoiding TermFrequency stored in the index (in .frq segment files).

After exploring a bit more, realized that LUCENE-2048 now allows omitPositions. Similarly, I'm looking for a omitFrequency option.

Thanks,
-Jibo


On Jul 13, 2011, at 1:34 PM, Markus Jelsma wrote:

> A dirty hack is to return 1.0f for each tf > 0. Just a couple of lines code
> for a custom similarity class.
>
>> Hello,
>>
>> I was wondering if there is a way we can omit only the Term Frequency in
>> solr?
>>
>> omitTermFreqAndPositions =true wouldn't work for us since we need the
>> positions for supporting phrase queries.
>>
>> Thanks,
>> -Jibo

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: omitTermFreq only?

Chris Hostetter-3

: Sorry I should have made the objectives clear. The goal is to reduce the
: index size by avoiding TermFrequency stored in the index (in .frq
: segment files).

Hmmm... why?

you're talking about eliminating a single (compressed) int per term, and
yet you want positions which take up a lot more space (at a minimum, even
if each term only appears once in a single document, that's already as
much space as the frequencies)

on anything except a toy index, eliminating freq while keeping positions
(if it were possible) is unlikely to even noticably affect the index size.

what is the motivation for your objective?  

If your main motivation is to just "to reduce index size", then perhaps
tell us more about your configuration/use cases and maybe we can offer
alternative suggestions.

http://people.apache.org/~hossman/#xyproblem
XY Problem

Your question appears to be an "XY Problem" ... that is: you are dealing
with "X", you are assuming "Y" will help you, and you are asking about "Y"
without giving more details about the "X" so that we can understand the
full issue.  Perhaps the best solution doesn't involve "Y" at all?
See Also: http://www.perlmonks.org/index.pl?node_id=542341


-Hoss
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: omitTermFreq only?

entdeveloper
In reply to this post by Markus Jelsma-2
I know I'm kind of reopening a closed thread, but I now have the same requirement to omitTermFreq only, but still have the ability to run phrase queries on a field.

Thing is, having a custom Similarity and setting tf=1.0f will turn off term frequencies globally, which is not what I need; I'd like to do it per field.

For sake of simplicity, I'm using dismax parser with qf=name^10 description^5 body^1, and pf=name description body. I'd like to turn off tf for the name field, but leave it for description and body, while allowing all of them to have positions so that phrase queries work.

Unfortunately, setting the name field's omitTermFreqAndPositions="true" also turns off the ability for phrases to work on name. Are there any tricks to doing this? I've thought of a custom Similarity and having a copyField for name (name_phrase) that leaves termFreqAndPositions, and only using that field in the pf instead of name, but that won't really work either. I also tried omitTermFreqAndPositions="true" and omitPositions="false", but that's an invalid setting.


Markus Jelsma-2 wrote
A dirty hack is to return 1.0f for each tf > 0. Just a couple of lines code
for a custom similarity class.

> Hello,
>
> I was wondering if there is a way we can omit only the Term Frequency in
> solr?
>
> omitTermFreqAndPositions =true wouldn't work for us since we need the
> positions for supporting phrase queries.
>
> Thanks,
> -Jibo
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: omitTermFreq only?

iorixxx
> Thing is, having a custom Similarity and setting tf=1.0f
> will turn off term
> frequencies globally, which is not what I need; I'd like to
> do it per field.

I think, it is possible to use different similarities for different fields. https://issues.apache.org/jira/browse/SOLR-2338
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: omitTermFreq only?

entdeveloper
iorixxx wrote
> Thing is, having a custom Similarity and setting tf=1.0f
> will turn off term
> frequencies globally, which is not what I need; I'd like to
> do it per field.

I think, it is possible to use different similarities for different fields. https://issues.apache.org/jira/browse/SOLR-2338
Ahh...guess I'll have to wait for Solr 4
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: omitTermFreq only?

anasalkouz
Unfortunately, in Solr 4 all this omitTermFreqAndPositions not working probably
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: omitTermFreq only?

Jack Krupansky-2
Please clarify. Be specific, with details.

-- Jack Krupansky

-----Original Message-----
From: anasalkouz
Sent: Wednesday, November 07, 2012 5:19 AM
To: [hidden email]
Subject: Re: omitTermFreq only?

Unfortunately, in Solr 4 all this omitTermFreqAndPositions not working
probably



--
View this message in context:
http://lucene.472066.n3.nabble.com/omitTermFreq-only-tp3167128p4018733.html
Sent from the Solr - User mailing list archive at Nabble.com.

Loading...