Using different field when overriding computeNorm

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Using different field when overriding computeNorm

Tsvika Rabkin
Hi,

I would like to override default similarity's computeNorm to work with
a different field, other than the query field.

Here is the DefaultSimilarity implementation:

@Override
  public float computeNorm(String field, FieldInvertState state) {
    final int numTerms;
    if (discountOverlaps)
      numTerms = state.getLength() - state.getNumOverlap();
    else
      numTerms = state.getLength();
    return state.getBoost() * ((float) (1.0 / Math.sqrt(numTerms)));
  }

any ideas how to do that?

Thanks,

Tsvika

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

RE: Using different field when overriding computeNorm

Ryan Aylward
I have had to do similar things to other methods of Similarity. In my example, I wanted to have different behavior for the tf() method for each field. The tf method does not include a field parameter as an input to it. The only solution I could come up with was to add a thread local to set the field and then check the thread local within the tf function. Here's the tf function...

        public float tf(float freq) {

                // Get the value of the thread local...
                String field = FieldThreadLocal.getField();

                if ("fieldA".equals(field)) {
                        // always return 1 for field A
                        return 1;
                } else {
                        // otherwise, use the normal tf function
                        return super.tf(freq);
                }
        }

tf() is used during scoring so I had to override the TermQuery (and TermWeight and TermScorer) to be able to set and clear the thread local at the appropriate times. This is a pretty ugly hack, but I couldn't find another way to make this work.

computeNorm() is calculated at index creation time but you try to do something similar.

Would be curious if other people had a better suggestion as to how to do this.

-----Original Message-----
From: Tsvika Rabkin [mailto:[hidden email]]
Sent: Tuesday, February 01, 2011 5:27 AM
To: [hidden email]
Subject: Using different field when overriding computeNorm

Hi,

I would like to override default similarity's computeNorm to work with
a different field, other than the query field.

Here is the DefaultSimilarity implementation:

@Override
  public float computeNorm(String field, FieldInvertState state) {
    final int numTerms;
    if (discountOverlaps)
      numTerms = state.getLength() - state.getNumOverlap();
    else
      numTerms = state.getLength();
    return state.getBoost() * ((float) (1.0 / Math.sqrt(numTerms)));
  }

any ideas how to do that?

Thanks,

Tsvika

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Using different field when overriding computeNorm

Robert Muir
On Tue, Feb 1, 2011 at 1:51 PM, Ryan Aylward <[hidden email]> wrote:
> I have had to do similar things to other methods of Similarity. In my example, I wanted to have different behavior for the tf() method for each field. The tf method does not include a field parameter as an input to it. The only solution I could come up

in Lucene's trunk, Similarity can now be controlled on a per-field
basis, see https://issues.apache.org/jira/browse/LUCENE-2236

The only exceptions are things like coord() which apply to e.g.
BooleanQuery (which might span multiple fields) and remain top-level
in the new SimilarityProvider.

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

RE: Using different field when overriding computeNorm

Ryan Aylward
This is great. Is there a target of when 4.0 will be released?

-----Original Message-----
From: Robert Muir [mailto:[hidden email]]
Sent: Tuesday, February 01, 2011 11:10 AM
To: [hidden email]
Subject: Re: Using different field when overriding computeNorm

On Tue, Feb 1, 2011 at 1:51 PM, Ryan Aylward <[hidden email]> wrote:
> I have had to do similar things to other methods of Similarity. In my example, I wanted to have different behavior for the tf() method for each field. The tf method does not include a field parameter as an input to it. The only solution I could come up

in Lucene's trunk, Similarity can now be controlled on a per-field
basis, see https://issues.apache.org/jira/browse/LUCENE-2236

The only exceptions are things like coord() which apply to e.g.
BooleanQuery (which might span multiple fields) and remain top-level
in the new SimilarityProvider.

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Using different field when overriding computeNorm

Robert Muir
On Thu, Feb 3, 2011 at 3:27 PM, Ryan Aylward <[hidden email]> wrote:
> This is great. Is there a target of when 4.0 will be released?
>

Unfortunately I think its quite a ways away: there are branches for
major features such as per-document payloads, realtime search, modern
index compression algorithms, and a variety of other exciting things
in the works. As far as releases go, currently we are working towards
release 3.1, which is the next stable minor release upgrade from 3.0.

It might be technically possible to backport this feature (per-field
similarity) to the 3.x codebase while still keeping backwards
compatibility, but I'm worried about breaking backwards compatibility
in subtle ways due to some gremlins in the code... we fixed most of
these gremlins in trunk but they are still available and deprecated in
3.1 (example: https://issues.apache.org/jira/browse/LUCENE-2828).

So, at the moment having this feature be something that has to wait
until 4.0 is the safest option in my opinion... but I feel your pain
here when trying to customize the scoring system...

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]