Modifying norms...

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

Modifying norms...

escher2k
I want to modify the norms to only include values between 0 and 100. Currently, I have a custom implementation of the default similarity. Is it sufficient to override the encodeNorm and decodeNorm methods from the base implementation in my custom Similarity class ? Please let me know if there are any performance implications to this.
Reply | Threaded
Open this post in threaded view
|

Re: Modifying norms...

Chris Hostetter-3

: I want to modify the norms to only include values between 0 and 100.
: Currently, I have a custom implementation of the default similarity. Is it
: sufficient to override the encodeNorm and decodeNorm methods from the base
: implementation in my custom Similarity class ? Please let me know if there
: are any performance implications to this.

those methods are static, so it's not possible to override them.  if you
are not using doc or field boosts, overriding lengthNorm is suitable for
your goal.

-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Modifying norms...

escher2k
Thanks Hoss. Suppose, I go ahead and modify Similarity.java from
static {
    for (int i = 0; i < 256; i++)
      NORM_TABLE[i] = SmallFloat.byte315ToFloat((byte)i);
  }

TO
static {
    for (int i = 0; i < 256; i++)
      NORM_TABLE[i] = (float) i  * 100.0 /256.0;
  }
 

Should this work ?

Thanks.

P.S. This is a very custom implementation. For the specific problem that I have, the lengthNorm
is set to 1 (independent of numTerms).

Chris Hostetter wrote
: I want to modify the norms to only include values between 0 and 100.
: Currently, I have a custom implementation of the default similarity. Is it
: sufficient to override the encodeNorm and decodeNorm methods from the base
: implementation in my custom Similarity class ? Please let me know if there
: are any performance implications to this.

those methods are static, so it's not possible to override them.  if you
are not using doc or field boosts, overriding lengthNorm is suitable for
your goal.

-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Reply | Threaded
Open this post in threaded view
|

Re: Modifying norms...

Chris Hostetter-3

: Thanks Hoss. Suppose, I go ahead and modify Similarity.java from
        ...
: Should this work ?

it depends on your definition of "work" ... if that code is what you want
it to do, then yes: it will do what you want it to do.

: P.S. This is a very custom implementation. For the specific problem that I
: have, the lengthNorm
: is set to 1 (independent of numTerms).

if your length norm is always 1, why do you care what the norm values are?
are you using document and field boosts? ... if "no" then none of this
shoudl matter.  if "yes" then why not just change the boost values you use
to get the behavior you want instead of modifying the encoding mechanism?




-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Modifying norms...

escher2k
Essentially what I am trying to do is boost every document by a certain factor, so that
the boost is between 1.0 and 2.0. After this, I we are trying to do a search across multiple fields
and have a computation based purely on tf. Example -
if (field1)
  tf = some function
else if (field2)
  tf = some other function
...

Now the boost is getting rounded to 1.0, 1.25, 1.5 or 2.0 due to the norm is stored, whereas I want more precision (e.g. 1.31, 1.45 etc). The boost is used for ranking documents.

Thanks.

Chris Hostetter wrote
: Thanks Hoss. Suppose, I go ahead and modify Similarity.java from
        ...
: Should this work ?

it depends on your definition of "work" ... if that code is what you want
it to do, then yes: it will do what you want it to do.

: P.S. This is a very custom implementation. For the specific problem that I
: have, the lengthNorm
: is set to 1 (independent of numTerms).

if your length norm is always 1, why do you care what the norm values are?
are you using document and field boosts? ... if "no" then none of this
shoudl matter.  if "yes" then why not just change the boost values you use
to get the behavior you want instead of modifying the encoding mechanism?




-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Reply | Threaded
Open this post in threaded view
|

Re: Modifying norms...

Chris Hostetter-3

: Essentially what I am trying to do is boost every document by a certain
: factor, so that
: the boost is between 1.0 and 2.0. After this, I we are trying to do a search
: across multiple fields
: and have a computation based purely on tf. Example -

it sounds like you are trying to place too much stock in the precise score
values you get back from a query.  if it's really important to you i
would suggest playing with the boost values you use and your tf/idf
functions so they work with the current boost/norm encoding instead of
tyring to change how the norms are encoded.  that way you won't have to
worry baout havking the static encoding funcs in Similarity.


-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]