field length normalization

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

field length normalization

Muneeb Ali
Hi,

In my schema, the document title field has "omitNorms=false", which, if I am not wrong, causes length of titles to be counted in the scoring.

But when I query with: "word1 word2 word3" I dont know why still the top two documents title have these words and other words, where as the document which has exact and only these query words is coming on third place.

Setting omitNorms to false, should bring the titles with exact words on top shouldn't it?

Also I realized when debugged query, that all three top documents have same score, shouldn't this be different as they have different title lengths?

Thanks very much.
-A
Reply | Threaded
Open this post in threaded view
|

Re: field length normalization

Siddhant Goel
Did you reindex after setting omitNorms to false? I'm not sure whether or
not it is needed, but it makes sense.

On Thu, Mar 11, 2010 at 5:34 PM, muneeb <[hidden email]> wrote:

>
> Hi,
>
> In my schema, the document title field has "omitNorms=false", which, if I
> am
> not wrong, causes length of titles to be counted in the scoring.
>
> But when I query with: "word1 word2 word3" I dont know why still the top
> two
> documents title have these words and other words, where as the document
> which has exact and only these query words is coming on third place.
>
> Setting omitNorms to false, should bring the titles with exact words on top
> shouldn't it?
>
> Also I realized when debugged query, that all three top documents have same
> score, shouldn't this be different as they have different title lengths?
>
> Thanks very much.
> -A
> --
> View this message in context:
> http://old.nabble.com/field-length-normalization-tp27862618p27862618.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>


--
- Siddhant
Reply | Threaded
Open this post in threaded view
|

Re: field length normalization

Muneeb Ali

: <quote author="Siddhant Goel">
: Did you reindex after setting omitNorms to false? I'm not sure whether or
: not it is needed, but it makes sense.

Yes i deleted the old index and reindexed it.
Just to add another fact, that the titlles length is less than 10. I am not sure if solr has pre-set values for length normalizations, because for titles with 3 as well as 4 terms the fieldNorm is coming up as 0.5 (in the debugQuery section).

Reply | Threaded
Open this post in threaded view
|

Re: field length normalization

Jay Hill
The fieldNorm is computed like this: fieldNorm = lengthNorm * documentBoost
* documentFieldBoosts

and the lengthNorm is: lengthNorm  =  1/(numTermsInField)**.5
[note that the value is encoded as a single byte, so there is some precision
loss]

So the values are not pre-set for the lengthNorm, but for some counts the
fieldLength value winds up being the same because of the precision los. Here
is a list of lengthNorm values for 1 to 10 term fields:

# of terms    lengthNorm
   1          1.0
   2         .625
   3         .5
   4         .5
   5         .4375
   6         .375
   7         .375
   8         .3125
   9         .3125
  10         .3125

That's why, in your example, the lengthNorm for 3 and 4 is the same.

-Jay
http://www.lucidimagination.com





On Thu, Mar 11, 2010 at 9:50 AM, muneeb <[hidden email]> wrote:

>
>
> :
> : Did you reindex after setting omitNorms to false? I'm not sure whether or
> : not it is needed, but it makes sense.
>
> Yes i deleted the old index and reindexed it.
> Just to add another fact, that the titlles length is less than 10. I am not
> sure if solr has pre-set values for length normalizations, because for
> titles with 3 as well as 4 terms the fieldNorm is coming up as 0.5 (in the
> debugQuery section).
>
>
> --
> View this message in context:
> http://old.nabble.com/field-length-normalization-tp27862618p27867025.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>
Reply | Threaded
Open this post in threaded view
|

Re: field length normalization

Muneeb Ali
 Ah I see.
Thanks very much Jay for your explanation, it really helped a lot.

I guess I have to deal with this in some other way, since I am working with short titles and I really want short titles to appear at top. Can you suggest anything to bring titles with length 3 to appear before titles with length 4 (given they have similar scores)?

Thanks,

Jay Hill wrote
The fieldNorm is computed like this: fieldNorm = lengthNorm * documentBoost
* documentFieldBoosts

and the lengthNorm is: lengthNorm  =  1/(numTermsInField)**.5
[note that the value is encoded as a single byte, so there is some precision
loss]

So the values are not pre-set for the lengthNorm, but for some counts the
fieldLength value winds up being the same because of the precision los. Here
is a list of lengthNorm values for 1 to 10 term fields:

# of terms    lengthNorm
   1          1.0
   2         .625
   3         .5
   4         .5
   5         .4375
   6         .375
   7         .375
   8         .3125
   9         .3125
  10         .3125

That's why, in your example, the lengthNorm for 3 and 4 is the same.

-Jay
http://www.lucidimagination.com





On Thu, Mar 11, 2010 at 9:50 AM, muneeb <muneebali1@hotmail.com> wrote:

>
>
> :
> : Did you reindex after setting omitNorms to false? I'm not sure whether or
> : not it is needed, but it makes sense.
>
> Yes i deleted the old index and reindexed it.
> Just to add another fact, that the titlles length is less than 10. I am not
> sure if solr has pre-set values for length normalizations, because for
> titles with 3 as well as 4 terms the fieldNorm is coming up as 0.5 (in the
> debugQuery section).
>
>
> --
> View this message in context:
> http://old.nabble.com/field-length-normalization-tp27862618p27867025.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>
Reply | Threaded
Open this post in threaded view
|

Re: field length normalization

Lance Norskog-2
You need to change your similarity object to be more sensitive at the
short end. This is a patch about how to do this:

http://issues.apache.org/jira/browse/LUCENE-2187

It involves Lucene coding.

On Fri, Mar 12, 2010 at 3:19 AM, muneeb <[hidden email]> wrote:

>
>  Ah I see.
> Thanks very much Jay for your explanation, it really helped a lot.
>
> I guess I have to deal with this in some other way, since I am working with
> short titles and I really want short titles to appear at top. Can you
> suggest anything to bring titles with length 3 to appear before titles with
> length 4 (given they have similar scores)?
>
> Thanks,
>
>
> Jay Hill wrote:
>>
>> The fieldNorm is computed like this: fieldNorm = lengthNorm *
>> documentBoost
>> * documentFieldBoosts
>>
>> and the lengthNorm is: lengthNorm  =  1/(numTermsInField)**.5
>> [note that the value is encoded as a single byte, so there is some
>> precision
>> loss]
>>
>> So the values are not pre-set for the lengthNorm, but for some counts the
>> fieldLength value winds up being the same because of the precision los.
>> Here
>> is a list of lengthNorm values for 1 to 10 term fields:
>>
>> # of terms    lengthNorm
>>    1          1.0
>>    2         .625
>>    3         .5
>>    4         .5
>>    5         .4375
>>    6         .375
>>    7         .375
>>    8         .3125
>>    9         .3125
>>   10         .3125
>>
>> That's why, in your example, the lengthNorm for 3 and 4 is the same.
>>
>> -Jay
>> http://www.lucidimagination.com
>>
>>
>>
>>
>>
>> On Thu, Mar 11, 2010 at 9:50 AM, muneeb <[hidden email]> wrote:
>>
>>>
>>>
>>> :
>>> : Did you reindex after setting omitNorms to false? I'm not sure whether
>>> or
>>> : not it is needed, but it makes sense.
>>>
>>> Yes i deleted the old index and reindexed it.
>>> Just to add another fact, that the titlles length is less than 10. I am
>>> not
>>> sure if solr has pre-set values for length normalizations, because for
>>> titles with 3 as well as 4 terms the fieldNorm is coming up as 0.5 (in
>>> the
>>> debugQuery section).
>>>
>>>
>>> --
>>> View this message in context:
>>> http://old.nabble.com/field-length-normalization-tp27862618p27867025.html
>>> Sent from the Solr - User mailing list archive at Nabble.com.
>>>
>>>
>>
>>
>
> --
> View this message in context: http://old.nabble.com/field-length-normalization-tp27862618p27874123.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>



--
Lance Norskog
[hidden email]