problems calculating norms

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

problems calculating norms

Karl Wettin-3
The my norm values does not equal the one of the norms set for the same
document and field in a Directory. I don't know why. They only differ
very little, but enough to change the order of very similar hits.

I only add one value to the field, so there is no mean division and
things going on in my code. Could that be it?

This is what I do:

if(eField.getKey().isIndexed && !eField.getKey().omitNorms) {
  float boost = eDocument.getKey().getBoost();
  boost *= eField.getKey().boost;
  float norm = eField.getKey().boost *
similarity.lengthNorm(eField.getKey().fieldName,
eField.getValue().size());
  byte encoded = Similarity.encodeNorm(norm);

boostByFieldNameAndDocumentNumber.get(eField.getKey().fieldName)[eDocument.getKey().getDocumentNumber()] = encoded;
}


Any ideas?


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: problems calculating norms

Chris Hostetter-3

I'm really confused by your example ... I'm assuming eField is a
Map.Entry, and eField.getKey() is returning a FieldInfo (allthough i'm not
sure why there's no explicit cast in your code) ... but what is the return
type of "eField.getValue()" ?

Without understanding what that object is, i can only speculate at what
.size() is returning ... but i can speculate two possible reasns why
you wouldh ave a very small discrepency...

1) perhaps you have an off by one error, and eField.getValue().size() is
returning one less then the actual number of terms?

2) if the problem doesn't happen with all docs, perhaps you are forgetting
to take into account the maxFieldLength DocumentWriter uses?




: Date: Thu, 11 May 2006 17:43:30 +0200
: From: karl wettin <[hidden email]>
: Reply-To: [hidden email]
: To: [hidden email]
: Subject: problems calculating norms
:
: The my norm values does not equal the one of the norms set for the same
: document and field in a Directory. I don't know why. They only differ
: very little, but enough to change the order of very similar hits.
:
: I only add one value to the field, so there is no mean division and
: things going on in my code. Could that be it?
:
: This is what I do:
:
: if(eField.getKey().isIndexed && !eField.getKey().omitNorms) {
:   float boost = eDocument.getKey().getBoost();
:   boost *= eField.getKey().boost;
:   float norm = eField.getKey().boost *
: similarity.lengthNorm(eField.getKey().fieldName,
: eField.getValue().size());
:   byte encoded = Similarity.encodeNorm(norm);
:
: boostByFieldNameAndDocumentNumber.get(eField.getKey().fieldName)[eDocument.getKey().getDocumentNumber()] = encoded;
: }
:
:
: Any ideas?
:
:
: ---------------------------------------------------------------------
: To unsubscribe, e-mail: [hidden email]
: For additional commands, e-mail: [hidden email]
:



-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: problems calculating norms

Karl Wettin-3
On Thu, 2006-05-11 at 11:15 -0700, Chris Hostetter wrote:
> I'm really confused by your example ...

Sorry. I just pasted it in to show that I do the same kind of
calculation as the DocumentWriter.

> I'm assuming eField is a Map.Entry, and eField.getKey() is returning a
> FieldInfo (allthough i'm not sure why there's no explicit cast in your
> code) ...

Java 1.5

> but what is the return type of "eField.getValue()" ?

Map<FieldSetting, LinkedList<Token>> where FieldSetting is my way of
normalizing the settings of Field.name per Document.

> Without understanding what that object is, i can only speculate at what
> .size() is returning ... but i can speculate two possible reasns why
> you would have a very small discrepency...
>
> 1) perhaps you have an off by one error, and eField.getValue().size() is
> returning one less then the actual number of terms?
>
> 2) if the problem doesn't happen with all docs, perhaps you are forgetting
> to take into account the maxFieldLength DocumentWriter uses?

I'm afraid it's none of above.

>
>
>
>
> : Date: Thu, 11 May 2006 17:43:30 +0200
> : From: karl wettin <[hidden email]>
> : Reply-To: [hidden email]
> : To: [hidden email]
> : Subject: problems calculating norms
> :
> : The my norm values does not equal the one of the norms set for the same
> : document and field in a Directory. I don't know why. They only differ
> : very little, but enough to change the order of very similar hits.
> :
> : I only add one value to the field, so there is no mean division and
> : things going on in my code. Could that be it?
> :
> : This is what I do:
> :
> : if(eField.getKey().isIndexed && !eField.getKey().omitNorms) {
> :   float boost = eDocument.getKey().getBoost();
> :   boost *= eField.getKey().boost;
> :   float norm = eField.getKey().boost *
> : similarity.lengthNorm(eField.getKey().fieldName,
> : eField.getValue().size());
> :   byte encoded = Similarity.encodeNorm(norm);
> :
> : boostByFieldNameAndDocumentNumber.get(eField.getKey().fieldName)[eDocument.getKey().getDocumentNumber()] = encoded;
> : }
> :
> :
> : Any ideas?
> :
> :
> : ---------------------------------------------------------------------
> : To unsubscribe, e-mail: [hidden email]
> : For additional commands, e-mail: [hidden email]
> :
>
>
>
> -Hoss
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]