Change norm encoding

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

Change norm encoding

Benjamin Heilbrunn
Hi,

i've got a problem concerning encoding of norms.
I want to use int values (0-255) instead of float interpreted bytes.

In my own Similarity-Class, which I use for indexing and searching, I
implemented the static methods encodeNorms, decodeNorms and
getNormDecoder.
But because they are static and the encoding of norms happens in
NormsWriterPerField.finish() with the following lines of code:

      final float norm =
docState.similarity.computeNorm(fieldInfo.name, fieldState);
      norms[upto] = Similarity.encodeNorm(norm);
      docIDs[upto] = docState.docID

my implementation is only used for computation of norm values but not
for the encoding.
Is there a reason why norm encoding and decoding is hardwired to the
implementation in Similarity?

And is there any elegant way to bypass this behaviour instead of
implementing an mapper, which maps every int between 0 and 255 to an
float value out of Similarity.NORM_TABLE, befor encoding.


Benjamin

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Change norm encoding

Michael McCandless-2
On Mon, Nov 9, 2009 at 11:04 AM, Benjamin Heilbrunn <[hidden email]> wrote:

> i've got a problem concerning encoding of norms.
> I want to use int values (0-255) instead of float interpreted bytes.
>
> In my own Similarity-Class, which I use for indexing and searching, I
> implemented the static methods encodeNorms, decodeNorms and
> getNormDecoder.
> But because they are static and the encoding of norms happens in
> NormsWriterPerField.finish() with the following lines of code:
>
>      final float norm =
> docState.similarity.computeNorm(fieldInfo.name, fieldState);
>      norms[upto] = Similarity.encodeNorm(norm);
>      docIDs[upto] = docState.docID
>
> my implementation is only used for computation of norm values but not
> for the encoding.
> Is there a reason why norm encoding and decoding is hardwired to the
> implementation in Similarity?

I don't think there's a particular reason... this is just how it has
always been.  I think making it more extensible would be good!a

> And is there any elegant way to bypass this behaviour instead of
> implementing an mapper, which maps every int between 0 and 255 to an
> float value out of Similarity.NORM_TABLE, befor encoding.

I think a patch is needed, to allow the Similarity instance (not the
static class) to provide the mapping, and decode table?  Various
queries call the decode, so you'd need to fix those too... wanna cough
up a patch?

Mike

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Change norm encoding

Benjamin Heilbrunn
Hi Mike,

thanks for your reply.
After making my post i found this (without taking a deeper look):

http://issues.apache.org/jira/browse/LUCENE-1260

Looks like a solution for that problem.
Why wasn't it applied to lucene?

Benjamin

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Change norm encoding

Michael McCandless-2
On Mon, Nov 9, 2009 at 12:19 PM, Benjamin Heilbrunn <[hidden email]> wrote:

> After making my post i found this (without taking a deeper look):
>
> http://issues.apache.org/jira/browse/LUCENE-1260
>
> Looks like a solution for that problem.

Indeed the most recent patch there looks almost exactly like what
you're proposing?  I guess the earlier versions of the patch was a
bigger change (but I haven't looked that closely).

> Why wasn't it applied to lucene?

I guess it sort of fizzled out from lack of attention?  Sometimes that
happens!  And then something, like your interest here, come along and
revive it :)

Mike

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Change norm encoding

Benjamin Heilbrunn
Hi,

I applied http://issues.apache.org/jira/secure/attachment/12411342/Lucene-1260.patch
That's exactly what I was looking for.

The problem is, that from know on I'm on a patched version and I'm not
very happy with breaking compatibility to the "original" jars...
So is there a chance that this patch becomes a part of lucenes upcoming changes?


Benjamin

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Change norm encoding

Michael McCandless-2
Well, assuming there are no objections to the current approach, and
performance checks out, I'll try to get this into 3.1...

Mike

On Tue, Nov 10, 2009 at 4:33 AM, Benjamin Heilbrunn <[hidden email]> wrote:

> Hi,
>
> I applied http://issues.apache.org/jira/secure/attachment/12411342/Lucene-1260.patch
> That's exactly what I was looking for.
>
> The problem is, that from know on I'm on a patched version and I'm not
> very happy with breaking compatibility to the "original" jars...
> So is there a chance that this patch becomes a part of lucenes upcoming changes?
>
>
> Benjamin
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]