[jira] Created: (LUCENE-1260) Norm codec strategy in Similarity

classic Classic list List threaded Threaded
37 messages Options
12
Reply | Threaded
Open this post in threaded view
|

[jira] Created: (LUCENE-1260) Norm codec strategy in Similarity

Tim Allison (Jira)
Norm codec strategy in Similarity
---------------------------------

                 Key: LUCENE-1260
                 URL: https://issues.apache.org/jira/browse/LUCENE-1260
             Project: Lucene - Java
          Issue Type: Improvement
          Components: Search
    Affects Versions: 2.3.1
            Reporter: Karl Wettin


The static span and resolution of the 8 bit norms codec might not fit with all applications.

My use case requires that 100f-250f is discretized in 60 bags instead of the default.. 10?


--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[jira] Updated: (LUCENE-1260) Norm codec strategy in Similarity

Tim Allison (Jira)

     [ https://issues.apache.org/jira/browse/LUCENE-1260?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Karl Wettin updated LUCENE-1260:
--------------------------------

    Attachment: LUCENE-1260.txt


 * Simlarity#getNormCodec()
 * Simlarity#setNormCodec(NormCodec)
 * Similarity$NormCodec
 * Similarity$DefaultNormCodec
 * Similarity$SimpleNormCodec (binsearches over a sorted float[])

I also depricated Similarity#getNormsTable() and replaced the only use I could find of it - in TermScorer. Could not spont any problems with performance or anything with that.

> Norm codec strategy in Similarity
> ---------------------------------
>
>                 Key: LUCENE-1260
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1260
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>    Affects Versions: 2.3.1
>            Reporter: Karl Wettin
>         Attachments: LUCENE-1260.txt
>
>
> The static span and resolution of the 8 bit norms codec might not fit with all applications.
> My use case requires that 100f-250f is discretized in 60 bags instead of the default.. 10?

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (LUCENE-1260) Norm codec strategy in Similarity

Tim Allison (Jira)
In reply to this post by Tim Allison (Jira)

    [ https://issues.apache.org/jira/browse/LUCENE-1260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12586588#action_12586588 ]

Karl Wettin commented on LUCENE-1260:
-------------------------------------

I suppose it would be possible to implement a NormCodec that would listen to encodeNorm(float) while one is creating a subset of the index in order to find all norm resolution sweetspots for that corpus using some appropriate algorithm. Mean shift?.

Perhaps it even would be possible to compress it down to n bags from the start and then allow for it to grow in case new documents with other norm requirements are added to the store.

I haven't thought too much about it yet, but it seems to me that norm codec has more to do with the physical store (Directory) than Similarity and should perhaps be moved there instead? I have no idea how, but I also want to move it to the instance scope so I can have multiple indices with unique norm span/resolutions created from the same classloader.

> Norm codec strategy in Similarity
> ---------------------------------
>
>                 Key: LUCENE-1260
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1260
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>    Affects Versions: 2.3.1
>            Reporter: Karl Wettin
>         Attachments: LUCENE-1260.txt
>
>
> The static span and resolution of the 8 bit norms codec might not fit with all applications.
> My use case requires that 100f-250f is discretized in 60 bags instead of the default.. 10?

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (LUCENE-1260) Norm codec strategy in Similarity

Tim Allison (Jira)
In reply to this post by Tim Allison (Jira)

    [ https://issues.apache.org/jira/browse/LUCENE-1260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12586954#action_12586954 ]

Hoss Man commented on LUCENE-1260:
----------------------------------

bq. I haven't thought too much about it yet, but it seems to me that norm codec has more to do with the physical store (Directory) than Similarity and should perhaps be moved there instead?

As long as the norm remains a fixed size (1 byte) then it doesn't really matter whether it's tied to Similarity's or the store itself -- it would be nice if the Index could tell you which normDecoder to use, but it's not any more unreasonable to expect the application to keep track of this (if it's not the default encoding) since applications already have to keep track of things like which Analyzer is "compatible" with querying this index.

If we want norms to be more flexible, so tat apps can pick not only the encoding but also the size... then things get more interesting, but it's still feasible to say "if you customize this, you have to make your reading apps and your writing apps smart enough to know about your customization."

bq. I also want to move it to the instance scope so I can have multiple indices with unique norm span/resolutions created from the same classloader.

I agree, it's a good goal.


> Norm codec strategy in Similarity
> ---------------------------------
>
>                 Key: LUCENE-1260
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1260
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>    Affects Versions: 2.3.1
>            Reporter: Karl Wettin
>         Attachments: LUCENE-1260.txt
>
>
> The static span and resolution of the 8 bit norms codec might not fit with all applications.
> My use case requires that 100f-250f is discretized in 60 bags instead of the default.. 10?

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (LUCENE-1260) Norm codec strategy in Similarity

Tim Allison (Jira)
In reply to this post by Tim Allison (Jira)

    [ https://issues.apache.org/jira/browse/LUCENE-1260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12587290#action_12587290 ]

Karl Wettin commented on LUCENE-1260:
-------------------------------------

{quote}
As long as the norm remains a fixed size (1 byte) then it doesn't really matter whether it's tied to Similarity's or the store itself - it would be nice if the Index could tell you which normDecoder to use, but it's not any more unreasonable to expect the application to keep track of this (if it's not the default encoding) since applications already have to keep track of things like which Analyzer is "compatible" with querying this index.

If we want norms to be more flexible, so tat apps can pick not only the encoding but also the size... then things get more interesting, but it's still feasible to say "if you customize this, you have to make your reading apps and your writing apps smart enough to know about your customization."
{quote}

I like the idea of an index that is completely self aware of norm encoding, what payloads mean, et c.

{quote}
I also want to move it to the instance scope so I can have multiple indices with unique norm span/resolutions created from the same classloader.
{quote}

My use case is really about document boost and not normalization.

So another solution to this is to introduce a (variable bit sized?) document boost file and completely separate it from the norms instead of as now where  normalization and document boost is baked up as the same thing. I think there would be no need to touch the norms encoding then, that the default resolution is good enough for /normalization/. It would fix several caveats with norms as I see it.



> Norm codec strategy in Similarity
> ---------------------------------
>
>                 Key: LUCENE-1260
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1260
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>    Affects Versions: 2.3.1
>            Reporter: Karl Wettin
>         Attachments: LUCENE-1260.txt
>
>
> The static span and resolution of the 8 bit norms codec might not fit with all applications.
> My use case requires that 100f-250f is discretized in 60 bags instead of the default.. 10?

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (LUCENE-1260) Norm codec strategy in Similarity

Tim Allison (Jira)
In reply to this post by Tim Allison (Jira)

    [ https://issues.apache.org/jira/browse/LUCENE-1260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12587435#action_12587435 ]

Hoss Man commented on LUCENE-1260:
----------------------------------

bq. My use case is really about document boost and not normalization.

bq. So another solution to this is to introduce a (variable bit sized?) document boost file and completely separate it from the norms instead...

1) "norms" is a vague term.  currently "lengthNorm" is folded in with "field boosts" and "doc boosts" to form a generic "fieldNorm" ... I assumed you were interested in a more general way to improve the resolution of "fieldNorm"

2) your description of general purpose variable sized document boosting sounds exactly like LUCENE-1231 ... in the long run utilities using LUCENE-1231 (or something like it) to replace "field boosts" and "length norms" might make the most sense as a way to eliminate the current static Norm encoding and put more flexibility in the hands of users

> Norm codec strategy in Similarity
> ---------------------------------
>
>                 Key: LUCENE-1260
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1260
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>    Affects Versions: 2.3.1
>            Reporter: Karl Wettin
>         Attachments: LUCENE-1260.txt
>
>
> The static span and resolution of the 8 bit norms codec might not fit with all applications.
> My use case requires that 100f-250f is discretized in 60 bags instead of the default.. 10?

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (LUCENE-1260) Norm codec strategy in Similarity

Tim Allison (Jira)
In reply to this post by Tim Allison (Jira)

    [ https://issues.apache.org/jira/browse/LUCENE-1260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12587445#action_12587445 ]

Karl Wettin commented on LUCENE-1260:
-------------------------------------

{quote}
1) "norms" is a vague term. currently "lengthNorm" is folded in with "field boosts" and "doc boosts" to form a generic "fieldNorm" ... I assumed you were interested in a more general way to improve the resolution of "fieldNorm"
{quote}

I still am but mainly because it is the simplest and only way to get better document boost resolution at the moment.




> Norm codec strategy in Similarity
> ---------------------------------
>
>                 Key: LUCENE-1260
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1260
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>    Affects Versions: 2.3.1
>            Reporter: Karl Wettin
>         Attachments: LUCENE-1260.txt
>
>
> The static span and resolution of the 8 bit norms codec might not fit with all applications.
> My use case requires that 100f-250f is discretized in 60 bags instead of the default.. 10?

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (LUCENE-1260) Norm codec strategy in Similarity

Tim Allison (Jira)
In reply to this post by Tim Allison (Jira)

    [ https://issues.apache.org/jira/browse/LUCENE-1260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12587446#action_12587446 ]

Karl Wettin commented on LUCENE-1260:
-------------------------------------

I notice there is a tyop in the patch. And there is no test case for SimpleNormCodec. I'll come up with that too.

> Norm codec strategy in Similarity
> ---------------------------------
>
>                 Key: LUCENE-1260
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1260
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>    Affects Versions: 2.3.1
>            Reporter: Karl Wettin
>         Attachments: LUCENE-1260.txt
>
>
> The static span and resolution of the 8 bit norms codec might not fit with all applications.
> My use case requires that 100f-250f is discretized in 60 bags instead of the default.. 10?

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[jira] Updated: (LUCENE-1260) Norm codec strategy in Similarity

Tim Allison (Jira)
In reply to this post by Tim Allison (Jira)

     [ https://issues.apache.org/jira/browse/LUCENE-1260?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Karl Wettin updated LUCENE-1260:
--------------------------------

    Attachment: LUCENE-1260.txt

Fixed some typos and added some tests. Perhaps it needs new javadocs too?

> Norm codec strategy in Similarity
> ---------------------------------
>
>                 Key: LUCENE-1260
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1260
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>    Affects Versions: 2.3.1
>            Reporter: Karl Wettin
>         Attachments: LUCENE-1260.txt, LUCENE-1260.txt
>
>
> The static span and resolution of the 8 bit norms codec might not fit with all applications.
> My use case requires that 100f-250f is discretized in 60 bags instead of the default.. 10?

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (LUCENE-1260) Norm codec strategy in Similarity

Tim Allison (Jira)
In reply to this post by Tim Allison (Jira)

    [ https://issues.apache.org/jira/browse/LUCENE-1260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12587954#action_12587954 ]

Karl Wettin commented on LUCENE-1260:
-------------------------------------

This is a retroactive ASL blessing of the patch posted 11/Apr/08 06:01 AM

> Norm codec strategy in Similarity
> ---------------------------------
>
>                 Key: LUCENE-1260
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1260
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>    Affects Versions: 2.3.1
>            Reporter: Karl Wettin
>         Attachments: LUCENE-1260.txt, LUCENE-1260.txt
>
>
> The static span and resolution of the 8 bit norms codec might not fit with all applications.
> My use case requires that 100f-250f is discretized in 60 bags instead of the default.. 10?

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [jira] Commented: (LUCENE-1260) Norm codec strategy in Similarity

hossman
In reply to this post by Tim Allison (Jira)

: I still am but mainly because it is the simplest and only way to get
: better document boost resolution at the moment.

I would argue that using a FieldScoreQuery is the easiest way to get
better document boost resolution ... but it doesn't change the fact thta
support for more flexible norm encoding is worthwhile. (but as i said: i
suspect column stride fields may be a suitbale replacement for the built
in norm support down the road)


-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[jira] Updated: (LUCENE-1260) Norm codec strategy in Similarity

Tim Allison (Jira)
In reply to this post by Tim Allison (Jira)

     [ https://issues.apache.org/jira/browse/LUCENE-1260?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Karl Wettin updated LUCENE-1260:
--------------------------------

    Attachment: LUCENE-1260.txt

New patch additionally includes:

 * Lots of javadocs with warnings
 * Similarity#readNormCodec(Directory):NodeCodec
 * Similarity#writeNormCodec(Directory, NodeCode)


> Norm codec strategy in Similarity
> ---------------------------------
>
>                 Key: LUCENE-1260
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1260
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>    Affects Versions: 2.3.1
>            Reporter: Karl Wettin
>         Attachments: LUCENE-1260.txt, LUCENE-1260.txt, LUCENE-1260.txt
>
>
> The static span and resolution of the 8 bit norms codec might not fit with all applications.
> My use case requires that 100f-250f is discretized in 60 bags instead of the default.. 10?

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (LUCENE-1260) Norm codec strategy in Similarity

Tim Allison (Jira)
In reply to this post by Tim Allison (Jira)

    [ https://issues.apache.org/jira/browse/LUCENE-1260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12592203#action_12592203 ]

Karl Wettin commented on LUCENE-1260:
-------------------------------------

I think I've takes this as far as it can without refactoring it out of the static scope.

> Norm codec strategy in Similarity
> ---------------------------------
>
>                 Key: LUCENE-1260
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1260
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>    Affects Versions: 2.3.1
>            Reporter: Karl Wettin
>         Attachments: LUCENE-1260.txt, LUCENE-1260.txt, LUCENE-1260.txt
>
>
> The static span and resolution of the 8 bit norms codec might not fit with all applications.
> My use case requires that 100f-250f is discretized in 60 bags instead of the default.. 10?

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (LUCENE-1260) Norm codec strategy in Similarity

Tim Allison (Jira)
In reply to this post by Tim Allison (Jira)

    [ https://issues.apache.org/jira/browse/LUCENE-1260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12625157#action_12625157 ]

Karl Wettin commented on LUCENE-1260:
-------------------------------------

I'd like to see this committed in 2.4, but I don't have core access.

> Norm codec strategy in Similarity
> ---------------------------------
>
>                 Key: LUCENE-1260
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1260
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>    Affects Versions: 2.3.1
>            Reporter: Karl Wettin
>         Attachments: LUCENE-1260.txt, LUCENE-1260.txt, LUCENE-1260.txt
>
>
> The static span and resolution of the 8 bit norms codec might not fit with all applications.
> My use case requires that 100f-250f is discretized in 60 bags instead of the default.. 10?

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (LUCENE-1260) Norm codec strategy in Similarity

Tim Allison (Jira)
In reply to this post by Tim Allison (Jira)

    [ https://issues.apache.org/jira/browse/LUCENE-1260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12625170#action_12625170 ]

Yonik Seeley commented on LUCENE-1260:
--------------------------------------

This solves a particular usecase nicely, but is it really generic enough and durable enough to put in core?
This essentially adds a new file into the index, but it's not really part of the index.  It wouldn't work with any possible upcoming similarity-per-field to give different NormCodecs per field, and it requires the user to handle their own management of the file (using lucene addIndexes to copy from one place to another won't grab this file, etc).

> Norm codec strategy in Similarity
> ---------------------------------
>
>                 Key: LUCENE-1260
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1260
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>    Affects Versions: 2.3.1
>            Reporter: Karl Wettin
>         Attachments: LUCENE-1260.txt, LUCENE-1260.txt, LUCENE-1260.txt
>
>
> The static span and resolution of the 8 bit norms codec might not fit with all applications.
> My use case requires that 100f-250f is discretized in 60 bags instead of the default.. 10?

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (LUCENE-1260) Norm codec strategy in Similarity

Tim Allison (Jira)
In reply to this post by Tim Allison (Jira)

    [ https://issues.apache.org/jira/browse/LUCENE-1260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12625223#action_12625223 ]

Karl Wettin commented on LUCENE-1260:
-------------------------------------

The file is just something secondary I added on "request", personally I use a hardcoded codec. All it does is to allow a simple way in to change the current static norm translation table.

> Norm codec strategy in Similarity
> ---------------------------------
>
>                 Key: LUCENE-1260
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1260
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>    Affects Versions: 2.3.1
>            Reporter: Karl Wettin
>         Attachments: LUCENE-1260.txt, LUCENE-1260.txt, LUCENE-1260.txt
>
>
> The static span and resolution of the 8 bit norms codec might not fit with all applications.
> My use case requires that 100f-250f is discretized in 60 bags instead of the default.. 10?

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (LUCENE-1260) Norm codec strategy in Similarity

Tim Allison (Jira)
In reply to this post by Tim Allison (Jira)

    [ https://issues.apache.org/jira/browse/LUCENE-1260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12706604#action_12706604 ]

Johan Kindgren commented on LUCENE-1260:
----------------------------------------

Wouldn't the simplest solution be to refactor out the static methods, replace them with instance methods and remove the getNormDecoder method? This would enable a pluggable behavior without introducing a new Codec.
Would cause minor changes to 11 classes in the core, and would also clean up the code from static stuff.

As described in LUCENE-1261.

> Norm codec strategy in Similarity
> ---------------------------------
>
>                 Key: LUCENE-1260
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1260
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>    Affects Versions: 2.3.1
>            Reporter: Karl Wettin
>         Attachments: LUCENE-1260.txt, LUCENE-1260.txt, LUCENE-1260.txt
>
>
> The static span and resolution of the 8 bit norms codec might not fit with all applications.
> My use case requires that 100f-250f is discretized in 60 bags instead of the default.. 10?

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (LUCENE-1260) Norm codec strategy in Similarity

Tim Allison (Jira)
In reply to this post by Tim Allison (Jira)

    [ https://issues.apache.org/jira/browse/LUCENE-1260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12715492#action_12715492 ]

Karl Wettin commented on LUCENE-1260:
-------------------------------------

bq. Wouldn't the simplest solution be to refactor out the static methods, replace them with instance methods and remove the getNormDecoder method? This would enable a pluggable behavior without introducing a new Codec.

Hi Johan,

feel free to post a patch!



> Norm codec strategy in Similarity
> ---------------------------------
>
>                 Key: LUCENE-1260
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1260
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>    Affects Versions: 2.3.1
>            Reporter: Karl Wettin
>         Attachments: LUCENE-1260.txt, LUCENE-1260.txt, LUCENE-1260.txt
>
>
> The static span and resolution of the 8 bit norms codec might not fit with all applications.
> My use case requires that 100f-250f is discretized in 60 bags instead of the default.. 10?

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[jira] Updated: (LUCENE-1260) Norm codec strategy in Similarity

Tim Allison (Jira)
In reply to this post by Tim Allison (Jira)

     [ https://issues.apache.org/jira/browse/LUCENE-1260?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Johan Kindgren updated LUCENE-1260:
-----------------------------------

    Attachment: Lucene-1260.patch

Removed 'static' keyword to enable a pluggable behavior for encoding/decoding norms. Our business-case for this is to fix scoring when using NGrams. If a word is split into three parts, the norm for these parts would then become ~0.3125 (don't remember exactly) in the current implementation. A search for the exakt same word would then generate a score of less than 1.0. With a pluggable norm-calculation, we could use a norm-table with values 0-100 and get a better scoring.

Minor changes in 11 core-classes and some tests. Also minor changes in analyzers, instantiated, memory and miscellaneous.

> Norm codec strategy in Similarity
> ---------------------------------
>
>                 Key: LUCENE-1260
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1260
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>    Affects Versions: 2.3.1
>            Reporter: Karl Wettin
>         Attachments: Lucene-1260.patch, LUCENE-1260.txt, LUCENE-1260.txt, LUCENE-1260.txt
>
>
> The static span and resolution of the 8 bit norms codec might not fit with all applications.
> My use case requires that 100f-250f is discretized in 60 bags instead of the default.. 10?

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[jira] Issue Comment Edited: (LUCENE-1260) Norm codec strategy in Similarity

Tim Allison (Jira)
In reply to this post by Tim Allison (Jira)

    [ https://issues.apache.org/jira/browse/LUCENE-1260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12722374#action_12722374 ]

Johan Kindgren edited comment on LUCENE-1260 at 6/21/09 8:54 AM:
-----------------------------------------------------------------

Removed 'static' keyword to enable a pluggable behavior for encoding/decoding norms. Our business-case for this is to fix scoring when using NGrams. If a word is split into three parts, the norm for these parts would then become ~0.3125 (don't remember exactly) in the current implementation. A search for the exact same word would then generate a score of less than 1.0. With a pluggable norm-calculation, we could use a norm-table with values 0-100 and get a better scoring.

Minor changes in 11 core-classes and some tests. Also minor changes in analyzers, instantiated, memory and miscellaneous.

      was (Author: johkin):
    Removed 'static' keyword to enable a pluggable behavior for encoding/decoding norms. Our business-case for this is to fix scoring when using NGrams. If a word is split into three parts, the norm for these parts would then become ~0.3125 (don't remember exactly) in the current implementation. A search for the exakt same word would then generate a score of less than 1.0. With a pluggable norm-calculation, we could use a norm-table with values 0-100 and get a better scoring.

Minor changes in 11 core-classes and some tests. Also minor changes in analyzers, instantiated, memory and miscellaneous.
 

> Norm codec strategy in Similarity
> ---------------------------------
>
>                 Key: LUCENE-1260
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1260
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>    Affects Versions: 2.3.1
>            Reporter: Karl Wettin
>         Attachments: Lucene-1260.patch, LUCENE-1260.txt, LUCENE-1260.txt, LUCENE-1260.txt
>
>
> The static span and resolution of the 8 bit norms codec might not fit with all applications.
> My use case requires that 100f-250f is discretized in 60 bags instead of the default.. 10?

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

12