Changing the Scoring api

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
12 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Changing the Scoring api

Marcus Falck
Hi everyone,

 

I want to override the default scoring when it comes to queries
containing the OR operator.

 

For example if I got the following headlines in my index :

"Sun sues Microsoft"

"Microsoft want to buy Tiscali"

".NU domain sues Microsoft"

"The sun is shining"

"Sun brings antitrust suit against Microsoft"

 

 

Those documents have been boosted in desc fashion ("Sun sues Microsoft"

has higher calculated norm value then "Sun brings antirust suit against
Microsoft"),

 

The similarity class that has been used has made the norm values to be
exactly as the boost value ( I have even modified the norm to be a float
so I won't loose precision ).

 

If I perform a search for: Microsoft OR Sun

 

The topranked results will almost certainly be:

 

Sun sues Microsoft

Sun Brings antitrust suit against Microsoft

....

 

I just want the documents returned like this:

"Sun sues Microsoft"

"Microsoft want to buy Tiscali"

".NU domain sues Microsoft"

"The sun is shining"

"Sun brings antitrust suit against Microsoft"

 

 

I have to get this to work since I'm indexing news material and the
customers are only interested in the newest articles ( so the date of
the article is being used as a boost factor).

 

Any ideas? My rank changes to lucene works as expected when it comes to
AND operator and single term queries.

 

 

/

Regards

Marcus Falck

 

 

 

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Changing the Scoring api

Chris Hostetter-3

: I want to override the default scoring when it comes to queries
: containing the OR operator.

this mesages seems to be an exact repost of your question from last friday
... was theresomething wrong with teh suggestions i included in my reply
to it?

http://www.nabble.com/Changing-the-Scoring-api-for-OR-parameters-tf2237565.html



-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

SV: Changing the Scoring api

Marcus Falck
In reply to this post by Marcus Falck
Hi Hoss,

No it wasn't any thing wrong with your suggestions except that they had landed in my junk mail for some reason, stupid outlook.

However I haven't had any chance testing all of your suggestions but I already had implemented my own similarity class that has the coord fixed to 1. And it doesn't work as excepted.


/
Marcus

-----Ursprungligt meddelande-----
Från: Chris Hostetter [mailto:[hidden email]]
Skickat: den 11 september 2006 20:15
Till: Lucene Users
Ämne: Re: Changing the Scoring api


: I want to override the default scoring when it comes to queries
: containing the OR operator.

this mesages seems to be an exact repost of your question from last friday
... was theresomething wrong with teh suggestions i included in my reply
to it?

http://www.nabble.com/Changing-the-Scoring-api-for-OR-parameters-tf2237565.html



-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]




---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

SV: Changing the Scoring api

Marcus Falck
In reply to this post by Marcus Falck
However the BooleanQuery's disableCoord seems to make effect.
But I still have the problem when I'm constructing queries with wildcards.


/
Marcus

-----Ursprungligt meddelande-----
Från: Marcus Falck [mailto:[hidden email]]
Skickat: den 12 september 2006 09:34
Till: [hidden email]
Ämne: SV: Changing the Scoring api

Hi Hoss,

No it wasn't any thing wrong with your suggestions except that they had landed in my junk mail for some reason, stupid outlook.

However I haven't had any chance testing all of your suggestions but I already had implemented my own similarity class that has the coord fixed to 1. And it doesn't work as excepted.


/
Marcus

-----Ursprungligt meddelande-----
Från: Chris Hostetter [mailto:[hidden email]]
Skickat: den 11 september 2006 20:15
Till: Lucene Users
Ämne: Re: Changing the Scoring api


: I want to override the default scoring when it comes to queries
: containing the OR operator.

this mesages seems to be an exact repost of your question from last friday
... was theresomething wrong with teh suggestions i included in my reply
to it?

http://www.nabble.com/Changing-the-Scoring-api-for-OR-parameters-tf2237565.html



-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]




---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]




---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: SV: Changing the Scoring api

Chris Hostetter-3

: However the BooleanQuery's disableCoord seems to make effect.
: But I still have the problem when I'm constructing queries with wildcards.

really? ... that's strange, WildcardQuery uses the disableCoord feature of
BooleanQuery.  Do you have an example of what you mean?

: already had implemented my own similarity class that has the coord fixed
: to 1. And it doesn't work as excepted.

are you setting your Similarity as the default on your IndexSearcher prior
to executing your Queries?


-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

SV: SV: Changing the Scoring api

Marcus Falck
In reply to this post by Marcus Falck
It didn't really work for booleanqueries either. I thought it was working for some hours but to my big disappointment I realized that this was not the case.

Im using two IndexReaders ( RAM and FS ) and one multireader. Creating one indexsearcher by passing the multireader as constructor argument.

Set the similarity class to my own similarity class using the SetSimilarity method on the searcher.

This is the source for the similarity class I'm using:

    public class BoostOnlySimilarity : Similarity
    {
        /// <summary>Implemented as <code>1/sqrt(numTerms)</code>. </summary>
        public override float LengthNorm(System.String fieldName, int numTerms)
        {
            return 1;
        }

        /// <summary>Implemented as <code>1/sqrt(sumOfSquaredWeights)</code>. </summary>
        public override float QueryNorm(float sumOfSquaredWeights)
        {
            // Deal with the multiple terms issue
            return 1;
            //return (float)(1.0 / sumOfSquaredWeights); // return 1;
        }

        /// <summary>Implemented as <code>sqrt(freq)</code>. </summary>
        public override float Tf(float freq)
        {
            return 1;
        }

        /// <summary>Implemented as <code>1 / (distance + 1)</code>. </summary>
        public override float SloppyFreq(int distance)
        {
            return 1;
        }

        public override float Idf(Lucene.Net.Index.Term term, Searcher searcher)
        {
            return 1;
        }
        public override float Ldf(int docFreq, int numDocs)
        {
            return 1;
        }
        /// <summary>Implemented as <code>overlap / maxOverlap</code>. </summary>
        public override float Coord(int overlap, int maxOverlap)
        {
            return 1;
        }

    }


/
Marcus


-----Ursprungligt meddelande-----
Från: Chris Hostetter [mailto:[hidden email]]
Skickat: den 12 september 2006 17:20
Till: [hidden email]
Ämne: Re: SV: Changing the Scoring api


: However the BooleanQuery's disableCoord seems to make effect.
: But I still have the problem when I'm constructing queries with wildcards.

really? ... that's strange, WildcardQuery uses the disableCoord feature of
BooleanQuery.  Do you have an example of what you mean?

: already had implemented my own similarity class that has the coord fixed
: to 1. And it doesn't work as excepted.

are you setting your Similarity as the default on your IndexSearcher prior
to executing your Queries?


-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]




---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

SV: SV: Changing the Scoring api

Marcus Falck
In reply to this post by Marcus Falck
Example:

Enter query:
AllText:Microsoft
score: 0,01476238 2002-02-19 05:09:00(1000022578) Qwest pins recovery hopes on long-distance
score: 0,01476227 2002-02-19 05:07:00(1000022547) <B>Microsoft</B> ordered to let states see Windows code

Enter query:
AllText:Microsoft OR AllText:IBM
score: 0,02949772 2002-02-19 01:07:00(1000022129) Massive debt and messy books
score: 0,02949705 2002-02-19 01:01:00(1000022033) Alberner Mythos

As you see the score is 0,014x for the search containing one term (which also equals the norm value for those documents).

And 0.029x for the docs containing 2 terms. Which appears to be norm * 2.
How do I get rid of that * 2.

/
Marcus



-----Ursprungligt meddelande-----
Från: Chris Hostetter [mailto:[hidden email]]
Skickat: den 12 september 2006 17:20
Till: [hidden email]
Ämne: Re: SV: Changing the Scoring api


: However the BooleanQuery's disableCoord seems to make effect.
: But I still have the problem when I'm constructing queries with wildcards.

really? ... that's strange, WildcardQuery uses the disableCoord feature of
BooleanQuery.  Do you have an example of what you mean?

: already had implemented my own similarity class that has the coord fixed
: to 1. And it doesn't work as excepted.

are you setting your Similarity as the default on your IndexSearcher prior
to executing your Queries?


-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]




---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: SV: SV: Changing the Scoring api

Chris Hostetter-3
In reply to this post by Marcus Falck

1) This is not java.  Since it's not java, i can't even begin to guess
   what odd excentricities might exist in whatever lucene port you are
   using.
2) If this *were* java then it wouldn't work th way you want it to, since
   you have the tf function returning "1" regardless of the frequency ...
   this method needs to return "0" sometimes, otherwise *everything* is a
   match
3) based on #2, and the resulst you describe i'm guessing your similarity
   isn't being used at all -- check that, add a System.out.println("BOO")
   (or whatever the correlary is in this langauge) to all ofthese methods, i
   don't think you'll ever see that output -- start with figuring out why
   your similarity is being used before you try to fix it any more.
4) once you are sure your Similarity is being used, then take a look at
   the IndexSearcher.explain methods --they are your best riend in
   tweaking scoring information.


: Date: Wed, 13 Sep 2006 14:42:13 +0200
: From: Marcus Falck <[hidden email]>
: Reply-To: [hidden email]
: To: [hidden email]
: Subject: SV: SV: Changing the Scoring api
:
: It didn't really work for booleanqueries either. I thought it was working for some hours but to my big disappointment I realized that this was not the case.
:
: Im using two IndexReaders ( RAM and FS ) and one multireader. Creating one indexsearcher by passing the multireader as constructor argument.
:
: Set the similarity class to my own similarity class using the SetSimilarity method on the searcher.
:
: This is the source for the similarity class I'm using:
:
:     public class BoostOnlySimilarity : Similarity
:     {
:         /// <summary>Implemented as <code>1/sqrt(numTerms)</code>. </summary>
:         public override float LengthNorm(System.String fieldName, int numTerms)
:         {
:             return 1;
:         }
:
:         /// <summary>Implemented as <code>1/sqrt(sumOfSquaredWeights)</code>. </summary>
:         public override float QueryNorm(float sumOfSquaredWeights)
:         {
:             // Deal with the multiple terms issue
:             return 1;
:             //return (float)(1.0 / sumOfSquaredWeights); // return 1;
:         }
:
:         /// <summary>Implemented as <code>sqrt(freq)</code>. </summary>
:         public override float Tf(float freq)
:         {
:             return 1;
:         }
:
:         /// <summary>Implemented as <code>1 / (distance + 1)</code>. </summary>
:         public override float SloppyFreq(int distance)
:         {
:             return 1;
:         }
:
:         public override float Idf(Lucene.Net.Index.Term term, Searcher searcher)
:         {
:             return 1;
:         }
:         public override float Ldf(int docFreq, int numDocs)
:         {
:             return 1;
:         }
:         /// <summary>Implemented as <code>overlap / maxOverlap</code>. </summary>
:         public override float Coord(int overlap, int maxOverlap)
:         {
:             return 1;
:         }
:
:     }
:
:
: /
: Marcus
:
:
: -----Ursprungligt meddelande-----
: Från: Chris Hostetter [mailto:[hidden email]]
: Skickat: den 12 september 2006 17:20
: Till: [hidden email]
: Ämne: Re: SV: Changing the Scoring api
:
:
: : However the BooleanQuery's disableCoord seems to make effect.
: : But I still have the problem when I'm constructing queries with wildcards.
:
: really? ... that's strange, WildcardQuery uses the disableCoord feature of
: BooleanQuery.  Do you have an example of what you mean?
:
: : already had implemented my own similarity class that has the coord fixed
: : to 1. And it doesn't work as excepted.
:
: are you setting your Similarity as the default on your IndexSearcher prior
: to executing your Queries?
:
:
: -Hoss
:
:
: ---------------------------------------------------------------------
: To unsubscribe, e-mail: [hidden email]
: For additional commands, e-mail: [hidden email]
:
:
:
:
: ---------------------------------------------------------------------
: To unsubscribe, e-mail: [hidden email]
: For additional commands, e-mail: [hidden email]
:



-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: SV: SV: Changing the Scoring api

Doron Cohen
I think it is not possible, by only modifying Similarity, to make the total
score only count for documents boosts (which is the original request in
this discussion).

This is because a higher level scorer always sums the scores of "its"
sub-scorers - is this right...?  if so there are probably two options:
  - using Solr's FunctionQuery as suggested earlier in this thread,
maintaining the desired score in a dedicated field rather than in doc boost
- this is perhaps a better approach, because the doc boost was not intended
for this use (it only resides in 1 byte together with other factors, etc.)
  - writing a dedicated (top level) scorer - and query, and weight - that
would not sum on subscorers.

Chris Hostetter <[hidden email]> wrote on 13/09/2006 11:06:20:
>
> 1) This is not java.  Since it's not java, i can't even begin to guess
>    what odd excentricities might exist in whatever lucene port you are
>    using.
> 2) If this *were* java then it wouldn't work th way you want it to, since
>    you have the tf function returning "1" regardless of the frequency ...
>    this method needs to return "0" sometimes, otherwise *everything* is a
>    match

I agree that tf='always 1' is weird on scoring computations, but since tf
is checked only for documents that contain the processed term (walking
termDocs), actually tf would never return 0. An idf='always 1' seems to me
more harsh for the scoring process - if it is this way modified, it should
be set a much smaller value.

> 3) based on #2, and the resulst you describe i'm guessing your similarity
>    isn't being used at all -- check that, add a System.out.println("BOO")
>    (or whatever the correlary is in this langauge) to all ofthese
methods, i

>    don't think you'll ever see that output -- start with figuring out why
>    your similarity is being used before you try to fix it any more.
> 4) once you are sure your Similarity is being used, then take a look at
>    the IndexSearcher.explain methods --they are your best riend in
>    tweaking scoring information.
>
>
> : Date: Wed, 13 Sep 2006 14:42:13 +0200
> : From: Marcus Falck <[hidden email]>
> : Reply-To: [hidden email]
> : To: [hidden email]
> : Subject: SV: SV: Changing the Scoring api
> :
> : It didn't really work for booleanqueries either. I thought it was
> working for some hours but to my big disappointment I realized that
> this was not the case.
> :
> : Im using two IndexReaders ( RAM and FS ) and one multireader.
> Creating one indexsearcher by passing the multireader as
constructorargument.
> :
> : Set the similarity class to my own similarity class using the
> SetSimilarity method on the searcher.
> :
> : This is the source for the similarity class I'm using:
> :
> :     public class BoostOnlySimilarity : Similarity
> :     {
> :         /// <summary>Implemented as
<code>1/sqrt(numTerms)</code>.</summary>

> :         public override float LengthNorm(System.String fieldName,
> int numTerms)
> :         {
> :             return 1;
> :         }
> :
> :         /// <summary>Implemented as
> <code>1/sqrt(sumOfSquaredWeights)</code>. </summary>
> :         public override float QueryNorm(float sumOfSquaredWeights)
> :         {
> :             // Deal with the multiple terms issue
> :             return 1;
> :             //return (float)(1.0 / sumOfSquaredWeights); // return 1;
> :         }
> :
> :         /// <summary>Implemented as <code>sqrt(freq)</code>. </summary>
> :         public override float Tf(float freq)
> :         {
> :             return 1;
> :         }
> :
> :         /// <summary>Implemented as <code>1 / (distance + 1)
> </code>. </summary>
> :         public override float SloppyFreq(int distance)
> :         {
> :             return 1;
> :         }
> :
> :         public override float Idf(Lucene.Net.Index.Term term,
> Searcher searcher)
> :         {
> :             return 1;
> :         }
> :         public override float Ldf(int docFreq, int numDocs)
> :         {
> :             return 1;
> :         }
> :         /// <summary>Implemented as <code>overlap /
> maxOverlap</code>. </summary>
> :         public override float Coord(int overlap, int maxOverlap)
> :         {
> :             return 1;
> :         }
> :
> :     }
> :
> :
> : /
> : Marcus
> :
> :
> : -----Ursprungligt meddelande-----
> : Från: Chris Hostetter [mailto:[hidden email]]
> : Skickat: den 12 september 2006 17:20
> : Till: [hidden email]
> : Ämne: Re: SV: Changing the Scoring api
> :
> :
> : : However the BooleanQuery's disableCoord seems to make effect.
> : : But I still have the problem when I'm constructing queries with
wildcards.
> :
> : really? ... that's strange, WildcardQuery uses the disableCoord feature
of
> : BooleanQuery.  Do you have an example of what you mean?
> :
> : : already had implemented my own similarity class that has the coord
fixed
> : : to 1. And it doesn't work as excepted.
> :
> : are you setting your Similarity as the default on your IndexSearcher
prior

> : to executing your Queries?
> :
> :
> : -Hoss
> :
> :
> : ---------------------------------------------------------------------
> : To unsubscribe, e-mail: [hidden email]
> : For additional commands, e-mail: [hidden email]
> :
> :
> :
> :
> : ---------------------------------------------------------------------
> : To unsubscribe, e-mail: [hidden email]
> : For additional commands, e-mail: [hidden email]
> :
>
>
>
> -Hoss
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

SV: SV: SV: Changing the Scoring api

Marcus Falck
In reply to this post by Marcus Falck
Yeah Hoss you are right this isn't java it's the .NET port. But I have to ask at this mail list since it contains a lot of people with a lot more insight in lucene then on the .NET user list.

And I have a hard time to believe that they wouldn't have ported the scoring parts correctly.  

First off:
I know that my Similarity class is used. Since a change in it immediately affects the calculated score for queries.

Second off:
I don't think my problems are related to the similarity class. I think Doron is correct when he says that the problems are in the summation of the subscorers.

Third off:
I haven't looked as much at the FunctionQuery in solr since I can't find any good documentation for it.
But if I write a function for a field don't the field values have to be in the field cache for applying this function? And since I'm dealing with a lot of data this will severely affect the overall performance.

/
Marcus





-----Ursprungligt meddelande-----
Från: Doron Cohen [mailto:[hidden email]]
Skickat: den 13 september 2006 21:03
Till: [hidden email]
Ämne: Re: SV: SV: Changing the Scoring api

I think it is not possible, by only modifying Similarity, to make the total
score only count for documents boosts (which is the original request in
this discussion).

This is because a higher level scorer always sums the scores of "its"
sub-scorers - is this right...?  if so there are probably two options:
  - using Solr's FunctionQuery as suggested earlier in this thread,
maintaining the desired score in a dedicated field rather than in doc boost
- this is perhaps a better approach, because the doc boost was not intended
for this use (it only resides in 1 byte together with other factors, etc.)
  - writing a dedicated (top level) scorer - and query, and weight - that
would not sum on subscorers.

Chris Hostetter <[hidden email]> wrote on 13/09/2006 11:06:20:
>
> 1) This is not java.  Since it's not java, i can't even begin to guess
>    what odd excentricities might exist in whatever lucene port you are
>    using.
> 2) If this *were* java then it wouldn't work th way you want it to, since
>    you have the tf function returning "1" regardless of the frequency ...
>    this method needs to return "0" sometimes, otherwise *everything* is a
>    match

I agree that tf='always 1' is weird on scoring computations, but since tf
is checked only for documents that contain the processed term (walking
termDocs), actually tf would never return 0. An idf='always 1' seems to me
more harsh for the scoring process - if it is this way modified, it should
be set a much smaller value.

> 3) based on #2, and the resulst you describe i'm guessing your similarity
>    isn't being used at all -- check that, add a System.out.println("BOO")
>    (or whatever the correlary is in this langauge) to all ofthese
methods, i

>    don't think you'll ever see that output -- start with figuring out why
>    your similarity is being used before you try to fix it any more.
> 4) once you are sure your Similarity is being used, then take a look at
>    the IndexSearcher.explain methods --they are your best riend in
>    tweaking scoring information.
>
>
> : Date: Wed, 13 Sep 2006 14:42:13 +0200
> : From: Marcus Falck <[hidden email]>
> : Reply-To: [hidden email]
> : To: [hidden email]
> : Subject: SV: SV: Changing the Scoring api
> :
> : It didn't really work for booleanqueries either. I thought it was
> working for some hours but to my big disappointment I realized that
> this was not the case.
> :
> : Im using two IndexReaders ( RAM and FS ) and one multireader.
> Creating one indexsearcher by passing the multireader as
constructorargument.
> :
> : Set the similarity class to my own similarity class using the
> SetSimilarity method on the searcher.
> :
> : This is the source for the similarity class I'm using:
> :
> :     public class BoostOnlySimilarity : Similarity
> :     {
> :         /// <summary>Implemented as
<code>1/sqrt(numTerms)</code>.</summary>

> :         public override float LengthNorm(System.String fieldName,
> int numTerms)
> :         {
> :             return 1;
> :         }
> :
> :         /// <summary>Implemented as
> <code>1/sqrt(sumOfSquaredWeights)</code>. </summary>
> :         public override float QueryNorm(float sumOfSquaredWeights)
> :         {
> :             // Deal with the multiple terms issue
> :             return 1;
> :             //return (float)(1.0 / sumOfSquaredWeights); // return 1;
> :         }
> :
> :         /// <summary>Implemented as <code>sqrt(freq)</code>. </summary>
> :         public override float Tf(float freq)
> :         {
> :             return 1;
> :         }
> :
> :         /// <summary>Implemented as <code>1 / (distance + 1)
> </code>. </summary>
> :         public override float SloppyFreq(int distance)
> :         {
> :             return 1;
> :         }
> :
> :         public override float Idf(Lucene.Net.Index.Term term,
> Searcher searcher)
> :         {
> :             return 1;
> :         }
> :         public override float Ldf(int docFreq, int numDocs)
> :         {
> :             return 1;
> :         }
> :         /// <summary>Implemented as <code>overlap /
> maxOverlap</code>. </summary>
> :         public override float Coord(int overlap, int maxOverlap)
> :         {
> :             return 1;
> :         }
> :
> :     }
> :
> :
> : /
> : Marcus
> :
> :
> : -----Ursprungligt meddelande-----
> : Från: Chris Hostetter [mailto:[hidden email]]
> : Skickat: den 12 september 2006 17:20
> : Till: [hidden email]
> : Ämne: Re: SV: Changing the Scoring api
> :
> :
> : : However the BooleanQuery's disableCoord seems to make effect.
> : : But I still have the problem when I'm constructing queries with
wildcards.
> :
> : really? ... that's strange, WildcardQuery uses the disableCoord feature
of
> : BooleanQuery.  Do you have an example of what you mean?
> :
> : : already had implemented my own similarity class that has the coord
fixed
> : : to 1. And it doesn't work as excepted.
> :
> : are you setting your Similarity as the default on your IndexSearcher
prior

> : to executing your Queries?
> :
> :
> : -Hoss
> :
> :
> : ---------------------------------------------------------------------
> : To unsubscribe, e-mail: [hidden email]
> : For additional commands, e-mail: [hidden email]
> :
> :
> :
> :
> : ---------------------------------------------------------------------
> : To unsubscribe, e-mail: [hidden email]
> : For additional commands, e-mail: [hidden email]
> :
>
>
>
> -Hoss
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]




---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

SV: SV: SV: Changing the Scoring api

Marcus Falck
In reply to this post by Marcus Falck
Apparently some modifications in the DisjunctionSumScorer class seems to give me exactly what I'm looking for. So it was possible =)



-----Ursprungligt meddelande-----
Från: Marcus Falck [mailto:[hidden email]]
Skickat: den 14 september 2006 09:56
Till: [hidden email]
Ämne: SV: SV: SV: Changing the Scoring api

Yeah Hoss you are right this isn't java it's the .NET port. But I have to ask at this mail list since it contains a lot of people with a lot more insight in lucene then on the .NET user list.

And I have a hard time to believe that they wouldn't have ported the scoring parts correctly.  

First off:
I know that my Similarity class is used. Since a change in it immediately affects the calculated score for queries.

Second off:
I don't think my problems are related to the similarity class. I think Doron is correct when he says that the problems are in the summation of the subscorers.

Third off:
I haven't looked as much at the FunctionQuery in solr since I can't find any good documentation for it.
But if I write a function for a field don't the field values have to be in the field cache for applying this function? And since I'm dealing with a lot of data this will severely affect the overall performance.

/
Marcus





-----Ursprungligt meddelande-----
Från: Doron Cohen [mailto:[hidden email]]
Skickat: den 13 september 2006 21:03
Till: [hidden email]
Ämne: Re: SV: SV: Changing the Scoring api

I think it is not possible, by only modifying Similarity, to make the total
score only count for documents boosts (which is the original request in
this discussion).

This is because a higher level scorer always sums the scores of "its"
sub-scorers - is this right...?  if so there are probably two options:
  - using Solr's FunctionQuery as suggested earlier in this thread,
maintaining the desired score in a dedicated field rather than in doc boost
- this is perhaps a better approach, because the doc boost was not intended
for this use (it only resides in 1 byte together with other factors, etc.)
  - writing a dedicated (top level) scorer - and query, and weight - that
would not sum on subscorers.

Chris Hostetter <[hidden email]> wrote on 13/09/2006 11:06:20:
>
> 1) This is not java.  Since it's not java, i can't even begin to guess
>    what odd excentricities might exist in whatever lucene port you are
>    using.
> 2) If this *were* java then it wouldn't work th way you want it to, since
>    you have the tf function returning "1" regardless of the frequency ...
>    this method needs to return "0" sometimes, otherwise *everything* is a
>    match

I agree that tf='always 1' is weird on scoring computations, but since tf
is checked only for documents that contain the processed term (walking
termDocs), actually tf would never return 0. An idf='always 1' seems to me
more harsh for the scoring process - if it is this way modified, it should
be set a much smaller value.

> 3) based on #2, and the resulst you describe i'm guessing your similarity
>    isn't being used at all -- check that, add a System.out.println("BOO")
>    (or whatever the correlary is in this langauge) to all ofthese
methods, i

>    don't think you'll ever see that output -- start with figuring out why
>    your similarity is being used before you try to fix it any more.
> 4) once you are sure your Similarity is being used, then take a look at
>    the IndexSearcher.explain methods --they are your best riend in
>    tweaking scoring information.
>
>
> : Date: Wed, 13 Sep 2006 14:42:13 +0200
> : From: Marcus Falck <[hidden email]>
> : Reply-To: [hidden email]
> : To: [hidden email]
> : Subject: SV: SV: Changing the Scoring api
> :
> : It didn't really work for booleanqueries either. I thought it was
> working for some hours but to my big disappointment I realized that
> this was not the case.
> :
> : Im using two IndexReaders ( RAM and FS ) and one multireader.
> Creating one indexsearcher by passing the multireader as
constructorargument.
> :
> : Set the similarity class to my own similarity class using the
> SetSimilarity method on the searcher.
> :
> : This is the source for the similarity class I'm using:
> :
> :     public class BoostOnlySimilarity : Similarity
> :     {
> :         /// <summary>Implemented as
<code>1/sqrt(numTerms)</code>.</summary>

> :         public override float LengthNorm(System.String fieldName,
> int numTerms)
> :         {
> :             return 1;
> :         }
> :
> :         /// <summary>Implemented as
> <code>1/sqrt(sumOfSquaredWeights)</code>. </summary>
> :         public override float QueryNorm(float sumOfSquaredWeights)
> :         {
> :             // Deal with the multiple terms issue
> :             return 1;
> :             //return (float)(1.0 / sumOfSquaredWeights); // return 1;
> :         }
> :
> :         /// <summary>Implemented as <code>sqrt(freq)</code>. </summary>
> :         public override float Tf(float freq)
> :         {
> :             return 1;
> :         }
> :
> :         /// <summary>Implemented as <code>1 / (distance + 1)
> </code>. </summary>
> :         public override float SloppyFreq(int distance)
> :         {
> :             return 1;
> :         }
> :
> :         public override float Idf(Lucene.Net.Index.Term term,
> Searcher searcher)
> :         {
> :             return 1;
> :         }
> :         public override float Ldf(int docFreq, int numDocs)
> :         {
> :             return 1;
> :         }
> :         /// <summary>Implemented as <code>overlap /
> maxOverlap</code>. </summary>
> :         public override float Coord(int overlap, int maxOverlap)
> :         {
> :             return 1;
> :         }
> :
> :     }
> :
> :
> : /
> : Marcus
> :
> :
> : -----Ursprungligt meddelande-----
> : Från: Chris Hostetter [mailto:[hidden email]]
> : Skickat: den 12 september 2006 17:20
> : Till: [hidden email]
> : Ämne: Re: SV: Changing the Scoring api
> :
> :
> : : However the BooleanQuery's disableCoord seems to make effect.
> : : But I still have the problem when I'm constructing queries with
wildcards.
> :
> : really? ... that's strange, WildcardQuery uses the disableCoord feature
of
> : BooleanQuery.  Do you have an example of what you mean?
> :
> : : already had implemented my own similarity class that has the coord
fixed
> : : to 1. And it doesn't work as excepted.
> :
> : are you setting your Similarity as the default on your IndexSearcher
prior

> : to executing your Queries?
> :
> :
> : -Hoss
> :
> :
> : ---------------------------------------------------------------------
> : To unsubscribe, e-mail: [hidden email]
> : For additional commands, e-mail: [hidden email]
> :
> :
> :
> :
> : ---------------------------------------------------------------------
> : To unsubscribe, e-mail: [hidden email]
> : For additional commands, e-mail: [hidden email]
> :
>
>
>
> -Hoss
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]




---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]




---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: SV: SV: SV: Changing the Scoring api

Chris Hostetter-3
In reply to this post by Marcus Falck

I obviously missunderstood your goal ... my reading of your question was
that you wanted the sum of the scores of individual terms (based on the tf
and idf) to matter, and you wanted the field norm values of the docs to be
taken into account (for "date boosting" purposes), but you did not want
documents to be 'penalized' by only matching some (but not all) of the
terms.  If you don't want the tf or idf to be taken into account either,
so that *only* the sums of the field norms are used, that can
still be done by overriding the coorisponding Similarity class methods
without modifying any Scorers ... and if you really don't want the sum
(just the xax) that can be done using DisjunctionMaxQueries instead of
BooleanQuery ... but i'm just speculating now because as i said, i
obviously missunerstood your question, so what exactly do you want, in
concrete terms?

: Yeah Hoss you are right this isn't java it's the .NET port. But I have
: to ask at this mail list since it contains a lot of people with a lot
: more insight in lucene then on the .NET user list.

Nothing personal, but that's the worst justification i've ever heard.
The Lucene.Net community is never going to grow/thrive if people don't
participate in it.  Looking at the archives, it doesn't appear you ever
attempted to post this (or any other) question to either of the Lucene.Net
mailing lists, so how can you say that you *have* to ask the Java Users
list in order to reach people with insight?  How do you know what kinds of
insight the Lucene.Net subscribers have? How do ou expect the Lucene.Net
community as a whole to gain insight if no one participates?


: And I have a hard time to believe that they wouldn't have ported the scoring parts correctly.

I wasn't suggestion it wouldn't be ported properly, i was pointing out
that different langauges (and differnet ports of APIs) have differnet
nuances.  The first thing i thought when looking at your Similarity class
was that it wasn't getting used at all because all of your method names
started with Capital letters -- it seemed like a very simple mistake for a
novice Java programmer to make.

: I haven't looked as much at the FunctionQuery in solr since I can't find
: any good documentation for it.  But if I write a function for a field
: don't the field values have to be in the field cache for applying this
: function? And since I'm dealing with a lot of data this will severely
: affect the overall performance.

1) if you have suggestions for improving the FunctionQuery javadocs, i'm
all ears ... it's not always easy for people who work with things daily to
realize how the documentation can be viewed as lacking by people less
familiar with it.  For me, i see that FunctionQueries are built from
ValuesSources, and that the classes which impliment ValueSource are
FieldCacheSource, LinearFloatFunction, etc... and just go from there.

2) having the values you want to compute functions on in the FieldCache
has no significantly greater impact on the performance of a query then the
fieldNorms you are currently using: in both cases there is an array with
one entry per doc; the only differnece is that fieldNorms are stored in a
byte[], and the FieldCache ues either int[] or float[] -- but you already
said you modified your fieldNorms to be float[] didn't you? ... so the
performance of FunctionQuery shouldn't be any different -- just easier to
maintain.



-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Loading...