Document boost not as expected...

classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

Document boost not as expected...

escher2k
I am implementing a document boost at indexing time for the documents. I read some posting that
seemed to indicate that omitNorm=false is needed to retain the document boosting for retrieval.
After I did that, it looks like I am not able to get back the boost I originally put in. Instead,
I get 1.25 as the score for all the documents retrieved.

Example:
Input
<doc boost="1.33">
<field name="uniq_id">3557_183970_10179</field>
<field name="login_name">user1</field>
<field name="show_all_flag">Y</field>
</doc>

Schema.xml
    <fieldtype name="stringB" class="solr.StrField" sortMissingLast="true" omitNorms="false"/>
    <field name="show_all_flag" type="stringB" indexed="true" stored="true"/>

Output for (http://testing:12002/solr/select/?qt=dismax&q=Y&qf=show_all_flag&fl=score,login_name)
<doc>
<float name="score">1.25</float>
<str name="login_name">5webdesign</str>
</doc>

I am not quite sure how the score changed from 1.33 to 1.25. I am not quite sure how this might have happened - I have modified the custom similarity but I don't quite have an explanation of how the score changed.


Reply | Threaded
Open this post in threaded view
|

Re: Document boost not as expected...

Mike Klaas
On 3/27/07, escher2k <[hidden email]> wrote:

>
> I am implementing a document boost at indexing time for the documents. I read
> some posting that
> seemed to indicate that omitNorm=false is needed to retain the document
> boosting for retrieval.
> After I did that, it looks like I am not able to get back the boost I
> originally put in. Instead,
> I get 1.25 as the score for all the documents retrieved.
>
> Example:
> Input
> <doc boost="1.33">
> <field name="uniq_id">3557_183970_10179</field>
> <field name="login_name">user1</field>
> <field name="show_all_flag">Y</field>
> </doc>
>
> Schema.xml
>     <fieldtype name="stringB" class="solr.StrField" sortMissingLast="true"
> omitNorms="false"/>
>     <field name="show_all_flag" type="stringB" indexed="true"
> stored="true"/>
>
> Output for
> (http://testing:12002/solr/select/?qt=dismax&q=Y&qf=show_all_flag&fl=score,login_name)
> <doc>
> <float name="score">1.25</float>
> <str name="login_name">5webdesign</str>
> </doc>
>
> I am not quite sure how the score changed from 1.33 to 1.25. I am not quite
> sure how this might have happened - I have modified the custom similarity
> but I don't quite have an explanation of how the score changed.

Have you looked at the score explanation debug data?  The document
boost is incorporated into the fieldNorm and so is modified by the
lengthNorm.  Further, during query the term idf, queryNorm come into
play.

You shouldn't expect that the document boost will be returned as the
document score (although you should expect it to affect it).

-Mike
Reply | Threaded
Open this post in threaded view
|

Re: Document boost not as expected...

Chris Hostetter-3
In reply to this post by escher2k

Ditto everything Mike said, but i'm also curious what Similarity changes
you made ... without knowing what that code looks like, all bets are off
in terms of anyone being able to help you understand the scores you are
seeing.

: I am not quite sure how the score changed from 1.33 to 1.25. I am not quite
: sure how this might have happened - I have modified the custom similarity
: but I don't quite have an explanation of how the score changed.


-Hoss

Reply | Threaded
Open this post in threaded view
|

Re: Document boost not as expected...

escher2k
Chris,
   Earlier I was trying to modify the Similarity computation to make it field dependent (we are trying to change tf based on the field). Now, I have reverted the custom computation so that the default Similarity is used. Fro testing, I boosted a single field in one doc.

<doc boost="1.33">
<field name="show_all_flag" boost="2.0">Y</field>
...
</doc>


This is what I see in the explain -
2.5 = (MATCH) sum of:
  2.5 = (MATCH) fieldWeight(show_all_flag:Y in 17), product of:
    1.0 = tf(termFreq(show_all_flag:Y)=1)
    1.0 = idf(docFreq=36239)
    2.5 = fieldNorm(field=show_all_flag, doc=17)


Again, I fail to understand where it is doing a multiplication by 1.25 (score (2.5) = field_boost (2.0) * 1.25 ??).

Thanks.

Chris Hostetter wrote
Ditto everything Mike said, but i'm also curious what Similarity changes
you made ... without knowing what that code looks like, all bets are off
in terms of anyone being able to help you understand the scores you are
seeing.

: I am not quite sure how the score changed from 1.33 to 1.25. I am not quite
: sure how this might have happened - I have modified the custom similarity
: but I don't quite have an explanation of how the score changed.


-Hoss
Reply | Threaded
Open this post in threaded view
|

Re: Document boost not as expected...

Mike Klaas
On 3/28/07, escher2k <[hidden email]> wrote:

> Again, I fail to understand where it is doing a multiplication by 1.25
> (score (2.5) = field_boost (2.0) * 1.25 ??).

As I said above, lengthNorm is also multiplied in.  This will depend
on your custom similar what value(s) you have in the field.

-Mike
Reply | Threaded
Open this post in threaded view
|

Re: Document boost not as expected...

escher2k
Mike,
   I am not doing anything custom for this test. I am assuming that the Default Similarity is used.
Surprisingly, if I remove the document level boost (set to 1.0) and just have a field level boost, the result
seems to be correct.

Mike Klaas wrote
On 3/28/07, escher2k <escher2k@yahoo.com> wrote:

> Again, I fail to understand where it is doing a multiplication by 1.25
> (score (2.5) = field_boost (2.0) * 1.25 ??).

As I said above, lengthNorm is also multiplied in.  This will depend
on your custom similar what value(s) you have in the field.

-Mike
Reply | Threaded
Open this post in threaded view
|

Re: Document boost not as expected...

Mike Klaas
On 3/28/07, escher2k <[hidden email]> wrote:
>
> Mike,
>    I am not doing anything custom for this test. I am assuming that the
> Default Similarity is used.
> Surprisingly, if I remove the document level boost (set to 1.0) and just
> have a field level boost, the result
> seems to be correct.

Another detail that I forgot to mention is that fieldNorms are encoded
into one-byte floats, so you can experience severe rounding errors.
The possible values are:

0       0.0
1       5.820766E-10
2       6.9849193E-10
3       8.1490725E-10
4       9.313226E-10
5       1.1641532E-9
6       1.3969839E-9
7       1.6298145E-9
8       1.8626451E-9
9       2.3283064E-9
10      2.7939677E-9
11      3.259629E-9
12      3.7252903E-9
13      4.656613E-9
14      5.5879354E-9
15      6.519258E-9
16      7.4505806E-9
17      9.313226E-9
18      1.1175871E-8
19      1.3038516E-8
20      1.4901161E-8
21      1.8626451E-8
22      2.2351742E-8
23      2.6077032E-8
24      2.9802322E-8
25      3.7252903E-8
26      4.4703484E-8
27      5.2154064E-8
28      5.9604645E-8
29      7.4505806E-8
30      8.940697E-8
31      1.0430813E-7
32      1.1920929E-7
33      1.4901161E-7
34      1.7881393E-7
35      2.0861626E-7
36      2.3841858E-7
37      2.9802322E-7
38      3.5762787E-7
39      4.172325E-7
40      4.7683716E-7
41      5.9604645E-7
42      7.1525574E-7
43      8.34465E-7
44      9.536743E-7
45      1.1920929E-6
46      1.4305115E-6
47      1.66893E-6
48      1.9073486E-6
49      2.3841858E-6
50      2.861023E-6
51      3.33786E-6
52      3.8146973E-6
53      4.7683716E-6
54      5.722046E-6
55      6.67572E-6
56      7.6293945E-6
57      9.536743E-6
58      1.1444092E-5
59      1.335144E-5
60      1.5258789E-5
61      1.9073486E-5
62      2.2888184E-5
63      2.670288E-5
64      3.0517578E-5
65      3.8146973E-5
66      4.5776367E-5
67      5.340576E-5
68      6.1035156E-5
69      7.6293945E-5
70      9.1552734E-5
71      1.0681152E-4
72      1.2207031E-4
73      1.5258789E-4
74      1.8310547E-4
75      2.1362305E-4
76      2.4414062E-4
77      3.0517578E-4
78      3.6621094E-4
79      4.272461E-4
80      4.8828125E-4
81      6.1035156E-4
82      7.324219E-4
83      8.544922E-4
84      9.765625E-4
85      0.0012207031
86      0.0014648438
87      0.0017089844
88      0.001953125
89      0.0024414062
90      0.0029296875
91      0.0034179688
92      0.00390625
93      0.0048828125
94      0.005859375
95      0.0068359375
96      0.0078125
97      0.009765625
98      0.01171875
99      0.013671875
100     0.015625
101     0.01953125
102     0.0234375
103     0.02734375
104     0.03125
105     0.0390625
106     0.046875
107     0.0546875
108     0.0625
109     0.078125
110     0.09375
111     0.109375
112     0.125
113     0.15625
114     0.1875
115     0.21875
116     0.25
117     0.3125
118     0.375
119     0.4375
120     0.5
121     0.625
122     0.75
123     0.875
124     1.0
125     1.25
126     1.5
127     1.75
128     2.0
129     2.5
130     3.0
131     3.5
132     4.0
133     5.0
134     6.0
135     7.0
136     8.0
137     10.0
138     12.0
139     14.0
140     16.0
141     20.0
142     24.0
143     28.0
144     32.0
145     40.0
146     48.0
147     56.0
148     64.0
149     80.0
150     96.0
151     112.0
152     128.0
153     160.0
154     192.0
155     224.0
156     256.0
157     320.0
158     384.0
159     448.0
160     512.0
161     640.0
162     768.0
163     896.0
164     1024.0
165     1280.0
166     1536.0
167     1792.0
168     2048.0
169     2560.0
170     3072.0
171     3584.0
172     4096.0
173     5120.0
174     6144.0
175     7168.0
176     8192.0
177     10240.0
178     12288.0
179     14336.0
180     16384.0
181     20480.0
182     24576.0
183     28672.0
184     32768.0
185     40960.0
186     49152.0
187     57344.0
188     65536.0
189     81920.0
190     98304.0
191     114688.0
192     131072.0
193     163840.0
194     196608.0
195     229376.0
196     262144.0
197     327680.0
198     393216.0
199     458752.0
200     524288.0
201     655360.0
202     786432.0
203     917504.0
204     1048576.0
205     1310720.0
206     1572864.0
207     1835008.0
208     2097152.0
209     2621440.0
210     3145728.0
211     3670016.0
212     4194304.0
213     5242880.0
214     6291456.0
215     7340032.0
216     8388608.0
217     1.048576E7
218     1.2582912E7
219     1.4680064E7
220     1.6777216E7
221     2.097152E7
222     2.5165824E7
223     2.9360128E7
224     3.3554432E7
225     4.194304E7
226     5.0331648E7
227     5.8720256E7
228     6.7108864E7
229     8.388608E7
230     1.00663296E8
231     1.17440512E8
232     1.34217728E8
233     1.6777216E8
234     2.01326592E8
235     2.34881024E8
236     2.68435456E8
237     3.3554432E8
238     4.02653184E8
239     4.69762048E8
240     5.3687091E8
241     6.7108864E8
242     8.0530637E8
243     9.395241E8
244     1.07374182E9
245     1.34217728E9
246     1.61061274E9
247     1.87904819E9
248     2.14748365E9
249     2.68435456E9
250     3.22122547E9
251     3.75809638E9
252     4.2949673E9
253     5.3687091E9
254     6.4424509E9
255     7.5161928E9
Reply | Threaded
Open this post in threaded view
|

Re: Document boost not as expected...

escher2k
Thanks for the reply Mike. I think that was what was causing the issue. I discovered the effect after I
bumped up the numbers a bit. Here's what I see now.

            Index time boost My Custom Similarity Default Similarity
Doc         1133226.63            131072                          121359
Doc 2      123194.06             114688                          106189


The difference between the results is because I am ignoring the length Norm (changed it
from ((float)(1.0 / Math.sqrt(numTerms) to 1.0f). Thanks once again.

Mike Klaas wrote
On 3/28/07, escher2k <escher2k@yahoo.com> wrote:
>
> Mike,
>    I am not doing anything custom for this test. I am assuming that the
> Default Similarity is used.
> Surprisingly, if I remove the document level boost (set to 1.0) and just
> have a field level boost, the result
> seems to be correct.

Another detail that I forgot to mention is that fieldNorms are encoded
into one-byte floats, so you can experience severe rounding errors.
The possible values are:

0       0.0
1       5.820766E-10
2       6.9849193E-10
3       8.1490725E-10
4       9.313226E-10
5       1.1641532E-9
6       1.3969839E-9
7       1.6298145E-9
8       1.8626451E-9
9       2.3283064E-9
10      2.7939677E-9
11      3.259629E-9
12      3.7252903E-9
13      4.656613E-9
14      5.5879354E-9
15      6.519258E-9
16      7.4505806E-9
17      9.313226E-9
18      1.1175871E-8
19      1.3038516E-8
20      1.4901161E-8
21      1.8626451E-8
22      2.2351742E-8
23      2.6077032E-8
24      2.9802322E-8
25      3.7252903E-8
26      4.4703484E-8
27      5.2154064E-8
28      5.9604645E-8
29      7.4505806E-8
30      8.940697E-8
31      1.0430813E-7
32      1.1920929E-7
33      1.4901161E-7
34      1.7881393E-7
35      2.0861626E-7
36      2.3841858E-7
37      2.9802322E-7
38      3.5762787E-7
39      4.172325E-7
40      4.7683716E-7
41      5.9604645E-7
42      7.1525574E-7
43      8.34465E-7
44      9.536743E-7
45      1.1920929E-6
46      1.4305115E-6
47      1.66893E-6
48      1.9073486E-6
49      2.3841858E-6
50      2.861023E-6
51      3.33786E-6
52      3.8146973E-6
53      4.7683716E-6
54      5.722046E-6
55      6.67572E-6
56      7.6293945E-6
57      9.536743E-6
58      1.1444092E-5
59      1.335144E-5
60      1.5258789E-5
61      1.9073486E-5
62      2.2888184E-5
63      2.670288E-5
64      3.0517578E-5
65      3.8146973E-5
66      4.5776367E-5
67      5.340576E-5
68      6.1035156E-5
69      7.6293945E-5
70      9.1552734E-5
71      1.0681152E-4
72      1.2207031E-4
73      1.5258789E-4
74      1.8310547E-4
75      2.1362305E-4
76      2.4414062E-4
77      3.0517578E-4
78      3.6621094E-4
79      4.272461E-4
80      4.8828125E-4
81      6.1035156E-4
82      7.324219E-4
83      8.544922E-4
84      9.765625E-4
85      0.0012207031
86      0.0014648438
87      0.0017089844
88      0.001953125
89      0.0024414062
90      0.0029296875
91      0.0034179688
92      0.00390625
93      0.0048828125
94      0.005859375
95      0.0068359375
96      0.0078125
97      0.009765625
98      0.01171875
99      0.013671875
100     0.015625
101     0.01953125
102     0.0234375
103     0.02734375
104     0.03125
105     0.0390625
106     0.046875
107     0.0546875
108     0.0625
109     0.078125
110     0.09375
111     0.109375
112     0.125
113     0.15625
114     0.1875
115     0.21875
116     0.25
117     0.3125
118     0.375
119     0.4375
120     0.5
121     0.625
122     0.75
123     0.875
124     1.0
125     1.25
126     1.5
127     1.75
128     2.0
129     2.5
130     3.0
131     3.5
132     4.0
133     5.0
134     6.0
135     7.0
136     8.0
137     10.0
138     12.0
139     14.0
140     16.0
141     20.0
142     24.0
143     28.0
144     32.0
145     40.0
146     48.0
147     56.0
148     64.0
149     80.0
150     96.0
151     112.0
152     128.0
153     160.0
154     192.0
155     224.0
156     256.0
157     320.0
158     384.0
159     448.0
160     512.0
161     640.0
162     768.0
163     896.0
164     1024.0
165     1280.0
166     1536.0
167     1792.0
168     2048.0
169     2560.0
170     3072.0
171     3584.0
172     4096.0
173     5120.0
174     6144.0
175     7168.0
176     8192.0
177     10240.0
178     12288.0
179     14336.0
180     16384.0
181     20480.0
182     24576.0
183     28672.0
184     32768.0
185     40960.0
186     49152.0
187     57344.0
188     65536.0
189     81920.0
190     98304.0
191     114688.0
192     131072.0
193     163840.0
194     196608.0
195     229376.0
196     262144.0
197     327680.0
198     393216.0
199     458752.0
200     524288.0
201     655360.0
202     786432.0
203     917504.0
204     1048576.0
205     1310720.0
206     1572864.0
207     1835008.0
208     2097152.0
209     2621440.0
210     3145728.0
211     3670016.0
212     4194304.0
213     5242880.0
214     6291456.0
215     7340032.0
216     8388608.0
217     1.048576E7
218     1.2582912E7
219     1.4680064E7
220     1.6777216E7
221     2.097152E7
222     2.5165824E7
223     2.9360128E7
224     3.3554432E7
225     4.194304E7
226     5.0331648E7
227     5.8720256E7
228     6.7108864E7
229     8.388608E7
230     1.00663296E8
231     1.17440512E8
232     1.34217728E8
233     1.6777216E8
234     2.01326592E8
235     2.34881024E8
236     2.68435456E8
237     3.3554432E8
238     4.02653184E8
239     4.69762048E8
240     5.3687091E8
241     6.7108864E8
242     8.0530637E8
243     9.395241E8
244     1.07374182E9
245     1.34217728E9
246     1.61061274E9
247     1.87904819E9
248     2.14748365E9
249     2.68435456E9
250     3.22122547E9
251     3.75809638E9
252     4.2949673E9
253     5.3687091E9
254     6.4424509E9
255     7.5161928E9