Copying dynamic fields into default text field messing up fieldNorm?

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

Copying dynamic fields into default text field messing up fieldNorm?

Yu-shan Fung
Hi All,

I'm trying to create an index of documents, where for each document, I am
trying to associate with it a set of related keywords, each with individual
boost values that I compute externally.

eg:
Document Title: Democrats
  related keywords:
    liberal: 4.0
    politics: 1.5
    obama: 2.0
    etc. (hundreds of related keywords)

Since boosts in solr is per field instead of per field-instance, I am trying
to get around this by creating dynamic fields for each related keyword, and
setting boost values accordingly. To be able to surface this document by
searching the related keywords, I have the schema setup to copy these
related keyword fields into the default text field.

But when I query any of these related keywords, I get back fieldNorms with
the max value:

  1.5409492E10 = (MATCH) weight(text:liberal in 11), product of:
    0.8608541 = queryWeight(text:liberal), product of:
      1.6840147 = idf(docFreq=109, maxDocs=218)
      0.51119155 = queryNorm
    1.79002368E10 = (MATCH) fieldWeight(text:liberal in 11), product of:
      1.4142135 = tf(termFreq(text:liberal)=2)
      1.6840147 = idf(docFreq=109, maxDocs=218)

According to this email exchange between Koji and Mat Brown,

http://www.mail-archive.com/solr-user@.../msg23759.html

The boost value from copyField's shouldn't be accumulated into the boost for
the text field, can anyone else verify this? This seem to go against what
I'm observing. When I turn off copyField, the fieldNorm goes back to normal
(in the single digit range).

Any idea what could be causing this? I'm running Solr 1.4 in case that
matters.

Any pointers/advice would be greatly appreciated! Thanks,
Yu-Shan
Reply | Threaded
Open this post in threaded view
|

Re: Copying dynamic fields into default text field messing up fieldNorm?

Jan Høydahl / Cominvent
This sounds like an ideal use case for payloads. You could attach a boost value to each term in your "keywords" field.
See http://www.lucidimagination.com/blog/2009/08/05/getting-started-with-payloads/

Another common workaround is to create, say, 8 multi-valued fields with boosts 0.5, 1.0, 1.5, 2.0, 4.0, 8.0, 16.0, 32.0, 64.0 and index your keywords into the kw-field which has the nearest boost to what you want. For your example, that could be:
kw05=, kw10=, kw15=politics;politicians, kw20=obama;barack, kw40=liberal, kw80= .....

--
Jan Høydahl  - search architect
Cominvent AS - www.cominvent.com

On 10. feb. 2010, at 03.07, Yu-Shan Fung wrote:

> Hi All,
>
> I'm trying to create an index of documents, where for each document, I am
> trying to associate with it a set of related keywords, each with individual
> boost values that I compute externally.
>
> eg:
> Document Title: Democrats
>  related keywords:
>    liberal: 4.0
>    politics: 1.5
>    obama: 2.0
>    etc. (hundreds of related keywords)
>
> Since boosts in solr is per field instead of per field-instance, I am trying
> to get around this by creating dynamic fields for each related keyword, and
> setting boost values accordingly. To be able to surface this document by
> searching the related keywords, I have the schema setup to copy these
> related keyword fields into the default text field.
>
> But when I query any of these related keywords, I get back fieldNorms with
> the max value:
>
>  1.5409492E10 = (MATCH) weight(text:liberal in 11), product of:
>    0.8608541 = queryWeight(text:liberal), product of:
>      1.6840147 = idf(docFreq=109, maxDocs=218)
>      0.51119155 = queryNorm
>    1.79002368E10 = (MATCH) fieldWeight(text:liberal in 11), product of:
>      1.4142135 = tf(termFreq(text:liberal)=2)
>      1.6840147 = idf(docFreq=109, maxDocs=218)
>
> According to this email exchange between Koji and Mat Brown,
>
> http://www.mail-archive.com/solr-user@.../msg23759.html
>
> The boost value from copyField's shouldn't be accumulated into the boost for
> the text field, can anyone else verify this? This seem to go against what
> I'm observing. When I turn off copyField, the fieldNorm goes back to normal
> (in the single digit range).
>
> Any idea what could be causing this? I'm running Solr 1.4 in case that
> matters.
>
> Any pointers/advice would be greatly appreciated! Thanks,
> Yu-Shan

Reply | Threaded
Open this post in threaded view
|

Re: Copying dynamic fields into default text field messing up fieldNorm?

hossman
In reply to this post by Yu-shan Fung

: According to this email exchange between Koji and Mat Brown,
:
: http://www.mail-archive.com/solr-user@.../msg23759.html
:
: The boost value from copyField's shouldn't be accumulated into the boost for
: the text field, can anyone else verify this? This seem to go against what

I belive Koji was mistaken. looking at DocumentBuilder.toDocument, the
boosts have been propogated to copyField destinations since that method
was added in 2007 (initially it didn't deal with copyfields at all,
but once that was fixed it copied the boosts as well.)


-Hoss

Reply | Threaded
Open this post in threaded view
|

Re: Copying dynamic fields into default text field messing up fieldNorm?

Koji Sekiguchi
Chris Hostetter wrote:

> : According to this email exchange between Koji and Mat Brown,
> :
> : http://www.mail-archive.com/solr-user@.../msg23759.html
> :
> : The boost value from copyField's shouldn't be accumulated into the boost for
> : the text field, can anyone else verify this? This seem to go against what
>
> I belive Koji was mistaken. looking at DocumentBuilder.toDocument, the
> boosts have been propogated to copyField destinations since that method
> was added in 2007 (initially it didn't deal with copyfields at all,
> but once that was fixed it copied the boosts as well.)
>
>  
Hmm, I didn't know it. Thanks for correcting me.
But is it (propagating boost) good idea? What is use case for?

Koji

--
http://www.rondhuit.com/en/

Reply | Threaded
Open this post in threaded view
|

Re: Copying dynamic fields into default text field messing up fieldNorm?

hossman

: > I belive Koji was mistaken. looking at DocumentBuilder.toDocument, the
: > boosts have been propogated to copyField destinations since that method was
: > added in 2007 (initially it didn't deal with copyfields at all, but once
: > that was fixed it copied the boosts as well.)
        ...
: Hmm, I didn't know it. Thanks for correcting me.
: But is it (propagating boost) good idea? What is use case for?

No clue, to either question ... i have no opinion on wether or not it
makes sense, i'm just telling you what i see in the code.


-Hoss

Reply | Threaded
Open this post in threaded view
|

Re: Copying dynamic fields into default text field messing up fieldNorm?

Yu-shan Fung
I'll take a stab. IMHO, it doesn't make much sense to propagae the boost,
and here's why:

For the typical use case, copyField is used to add other "searchable" fields
into the default "text" field for Standard queries. Say we are copying the
ModelNumber field into the text field, and we have a boost of 5.0 for the
ModelNumber field. Now, that means any document with a ModelNumber value
would have the extra boost of 5.0 multiplied into the boost of the "text"
field, for ALL terms in "text"; whereas documents with no ModelNumber would
get no such benefit, completely skewing the results

This would only make sense if boosts are per field instance and not per
field, but we know that's not the case.

Am I making sense?
Yu-Shan


On Tue, Feb 16, 2010 at 10:54 PM, Chris Hostetter
<[hidden email]>wrote:

>
> : > I belive Koji was mistaken. looking at DocumentBuilder.toDocument, the
> : > boosts have been propogated to copyField destinations since that method
> was
> : > added in 2007 (initially it didn't deal with copyfields at all, but
> once
> : > that was fixed it copied the boosts as well.)
>        ...
> : Hmm, I didn't know it. Thanks for correcting me.
> : But is it (propagating boost) good idea? What is use case for?
>
> No clue, to either question ... i have no opinion on wether or not it
> makes sense, i'm just telling you what i see in the code.
>
>
> -Hoss
>
>


--
“When nothing seems to help, I go look at a stonecutter hammering away at
his rock perhaps a hundred times without as much as a crack showing in it.
Yet at the hundred and first blow it will split in two, and I know it was
not that blow that did it, but all that had gone before.” — Jacob Riis