Deprecated setBoost method

classic Classic list List threaded Threaded
19 messages Options
Reply | Threaded
Open this post in threaded view
|

Deprecated setBoost method

baris.kazar
Hi,-

i saw this in the Field class docs and i am figuring out the following
note in the docs:

setBoost(float boost)
Deprecated.
Index-time boosts are deprecated, please index index-time scoring
factors into a doc value field and combine them with the score at query
time using eg. FunctionScoreQuery.

I appreciate this note. Is there an example about this? I wish docs
would give a simple example to further help.

https://lucene.apache.org/core/6_6_0//core/org/apache/lucene/document/Field.html

vs

https://lucene.apache.org/core/7_7_2/core/org/apache/lucene/document/Field.html

Best regards



---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Index-time boosting: Deprecated setBoost method

baris.kazar
It looks like index-time boosting (field) is not possible since Lucene
version 7.7.2 and

i was using before for another case the BoostQuery at search time for
boosting and

this seems to be the only boosting option now in Lucene.

Best regards


On 10/18/19 10:01 AM, [hidden email] wrote:

> Hi,-
>
> i saw this in the Field class docs and i am figuring out the following
> note in the docs:
>
> setBoost(float boost)
> Deprecated.
> Index-time boosts are deprecated, please index index-time scoring
> factors into a doc value field and combine them with the score at
> query time using eg. FunctionScoreQuery.
>
> I appreciate this note. Is there an example about this? I wish docs
> would give a simple example to further help.
>
> https://lucene.apache.org/core/6_6_0//core/org/apache/lucene/document/Field.html 
>
>
> vs
>
> https://lucene.apache.org/core/7_7_2/core/org/apache/lucene/document/Field.html 
>
>
> Best regards
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

RE: Index-time boosting: Deprecated setBoost method

Uwe Schindler
Hi,

that's not true. You can do index time boosting, but you need to do that using a separate field. You just index a numeric docvalues field (which may contain a long or float value per document). Later you wrap your query with some FunctionScoreQuery (e.g., use the Javascript function query syntax in the expressions module). This allows you to compile a javascript function that calculated the final score based on the score returned by the inner query and combines them with docvalues that were indexed per document.

Uwe

-----
Uwe Schindler
Achterdiek 19, D-28357 Bremen
https://www.thetaphi.de
eMail: [hidden email]

> -----Original Message-----
> From: [hidden email] <[hidden email]>
> Sent: Friday, October 18, 2019 5:28 PM
> To: [hidden email]
> Cc: [hidden email]
> Subject: Re: Index-time boosting: Deprecated setBoost method
>
> It looks like index-time boosting (field) is not possible since Lucene
> version 7.7.2 and
>
> i was using before for another case the BoostQuery at search time for
> boosting and
>
> this seems to be the only boosting option now in Lucene.
>
> Best regards
>
>
> On 10/18/19 10:01 AM, [hidden email] wrote:
> > Hi,-
> >
> > i saw this in the Field class docs and i am figuring out the following
> > note in the docs:
> >
> > setBoost(float boost)
> > Deprecated.
> > Index-time boosts are deprecated, please index index-time scoring
> > factors into a doc value field and combine them with the score at
> > query time using eg. FunctionScoreQuery.
> >
> > I appreciate this note. Is there an example about this? I wish docs
> > would give a simple example to further help.
> >
> >
> https://lucene.apache.org/core/6_6_0//core/org/apache/lucene/document/
> Field.html
> >
> >
> > vs
> >
> >
> https://lucene.apache.org/core/7_7_2/core/org/apache/lucene/document/F
> ield.html
> >
> >
> > Best regards
> >
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Index-time boosting: Deprecated setBoost method

baris.kazar
Uwe,-

Thanks very much for the reply.

Is there a working example for this? Is this mentioned in the Lucene
Javadocs or any other docs so that i can look it?


this methodology seems sort of like discouraging using index time boosting.

Previous setBoost method call was fine and easy to use.

Did it have some performance issues and then is that why it was deprecated?


FunctionScoreQuery usage with MultiFieldQueryParser would also be nice where

MultiFieldQuery already has boosts field to do this in its constructor.

Maybe it is not needed with MultiFieldQueryParser.


Best regards


On 10/18/19 1:28 PM, Uwe Schindler wrote:

> Hi,
>
> that's not true. You can do index time boosting, but you need to do that using a separate field. You just index a numeric docvalues field (which may contain a long or float value per document). Later you wrap your query with some FunctionScoreQuery (e.g., use the Javascript function query syntax in the expressions module). This allows you to compile a javascript function that calculated the final score based on the score returned by the inner query and combines them with docvalues that were indexed per document.
>
> Uwe
>
> -----
> Uwe Schindler
> Achterdiek 19, D-28357 Bremen
> https://urldefense.proofpoint.com/v2/url?u=https-3A__www.thetaphi.de&d=DwIFaQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=nlG5z5NcNdIbQAiX-BKNeyLlULCbaezrgocEvPhQkl4&m=6rVk8db2H8dAcjS3WCWmAPd08C7JQCvZ8W80yE9L5xY&s=zgKmnmP9gLG4DlEnAfDdtBMEzPXtHNVYojxXIKEnQgs&e=
> eMail: [hidden email]
>
>> -----Original Message-----
>> From: [hidden email] <[hidden email]>
>> Sent: Friday, October 18, 2019 5:28 PM
>> To: [hidden email]
>> Cc: [hidden email]
>> Subject: Re: Index-time boosting: Deprecated setBoost method
>>
>> It looks like index-time boosting (field) is not possible since Lucene
>> version 7.7.2 and
>>
>> i was using before for another case the BoostQuery at search time for
>> boosting and
>>
>> this seems to be the only boosting option now in Lucene.
>>
>> Best regards
>>
>>
>> On 10/18/19 10:01 AM, [hidden email] wrote:
>>> Hi,-
>>>
>>> i saw this in the Field class docs and i am figuring out the following
>>> note in the docs:
>>>
>>> setBoost(float boost)
>>> Deprecated.
>>> Index-time boosts are deprecated, please index index-time scoring
>>> factors into a doc value field and combine them with the score at
>>> query time using eg. FunctionScoreQuery.
>>>
>>> I appreciate this note. Is there an example about this? I wish docs
>>> would give a simple example to further help.
>>>
>>>
>> https://urldefense.proofpoint.com/v2/url?u=https-3A__lucene.apache.org_core_6-5F6-5F0__core_org_apache_lucene_document_&d=DwIFaQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=nlG5z5NcNdIbQAiX-BKNeyLlULCbaezrgocEvPhQkl4&m=6rVk8db2H8dAcjS3WCWmAPd08C7JQCvZ8W80yE9L5xY&s=rIVbw3_TGEwpaet5ibCeYze6vSDUiPhwOzlV0z484fM&e=
>> Field.html
>>>
>>> vs
>>>
>>>
>> https://urldefense.proofpoint.com/v2/url?u=https-3A__lucene.apache.org_core_7-5F7-5F2_core_org_apache_lucene_document_F&d=DwIFaQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=nlG5z5NcNdIbQAiX-BKNeyLlULCbaezrgocEvPhQkl4&m=6rVk8db2H8dAcjS3WCWmAPd08C7JQCvZ8W80yE9L5xY&s=yt1toHHZQBqd3qKpWeSzywGJhy928Q5qaEO4v9Lj3vg&e=
>> ield.html
>>>
>>> Best regards
>>>
>>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [hidden email]
>> For additional commands, e-mail: [hidden email]
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

RE: Index-time boosting: Deprecated setBoost method

Uwe Schindler
Hi,

> Is there a working example for this? Is this mentioned in the Lucene
> Javadocs or any other docs so that i can look it?

To index the docvalues, see NumericDocValuesField (it can be added to documents like indexed or stored fields). You may have used them for sorting already.

> this methodology seems sort of like discouraging using index time boosting.

Not really. Many use this all the time. It's one of the killer features of both Solr and Elasticsearch. The problem was how the Document.setBoost()worked (it did not work correctly, see below).

> Previous setBoost method call was fine and easy to use.
> Did it have some performance issues and then is that why it was deprecated?

No the reason for deprecating this was for several reasons: setBoost was not doing what the user had expected. Internally the boost value was just multiplied into the document norm factor (which is internally also a docvalues field). The norm factors are only very inprecise floats stored in a byte, so precision is not well. If you put some values into it and the length norm was already consuming all bits, the boosting was very coarse. It was also only multiplied into and most users want to do some stuff like record click counts in the index and then boost for example with the logarithm or some other function. If the boost is just multiplied into the length norm you have no flexibility at all.

In addition you can have several docvalues fields and use their values in a function (e.g. one field with click count and another one with product price). After that you can combine click count and price (which can be modified indipenently during index updates) and change boost to boost lower price and higher click count up.

This is what you can do with the expressions module. You just give it a function.

Here is an example, the second example is using a FunctionScoreQuery that modifies the score based on the function and the given docvalues:
https://lucene.apache.org/core/7_7_2/expressions/org/apache/lucene/expressions/Expression.html

> FunctionScoreQuery usage with MultiFieldQueryParser would also be nice
> where
>
> MultiFieldQuery already has boosts field to do this in its constructor.

The boots in the query parser are applied for fields during query time (to have a different weight per field). Index time boosting is per document. So you can combine both.

> Maybe it is not needed with MultiFieldQueryParser.

You use MultiFieldQueryParser to adjust weights of the fields (e.g. title versus body). The parsed query is then wrapped with an expression that modifies the score per document according to the docvalues.

Uwe

> On 10/18/19 1:28 PM, Uwe Schindler wrote:
>
> > Hi,
> >
> > that's not true. You can do index time boosting, but you need to do that
> using a separate field. You just index a numeric docvalues field (which may
> contain a long or float value per document). Later you wrap your query with
> some FunctionScoreQuery (e.g., use the Javascript function query syntax in
> the expressions module). This allows you to compile a javascript function
> that calculated the final score based on the score returned by the inner query
> and combines them with docvalues that were indexed per document.
> >
> > Uwe
> >
> > -----
> > Uwe Schindler
> > Achterdiek 19, D-28357 Bremen
> > https://urldefense.proofpoint.com/v2/url?u=https-
> 3A__www.thetaphi.de&d=DwIFaQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIr
> MUB65eapI_JnE&r=nlG5z5NcNdIbQAiX-
> BKNeyLlULCbaezrgocEvPhQkl4&m=6rVk8db2H8dAcjS3WCWmAPd08C7JQCvZ
> 8W80yE9L5xY&s=zgKmnmP9gLG4DlEnAfDdtBMEzPXtHNVYojxXIKEnQgs&e=
> > eMail: [hidden email]
> >
> >> -----Original Message-----
> >> From: [hidden email] <[hidden email]>
> >> Sent: Friday, October 18, 2019 5:28 PM
> >> To: [hidden email]
> >> Cc: [hidden email]
> >> Subject: Re: Index-time boosting: Deprecated setBoost method
> >>
> >> It looks like index-time boosting (field) is not possible since Lucene
> >> version 7.7.2 and
> >>
> >> i was using before for another case the BoostQuery at search time for
> >> boosting and
> >>
> >> this seems to be the only boosting option now in Lucene.
> >>
> >> Best regards
> >>
> >>
> >> On 10/18/19 10:01 AM, [hidden email] wrote:
> >>> Hi,-
> >>>
> >>> i saw this in the Field class docs and i am figuring out the following
> >>> note in the docs:
> >>>
> >>> setBoost(float boost)
> >>> Deprecated.
> >>> Index-time boosts are deprecated, please index index-time scoring
> >>> factors into a doc value field and combine them with the score at
> >>> query time using eg. FunctionScoreQuery.
> >>>
> >>> I appreciate this note. Is there an example about this? I wish docs
> >>> would give a simple example to further help.
> >>>
> >>>
> >> https://urldefense.proofpoint.com/v2/url?u=https-
> 3A__lucene.apache.org_core_6-5F6-
> 5F0__core_org_apache_lucene_document_&d=DwIFaQ&c=RoP1YumCXCga
> WHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=nlG5z5NcNdIbQAiX-
> BKNeyLlULCbaezrgocEvPhQkl4&m=6rVk8db2H8dAcjS3WCWmAPd08C7JQCvZ
> 8W80yE9L5xY&s=rIVbw3_TGEwpaet5ibCeYze6vSDUiPhwOzlV0z484fM&e=
> >> Field.html
> >>>
> >>> vs
> >>>
> >>>
> >> https://urldefense.proofpoint.com/v2/url?u=https-
> 3A__lucene.apache.org_core_7-5F7-
> 5F2_core_org_apache_lucene_document_F&d=DwIFaQ&c=RoP1YumCXCgaW
> HvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=nlG5z5NcNdIbQAiX-
> BKNeyLlULCbaezrgocEvPhQkl4&m=6rVk8db2H8dAcjS3WCWmAPd08C7JQCvZ
> 8W80yE9L5xY&s=yt1toHHZQBqd3qKpWeSzywGJhy928Q5qaEO4v9Lj3vg&e=
> >> ield.html
> >>>
> >>> Best regards
> >>>
> >>>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: [hidden email]
> >> For additional commands, e-mail: [hidden email]
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: [hidden email]
> > For additional commands, e-mail: [hidden email]
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Index-time boosting: Deprecated setBoost method

baris.kazar
Uwe,-

  Thanks, if possible i am looking for a pure Java methodology to do the
index time boosting.

This example looks like a search time boosting example:

https://urldefense.proofpoint.com/v2/url?u=https-3A__lucene.apache.org_core_7-5F7-5F2_expressions_org_apache_lucene_expressions_Expression.html&d=DwIFaQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=nlG5z5NcNdIbQAiX-BKNeyLlULCbaezrgocEvPhQkl4&m=6m6i5zZXPZNP6DyVv_xG4vXnVTPEdfKLeLSvGjEXbyw&s=B5_kGwRIbAoGqL0-SVR9r3t78E5XUuzLT37TeyV-bv8&e= 


Best regards

On 10/18/19 2:31 PM, Uwe Schindler wrote:

> Hi,
>
>> Is there a working example for this? Is this mentioned in the Lucene
>> Javadocs or any other docs so that i can look it?
> To index the docvalues, see NumericDocValuesField (it can be added to documents like indexed or stored fields). You may have used them for sorting already.
>
>> this methodology seems sort of like discouraging using index time boosting.
> Not really. Many use this all the time. It's one of the killer features of both Solr and Elasticsearch. The problem was how the Document.setBoost()worked (it did not work correctly, see below).
>
>> Previous setBoost method call was fine and easy to use.
>> Did it have some performance issues and then is that why it was deprecated?
> No the reason for deprecating this was for several reasons: setBoost was not doing what the user had expected. Internally the boost value was just multiplied into the document norm factor (which is internally also a docvalues field). The norm factors are only very inprecise floats stored in a byte, so precision is not well. If you put some values into it and the length norm was already consuming all bits, the boosting was very coarse. It was also only multiplied into and most users want to do some stuff like record click counts in the index and then boost for example with the logarithm or some other function. If the boost is just multiplied into the length norm you have no flexibility at all.
>
> In addition you can have several docvalues fields and use their values in a function (e.g. one field with click count and another one with product price). After that you can combine click count and price (which can be modified indipenently during index updates) and change boost to boost lower price and higher click count up.
>
> This is what you can do with the expressions module. You just give it a function.
>
> Here is an example, the second example is using a FunctionScoreQuery that modifies the score based on the function and the given docvalues:
> https://urldefense.proofpoint.com/v2/url?u=https-3A__lucene.apache.org_core_7-5F7-5F2_expressions_org_apache_lucene_expressions_Expression.html&d=DwIFaQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=nlG5z5NcNdIbQAiX-BKNeyLlULCbaezrgocEvPhQkl4&m=6m6i5zZXPZNP6DyVv_xG4vXnVTPEdfKLeLSvGjEXbyw&s=B5_kGwRIbAoGqL0-SVR9r3t78E5XUuzLT37TeyV-bv8&e=
>
>> FunctionScoreQuery usage with MultiFieldQueryParser would also be nice
>> where
>>
>> MultiFieldQuery already has boosts field to do this in its constructor.
> The boots in the query parser are applied for fields during query time (to have a different weight per field). Index time boosting is per document. So you can combine both.
>
>> Maybe it is not needed with MultiFieldQueryParser.
> You use MultiFieldQueryParser to adjust weights of the fields (e.g. title versus body). The parsed query is then wrapped with an expression that modifies the score per document according to the docvalues.
>
> Uwe
>
>> On 10/18/19 1:28 PM, Uwe Schindler wrote:
>>
>>> Hi,
>>>
>>> that's not true. You can do index time boosting, but you need to do that
>> using a separate field. You just index a numeric docvalues field (which may
>> contain a long or float value per document). Later you wrap your query with
>> some FunctionScoreQuery (e.g., use the Javascript function query syntax in
>> the expressions module). This allows you to compile a javascript function
>> that calculated the final score based on the score returned by the inner query
>> and combines them with docvalues that were indexed per document.
>>> Uwe
>>>
>>> -----
>>> Uwe Schindler
>>> Achterdiek 19, D-28357 Bremen
>>> https://urldefense.proofpoint.com/v2/url?u=https-
>> 3A__www.thetaphi.de&d=DwIFaQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIr
>> MUB65eapI_JnE&r=nlG5z5NcNdIbQAiX-
>> BKNeyLlULCbaezrgocEvPhQkl4&m=6rVk8db2H8dAcjS3WCWmAPd08C7JQCvZ
>> 8W80yE9L5xY&s=zgKmnmP9gLG4DlEnAfDdtBMEzPXtHNVYojxXIKEnQgs&e=
>>> eMail: [hidden email]
>>>
>>>> -----Original Message-----
>>>> From: [hidden email] <[hidden email]>
>>>> Sent: Friday, October 18, 2019 5:28 PM
>>>> To: [hidden email]
>>>> Cc: [hidden email]
>>>> Subject: Re: Index-time boosting: Deprecated setBoost method
>>>>
>>>> It looks like index-time boosting (field) is not possible since Lucene
>>>> version 7.7.2 and
>>>>
>>>> i was using before for another case the BoostQuery at search time for
>>>> boosting and
>>>>
>>>> this seems to be the only boosting option now in Lucene.
>>>>
>>>> Best regards
>>>>
>>>>
>>>> On 10/18/19 10:01 AM, [hidden email] wrote:
>>>>> Hi,-
>>>>>
>>>>> i saw this in the Field class docs and i am figuring out the following
>>>>> note in the docs:
>>>>>
>>>>> setBoost(float boost)
>>>>> Deprecated.
>>>>> Index-time boosts are deprecated, please index index-time scoring
>>>>> factors into a doc value field and combine them with the score at
>>>>> query time using eg. FunctionScoreQuery.
>>>>>
>>>>> I appreciate this note. Is there an example about this? I wish docs
>>>>> would give a simple example to further help.
>>>>>
>>>>>
>>>> https://urldefense.proofpoint.com/v2/url?u=https-
>> 3A__lucene.apache.org_core_6-5F6-
>> 5F0__core_org_apache_lucene_document_&d=DwIFaQ&c=RoP1YumCXCga
>> WHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=nlG5z5NcNdIbQAiX-
>> BKNeyLlULCbaezrgocEvPhQkl4&m=6rVk8db2H8dAcjS3WCWmAPd08C7JQCvZ
>> 8W80yE9L5xY&s=rIVbw3_TGEwpaet5ibCeYze6vSDUiPhwOzlV0z484fM&e=
>>>> Field.html
>>>>> vs
>>>>>
>>>>>
>>>> https://urldefense.proofpoint.com/v2/url?u=https-
>> 3A__lucene.apache.org_core_7-5F7-
>> 5F2_core_org_apache_lucene_document_F&d=DwIFaQ&c=RoP1YumCXCgaW
>> HvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=nlG5z5NcNdIbQAiX-
>> BKNeyLlULCbaezrgocEvPhQkl4&m=6rVk8db2H8dAcjS3WCWmAPd08C7JQCvZ
>> 8W80yE9L5xY&s=yt1toHHZQBqd3qKpWeSzywGJhy928Q5qaEO4v9Lj3vg&e=
>>>> ield.html
>>>>> Best regards
>>>>>
>>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: [hidden email]
>>>> For additional commands, e-mail: [hidden email]
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: [hidden email]
>>> For additional commands, e-mail: [hidden email]
>>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [hidden email]
>> For additional commands, e-mail: [hidden email]
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Index-time boosting: Deprecated setBoost method

Uwe Schindler
Sorry I was imprecise. It's a mix of both. The factors are stored per document in index (this is why I called it index time). During query time the expression use the index time values to fold them into the query boost at query time.

What's your problem with that approach?

Uwe

Am October 18, 2019 6:50:40 PM UTC schrieb [hidden email]:

>Uwe,-
>
> Thanks, if possible i am looking for a pure Java methodology to do the
>
>index time boosting.
>
>This example looks like a search time boosting example:
>
>https://urldefense.proofpoint.com/v2/url?u=https-3A__lucene.apache.org_core_7-5F7-5F2_expressions_org_apache_lucene_expressions_Expression.html&d=DwIFaQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=nlG5z5NcNdIbQAiX-BKNeyLlULCbaezrgocEvPhQkl4&m=6m6i5zZXPZNP6DyVv_xG4vXnVTPEdfKLeLSvGjEXbyw&s=B5_kGwRIbAoGqL0-SVR9r3t78E5XUuzLT37TeyV-bv8&e=
>
>
>
>Best regards
>
>On 10/18/19 2:31 PM, Uwe Schindler wrote:
>> Hi,
>>
>>> Is there a working example for this? Is this mentioned in the Lucene
>>> Javadocs or any other docs so that i can look it?
>> To index the docvalues, see NumericDocValuesField (it can be added to
>documents like indexed or stored fields). You may have used them for
>sorting already.
>>
>>> this methodology seems sort of like discouraging using index time
>boosting.
>> Not really. Many use this all the time. It's one of the killer
>features of both Solr and Elasticsearch. The problem was how the
>Document.setBoost()worked (it did not work correctly, see below).
>>
>>> Previous setBoost method call was fine and easy to use.
>>> Did it have some performance issues and then is that why it was
>deprecated?
>> No the reason for deprecating this was for several reasons: setBoost
>was not doing what the user had expected. Internally the boost value
>was just multiplied into the document norm factor (which is internally
>also a docvalues field). The norm factors are only very inprecise
>floats stored in a byte, so precision is not well. If you put some
>values into it and the length norm was already consuming all bits, the
>boosting was very coarse. It was also only multiplied into and most
>users want to do some stuff like record click counts in the index and
>then boost for example with the logarithm or some other function. If
>the boost is just multiplied into the length norm you have no
>flexibility at all.
>>
>> In addition you can have several docvalues fields and use their
>values in a function (e.g. one field with click count and another one
>with product price). After that you can combine click count and price
>(which can be modified indipenently during index updates) and change
>boost to boost lower price and higher click count up.
>>
>> This is what you can do with the expressions module. You just give it
>a function.
>>
>> Here is an example, the second example is using a FunctionScoreQuery
>that modifies the score based on the function and the given docvalues:
>>
>https://urldefense.proofpoint.com/v2/url?u=https-3A__lucene.apache.org_core_7-5F7-5F2_expressions_org_apache_lucene_expressions_Expression.html&d=DwIFaQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=nlG5z5NcNdIbQAiX-BKNeyLlULCbaezrgocEvPhQkl4&m=6m6i5zZXPZNP6DyVv_xG4vXnVTPEdfKLeLSvGjEXbyw&s=B5_kGwRIbAoGqL0-SVR9r3t78E5XUuzLT37TeyV-bv8&e=
>>
>>> FunctionScoreQuery usage with MultiFieldQueryParser would also be
>nice
>>> where
>>>
>>> MultiFieldQuery already has boosts field to do this in its
>constructor.
>> The boots in the query parser are applied for fields during query
>time (to have a different weight per field). Index time boosting is per
>document. So you can combine both.
>>
>>> Maybe it is not needed with MultiFieldQueryParser.
>> You use MultiFieldQueryParser to adjust weights of the fields (e.g.
>title versus body). The parsed query is then wrapped with an expression
>that modifies the score per document according to the docvalues.
>>
>> Uwe
>>
>>> On 10/18/19 1:28 PM, Uwe Schindler wrote:
>>>
>>>> Hi,
>>>>
>>>> that's not true. You can do index time boosting, but you need to do
>that
>>> using a separate field. You just index a numeric docvalues field
>(which may
>>> contain a long or float value per document). Later you wrap your
>query with
>>> some FunctionScoreQuery (e.g., use the Javascript function query
>syntax in
>>> the expressions module). This allows you to compile a javascript
>function
>>> that calculated the final score based on the score returned by the
>inner query
>>> and combines them with docvalues that were indexed per document.
>>>> Uwe
>>>>
>>>> -----
>>>> Uwe Schindler
>>>> Achterdiek 19, D-28357 Bremen
>>>> https://urldefense.proofpoint.com/v2/url?u=https-
>>> 3A__www.thetaphi.de&d=DwIFaQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIr
>>> MUB65eapI_JnE&r=nlG5z5NcNdIbQAiX-
>>> BKNeyLlULCbaezrgocEvPhQkl4&m=6rVk8db2H8dAcjS3WCWmAPd08C7JQCvZ
>>> 8W80yE9L5xY&s=zgKmnmP9gLG4DlEnAfDdtBMEzPXtHNVYojxXIKEnQgs&e=
>>>> eMail: [hidden email]
>>>>
>>>>> -----Original Message-----
>>>>> From: [hidden email] <[hidden email]>
>>>>> Sent: Friday, October 18, 2019 5:28 PM
>>>>> To: [hidden email]
>>>>> Cc: [hidden email]
>>>>> Subject: Re: Index-time boosting: Deprecated setBoost method
>>>>>
>>>>> It looks like index-time boosting (field) is not possible since
>Lucene
>>>>> version 7.7.2 and
>>>>>
>>>>> i was using before for another case the BoostQuery at search time
>for
>>>>> boosting and
>>>>>
>>>>> this seems to be the only boosting option now in Lucene.
>>>>>
>>>>> Best regards
>>>>>
>>>>>
>>>>> On 10/18/19 10:01 AM, [hidden email] wrote:
>>>>>> Hi,-
>>>>>>
>>>>>> i saw this in the Field class docs and i am figuring out the
>following
>>>>>> note in the docs:
>>>>>>
>>>>>> setBoost(float boost)
>>>>>> Deprecated.
>>>>>> Index-time boosts are deprecated, please index index-time scoring
>>>>>> factors into a doc value field and combine them with the score at
>>>>>> query time using eg. FunctionScoreQuery.
>>>>>>
>>>>>> I appreciate this note. Is there an example about this? I wish
>docs
>>>>>> would give a simple example to further help.
>>>>>>
>>>>>>
>>>>> https://urldefense.proofpoint.com/v2/url?u=https-
>>> 3A__lucene.apache.org_core_6-5F6-
>>> 5F0__core_org_apache_lucene_document_&d=DwIFaQ&c=RoP1YumCXCga
>>> WHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=nlG5z5NcNdIbQAiX-
>>> BKNeyLlULCbaezrgocEvPhQkl4&m=6rVk8db2H8dAcjS3WCWmAPd08C7JQCvZ
>>> 8W80yE9L5xY&s=rIVbw3_TGEwpaet5ibCeYze6vSDUiPhwOzlV0z484fM&e=
>>>>> Field.html
>>>>>> vs
>>>>>>
>>>>>>
>>>>> https://urldefense.proofpoint.com/v2/url?u=https-
>>> 3A__lucene.apache.org_core_7-5F7-
>>> 5F2_core_org_apache_lucene_document_F&d=DwIFaQ&c=RoP1YumCXCgaW
>>> HvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=nlG5z5NcNdIbQAiX-
>>> BKNeyLlULCbaezrgocEvPhQkl4&m=6rVk8db2H8dAcjS3WCWmAPd08C7JQCvZ
>>> 8W80yE9L5xY&s=yt1toHHZQBqd3qKpWeSzywGJhy928Q5qaEO4v9Lj3vg&e=
>>>>> ield.html
>>>>>> Best regards
>>>>>>
>>>>>>
>>>>>
>---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: [hidden email]
>>>>> For additional commands, e-mail: [hidden email]
>>>>
>---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: [hidden email]
>>>> For additional commands, e-mail: [hidden email]
>>>>
>>>
>---------------------------------------------------------------------
>>> To unsubscribe, e-mail: [hidden email]
>>> For additional commands, e-mail: [hidden email]
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [hidden email]
>> For additional commands, e-mail: [hidden email]
>>
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: [hidden email]
>For additional commands, e-mail: [hidden email]

--
Uwe Schindler
Achterdiek 19, 28357 Bremen
https://www.thetaphi.de
Reply | Threaded
Open this post in threaded view
|

Re: Index-time boosting: Deprecated setBoost method

baris.kazar
Uwe,-

Two questions there:

i guess this is applicable to TextField, too.

And i was expecting a index writer object in the example for index time
boosting.

Best regards


On 10/18/19 2:57 PM, Uwe Schindler wrote:

> Sorry I was imprecise. It's a mix of both. The factors are stored per document in index (this is why I called it index time). During query time the expression use the index time values to fold them into the query boost at query time.
>
> What's your problem with that approach?
>
> Uwe
>
> Am October 18, 2019 6:50:40 PM UTC schrieb [hidden email]:
>> Uwe,-
>>
>>   Thanks, if possible i am looking for a pure Java methodology to do the
>>
>> index time boosting.
>>
>> This example looks like a search time boosting example:
>>
>> https://urldefense.proofpoint.com/v2/url?u=https-3A__lucene.apache.org_core_7-5F7-5F2_expressions_org_apache_lucene_expressions_Expression.html&d=DwIFaQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=nlG5z5NcNdIbQAiX-BKNeyLlULCbaezrgocEvPhQkl4&m=6m6i5zZXPZNP6DyVv_xG4vXnVTPEdfKLeLSvGjEXbyw&s=B5_kGwRIbAoGqL0-SVR9r3t78E5XUuzLT37TeyV-bv8&e=
>>
>>
>>
>> Best regards
>>
>> On 10/18/19 2:31 PM, Uwe Schindler wrote:
>>> Hi,
>>>
>>>> Is there a working example for this? Is this mentioned in the Lucene
>>>> Javadocs or any other docs so that i can look it?
>>> To index the docvalues, see NumericDocValuesField (it can be added to
>> documents like indexed or stored fields). You may have used them for
>> sorting already.
>>>> this methodology seems sort of like discouraging using index time
>> boosting.
>>> Not really. Many use this all the time. It's one of the killer
>> features of both Solr and Elasticsearch. The problem was how the
>> Document.setBoost()worked (it did not work correctly, see below).
>>>> Previous setBoost method call was fine and easy to use.
>>>> Did it have some performance issues and then is that why it was
>> deprecated?
>>> No the reason for deprecating this was for several reasons: setBoost
>> was not doing what the user had expected. Internally the boost value
>> was just multiplied into the document norm factor (which is internally
>> also a docvalues field). The norm factors are only very inprecise
>> floats stored in a byte, so precision is not well. If you put some
>> values into it and the length norm was already consuming all bits, the
>> boosting was very coarse. It was also only multiplied into and most
>> users want to do some stuff like record click counts in the index and
>> then boost for example with the logarithm or some other function. If
>> the boost is just multiplied into the length norm you have no
>> flexibility at all.
>>> In addition you can have several docvalues fields and use their
>> values in a function (e.g. one field with click count and another one
>> with product price). After that you can combine click count and price
>> (which can be modified indipenently during index updates) and change
>> boost to boost lower price and higher click count up.
>>> This is what you can do with the expressions module. You just give it
>> a function.
>>> Here is an example, the second example is using a FunctionScoreQuery
>> that modifies the score based on the function and the given docvalues:
>> https://urldefense.proofpoint.com/v2/url?u=https-3A__lucene.apache.org_core_7-5F7-5F2_expressions_org_apache_lucene_expressions_Expression.html&d=DwIFaQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=nlG5z5NcNdIbQAiX-BKNeyLlULCbaezrgocEvPhQkl4&m=6m6i5zZXPZNP6DyVv_xG4vXnVTPEdfKLeLSvGjEXbyw&s=B5_kGwRIbAoGqL0-SVR9r3t78E5XUuzLT37TeyV-bv8&e=
>>>> FunctionScoreQuery usage with MultiFieldQueryParser would also be
>> nice
>>>> where
>>>>
>>>> MultiFieldQuery already has boosts field to do this in its
>> constructor.
>>> The boots in the query parser are applied for fields during query
>> time (to have a different weight per field). Index time boosting is per
>> document. So you can combine both.
>>>> Maybe it is not needed with MultiFieldQueryParser.
>>> You use MultiFieldQueryParser to adjust weights of the fields (e.g.
>> title versus body). The parsed query is then wrapped with an expression
>> that modifies the score per document according to the docvalues.
>>> Uwe
>>>
>>>> On 10/18/19 1:28 PM, Uwe Schindler wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> that's not true. You can do index time boosting, but you need to do
>> that
>>>> using a separate field. You just index a numeric docvalues field
>> (which may
>>>> contain a long or float value per document). Later you wrap your
>> query with
>>>> some FunctionScoreQuery (e.g., use the Javascript function query
>> syntax in
>>>> the expressions module). This allows you to compile a javascript
>> function
>>>> that calculated the final score based on the score returned by the
>> inner query
>>>> and combines them with docvalues that were indexed per document.
>>>>> Uwe
>>>>>
>>>>> -----
>>>>> Uwe Schindler
>>>>> Achterdiek 19, D-28357 Bremen
>>>>> https://urldefense.proofpoint.com/v2/url?u=https-
>>>> 3A__www.thetaphi.de&d=DwIFaQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIr
>>>> MUB65eapI_JnE&r=nlG5z5NcNdIbQAiX-
>>>> BKNeyLlULCbaezrgocEvPhQkl4&m=6rVk8db2H8dAcjS3WCWmAPd08C7JQCvZ
>>>> 8W80yE9L5xY&s=zgKmnmP9gLG4DlEnAfDdtBMEzPXtHNVYojxXIKEnQgs&e=
>>>>> eMail: [hidden email]
>>>>>
>>>>>> -----Original Message-----
>>>>>> From: [hidden email] <[hidden email]>
>>>>>> Sent: Friday, October 18, 2019 5:28 PM
>>>>>> To: [hidden email]
>>>>>> Cc: [hidden email]
>>>>>> Subject: Re: Index-time boosting: Deprecated setBoost method
>>>>>>
>>>>>> It looks like index-time boosting (field) is not possible since
>> Lucene
>>>>>> version 7.7.2 and
>>>>>>
>>>>>> i was using before for another case the BoostQuery at search time
>> for
>>>>>> boosting and
>>>>>>
>>>>>> this seems to be the only boosting option now in Lucene.
>>>>>>
>>>>>> Best regards
>>>>>>
>>>>>>
>>>>>> On 10/18/19 10:01 AM, [hidden email] wrote:
>>>>>>> Hi,-
>>>>>>>
>>>>>>> i saw this in the Field class docs and i am figuring out the
>> following
>>>>>>> note in the docs:
>>>>>>>
>>>>>>> setBoost(float boost)
>>>>>>> Deprecated.
>>>>>>> Index-time boosts are deprecated, please index index-time scoring
>>>>>>> factors into a doc value field and combine them with the score at
>>>>>>> query time using eg. FunctionScoreQuery.
>>>>>>>
>>>>>>> I appreciate this note. Is there an example about this? I wish
>> docs
>>>>>>> would give a simple example to further help.
>>>>>>>
>>>>>>>
>>>>>> https://urldefense.proofpoint.com/v2/url?u=https-
>>>> 3A__lucene.apache.org_core_6-5F6-
>>>> 5F0__core_org_apache_lucene_document_&d=DwIFaQ&c=RoP1YumCXCga
>>>> WHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=nlG5z5NcNdIbQAiX-
>>>> BKNeyLlULCbaezrgocEvPhQkl4&m=6rVk8db2H8dAcjS3WCWmAPd08C7JQCvZ
>>>> 8W80yE9L5xY&s=rIVbw3_TGEwpaet5ibCeYze6vSDUiPhwOzlV0z484fM&e=
>>>>>> Field.html
>>>>>>> vs
>>>>>>>
>>>>>>>
>>>>>> https://urldefense.proofpoint.com/v2/url?u=https-
>>>> 3A__lucene.apache.org_core_7-5F7-
>>>> 5F2_core_org_apache_lucene_document_F&d=DwIFaQ&c=RoP1YumCXCgaW
>>>> HvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=nlG5z5NcNdIbQAiX-
>>>> BKNeyLlULCbaezrgocEvPhQkl4&m=6rVk8db2H8dAcjS3WCWmAPd08C7JQCvZ
>>>> 8W80yE9L5xY&s=yt1toHHZQBqd3qKpWeSzywGJhy928Q5qaEO4v9Lj3vg&e=
>>>>>> ield.html
>>>>>>> Best regards
>>>>>>>
>>>>>>>
>> ---------------------------------------------------------------------
>>>>>> To unsubscribe, e-mail: [hidden email]
>>>>>> For additional commands, e-mail: [hidden email]
>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: [hidden email]
>>>>> For additional commands, e-mail: [hidden email]
>>>>>
>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: [hidden email]
>>>> For additional commands, e-mail: [hidden email]
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: [hidden email]
>>> For additional commands, e-mail: [hidden email]
>>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [hidden email]
>> For additional commands, e-mail: [hidden email]
> --
> Uwe Schindler
> Achterdiek 19, 28357 Bremen
> https://urldefense.proofpoint.com/v2/url?u=https-3A__www.thetaphi.de&d=DwIFaQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=nlG5z5NcNdIbQAiX-BKNeyLlULCbaezrgocEvPhQkl4&m=6ksT9ArMj83Yxf_GrxLNeJ4UFEeKdVdLK0BlOT0d754&s=33f2nq9rOLI5pN9e_RYl_TiEKnP_f4WMZ__vqyz2bzo&e=

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Index-time boosting: Deprecated setBoost method

Uwe Schindler
Hi,

Read my original email! The index time values are written using NumericDocValuesField. The expressions docs also refer to that when the bindings are documented.

It's separate from the indexed data (TextField). Think of it like an additional numeric field in your database table with a factor in each row.

Uwe

Am October 18, 2019 7:14:03 PM UTC schrieb [hidden email]:

>Uwe,-
>
>Two questions there:
>
>i guess this is applicable to TextField, too.
>
>And i was expecting a index writer object in the example for index time
>
>boosting.
>
>Best regards
>
>
>On 10/18/19 2:57 PM, Uwe Schindler wrote:
>> Sorry I was imprecise. It's a mix of both. The factors are stored per
>document in index (this is why I called it index time). During query
>time the expression use the index time values to fold them into the
>query boost at query time.
>>
>> What's your problem with that approach?
>>
>> Uwe
>>
>> Am October 18, 2019 6:50:40 PM UTC schrieb [hidden email]:
>>> Uwe,-
>>>
>>>   Thanks, if possible i am looking for a pure Java methodology to do
>the
>>>
>>> index time boosting.
>>>
>>> This example looks like a search time boosting example:
>>>
>>>
>https://urldefense.proofpoint.com/v2/url?u=https-3A__lucene.apache.org_core_7-5F7-5F2_expressions_org_apache_lucene_expressions_Expression.html&d=DwIFaQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=nlG5z5NcNdIbQAiX-BKNeyLlULCbaezrgocEvPhQkl4&m=6m6i5zZXPZNP6DyVv_xG4vXnVTPEdfKLeLSvGjEXbyw&s=B5_kGwRIbAoGqL0-SVR9r3t78E5XUuzLT37TeyV-bv8&e=
>>>
>>>
>>>
>>> Best regards
>>>
>>> On 10/18/19 2:31 PM, Uwe Schindler wrote:
>>>> Hi,
>>>>
>>>>> Is there a working example for this? Is this mentioned in the
>Lucene
>>>>> Javadocs or any other docs so that i can look it?
>>>> To index the docvalues, see NumericDocValuesField (it can be added
>to
>>> documents like indexed or stored fields). You may have used them for
>>> sorting already.
>>>>> this methodology seems sort of like discouraging using index time
>>> boosting.
>>>> Not really. Many use this all the time. It's one of the killer
>>> features of both Solr and Elasticsearch. The problem was how the
>>> Document.setBoost()worked (it did not work correctly, see below).
>>>>> Previous setBoost method call was fine and easy to use.
>>>>> Did it have some performance issues and then is that why it was
>>> deprecated?
>>>> No the reason for deprecating this was for several reasons:
>setBoost
>>> was not doing what the user had expected. Internally the boost value
>>> was just multiplied into the document norm factor (which is
>internally
>>> also a docvalues field). The norm factors are only very inprecise
>>> floats stored in a byte, so precision is not well. If you put some
>>> values into it and the length norm was already consuming all bits,
>the
>>> boosting was very coarse. It was also only multiplied into and most
>>> users want to do some stuff like record click counts in the index
>and
>>> then boost for example with the logarithm or some other function. If
>>> the boost is just multiplied into the length norm you have no
>>> flexibility at all.
>>>> In addition you can have several docvalues fields and use their
>>> values in a function (e.g. one field with click count and another
>one
>>> with product price). After that you can combine click count and
>price
>>> (which can be modified indipenently during index updates) and change
>>> boost to boost lower price and higher click count up.
>>>> This is what you can do with the expressions module. You just give
>it
>>> a function.
>>>> Here is an example, the second example is using a
>FunctionScoreQuery
>>> that modifies the score based on the function and the given
>docvalues:
>>>
>https://urldefense.proofpoint.com/v2/url?u=https-3A__lucene.apache.org_core_7-5F7-5F2_expressions_org_apache_lucene_expressions_Expression.html&d=DwIFaQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=nlG5z5NcNdIbQAiX-BKNeyLlULCbaezrgocEvPhQkl4&m=6m6i5zZXPZNP6DyVv_xG4vXnVTPEdfKLeLSvGjEXbyw&s=B5_kGwRIbAoGqL0-SVR9r3t78E5XUuzLT37TeyV-bv8&e=
>>>>> FunctionScoreQuery usage with MultiFieldQueryParser would also be
>>> nice
>>>>> where
>>>>>
>>>>> MultiFieldQuery already has boosts field to do this in its
>>> constructor.
>>>> The boots in the query parser are applied for fields during query
>>> time (to have a different weight per field). Index time boosting is
>per
>>> document. So you can combine both.
>>>>> Maybe it is not needed with MultiFieldQueryParser.
>>>> You use MultiFieldQueryParser to adjust weights of the fields (e.g.
>>> title versus body). The parsed query is then wrapped with an
>expression
>>> that modifies the score per document according to the docvalues.
>>>> Uwe
>>>>
>>>>> On 10/18/19 1:28 PM, Uwe Schindler wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> that's not true. You can do index time boosting, but you need to
>do
>>> that
>>>>> using a separate field. You just index a numeric docvalues field
>>> (which may
>>>>> contain a long or float value per document). Later you wrap your
>>> query with
>>>>> some FunctionScoreQuery (e.g., use the Javascript function query
>>> syntax in
>>>>> the expressions module). This allows you to compile a javascript
>>> function
>>>>> that calculated the final score based on the score returned by the
>>> inner query
>>>>> and combines them with docvalues that were indexed per document.
>>>>>> Uwe
>>>>>>
>>>>>> -----
>>>>>> Uwe Schindler
>>>>>> Achterdiek 19, D-28357 Bremen
>>>>>> https://urldefense.proofpoint.com/v2/url?u=https-
>>>>> 3A__www.thetaphi.de&d=DwIFaQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIr
>>>>> MUB65eapI_JnE&r=nlG5z5NcNdIbQAiX-
>>>>> BKNeyLlULCbaezrgocEvPhQkl4&m=6rVk8db2H8dAcjS3WCWmAPd08C7JQCvZ
>>>>> 8W80yE9L5xY&s=zgKmnmP9gLG4DlEnAfDdtBMEzPXtHNVYojxXIKEnQgs&e=
>>>>>> eMail: [hidden email]
>>>>>>
>>>>>>> -----Original Message-----
>>>>>>> From: [hidden email] <[hidden email]>
>>>>>>> Sent: Friday, October 18, 2019 5:28 PM
>>>>>>> To: [hidden email]
>>>>>>> Cc: [hidden email]
>>>>>>> Subject: Re: Index-time boosting: Deprecated setBoost method
>>>>>>>
>>>>>>> It looks like index-time boosting (field) is not possible since
>>> Lucene
>>>>>>> version 7.7.2 and
>>>>>>>
>>>>>>> i was using before for another case the BoostQuery at search
>time
>>> for
>>>>>>> boosting and
>>>>>>>
>>>>>>> this seems to be the only boosting option now in Lucene.
>>>>>>>
>>>>>>> Best regards
>>>>>>>
>>>>>>>
>>>>>>> On 10/18/19 10:01 AM, [hidden email] wrote:
>>>>>>>> Hi,-
>>>>>>>>
>>>>>>>> i saw this in the Field class docs and i am figuring out the
>>> following
>>>>>>>> note in the docs:
>>>>>>>>
>>>>>>>> setBoost(float boost)
>>>>>>>> Deprecated.
>>>>>>>> Index-time boosts are deprecated, please index index-time
>scoring
>>>>>>>> factors into a doc value field and combine them with the score
>at
>>>>>>>> query time using eg. FunctionScoreQuery.
>>>>>>>>
>>>>>>>> I appreciate this note. Is there an example about this? I wish
>>> docs
>>>>>>>> would give a simple example to further help.
>>>>>>>>
>>>>>>>>
>>>>>>> https://urldefense.proofpoint.com/v2/url?u=https-
>>>>> 3A__lucene.apache.org_core_6-5F6-
>>>>> 5F0__core_org_apache_lucene_document_&d=DwIFaQ&c=RoP1YumCXCga
>>>>> WHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=nlG5z5NcNdIbQAiX-
>>>>> BKNeyLlULCbaezrgocEvPhQkl4&m=6rVk8db2H8dAcjS3WCWmAPd08C7JQCvZ
>>>>> 8W80yE9L5xY&s=rIVbw3_TGEwpaet5ibCeYze6vSDUiPhwOzlV0z484fM&e=
>>>>>>> Field.html
>>>>>>>> vs
>>>>>>>>
>>>>>>>>
>>>>>>> https://urldefense.proofpoint.com/v2/url?u=https-
>>>>> 3A__lucene.apache.org_core_7-5F7-
>>>>> 5F2_core_org_apache_lucene_document_F&d=DwIFaQ&c=RoP1YumCXCgaW
>>>>> HvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=nlG5z5NcNdIbQAiX-
>>>>> BKNeyLlULCbaezrgocEvPhQkl4&m=6rVk8db2H8dAcjS3WCWmAPd08C7JQCvZ
>>>>> 8W80yE9L5xY&s=yt1toHHZQBqd3qKpWeSzywGJhy928Q5qaEO4v9Lj3vg&e=
>>>>>>> ield.html
>>>>>>>> Best regards
>>>>>>>>
>>>>>>>>
>>>
>---------------------------------------------------------------------
>>>>>>> To unsubscribe, e-mail: [hidden email]
>>>>>>> For additional commands, e-mail:
>[hidden email]
>>>
>---------------------------------------------------------------------
>>>>>> To unsubscribe, e-mail: [hidden email]
>>>>>> For additional commands, e-mail: [hidden email]
>>>>>>
>>>
>---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: [hidden email]
>>>>> For additional commands, e-mail: [hidden email]
>>>>
>---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: [hidden email]
>>>> For additional commands, e-mail: [hidden email]
>>>>
>>>
>---------------------------------------------------------------------
>>> To unsubscribe, e-mail: [hidden email]
>>> For additional commands, e-mail: [hidden email]
>> --
>> Uwe Schindler
>> Achterdiek 19, 28357 Bremen
>>
>https://urldefense.proofpoint.com/v2/url?u=https-3A__www.thetaphi.de&d=DwIFaQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=nlG5z5NcNdIbQAiX-BKNeyLlULCbaezrgocEvPhQkl4&m=6ksT9ArMj83Yxf_GrxLNeJ4UFEeKdVdLK0BlOT0d754&s=33f2nq9rOLI5pN9e_RYl_TiEKnP_f4WMZ__vqyz2bzo&e=
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: [hidden email]
>For additional commands, e-mail: [hidden email]

--
Uwe Schindler
Achterdiek 19, 28357 Bremen
https://www.thetaphi.de
Reply | Threaded
Open this post in threaded view
|

Re: Index-time boosting: Deprecated setBoost method

baris.kazar
Uwe,-

  can this
https://lucene.apache.org/core/7_7_2/expressions/org/apache/lucene/expressions/Expression.html 
doc example that You also gave be extended with NumericDocValuesField
part that needs to be done at indexing time boosting, too?

i see now why You meant that this is mixed type of boosting (i.e., both
indexing time and search time).

I need then include this query mentioned in this example on these _score
field (i would call it _boost field in my case) into my overall
BooleanQuery.

i will now try to combine these together and post here for future help.

Best regards


On 10/18/19 3:18 PM, Uwe Schindler wrote:

> Hi,
>
> Read my original email! The index time values are written using NumericDocValuesField. The expressions docs also refer to that when the bindings are documented.
>
> It's separate from the indexed data (TextField). Think of it like an additional numeric field in your database table with a factor in each row.
>
> Uwe
>
> Am October 18, 2019 7:14:03 PM UTC schrieb [hidden email]:
>> Uwe,-
>>
>> Two questions there:
>>
>> i guess this is applicable to TextField, too.
>>
>> And i was expecting a index writer object in the example for index time
>>
>> boosting.
>>
>> Best regards
>>
>>
>> On 10/18/19 2:57 PM, Uwe Schindler wrote:
>>> Sorry I was imprecise. It's a mix of both. The factors are stored per
>> document in index (this is why I called it index time). During query
>> time the expression use the index time values to fold them into the
>> query boost at query time.
>>> What's your problem with that approach?
>>>
>>> Uwe
>>>
>>> Am October 18, 2019 6:50:40 PM UTC schrieb [hidden email]:
>>>> Uwe,-
>>>>
>>>>    Thanks, if possible i am looking for a pure Java methodology to do
>> the
>>>> index time boosting.
>>>>
>>>> This example looks like a search time boosting example:
>>>>
>>>>
>> https://urldefense.proofpoint.com/v2/url?u=https-3A__lucene.apache.org_core_7-5F7-5F2_expressions_org_apache_lucene_expressions_Expression.html&d=DwIFaQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=nlG5z5NcNdIbQAiX-BKNeyLlULCbaezrgocEvPhQkl4&m=6m6i5zZXPZNP6DyVv_xG4vXnVTPEdfKLeLSvGjEXbyw&s=B5_kGwRIbAoGqL0-SVR9r3t78E5XUuzLT37TeyV-bv8&e=
>>>>
>>>>
>>>> Best regards
>>>>
>>>> On 10/18/19 2:31 PM, Uwe Schindler wrote:
>>>>> Hi,
>>>>>
>>>>>> Is there a working example for this? Is this mentioned in the
>> Lucene
>>>>>> Javadocs or any other docs so that i can look it?
>>>>> To index the docvalues, see NumericDocValuesField (it can be added
>> to
>>>> documents like indexed or stored fields). You may have used them for
>>>> sorting already.
>>>>>> this methodology seems sort of like discouraging using index time
>>>> boosting.
>>>>> Not really. Many use this all the time. It's one of the killer
>>>> features of both Solr and Elasticsearch. The problem was how the
>>>> Document.setBoost()worked (it did not work correctly, see below).
>>>>>> Previous setBoost method call was fine and easy to use.
>>>>>> Did it have some performance issues and then is that why it was
>>>> deprecated?
>>>>> No the reason for deprecating this was for several reasons:
>> setBoost
>>>> was not doing what the user had expected. Internally the boost value
>>>> was just multiplied into the document norm factor (which is
>> internally
>>>> also a docvalues field). The norm factors are only very inprecise
>>>> floats stored in a byte, so precision is not well. If you put some
>>>> values into it and the length norm was already consuming all bits,
>> the
>>>> boosting was very coarse. It was also only multiplied into and most
>>>> users want to do some stuff like record click counts in the index
>> and
>>>> then boost for example with the logarithm or some other function. If
>>>> the boost is just multiplied into the length norm you have no
>>>> flexibility at all.
>>>>> In addition you can have several docvalues fields and use their
>>>> values in a function (e.g. one field with click count and another
>> one
>>>> with product price). After that you can combine click count and
>> price
>>>> (which can be modified indipenently during index updates) and change
>>>> boost to boost lower price and higher click count up.
>>>>> This is what you can do with the expressions module. You just give
>> it
>>>> a function.
>>>>> Here is an example, the second example is using a
>> FunctionScoreQuery
>>>> that modifies the score based on the function and the given
>> docvalues:
>> https://urldefense.proofpoint.com/v2/url?u=https-3A__lucene.apache.org_core_7-5F7-5F2_expressions_org_apache_lucene_expressions_Expression.html&d=DwIFaQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=nlG5z5NcNdIbQAiX-BKNeyLlULCbaezrgocEvPhQkl4&m=6m6i5zZXPZNP6DyVv_xG4vXnVTPEdfKLeLSvGjEXbyw&s=B5_kGwRIbAoGqL0-SVR9r3t78E5XUuzLT37TeyV-bv8&e=
>>>>>> FunctionScoreQuery usage with MultiFieldQueryParser would also be
>>>> nice
>>>>>> where
>>>>>>
>>>>>> MultiFieldQuery already has boosts field to do this in its
>>>> constructor.
>>>>> The boots in the query parser are applied for fields during query
>>>> time (to have a different weight per field). Index time boosting is
>> per
>>>> document. So you can combine both.
>>>>>> Maybe it is not needed with MultiFieldQueryParser.
>>>>> You use MultiFieldQueryParser to adjust weights of the fields (e.g.
>>>> title versus body). The parsed query is then wrapped with an
>> expression
>>>> that modifies the score per document according to the docvalues.
>>>>> Uwe
>>>>>
>>>>>> On 10/18/19 1:28 PM, Uwe Schindler wrote:
>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> that's not true. You can do index time boosting, but you need to
>> do
>>>> that
>>>>>> using a separate field. You just index a numeric docvalues field
>>>> (which may
>>>>>> contain a long or float value per document). Later you wrap your
>>>> query with
>>>>>> some FunctionScoreQuery (e.g., use the Javascript function query
>>>> syntax in
>>>>>> the expressions module). This allows you to compile a javascript
>>>> function
>>>>>> that calculated the final score based on the score returned by the
>>>> inner query
>>>>>> and combines them with docvalues that were indexed per document.
>>>>>>> Uwe
>>>>>>>
>>>>>>> -----
>>>>>>> Uwe Schindler
>>>>>>> Achterdiek 19, D-28357 Bremen
>>>>>>> https://urldefense.proofpoint.com/v2/url?u=https-
>>>>>> 3A__www.thetaphi.de&d=DwIFaQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIr
>>>>>> MUB65eapI_JnE&r=nlG5z5NcNdIbQAiX-
>>>>>> BKNeyLlULCbaezrgocEvPhQkl4&m=6rVk8db2H8dAcjS3WCWmAPd08C7JQCvZ
>>>>>> 8W80yE9L5xY&s=zgKmnmP9gLG4DlEnAfDdtBMEzPXtHNVYojxXIKEnQgs&e=
>>>>>>> eMail: [hidden email]
>>>>>>>
>>>>>>>> -----Original Message-----
>>>>>>>> From: [hidden email] <[hidden email]>
>>>>>>>> Sent: Friday, October 18, 2019 5:28 PM
>>>>>>>> To: [hidden email]
>>>>>>>> Cc: [hidden email]
>>>>>>>> Subject: Re: Index-time boosting: Deprecated setBoost method
>>>>>>>>
>>>>>>>> It looks like index-time boosting (field) is not possible since
>>>> Lucene
>>>>>>>> version 7.7.2 and
>>>>>>>>
>>>>>>>> i was using before for another case the BoostQuery at search
>> time
>>>> for
>>>>>>>> boosting and
>>>>>>>>
>>>>>>>> this seems to be the only boosting option now in Lucene.
>>>>>>>>
>>>>>>>> Best regards
>>>>>>>>
>>>>>>>>
>>>>>>>> On 10/18/19 10:01 AM, [hidden email] wrote:
>>>>>>>>> Hi,-
>>>>>>>>>
>>>>>>>>> i saw this in the Field class docs and i am figuring out the
>>>> following
>>>>>>>>> note in the docs:
>>>>>>>>>
>>>>>>>>> setBoost(float boost)
>>>>>>>>> Deprecated.
>>>>>>>>> Index-time boosts are deprecated, please index index-time
>> scoring
>>>>>>>>> factors into a doc value field and combine them with the score
>> at
>>>>>>>>> query time using eg. FunctionScoreQuery.
>>>>>>>>>
>>>>>>>>> I appreciate this note. Is there an example about this? I wish
>>>> docs
>>>>>>>>> would give a simple example to further help.
>>>>>>>>>
>>>>>>>>>
>>>>>>>> https://urldefense.proofpoint.com/v2/url?u=https-
>>>>>> 3A__lucene.apache.org_core_6-5F6-
>>>>>> 5F0__core_org_apache_lucene_document_&d=DwIFaQ&c=RoP1YumCXCga
>>>>>> WHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=nlG5z5NcNdIbQAiX-
>>>>>> BKNeyLlULCbaezrgocEvPhQkl4&m=6rVk8db2H8dAcjS3WCWmAPd08C7JQCvZ
>>>>>> 8W80yE9L5xY&s=rIVbw3_TGEwpaet5ibCeYze6vSDUiPhwOzlV0z484fM&e=
>>>>>>>> Field.html
>>>>>>>>> vs
>>>>>>>>>
>>>>>>>>>
>>>>>>>> https://urldefense.proofpoint.com/v2/url?u=https-
>>>>>> 3A__lucene.apache.org_core_7-5F7-
>>>>>> 5F2_core_org_apache_lucene_document_F&d=DwIFaQ&c=RoP1YumCXCgaW
>>>>>> HvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=nlG5z5NcNdIbQAiX-
>>>>>> BKNeyLlULCbaezrgocEvPhQkl4&m=6rVk8db2H8dAcjS3WCWmAPd08C7JQCvZ
>>>>>> 8W80yE9L5xY&s=yt1toHHZQBqd3qKpWeSzywGJhy928Q5qaEO4v9Lj3vg&e=
>>>>>>>> ield.html
>>>>>>>>> Best regards
>>>>>>>>>
>>>>>>>>>
>> ---------------------------------------------------------------------
>>>>>>>> To unsubscribe, e-mail: [hidden email]
>>>>>>>> For additional commands, e-mail:
>> [hidden email]
>> ---------------------------------------------------------------------
>>>>>>> To unsubscribe, e-mail: [hidden email]
>>>>>>> For additional commands, e-mail: [hidden email]
>>>>>>>
>> ---------------------------------------------------------------------
>>>>>> To unsubscribe, e-mail: [hidden email]
>>>>>> For additional commands, e-mail: [hidden email]
>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: [hidden email]
>>>>> For additional commands, e-mail: [hidden email]
>>>>>
>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: [hidden email]
>>>> For additional commands, e-mail: [hidden email]
>>> --
>>> Uwe Schindler
>>> Achterdiek 19, 28357 Bremen
>>>
>> https://urldefense.proofpoint.com/v2/url?u=https-3A__www.thetaphi.de&d=DwIFaQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=nlG5z5NcNdIbQAiX-BKNeyLlULCbaezrgocEvPhQkl4&m=6ksT9ArMj83Yxf_GrxLNeJ4UFEeKdVdLK0BlOT0d754&s=33f2nq9rOLI5pN9e_RYl_TiEKnP_f4WMZ__vqyz2bzo&e=
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [hidden email]
>> For additional commands, e-mail: [hidden email]
> --
> Uwe Schindler
> Achterdiek 19, 28357 Bremen
> https://urldefense.proofpoint.com/v2/url?u=https-3A__www.thetaphi.de&d=DwIFaQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=nlG5z5NcNdIbQAiX-BKNeyLlULCbaezrgocEvPhQkl4&m=owjI40OeLzt8gvPN44aTdndoiUel5E9Hqx1TEcoWk_Y&s=xbZedNkQXb5eQcw_K7lCOP7b5ToKJVZ1dCPY3hi836c&e=

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Index-time boosting: Deprecated setBoost method

baris.kazar
Hi,-

  i would like to ask the following to make it clearer (for me at least):

Document doc = new Document();



Field  f1= new TextField("field1", "string1", Field.Store.YES);


doc.add(f1); 
f1.setBoost(2.0f);



Field f2 = new TextField("field2", "string2", Field.Store.YES);


doc.add(f2);


f2.setBoost(1.0f);




This turns into this where _boost1 field is associated with field1 and

_boost2 field is associated with field2 field:


In Indexing code:

Field  f1= new TextField("field1", "string1", Field.Store.YES);


Field _boost1 = new NumericDocValuesField(“field1”, 2L);
doc.add(_boost1);

// If this boost value needs to be stored, a separate storedField
instance needs to be added as well
… ( i will post this soon)

Field _boost2 = new NumericDocValuesField(“field2”, 1L);
doc.add(_boost2);

// If this boost value needs to be stored, a separate storedField
instance needs to be added as well
… ( i will post this soon)


Now, in the searching code (i.e., at query time) should i need the
FunctionScoreQuery because in this case

the boost is just a constant value but not a function? However, constant
value can be argued to be a function with the same value all the time, too.


Expression expr = JavascriptCompiler.compile(“_boost");



// SimpleBindings just maps variables to SortField instances


SimpleBindings bindings = new SimpleBindings();


bindings.add(new SortField("_boost1", SortField.Type.SCORE));
 


// create a query that matches based on body:contents but


// scores using expr


Query query = new FunctionScoreQuery(


     new TermQuery(new Term("field1", "term_to_look_for")),


expr.getDoubleValuesSource(bindings));


searcher.search(query, 10);


So, if boost is a single constant value, do we need the Javascript part
above?

Best regards


On 10/18/19 4:07 PM, [hidden email] wrote:

> Uwe,-
>
>  can this
> https://lucene.apache.org/core/7_7_2/expressions/org/apache/lucene/expressions/Expression.html 
> doc example that You also gave be extended with NumericDocValuesField
> part that needs to be done at indexing time boosting, too?
>
> i see now why You meant that this is mixed type of boosting (i.e.,
> both indexing time and search time).
>
> I need then include this query mentioned in this example on these
> _score field (i would call it _boost field in my case) into my overall
> BooleanQuery.
>
> i will now try to combine these together and post here for future help.
>
> Best regards
>
>
> On 10/18/19 3:18 PM, Uwe Schindler wrote:
>> Hi,
>>
>> Read my original email! The index time values are written using
>> NumericDocValuesField. The expressions docs also refer to that when
>> the bindings are documented.
>>
>> It's separate from the indexed data (TextField). Think of it like an
>> additional numeric field in your database table with a factor in each
>> row.
>>
>> Uwe
>>
>> Am October 18, 2019 7:14:03 PM UTC schrieb [hidden email]:
>>> Uwe,-
>>>
>>> Two questions there:
>>>
>>> i guess this is applicable to TextField, too.
>>>
>>> And i was expecting a index writer object in the example for index time
>>>
>>> boosting.
>>>
>>> Best regards
>>>
>>>
>>> On 10/18/19 2:57 PM, Uwe Schindler wrote:
>>>> Sorry I was imprecise. It's a mix of both. The factors are stored per
>>> document in index (this is why I called it index time). During query
>>> time the expression use the index time values to fold them into the
>>> query boost at query time.
>>>> What's your problem with that approach?
>>>>
>>>> Uwe
>>>>
>>>> Am October 18, 2019 6:50:40 PM UTC schrieb [hidden email]:
>>>>> Uwe,-
>>>>>
>>>>>    Thanks, if possible i am looking for a pure Java methodology to do
>>> the
>>>>> index time boosting.
>>>>>
>>>>> This example looks like a search time boosting example:
>>>>>
>>>>>
>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__lucene.apache.org_core_7-5F7-5F2_expressions_org_apache_lucene_expressions_Expression.html&d=DwIFaQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=nlG5z5NcNdIbQAiX-BKNeyLlULCbaezrgocEvPhQkl4&m=6m6i5zZXPZNP6DyVv_xG4vXnVTPEdfKLeLSvGjEXbyw&s=B5_kGwRIbAoGqL0-SVR9r3t78E5XUuzLT37TeyV-bv8&e= 
>>>
>>>>>
>>>>>
>>>>> Best regards
>>>>>
>>>>> On 10/18/19 2:31 PM, Uwe Schindler wrote:
>>>>>> Hi,
>>>>>>
>>>>>>> Is there a working example for this? Is this mentioned in the
>>> Lucene
>>>>>>> Javadocs or any other docs so that i can look it?
>>>>>> To index the docvalues, see NumericDocValuesField (it can be added
>>> to
>>>>> documents like indexed or stored fields). You may have used them for
>>>>> sorting already.
>>>>>>> this methodology seems sort of like discouraging using index time
>>>>> boosting.
>>>>>> Not really. Many use this all the time. It's one of the killer
>>>>> features of both Solr and Elasticsearch. The problem was how the
>>>>> Document.setBoost()worked (it did not work correctly, see below).
>>>>>>> Previous setBoost method call was fine and easy to use.
>>>>>>> Did it have some performance issues and then is that why it was
>>>>> deprecated?
>>>>>> No the reason for deprecating this was for several reasons:
>>> setBoost
>>>>> was not doing what the user had expected. Internally the boost value
>>>>> was just multiplied into the document norm factor (which is
>>> internally
>>>>> also a docvalues field). The norm factors are only very inprecise
>>>>> floats stored in a byte, so precision is not well. If you put some
>>>>> values into it and the length norm was already consuming all bits,
>>> the
>>>>> boosting was very coarse. It was also only multiplied into and most
>>>>> users want to do some stuff like record click counts in the index
>>> and
>>>>> then boost for example with the logarithm or some other function. If
>>>>> the boost is just multiplied into the length norm you have no
>>>>> flexibility at all.
>>>>>> In addition you can have several docvalues fields and use their
>>>>> values in a function (e.g. one field with click count and another
>>> one
>>>>> with product price). After that you can combine click count and
>>> price
>>>>> (which can be modified indipenently during index updates) and change
>>>>> boost to boost lower price and higher click count up.
>>>>>> This is what you can do with the expressions module. You just give
>>> it
>>>>> a function.
>>>>>> Here is an example, the second example is using a
>>> FunctionScoreQuery
>>>>> that modifies the score based on the function and the given
>>> docvalues:
>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__lucene.apache.org_core_7-5F7-5F2_expressions_org_apache_lucene_expressions_Expression.html&d=DwIFaQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=nlG5z5NcNdIbQAiX-BKNeyLlULCbaezrgocEvPhQkl4&m=6m6i5zZXPZNP6DyVv_xG4vXnVTPEdfKLeLSvGjEXbyw&s=B5_kGwRIbAoGqL0-SVR9r3t78E5XUuzLT37TeyV-bv8&e= 
>>>
>>>>>>> FunctionScoreQuery usage with MultiFieldQueryParser would also be
>>>>> nice
>>>>>>> where
>>>>>>>
>>>>>>> MultiFieldQuery already has boosts field to do this in its
>>>>> constructor.
>>>>>> The boots in the query parser are applied for fields during query
>>>>> time (to have a different weight per field). Index time boosting is
>>> per
>>>>> document. So you can combine both.
>>>>>>> Maybe it is not needed with MultiFieldQueryParser.
>>>>>> You use MultiFieldQueryParser to adjust weights of the fields (e.g.
>>>>> title versus body). The parsed query is then wrapped with an
>>> expression
>>>>> that modifies the score per document according to the docvalues.
>>>>>> Uwe
>>>>>>
>>>>>>> On 10/18/19 1:28 PM, Uwe Schindler wrote:
>>>>>>>
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> that's not true. You can do index time boosting, but you need to
>>> do
>>>>> that
>>>>>>> using a separate field. You just index a numeric docvalues field
>>>>> (which may
>>>>>>> contain a long or float value per document). Later you wrap your
>>>>> query with
>>>>>>> some FunctionScoreQuery (e.g., use the Javascript function query
>>>>> syntax in
>>>>>>> the expressions module). This allows you to compile a javascript
>>>>> function
>>>>>>> that calculated the final score based on the score returned by the
>>>>> inner query
>>>>>>> and combines them with docvalues that were indexed per document.
>>>>>>>> Uwe
>>>>>>>>
>>>>>>>> -----
>>>>>>>> Uwe Schindler
>>>>>>>> Achterdiek 19, D-28357 Bremen
>>>>>>>> https://urldefense.proofpoint.com/v2/url?u=https-
>>>>>>> 3A__www.thetaphi.de&d=DwIFaQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIr
>>>>>>> MUB65eapI_JnE&r=nlG5z5NcNdIbQAiX-
>>>>>>> BKNeyLlULCbaezrgocEvPhQkl4&m=6rVk8db2H8dAcjS3WCWmAPd08C7JQCvZ
>>>>>>> 8W80yE9L5xY&s=zgKmnmP9gLG4DlEnAfDdtBMEzPXtHNVYojxXIKEnQgs&e=
>>>>>>>> eMail: [hidden email]
>>>>>>>>
>>>>>>>>> -----Original Message-----
>>>>>>>>> From: [hidden email] <[hidden email]>
>>>>>>>>> Sent: Friday, October 18, 2019 5:28 PM
>>>>>>>>> To: [hidden email]
>>>>>>>>> Cc: [hidden email]
>>>>>>>>> Subject: Re: Index-time boosting: Deprecated setBoost method
>>>>>>>>>
>>>>>>>>> It looks like index-time boosting (field) is not possible since
>>>>> Lucene
>>>>>>>>> version 7.7.2 and
>>>>>>>>>
>>>>>>>>> i was using before for another case the BoostQuery at search
>>> time
>>>>> for
>>>>>>>>> boosting and
>>>>>>>>>
>>>>>>>>> this seems to be the only boosting option now in Lucene.
>>>>>>>>>
>>>>>>>>> Best regards
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 10/18/19 10:01 AM, [hidden email] wrote:
>>>>>>>>>> Hi,-
>>>>>>>>>>
>>>>>>>>>> i saw this in the Field class docs and i am figuring out the
>>>>> following
>>>>>>>>>> note in the docs:
>>>>>>>>>>
>>>>>>>>>> setBoost(float boost)
>>>>>>>>>> Deprecated.
>>>>>>>>>> Index-time boosts are deprecated, please index index-time
>>> scoring
>>>>>>>>>> factors into a doc value field and combine them with the score
>>> at
>>>>>>>>>> query time using eg. FunctionScoreQuery.
>>>>>>>>>>
>>>>>>>>>> I appreciate this note. Is there an example about this? I wish
>>>>> docs
>>>>>>>>>> would give a simple example to further help.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>> https://urldefense.proofpoint.com/v2/url?u=https-
>>>>>>> 3A__lucene.apache.org_core_6-5F6-
>>>>>>> 5F0__core_org_apache_lucene_document_&d=DwIFaQ&c=RoP1YumCXCga
>>>>>>> WHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=nlG5z5NcNdIbQAiX-
>>>>>>> BKNeyLlULCbaezrgocEvPhQkl4&m=6rVk8db2H8dAcjS3WCWmAPd08C7JQCvZ
>>>>>>> 8W80yE9L5xY&s=rIVbw3_TGEwpaet5ibCeYze6vSDUiPhwOzlV0z484fM&e=
>>>>>>>>> Field.html
>>>>>>>>>> vs
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>> https://urldefense.proofpoint.com/v2/url?u=https-
>>>>>>> 3A__lucene.apache.org_core_7-5F7-
>>>>>>> 5F2_core_org_apache_lucene_document_F&d=DwIFaQ&c=RoP1YumCXCgaW
>>>>>>> HvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=nlG5z5NcNdIbQAiX-
>>>>>>> BKNeyLlULCbaezrgocEvPhQkl4&m=6rVk8db2H8dAcjS3WCWmAPd08C7JQCvZ
>>>>>>> 8W80yE9L5xY&s=yt1toHHZQBqd3qKpWeSzywGJhy928Q5qaEO4v9Lj3vg&e=
>>>>>>>>> ield.html
>>>>>>>>>> Best regards
>>>>>>>>>>
>>>>>>>>>>
>>> ---------------------------------------------------------------------
>>>>>>>>> To unsubscribe, e-mail: [hidden email]
>>>>>>>>> For additional commands, e-mail:
>>> [hidden email]
>>> ---------------------------------------------------------------------
>>>>>>>> To unsubscribe, e-mail: [hidden email]
>>>>>>>> For additional commands, e-mail: [hidden email]
>>>>>>>>
>>> ---------------------------------------------------------------------
>>>>>>> To unsubscribe, e-mail: [hidden email]
>>>>>>> For additional commands, e-mail: [hidden email]
>>> ---------------------------------------------------------------------
>>>>>> To unsubscribe, e-mail: [hidden email]
>>>>>> For additional commands, e-mail: [hidden email]
>>>>>>
>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: [hidden email]
>>>>> For additional commands, e-mail: [hidden email]
>>>> --
>>>> Uwe Schindler
>>>> Achterdiek 19, 28357 Bremen
>>>>
>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__www.thetaphi.de&d=DwIFaQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=nlG5z5NcNdIbQAiX-BKNeyLlULCbaezrgocEvPhQkl4&m=6ksT9ArMj83Yxf_GrxLNeJ4UFEeKdVdLK0BlOT0d754&s=33f2nq9rOLI5pN9e_RYl_TiEKnP_f4WMZ__vqyz2bzo&e= 
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: [hidden email]
>>> For additional commands, e-mail: [hidden email]
>> --
>> Uwe Schindler
>> Achterdiek 19, 28357 Bremen
>> https://urldefense.proofpoint.com/v2/url?u=https-3A__www.thetaphi.de&d=DwIFaQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=nlG5z5NcNdIbQAiX-BKNeyLlULCbaezrgocEvPhQkl4&m=owjI40OeLzt8gvPN44aTdndoiUel5E9Hqx1TEcoWk_Y&s=xbZedNkQXb5eQcw_K7lCOP7b5ToKJVZ1dCPY3hi836c&e= 
>>

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Index-time boosting: Deprecated setBoost method

baris.kazar
Hi,-

Sorry about the missing parts in previous post. please accept my
apologies for that.

i needed to add a few more questions/corrections/additions to the
previous post:

Main Question was: if boost is a single constant value, do we need the
Javascript part below?



=== Indexing code snippet for Lucene version 6.6.0 and before===

Document doc = new Document();




Field  f1= new TextField("field1", "string1", Field.Store.YES);


doc.add(f1); 
f1.setBoost(2.0f);



Field f2 = new TextField("field2", "string2", Field.Store.YES);


doc.add(f2);


f2.setBoost(1.0f);



=== end of indexing code snippet for Lucene version 6.6.0 and before ===


This turns into this where _boost1 field is associated with field1 and

_boost2 field is associated with field2 field:


In Indexing code:

=== begining of indexing code snippet ===
Field  f1= new TextField("field1", "string1", Field.Store.YES);


Field _boost1 = new NumericDocValuesField(“field1”, 2L);
doc.add(_boost1);

// If this boost value needs to be stored, a separate storedField
instance needs to be added as well
… ( i will post this soon)

Field _boost2 = new NumericDocValuesField(“field2”, 1L);
doc.add(_boost2);

// If this boost value needs to be stored, a separate storedField
instance needs to be added as well
… ( i will post this soon)

=== end of indexing code snippet ===


Now, in the searching code (i.e., at query time) should i need the
FunctionScoreQuery because in this case

the boost is just a constant value but not a function? However, constant
value can be argued to be a function with the same value all the time, too.


== begining of query time code snippet ===
Expression expr = JavascriptCompiler.compile(“_boost1 + _boost2");



// SimpleBindings just maps variables to SortField instances


SimpleBindings bindings = new SimpleBindings();


bindings.add(new SortField("_boost1", SortField.Type.LONG));
 
// These
have to LONG type i think since NumericDocValuesField accepts "long"
type only, am i right? Can this be DOUBLE type?

bindings.add(new SortField("_boost2", SortField.Type.LONG));
 
// same
question here

// create a query that matches based on body:contents but


// scores using expr


Query query = new FunctionScoreQuery(


     new TermQuery(new Term("field1", "term_to_look_for")),


expr.getDoubleValuesSource(bindings));


searcher.search(query, 10);

=== end of code snippet ===


Best regards


On 10/21/19 11:05 AM, [hidden email] wrote:

> Hi,-
>
>  i would like to ask the following to make it clearer (for me at least):
>
> Document doc = new Document();
>
> 

Field  f1= new TextField("field1", "string1", Field.Store.YES);

>
> doc.add(f1); 
f1.setBoost(2.0f);


>
> Field f2 = new TextField("field2", "string2", Field.Store.YES);

>
> doc.add(f2);

>
> f2.setBoost(1.0f);


>
>
> This turns into this where _boost1 field is associated with field1 and
>
> _boost2 field is associated with field2 field:
>
>
> In Indexing code:
>
> Field  f1= new TextField("field1", "string1", Field.Store.YES);

>
> Field _boost1 = new NumericDocValuesField(“field1”, 2L);
> doc.add(_boost1);
>
> // If this boost value needs to be stored, a separate storedField
> instance needs to be added as well
> … ( i will post this soon)
>
> Field _boost2 = new NumericDocValuesField(“field2”, 1L);
> doc.add(_boost2);
>
> // If this boost value needs to be stored, a separate storedField
> instance needs to be added as well
> … ( i will post this soon)
>
>
> Now, in the searching code (i.e., at query time) should i need the
> FunctionScoreQuery because in this case
>
> the boost is just a constant value but not a function? However,
> constant value can be argued to be a function with the same value all
> the time, too.
>
>
> Expression expr = JavascriptCompiler.compile(“_boost");
>
> 

// SimpleBindings just maps variables to SortField instances

>
> SimpleBindings bindings = new SimpleBindings();

>
> bindings.add(new SortField("_boost1", SortField.Type.SCORE));
 

>
> // create a query that matches based on body:contents but

>
> // scores using expr

>
> Query query = new FunctionScoreQuery(

>
>     new TermQuery(new Term("field1", "term_to_look_for")),

>
> expr.getDoubleValuesSource(bindings));
>
> 
searcher.search(query, 10);
>
>
> So, if boost is a single constant value, do we need the Javascript
> part above?
>
> Best regards
>
>
> On 10/18/19 4:07 PM, [hidden email] wrote:
>> Uwe,-
>>
>>  can this
>> https://urldefense.proofpoint.com/v2/url?u=https-3A__lucene.apache.org_core_7-5F7-5F2_expressions_org_apache_lucene_expressions_Expression.html&d=DwIDaQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=nlG5z5NcNdIbQAiX-BKNeyLlULCbaezrgocEvPhQkl4&m=MR2S9Z9HEge6s665mtGOFRHKGmuiVYkjp4tXOciYl7A&s=tMCjb5H5KivfJsp-BfABonpjelgp6hn9cBg2GScCmic&e= 
>> doc example that You also gave be extended with NumericDocValuesField
>> part that needs to be done at indexing time boosting, too?
>>
>> i see now why You meant that this is mixed type of boosting (i.e.,
>> both indexing time and search time).
>>
>> I need then include this query mentioned in this example on these
>> _score field (i would call it _boost field in my case) into my
>> overall BooleanQuery.
>>
>> i will now try to combine these together and post here for future help.
>>
>> Best regards
>>
>>
>> On 10/18/19 3:18 PM, Uwe Schindler wrote:
>>> Hi,
>>>
>>> Read my original email! The index time values are written using
>>> NumericDocValuesField. The expressions docs also refer to that when
>>> the bindings are documented.
>>>
>>> It's separate from the indexed data (TextField). Think of it like an
>>> additional numeric field in your database table with a factor in
>>> each row.
>>>
>>> Uwe
>>>
>>> Am October 18, 2019 7:14:03 PM UTC schrieb [hidden email]:
>>>> Uwe,-
>>>>
>>>> Two questions there:
>>>>
>>>> i guess this is applicable to TextField, too.
>>>>
>>>> And i was expecting a index writer object in the example for index
>>>> time
>>>>
>>>> boosting.
>>>>
>>>> Best regards
>>>>
>>>>
>>>> On 10/18/19 2:57 PM, Uwe Schindler wrote:
>>>>> Sorry I was imprecise. It's a mix of both. The factors are stored per
>>>> document in index (this is why I called it index time). During query
>>>> time the expression use the index time values to fold them into the
>>>> query boost at query time.
>>>>> What's your problem with that approach?
>>>>>
>>>>> Uwe
>>>>>
>>>>> Am October 18, 2019 6:50:40 PM UTC schrieb [hidden email]:
>>>>>> Uwe,-
>>>>>>
>>>>>>    Thanks, if possible i am looking for a pure Java methodology
>>>>>> to do
>>>> the
>>>>>> index time boosting.
>>>>>>
>>>>>> This example looks like a search time boosting example:
>>>>>>
>>>>>>
>>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__lucene.apache.org_core_7-5F7-5F2_expressions_org_apache_lucene_expressions_Expression.html&d=DwIFaQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=nlG5z5NcNdIbQAiX-BKNeyLlULCbaezrgocEvPhQkl4&m=6m6i5zZXPZNP6DyVv_xG4vXnVTPEdfKLeLSvGjEXbyw&s=B5_kGwRIbAoGqL0-SVR9r3t78E5XUuzLT37TeyV-bv8&e= 
>>>>
>>>>>>
>>>>>>
>>>>>> Best regards
>>>>>>
>>>>>> On 10/18/19 2:31 PM, Uwe Schindler wrote:
>>>>>>> Hi,
>>>>>>>
>>>>>>>> Is there a working example for this? Is this mentioned in the
>>>> Lucene
>>>>>>>> Javadocs or any other docs so that i can look it?
>>>>>>> To index the docvalues, see NumericDocValuesField (it can be added
>>>> to
>>>>>> documents like indexed or stored fields). You may have used them for
>>>>>> sorting already.
>>>>>>>> this methodology seems sort of like discouraging using index time
>>>>>> boosting.
>>>>>>> Not really. Many use this all the time. It's one of the killer
>>>>>> features of both Solr and Elasticsearch. The problem was how the
>>>>>> Document.setBoost()worked (it did not work correctly, see below).
>>>>>>>> Previous setBoost method call was fine and easy to use.
>>>>>>>> Did it have some performance issues and then is that why it was
>>>>>> deprecated?
>>>>>>> No the reason for deprecating this was for several reasons:
>>>> setBoost
>>>>>> was not doing what the user had expected. Internally the boost value
>>>>>> was just multiplied into the document norm factor (which is
>>>> internally
>>>>>> also a docvalues field). The norm factors are only very inprecise
>>>>>> floats stored in a byte, so precision is not well. If you put some
>>>>>> values into it and the length norm was already consuming all bits,
>>>> the
>>>>>> boosting was very coarse. It was also only multiplied into and most
>>>>>> users want to do some stuff like record click counts in the index
>>>> and
>>>>>> then boost for example with the logarithm or some other function. If
>>>>>> the boost is just multiplied into the length norm you have no
>>>>>> flexibility at all.
>>>>>>> In addition you can have several docvalues fields and use their
>>>>>> values in a function (e.g. one field with click count and another
>>>> one
>>>>>> with product price). After that you can combine click count and
>>>> price
>>>>>> (which can be modified indipenently during index updates) and change
>>>>>> boost to boost lower price and higher click count up.
>>>>>>> This is what you can do with the expressions module. You just give
>>>> it
>>>>>> a function.
>>>>>>> Here is an example, the second example is using a
>>>> FunctionScoreQuery
>>>>>> that modifies the score based on the function and the given
>>>> docvalues:
>>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__lucene.apache.org_core_7-5F7-5F2_expressions_org_apache_lucene_expressions_Expression.html&d=DwIFaQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=nlG5z5NcNdIbQAiX-BKNeyLlULCbaezrgocEvPhQkl4&m=6m6i5zZXPZNP6DyVv_xG4vXnVTPEdfKLeLSvGjEXbyw&s=B5_kGwRIbAoGqL0-SVR9r3t78E5XUuzLT37TeyV-bv8&e= 
>>>>
>>>>>>>> FunctionScoreQuery usage with MultiFieldQueryParser would also be
>>>>>> nice
>>>>>>>> where
>>>>>>>>
>>>>>>>> MultiFieldQuery already has boosts field to do this in its
>>>>>> constructor.
>>>>>>> The boots in the query parser are applied for fields during query
>>>>>> time (to have a different weight per field). Index time boosting is
>>>> per
>>>>>> document. So you can combine both.
>>>>>>>> Maybe it is not needed with MultiFieldQueryParser.
>>>>>>> You use MultiFieldQueryParser to adjust weights of the fields (e.g.
>>>>>> title versus body). The parsed query is then wrapped with an
>>>> expression
>>>>>> that modifies the score per document according to the docvalues.
>>>>>>> Uwe
>>>>>>>
>>>>>>>> On 10/18/19 1:28 PM, Uwe Schindler wrote:
>>>>>>>>
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>> that's not true. You can do index time boosting, but you need to
>>>> do
>>>>>> that
>>>>>>>> using a separate field. You just index a numeric docvalues field
>>>>>> (which may
>>>>>>>> contain a long or float value per document). Later you wrap your
>>>>>> query with
>>>>>>>> some FunctionScoreQuery (e.g., use the Javascript function query
>>>>>> syntax in
>>>>>>>> the expressions module). This allows you to compile a javascript
>>>>>> function
>>>>>>>> that calculated the final score based on the score returned by the
>>>>>> inner query
>>>>>>>> and combines them with docvalues that were indexed per document.
>>>>>>>>> Uwe
>>>>>>>>>
>>>>>>>>> -----
>>>>>>>>> Uwe Schindler
>>>>>>>>> Achterdiek 19, D-28357 Bremen
>>>>>>>>> https://urldefense.proofpoint.com/v2/url?u=https-
>>>>>>>> 3A__www.thetaphi.de&d=DwIFaQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIr
>>>>>>>> MUB65eapI_JnE&r=nlG5z5NcNdIbQAiX-
>>>>>>>> BKNeyLlULCbaezrgocEvPhQkl4&m=6rVk8db2H8dAcjS3WCWmAPd08C7JQCvZ
>>>>>>>> 8W80yE9L5xY&s=zgKmnmP9gLG4DlEnAfDdtBMEzPXtHNVYojxXIKEnQgs&e=
>>>>>>>>> eMail: [hidden email]
>>>>>>>>>
>>>>>>>>>> -----Original Message-----
>>>>>>>>>> From: [hidden email] <[hidden email]>
>>>>>>>>>> Sent: Friday, October 18, 2019 5:28 PM
>>>>>>>>>> To: [hidden email]
>>>>>>>>>> Cc: [hidden email]
>>>>>>>>>> Subject: Re: Index-time boosting: Deprecated setBoost method
>>>>>>>>>>
>>>>>>>>>> It looks like index-time boosting (field) is not possible since
>>>>>> Lucene
>>>>>>>>>> version 7.7.2 and
>>>>>>>>>>
>>>>>>>>>> i was using before for another case the BoostQuery at search
>>>> time
>>>>>> for
>>>>>>>>>> boosting and
>>>>>>>>>>
>>>>>>>>>> this seems to be the only boosting option now in Lucene.
>>>>>>>>>>
>>>>>>>>>> Best regards
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On 10/18/19 10:01 AM, [hidden email] wrote:
>>>>>>>>>>> Hi,-
>>>>>>>>>>>
>>>>>>>>>>> i saw this in the Field class docs and i am figuring out the
>>>>>> following
>>>>>>>>>>> note in the docs:
>>>>>>>>>>>
>>>>>>>>>>> setBoost(float boost)
>>>>>>>>>>> Deprecated.
>>>>>>>>>>> Index-time boosts are deprecated, please index index-time
>>>> scoring
>>>>>>>>>>> factors into a doc value field and combine them with the score
>>>> at
>>>>>>>>>>> query time using eg. FunctionScoreQuery.
>>>>>>>>>>>
>>>>>>>>>>> I appreciate this note. Is there an example about this? I wish
>>>>>> docs
>>>>>>>>>>> would give a simple example to further help.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>> https://urldefense.proofpoint.com/v2/url?u=https-
>>>>>>>> 3A__lucene.apache.org_core_6-5F6-
>>>>>>>> 5F0__core_org_apache_lucene_document_&d=DwIFaQ&c=RoP1YumCXCga
>>>>>>>> WHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=nlG5z5NcNdIbQAiX-
>>>>>>>> BKNeyLlULCbaezrgocEvPhQkl4&m=6rVk8db2H8dAcjS3WCWmAPd08C7JQCvZ
>>>>>>>> 8W80yE9L5xY&s=rIVbw3_TGEwpaet5ibCeYze6vSDUiPhwOzlV0z484fM&e=
>>>>>>>>>> Field.html
>>>>>>>>>>> vs
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>> https://urldefense.proofpoint.com/v2/url?u=https-
>>>>>>>> 3A__lucene.apache.org_core_7-5F7-
>>>>>>>> 5F2_core_org_apache_lucene_document_F&d=DwIFaQ&c=RoP1YumCXCgaW
>>>>>>>> HvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=nlG5z5NcNdIbQAiX-
>>>>>>>> BKNeyLlULCbaezrgocEvPhQkl4&m=6rVk8db2H8dAcjS3WCWmAPd08C7JQCvZ
>>>>>>>> 8W80yE9L5xY&s=yt1toHHZQBqd3qKpWeSzywGJhy928Q5qaEO4v9Lj3vg&e=
>>>>>>>>>> ield.html
>>>>>>>>>>> Best regards
>>>>>>>>>>>
>>>>>>>>>>>
>>>> ---------------------------------------------------------------------
>>>>>>>>>> To unsubscribe, e-mail: [hidden email]
>>>>>>>>>> For additional commands, e-mail:
>>>> [hidden email]
>>>> ---------------------------------------------------------------------
>>>>>>>>> To unsubscribe, e-mail: [hidden email]
>>>>>>>>> For additional commands, e-mail: [hidden email]
>>>>>>>>>
>>>> ---------------------------------------------------------------------
>>>>>>>> To unsubscribe, e-mail: [hidden email]
>>>>>>>> For additional commands, e-mail: [hidden email]
>>>> ---------------------------------------------------------------------
>>>>>>> To unsubscribe, e-mail: [hidden email]
>>>>>>> For additional commands, e-mail: [hidden email]
>>>>>>>
>>>> ---------------------------------------------------------------------
>>>>>> To unsubscribe, e-mail: [hidden email]
>>>>>> For additional commands, e-mail: [hidden email]
>>>>> --
>>>>> Uwe Schindler
>>>>> Achterdiek 19, 28357 Bremen
>>>>>
>>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__www.thetaphi.de&d=DwIFaQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=nlG5z5NcNdIbQAiX-BKNeyLlULCbaezrgocEvPhQkl4&m=6ksT9ArMj83Yxf_GrxLNeJ4UFEeKdVdLK0BlOT0d754&s=33f2nq9rOLI5pN9e_RYl_TiEKnP_f4WMZ__vqyz2bzo&e= 
>>>>
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: [hidden email]
>>>> For additional commands, e-mail: [hidden email]
>>> --
>>> Uwe Schindler
>>> Achterdiek 19, 28357 Bremen
>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__www.thetaphi.de&d=DwIFaQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=nlG5z5NcNdIbQAiX-BKNeyLlULCbaezrgocEvPhQkl4&m=owjI40OeLzt8gvPN44aTdndoiUel5E9Hqx1TEcoWk_Y&s=xbZedNkQXb5eQcw_K7lCOP7b5ToKJVZ1dCPY3hi836c&e= 
>>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

RE: Index-time boosting: Deprecated setBoost method

Uwe Schindler
Hi,

sorry I don't fully understand what you intend to do? If the boost values per field are static and used with exactly same value for every document, it's not needed a index time. You can just boost the field on the query side (e.g. using BoostQuery). Boosting every document with the same static values is an anti-pattern, that's something better suited for the query side - as you are more flexible.

If you need a different boost value per document, you can save that boost value in the index per document using a docvalues field (this consumes extra space, of course). Then you need the ExpressionQuery on the query side. But just because it looks like Javascript, it's not slow. The syntax is compiled to bytecode and directly included into the query execution as a dynamic java class, so it's very fast.

In short:
- If you need to have a different boost factor per field name that's constant for all documents, apply it at query time with BoostQuery.
- If you have to boost specific documents (e.g., top selling products), index a numeric docvalues field per document. On the query side you can use different query types to modify the score of each result based on the docvalues field. That can be done with Expression modules (using compiled Javascript) or by another query in Lucene that operates on ValueSource (e.g., FunctionQuery). The first one is easier to use for complex formulas.4

Uwe

-----
Uwe Schindler
Achterdiek 19, D-28357 Bremen
https://www.thetaphi.de
eMail: [hidden email]

> -----Original Message-----
> From: [hidden email] <[hidden email]>
> Sent: Monday, October 21, 2019 5:17 PM
> To: [hidden email]
> Cc: baris.kazar <[hidden email]>
> Subject: Re: Index-time boosting: Deprecated setBoost method
>
> Hi,-
>
> Sorry about the missing parts in previous post. please accept my
> apologies for that.
>
> i needed to add a few more questions/corrections/additions to the
> previous post:
>
> Main Question was: if boost is a single constant value, do we need the
> Javascript part below?
>
>
>
> === Indexing code snippet for Lucene version 6.6.0 and before===
>
> Document doc = new Document();
>
>
> 

Field  f1= new TextField("field1", "string1", Field.Store.YES);

>
> doc.add(f1); 
f1.setBoost(2.0f);


>
> Field f2 = new TextField("field2", "string2", Field.Store.YES);

>
> doc.add(f2);

>
> f2.setBoost(1.0f);


>
> === end of indexing code snippet for Lucene version 6.6.0 and before ===
>
>
> This turns into this where _boost1 field is associated with field1 and
>
> _boost2 field is associated with field2 field:
>
>
> In Indexing code:
>
> === begining of indexing code snippet ===
> Field  f1= new TextField("field1", "string1", Field.Store.YES);

>
> Field _boost1 = new NumericDocValuesField(“field1”, 2L);
> doc.add(_boost1);
>
> // If this boost value needs to be stored, a separate storedField
> instance needs to be added as well
> … ( i will post this soon)
>
> Field _boost2 = new NumericDocValuesField(“field2”, 1L);
> doc.add(_boost2);
>
> // If this boost value needs to be stored, a separate storedField
> instance needs to be added as well
> … ( i will post this soon)
>
> === end of indexing code snippet ===
>
>
> Now, in the searching code (i.e., at query time) should i need the
> FunctionScoreQuery because in this case
>
> the boost is just a constant value but not a function? However, constant
> value can be argued to be a function with the same value all the time, too.
>
>
> == begining of query time code snippet ===
> Expression expr = JavascriptCompiler.compile(“_boost1 + _boost2");
>
> 

// SimpleBindings just maps variables to SortField instances

>
> SimpleBindings bindings = new SimpleBindings();

>
> bindings.add(new SortField("_boost1", SortField.Type.LONG));
 
// These
> have to LONG type i think since NumericDocValuesField accepts "long"
> type only, am i right? Can this be DOUBLE type?
>
> bindings.add(new SortField("_boost2", SortField.Type.LONG));
 
// same
> question here
>
> // create a query that matches based on body:contents but

>
> // scores using expr

>
> Query query = new FunctionScoreQuery(

>
>      new TermQuery(new Term("field1", "term_to_look_for")),

>
> expr.getDoubleValuesSource(bindings));
>
> 
searcher.search(query, 10);
>
> === end of code snippet ===
>
>
> Best regards
>
>
> On 10/21/19 11:05 AM, [hidden email] wrote:
> > Hi,-
> >
> >  i would like to ask the following to make it clearer (for me at least):
> >
> > Document doc = new Document();
> >
> > 

Field  f1= new TextField("field1", "string1", Field.Store.YES);

> >
> > doc.add(f1); 
f1.setBoost(2.0f);


> >
> > Field f2 = new TextField("field2", "string2", Field.Store.YES);

> >
> > doc.add(f2);

> >
> > f2.setBoost(1.0f);


> >
> >
> > This turns into this where _boost1 field is associated with field1 and
> >
> > _boost2 field is associated with field2 field:
> >
> >
> > In Indexing code:
> >
> > Field  f1= new TextField("field1", "string1", Field.Store.YES);

> >
> > Field _boost1 = new NumericDocValuesField(“field1”, 2L);
> > doc.add(_boost1);
> >
> > // If this boost value needs to be stored, a separate storedField
> > instance needs to be added as well
> > … ( i will post this soon)
> >
> > Field _boost2 = new NumericDocValuesField(“field2”, 1L);
> > doc.add(_boost2);
> >
> > // If this boost value needs to be stored, a separate storedField
> > instance needs to be added as well
> > … ( i will post this soon)
> >
> >
> > Now, in the searching code (i.e., at query time) should i need the
> > FunctionScoreQuery because in this case
> >
> > the boost is just a constant value but not a function? However,
> > constant value can be argued to be a function with the same value all
> > the time, too.
> >
> >
> > Expression expr = JavascriptCompiler.compile(“_boost");
> >
> > 

// SimpleBindings just maps variables to SortField instances

> >
> > SimpleBindings bindings = new SimpleBindings();

> >
> > bindings.add(new SortField("_boost1", SortField.Type.SCORE));
 

> >
> > // create a query that matches based on body:contents but

> >
> > // scores using expr

> >
> > Query query = new FunctionScoreQuery(

> >
> >     new TermQuery(new Term("field1", "term_to_look_for")),

> >
> > expr.getDoubleValuesSource(bindings));
> >
> > 
searcher.search(query, 10);
> >
> >
> > So, if boost is a single constant value, do we need the Javascript
> > part above?
> >
> > Best regards
> >
> >
> > On 10/18/19 4:07 PM, [hidden email] wrote:
> >> Uwe,-
> >>
> >>  can this
> >> https://urldefense.proofpoint.com/v2/url?u=https-
> 3A__lucene.apache.org_core_7-5F7-
> 5F2_expressions_org_apache_lucene_expressions_Expression.html&d=DwID
> aQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=nlG5z5NcNdI
> bQAiX-
> BKNeyLlULCbaezrgocEvPhQkl4&m=MR2S9Z9HEge6s665mtGOFRHKGmuiVYkjp
> 4tXOciYl7A&s=tMCjb5H5KivfJsp-BfABonpjelgp6hn9cBg2GScCmic&e=
> >> doc example that You also gave be extended with NumericDocValuesField
> >> part that needs to be done at indexing time boosting, too?
> >>
> >> i see now why You meant that this is mixed type of boosting (i.e.,
> >> both indexing time and search time).
> >>
> >> I need then include this query mentioned in this example on these
> >> _score field (i would call it _boost field in my case) into my
> >> overall BooleanQuery.
> >>
> >> i will now try to combine these together and post here for future help.
> >>
> >> Best regards
> >>
> >>
> >> On 10/18/19 3:18 PM, Uwe Schindler wrote:
> >>> Hi,
> >>>
> >>> Read my original email! The index time values are written using
> >>> NumericDocValuesField. The expressions docs also refer to that when
> >>> the bindings are documented.
> >>>
> >>> It's separate from the indexed data (TextField). Think of it like an
> >>> additional numeric field in your database table with a factor in
> >>> each row.
> >>>
> >>> Uwe
> >>>
> >>> Am October 18, 2019 7:14:03 PM UTC schrieb [hidden email]:
> >>>> Uwe,-
> >>>>
> >>>> Two questions there:
> >>>>
> >>>> i guess this is applicable to TextField, too.
> >>>>
> >>>> And i was expecting a index writer object in the example for index
> >>>> time
> >>>>
> >>>> boosting.
> >>>>
> >>>> Best regards
> >>>>
> >>>>
> >>>> On 10/18/19 2:57 PM, Uwe Schindler wrote:
> >>>>> Sorry I was imprecise. It's a mix of both. The factors are stored per
> >>>> document in index (this is why I called it index time). During query
> >>>> time the expression use the index time values to fold them into the
> >>>> query boost at query time.
> >>>>> What's your problem with that approach?
> >>>>>
> >>>>> Uwe
> >>>>>
> >>>>> Am October 18, 2019 6:50:40 PM UTC schrieb [hidden email]:
> >>>>>> Uwe,-
> >>>>>>
> >>>>>>    Thanks, if possible i am looking for a pure Java methodology
> >>>>>> to do
> >>>> the
> >>>>>> index time boosting.
> >>>>>>
> >>>>>> This example looks like a search time boosting example:
> >>>>>>
> >>>>>>
> >>>> https://urldefense.proofpoint.com/v2/url?u=https-
> 3A__lucene.apache.org_core_7-5F7-
> 5F2_expressions_org_apache_lucene_expressions_Expression.html&d=DwIF
> aQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=nlG5z5NcNdI
> bQAiX-
> BKNeyLlULCbaezrgocEvPhQkl4&m=6m6i5zZXPZNP6DyVv_xG4vXnVTPEdfKLeLS
> vGjEXbyw&s=B5_kGwRIbAoGqL0-SVR9r3t78E5XUuzLT37TeyV-bv8&e=
> >>>>
> >>>>>>
> >>>>>>
> >>>>>> Best regards
> >>>>>>
> >>>>>> On 10/18/19 2:31 PM, Uwe Schindler wrote:
> >>>>>>> Hi,
> >>>>>>>
> >>>>>>>> Is there a working example for this? Is this mentioned in the
> >>>> Lucene
> >>>>>>>> Javadocs or any other docs so that i can look it?
> >>>>>>> To index the docvalues, see NumericDocValuesField (it can be
> added
> >>>> to
> >>>>>> documents like indexed or stored fields). You may have used them
> for
> >>>>>> sorting already.
> >>>>>>>> this methodology seems sort of like discouraging using index time
> >>>>>> boosting.
> >>>>>>> Not really. Many use this all the time. It's one of the killer
> >>>>>> features of both Solr and Elasticsearch. The problem was how the
> >>>>>> Document.setBoost()worked (it did not work correctly, see below).
> >>>>>>>> Previous setBoost method call was fine and easy to use.
> >>>>>>>> Did it have some performance issues and then is that why it was
> >>>>>> deprecated?
> >>>>>>> No the reason for deprecating this was for several reasons:
> >>>> setBoost
> >>>>>> was not doing what the user had expected. Internally the boost value
> >>>>>> was just multiplied into the document norm factor (which is
> >>>> internally
> >>>>>> also a docvalues field). The norm factors are only very inprecise
> >>>>>> floats stored in a byte, so precision is not well. If you put some
> >>>>>> values into it and the length norm was already consuming all bits,
> >>>> the
> >>>>>> boosting was very coarse. It was also only multiplied into and most
> >>>>>> users want to do some stuff like record click counts in the index
> >>>> and
> >>>>>> then boost for example with the logarithm or some other function. If
> >>>>>> the boost is just multiplied into the length norm you have no
> >>>>>> flexibility at all.
> >>>>>>> In addition you can have several docvalues fields and use their
> >>>>>> values in a function (e.g. one field with click count and another
> >>>> one
> >>>>>> with product price). After that you can combine click count and
> >>>> price
> >>>>>> (which can be modified indipenently during index updates) and
> change
> >>>>>> boost to boost lower price and higher click count up.
> >>>>>>> This is what you can do with the expressions module. You just give
> >>>> it
> >>>>>> a function.
> >>>>>>> Here is an example, the second example is using a
> >>>> FunctionScoreQuery
> >>>>>> that modifies the score based on the function and the given
> >>>> docvalues:
> >>>> https://urldefense.proofpoint.com/v2/url?u=https-
> 3A__lucene.apache.org_core_7-5F7-
> 5F2_expressions_org_apache_lucene_expressions_Expression.html&d=DwIF
> aQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=nlG5z5NcNdI
> bQAiX-
> BKNeyLlULCbaezrgocEvPhQkl4&m=6m6i5zZXPZNP6DyVv_xG4vXnVTPEdfKLeLS
> vGjEXbyw&s=B5_kGwRIbAoGqL0-SVR9r3t78E5XUuzLT37TeyV-bv8&e=
> >>>>
> >>>>>>>> FunctionScoreQuery usage with MultiFieldQueryParser would also
> be
> >>>>>> nice
> >>>>>>>> where
> >>>>>>>>
> >>>>>>>> MultiFieldQuery already has boosts field to do this in its
> >>>>>> constructor.
> >>>>>>> The boots in the query parser are applied for fields during query
> >>>>>> time (to have a different weight per field). Index time boosting is
> >>>> per
> >>>>>> document. So you can combine both.
> >>>>>>>> Maybe it is not needed with MultiFieldQueryParser.
> >>>>>>> You use MultiFieldQueryParser to adjust weights of the fields (e.g.
> >>>>>> title versus body). The parsed query is then wrapped with an
> >>>> expression
> >>>>>> that modifies the score per document according to the docvalues.
> >>>>>>> Uwe
> >>>>>>>
> >>>>>>>> On 10/18/19 1:28 PM, Uwe Schindler wrote:
> >>>>>>>>
> >>>>>>>>> Hi,
> >>>>>>>>>
> >>>>>>>>> that's not true. You can do index time boosting, but you need to
> >>>> do
> >>>>>> that
> >>>>>>>> using a separate field. You just index a numeric docvalues field
> >>>>>> (which may
> >>>>>>>> contain a long or float value per document). Later you wrap your
> >>>>>> query with
> >>>>>>>> some FunctionScoreQuery (e.g., use the Javascript function query
> >>>>>> syntax in
> >>>>>>>> the expressions module). This allows you to compile a javascript
> >>>>>> function
> >>>>>>>> that calculated the final score based on the score returned by the
> >>>>>> inner query
> >>>>>>>> and combines them with docvalues that were indexed per
> document.
> >>>>>>>>> Uwe
> >>>>>>>>>
> >>>>>>>>> -----
> >>>>>>>>> Uwe Schindler
> >>>>>>>>> Achterdiek 19, D-28357 Bremen
> >>>>>>>>> https://urldefense.proofpoint.com/v2/url?u=https-
> >>>>>>>>
> 3A__www.thetaphi.de&d=DwIFaQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIr
> >>>>>>>> MUB65eapI_JnE&r=nlG5z5NcNdIbQAiX-
> >>>>>>>>
> BKNeyLlULCbaezrgocEvPhQkl4&m=6rVk8db2H8dAcjS3WCWmAPd08C7JQCvZ
> >>>>>>>>
> 8W80yE9L5xY&s=zgKmnmP9gLG4DlEnAfDdtBMEzPXtHNVYojxXIKEnQgs&e=
> >>>>>>>>> eMail: [hidden email]
> >>>>>>>>>
> >>>>>>>>>> -----Original Message-----
> >>>>>>>>>> From: [hidden email] <[hidden email]>
> >>>>>>>>>> Sent: Friday, October 18, 2019 5:28 PM
> >>>>>>>>>> To: [hidden email]
> >>>>>>>>>> Cc: [hidden email]
> >>>>>>>>>> Subject: Re: Index-time boosting: Deprecated setBoost method
> >>>>>>>>>>
> >>>>>>>>>> It looks like index-time boosting (field) is not possible since
> >>>>>> Lucene
> >>>>>>>>>> version 7.7.2 and
> >>>>>>>>>>
> >>>>>>>>>> i was using before for another case the BoostQuery at search
> >>>> time
> >>>>>> for
> >>>>>>>>>> boosting and
> >>>>>>>>>>
> >>>>>>>>>> this seems to be the only boosting option now in Lucene.
> >>>>>>>>>>
> >>>>>>>>>> Best regards
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> On 10/18/19 10:01 AM, [hidden email] wrote:
> >>>>>>>>>>> Hi,-
> >>>>>>>>>>>
> >>>>>>>>>>> i saw this in the Field class docs and i am figuring out the
> >>>>>> following
> >>>>>>>>>>> note in the docs:
> >>>>>>>>>>>
> >>>>>>>>>>> setBoost(float boost)
> >>>>>>>>>>> Deprecated.
> >>>>>>>>>>> Index-time boosts are deprecated, please index index-time
> >>>> scoring
> >>>>>>>>>>> factors into a doc value field and combine them with the score
> >>>> at
> >>>>>>>>>>> query time using eg. FunctionScoreQuery.
> >>>>>>>>>>>
> >>>>>>>>>>> I appreciate this note. Is there an example about this? I wish
> >>>>>> docs
> >>>>>>>>>>> would give a simple example to further help.
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>> https://urldefense.proofpoint.com/v2/url?u=https-
> >>>>>>>> 3A__lucene.apache.org_core_6-5F6-
> >>>>>>>>
> 5F0__core_org_apache_lucene_document_&d=DwIFaQ&c=RoP1YumCXCga
> >>>>>>>> WHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=nlG5z5NcNdIbQAiX-
> >>>>>>>>
> BKNeyLlULCbaezrgocEvPhQkl4&m=6rVk8db2H8dAcjS3WCWmAPd08C7JQCvZ
> >>>>>>>>
> 8W80yE9L5xY&s=rIVbw3_TGEwpaet5ibCeYze6vSDUiPhwOzlV0z484fM&e=
> >>>>>>>>>> Field.html
> >>>>>>>>>>> vs
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>> https://urldefense.proofpoint.com/v2/url?u=https-
> >>>>>>>> 3A__lucene.apache.org_core_7-5F7-
> >>>>>>>>
> 5F2_core_org_apache_lucene_document_F&d=DwIFaQ&c=RoP1YumCXCgaW
> >>>>>>>> HvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=nlG5z5NcNdIbQAiX-
> >>>>>>>>
> BKNeyLlULCbaezrgocEvPhQkl4&m=6rVk8db2H8dAcjS3WCWmAPd08C7JQCvZ
> >>>>>>>>
> 8W80yE9L5xY&s=yt1toHHZQBqd3qKpWeSzywGJhy928Q5qaEO4v9Lj3vg&e=
> >>>>>>>>>> ield.html
> >>>>>>>>>>> Best regards
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>> ---------------------------------------------------------------------
> >>>>>>>>>> To unsubscribe, e-mail: java-user-
> [hidden email]
> >>>>>>>>>> For additional commands, e-mail:
> >>>> [hidden email]
> >>>> ---------------------------------------------------------------------
> >>>>>>>>> To unsubscribe, e-mail: java-user-
> [hidden email]
> >>>>>>>>> For additional commands, e-mail: java-user-
> [hidden email]
> >>>>>>>>>
> >>>> ---------------------------------------------------------------------
> >>>>>>>> To unsubscribe, e-mail: [hidden email]
> >>>>>>>> For additional commands, e-mail: java-user-
> [hidden email]
> >>>> ---------------------------------------------------------------------
> >>>>>>> To unsubscribe, e-mail: [hidden email]
> >>>>>>> For additional commands, e-mail: java-user-
> [hidden email]
> >>>>>>>
> >>>> ---------------------------------------------------------------------
> >>>>>> To unsubscribe, e-mail: [hidden email]
> >>>>>> For additional commands, e-mail: [hidden email]
> >>>>> --
> >>>>> Uwe Schindler
> >>>>> Achterdiek 19, 28357 Bremen
> >>>>>
> >>>> https://urldefense.proofpoint.com/v2/url?u=https-
> 3A__www.thetaphi.de&d=DwIFaQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIr
> MUB65eapI_JnE&r=nlG5z5NcNdIbQAiX-
> BKNeyLlULCbaezrgocEvPhQkl4&m=6ksT9ArMj83Yxf_GrxLNeJ4UFEeKdVdLK0Bl
> OT0d754&s=33f2nq9rOLI5pN9e_RYl_TiEKnP_f4WMZ__vqyz2bzo&e=
> >>>>
> >>>>
> >>>> ---------------------------------------------------------------------
> >>>> To unsubscribe, e-mail: [hidden email]
> >>>> For additional commands, e-mail: [hidden email]
> >>> --
> >>> Uwe Schindler
> >>> Achterdiek 19, 28357 Bremen
> >>> https://urldefense.proofpoint.com/v2/url?u=https-
> 3A__www.thetaphi.de&d=DwIFaQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIr
> MUB65eapI_JnE&r=nlG5z5NcNdIbQAiX-
> BKNeyLlULCbaezrgocEvPhQkl4&m=owjI40OeLzt8gvPN44aTdndoiUel5E9Hqx1T
> EcoWk_Y&s=xbZedNkQXb5eQcw_K7lCOP7b5ToKJVZ1dCPY3hi836c&e=
> >>>
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: [hidden email]
> > For additional commands, e-mail: [hidden email]
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Index-time boosting: Deprecated setBoost method

baris.kazar
Hi,-

Thanks and i appreciate the disccussion.

Let me please  ask this way, i think i give too much info at one time:

Currently i have this:



Field  f1= new TextField("field1", "string1", Field.Store.YES);


doc.add(f1); 
f1.setBoost(2.0f);



Field f2 = new TextField("field2", "string2", Field.Store.YES);


doc.add(f2);


f2.setBoost(1.0f);




But this fails with Lucene 7.7.2.


Probably it is more efficient and more flexible to fix this by using
BoostQuery.

However, what could be the fix with index time boosting? the code in my
previous post was trying to do that.

Best regards


On 10/21/19 12:34 PM, Uwe Schindler wrote:

> Hi,
>
> sorry I don't fully understand what you intend to do? If the boost values per field are static and used with exactly same value for every document, it's not needed a index time. You can just boost the field on the query side (e.g. using BoostQuery). Boosting every document with the same static values is an anti-pattern, that's something better suited for the query side - as you are more flexible.
>
> If you need a different boost value per document, you can save that boost value in the index per document using a docvalues field (this consumes extra space, of course). Then you need the ExpressionQuery on the query side. But just because it looks like Javascript, it's not slow. The syntax is compiled to bytecode and directly included into the query execution as a dynamic java class, so it's very fast.
>
> In short:
> - If you need to have a different boost factor per field name that's constant for all documents, apply it at query time with BoostQuery.
> - If you have to boost specific documents (e.g., top selling products), index a numeric docvalues field per document. On the query side you can use different query types to modify the score of each result based on the docvalues field. That can be done with Expression modules (using compiled Javascript) or by another query in Lucene that operates on ValueSource (e.g., FunctionQuery). The first one is easier to use for complex formulas.4
>
> Uwe
>
> -----
> Uwe Schindler
> Achterdiek 19, D-28357 Bremen
> https://urldefense.proofpoint.com/v2/url?u=https-3A__www.thetaphi.de&d=DwIFaQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=nlG5z5NcNdIbQAiX-BKNeyLlULCbaezrgocEvPhQkl4&m=70RoM6loHhMGsp95phVzGQf8w5JxW7gXT5XnleMKrOs&s=td7cUfd22mXljSuvkUPXDunkIs_eO4GxdvHHxD2CTk0&e=
> eMail: [hidden email]
>
>> -----Original Message-----
>> From: [hidden email] <[hidden email]>
>> Sent: Monday, October 21, 2019 5:17 PM
>> To: [hidden email]
>> Cc: baris.kazar <[hidden email]>
>> Subject: Re: Index-time boosting: Deprecated setBoost method
>>
>> Hi,-
>>
>> Sorry about the missing parts in previous post. please accept my
>> apologies for that.
>>
>> i needed to add a few more questions/corrections/additions to the
>> previous post:
>>
>> Main Question was: if boost is a single constant value, do we need the
>> Javascript part below?
>>
>>
>>
>> === Indexing code snippet for Lucene version 6.6.0 and before===
>>
>> Document doc = new Document();
>>
>>
>> 

Field  f1= new TextField("field1", "string1", Field.Store.YES);

>>
>> doc.add(f1); 
f1.setBoost(2.0f);


>>
>> Field f2 = new TextField("field2", "string2", Field.Store.YES);

>>
>> doc.add(f2);

>>
>> f2.setBoost(1.0f);


>>
>> === end of indexing code snippet for Lucene version 6.6.0 and before ===
>>
>>
>> This turns into this where _boost1 field is associated with field1 and
>>
>> _boost2 field is associated with field2 field:
>>
>>
>> In Indexing code:
>>
>> === begining of indexing code snippet ===
>> Field  f1= new TextField("field1", "string1", Field.Store.YES);

>>
>> Field _boost1 = new NumericDocValuesField(“field1”, 2L);
>> doc.add(_boost1);
>>
>> // If this boost value needs to be stored, a separate storedField
>> instance needs to be added as well
>> … ( i will post this soon)
>>
>> Field _boost2 = new NumericDocValuesField(“field2”, 1L);
>> doc.add(_boost2);
>>
>> // If this boost value needs to be stored, a separate storedField
>> instance needs to be added as well
>> … ( i will post this soon)
>>
>> === end of indexing code snippet ===
>>
>>
>> Now, in the searching code (i.e., at query time) should i need the
>> FunctionScoreQuery because in this case
>>
>> the boost is just a constant value but not a function? However, constant
>> value can be argued to be a function with the same value all the time, too.
>>
>>
>> == begining of query time code snippet ===
>> Expression expr = JavascriptCompiler.compile(“_boost1 + _boost2");
>>
>> 

// SimpleBindings just maps variables to SortField instances

>>
>> SimpleBindings bindings = new SimpleBindings();

>>
>> bindings.add(new SortField("_boost1", SortField.Type.LONG));
 
// These
>> have to LONG type i think since NumericDocValuesField accepts "long"
>> type only, am i right? Can this be DOUBLE type?
>>
>> bindings.add(new SortField("_boost2", SortField.Type.LONG));
 
// same
>> question here
>>
>> // create a query that matches based on body:contents but

>>
>> // scores using expr

>>
>> Query query = new FunctionScoreQuery(

>>
>>       new TermQuery(new Term("field1", "term_to_look_for")),

>>
>> expr.getDoubleValuesSource(bindings));
>>
>> 
searcher.search(query, 10);
>>
>> === end of code snippet ===
>>
>>
>> Best regards
>>
>>
>> On 10/21/19 11:05 AM, [hidden email] wrote:
>>> Hi,-
>>>
>>>   i would like to ask the following to make it clearer (for me at least):
>>>
>>> Document doc = new Document();
>>>
>>> 

Field  f1= new TextField("field1", "string1", Field.Store.YES);

>>>
>>> doc.add(f1); 
f1.setBoost(2.0f);


>>>
>>> Field f2 = new TextField("field2", "string2", Field.Store.YES);

>>>
>>> doc.add(f2);

>>>
>>> f2.setBoost(1.0f);


>>>
>>>
>>> This turns into this where _boost1 field is associated with field1 and
>>>
>>> _boost2 field is associated with field2 field:
>>>
>>>
>>> In Indexing code:
>>>
>>> Field  f1= new TextField("field1", "string1", Field.Store.YES);

>>>
>>> Field _boost1 = new NumericDocValuesField(“field1”, 2L);
>>> doc.add(_boost1);
>>>
>>> // If this boost value needs to be stored, a separate storedField
>>> instance needs to be added as well
>>> … ( i will post this soon)
>>>
>>> Field _boost2 = new NumericDocValuesField(“field2”, 1L);
>>> doc.add(_boost2);
>>>
>>> // If this boost value needs to be stored, a separate storedField
>>> instance needs to be added as well
>>> … ( i will post this soon)
>>>
>>>
>>> Now, in the searching code (i.e., at query time) should i need the
>>> FunctionScoreQuery because in this case
>>>
>>> the boost is just a constant value but not a function? However,
>>> constant value can be argued to be a function with the same value all
>>> the time, too.
>>>
>>>
>>> Expression expr = JavascriptCompiler.compile(“_boost");
>>>
>>> 

// SimpleBindings just maps variables to SortField instances

>>>
>>> SimpleBindings bindings = new SimpleBindings();

>>>
>>> bindings.add(new SortField("_boost1", SortField.Type.SCORE));
 

>>>
>>> // create a query that matches based on body:contents but

>>>
>>> // scores using expr

>>>
>>> Query query = new FunctionScoreQuery(

>>>
>>>      new TermQuery(new Term("field1", "term_to_look_for")),

>>>
>>> expr.getDoubleValuesSource(bindings));
>>>
>>> 
searcher.search(query, 10);
>>>
>>>
>>> So, if boost is a single constant value, do we need the Javascript
>>> part above?
>>>
>>> Best regards
>>>
>>>
>>> On 10/18/19 4:07 PM, [hidden email] wrote:
>>>> Uwe,-
>>>>
>>>>   can this
>>>> https://urldefense.proofpoint.com/v2/url?u=https-
>> 3A__lucene.apache.org_core_7-5F7-
>> 5F2_expressions_org_apache_lucene_expressions_Expression.html&d=DwID
>> aQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=nlG5z5NcNdI
>> bQAiX-
>> BKNeyLlULCbaezrgocEvPhQkl4&m=MR2S9Z9HEge6s665mtGOFRHKGmuiVYkjp
>> 4tXOciYl7A&s=tMCjb5H5KivfJsp-BfABonpjelgp6hn9cBg2GScCmic&e=
>>>> doc example that You also gave be extended with NumericDocValuesField
>>>> part that needs to be done at indexing time boosting, too?
>>>>
>>>> i see now why You meant that this is mixed type of boosting (i.e.,
>>>> both indexing time and search time).
>>>>
>>>> I need then include this query mentioned in this example on these
>>>> _score field (i would call it _boost field in my case) into my
>>>> overall BooleanQuery.
>>>>
>>>> i will now try to combine these together and post here for future help.
>>>>
>>>> Best regards
>>>>
>>>>
>>>> On 10/18/19 3:18 PM, Uwe Schindler wrote:
>>>>> Hi,
>>>>>
>>>>> Read my original email! The index time values are written using
>>>>> NumericDocValuesField. The expressions docs also refer to that when
>>>>> the bindings are documented.
>>>>>
>>>>> It's separate from the indexed data (TextField). Think of it like an
>>>>> additional numeric field in your database table with a factor in
>>>>> each row.
>>>>>
>>>>> Uwe
>>>>>
>>>>> Am October 18, 2019 7:14:03 PM UTC schrieb [hidden email]:
>>>>>> Uwe,-
>>>>>>
>>>>>> Two questions there:
>>>>>>
>>>>>> i guess this is applicable to TextField, too.
>>>>>>
>>>>>> And i was expecting a index writer object in the example for index
>>>>>> time
>>>>>>
>>>>>> boosting.
>>>>>>
>>>>>> Best regards
>>>>>>
>>>>>>
>>>>>> On 10/18/19 2:57 PM, Uwe Schindler wrote:
>>>>>>> Sorry I was imprecise. It's a mix of both. The factors are stored per
>>>>>> document in index (this is why I called it index time). During query
>>>>>> time the expression use the index time values to fold them into the
>>>>>> query boost at query time.
>>>>>>> What's your problem with that approach?
>>>>>>>
>>>>>>> Uwe
>>>>>>>
>>>>>>> Am October 18, 2019 6:50:40 PM UTC schrieb [hidden email]:
>>>>>>>> Uwe,-
>>>>>>>>
>>>>>>>>     Thanks, if possible i am looking for a pure Java methodology
>>>>>>>> to do
>>>>>> the
>>>>>>>> index time boosting.
>>>>>>>>
>>>>>>>> This example looks like a search time boosting example:
>>>>>>>>
>>>>>>>>
>>>>>> https://urldefense.proofpoint.com/v2/url?u=https-
>> 3A__lucene.apache.org_core_7-5F7-
>> 5F2_expressions_org_apache_lucene_expressions_Expression.html&d=DwIF
>> aQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=nlG5z5NcNdI
>> bQAiX-
>> BKNeyLlULCbaezrgocEvPhQkl4&m=6m6i5zZXPZNP6DyVv_xG4vXnVTPEdfKLeLS
>> vGjEXbyw&s=B5_kGwRIbAoGqL0-SVR9r3t78E5XUuzLT37TeyV-bv8&e=
>>>>>>>>
>>>>>>>> Best regards
>>>>>>>>
>>>>>>>> On 10/18/19 2:31 PM, Uwe Schindler wrote:
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>>> Is there a working example for this? Is this mentioned in the
>>>>>> Lucene
>>>>>>>>>> Javadocs or any other docs so that i can look it?
>>>>>>>>> To index the docvalues, see NumericDocValuesField (it can be
>> added
>>>>>> to
>>>>>>>> documents like indexed or stored fields). You may have used them
>> for
>>>>>>>> sorting already.
>>>>>>>>>> this methodology seems sort of like discouraging using index time
>>>>>>>> boosting.
>>>>>>>>> Not really. Many use this all the time. It's one of the killer
>>>>>>>> features of both Solr and Elasticsearch. The problem was how the
>>>>>>>> Document.setBoost()worked (it did not work correctly, see below).
>>>>>>>>>> Previous setBoost method call was fine and easy to use.
>>>>>>>>>> Did it have some performance issues and then is that why it was
>>>>>>>> deprecated?
>>>>>>>>> No the reason for deprecating this was for several reasons:
>>>>>> setBoost
>>>>>>>> was not doing what the user had expected. Internally the boost value
>>>>>>>> was just multiplied into the document norm factor (which is
>>>>>> internally
>>>>>>>> also a docvalues field). The norm factors are only very inprecise
>>>>>>>> floats stored in a byte, so precision is not well. If you put some
>>>>>>>> values into it and the length norm was already consuming all bits,
>>>>>> the
>>>>>>>> boosting was very coarse. It was also only multiplied into and most
>>>>>>>> users want to do some stuff like record click counts in the index
>>>>>> and
>>>>>>>> then boost for example with the logarithm or some other function. If
>>>>>>>> the boost is just multiplied into the length norm you have no
>>>>>>>> flexibility at all.
>>>>>>>>> In addition you can have several docvalues fields and use their
>>>>>>>> values in a function (e.g. one field with click count and another
>>>>>> one
>>>>>>>> with product price). After that you can combine click count and
>>>>>> price
>>>>>>>> (which can be modified indipenently during index updates) and
>> change
>>>>>>>> boost to boost lower price and higher click count up.
>>>>>>>>> This is what you can do with the expressions module. You just give
>>>>>> it
>>>>>>>> a function.
>>>>>>>>> Here is an example, the second example is using a
>>>>>> FunctionScoreQuery
>>>>>>>> that modifies the score based on the function and the given
>>>>>> docvalues:
>>>>>> https://urldefense.proofpoint.com/v2/url?u=https-
>> 3A__lucene.apache.org_core_7-5F7-
>> 5F2_expressions_org_apache_lucene_expressions_Expression.html&d=DwIF
>> aQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=nlG5z5NcNdI
>> bQAiX-
>> BKNeyLlULCbaezrgocEvPhQkl4&m=6m6i5zZXPZNP6DyVv_xG4vXnVTPEdfKLeLS
>> vGjEXbyw&s=B5_kGwRIbAoGqL0-SVR9r3t78E5XUuzLT37TeyV-bv8&e=
>>>>>>>>>> FunctionScoreQuery usage with MultiFieldQueryParser would also
>> be
>>>>>>>> nice
>>>>>>>>>> where
>>>>>>>>>>
>>>>>>>>>> MultiFieldQuery already has boosts field to do this in its
>>>>>>>> constructor.
>>>>>>>>> The boots in the query parser are applied for fields during query
>>>>>>>> time (to have a different weight per field). Index time boosting is
>>>>>> per
>>>>>>>> document. So you can combine both.
>>>>>>>>>> Maybe it is not needed with MultiFieldQueryParser.
>>>>>>>>> You use MultiFieldQueryParser to adjust weights of the fields (e.g.
>>>>>>>> title versus body). The parsed query is then wrapped with an
>>>>>> expression
>>>>>>>> that modifies the score per document according to the docvalues.
>>>>>>>>> Uwe
>>>>>>>>>
>>>>>>>>>> On 10/18/19 1:28 PM, Uwe Schindler wrote:
>>>>>>>>>>
>>>>>>>>>>> Hi,
>>>>>>>>>>>
>>>>>>>>>>> that's not true. You can do index time boosting, but you need to
>>>>>> do
>>>>>>>> that
>>>>>>>>>> using a separate field. You just index a numeric docvalues field
>>>>>>>> (which may
>>>>>>>>>> contain a long or float value per document). Later you wrap your
>>>>>>>> query with
>>>>>>>>>> some FunctionScoreQuery (e.g., use the Javascript function query
>>>>>>>> syntax in
>>>>>>>>>> the expressions module). This allows you to compile a javascript
>>>>>>>> function
>>>>>>>>>> that calculated the final score based on the score returned by the
>>>>>>>> inner query
>>>>>>>>>> and combines them with docvalues that were indexed per
>> document.
>>>>>>>>>>> Uwe
>>>>>>>>>>>
>>>>>>>>>>> -----
>>>>>>>>>>> Uwe Schindler
>>>>>>>>>>> Achterdiek 19, D-28357 Bremen
>>>>>>>>>>> https://urldefense.proofpoint.com/v2/url?u=https-
>> 3A__www.thetaphi.de&d=DwIFaQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIr
>>>>>>>>>> MUB65eapI_JnE&r=nlG5z5NcNdIbQAiX-
>>>>>>>>>>
>> BKNeyLlULCbaezrgocEvPhQkl4&m=6rVk8db2H8dAcjS3WCWmAPd08C7JQCvZ
>> 8W80yE9L5xY&s=zgKmnmP9gLG4DlEnAfDdtBMEzPXtHNVYojxXIKEnQgs&e=
>>>>>>>>>>> eMail: [hidden email]
>>>>>>>>>>>
>>>>>>>>>>>> -----Original Message-----
>>>>>>>>>>>> From: [hidden email] <[hidden email]>
>>>>>>>>>>>> Sent: Friday, October 18, 2019 5:28 PM
>>>>>>>>>>>> To: [hidden email]
>>>>>>>>>>>> Cc: [hidden email]
>>>>>>>>>>>> Subject: Re: Index-time boosting: Deprecated setBoost method
>>>>>>>>>>>>
>>>>>>>>>>>> It looks like index-time boosting (field) is not possible since
>>>>>>>> Lucene
>>>>>>>>>>>> version 7.7.2 and
>>>>>>>>>>>>
>>>>>>>>>>>> i was using before for another case the BoostQuery at search
>>>>>> time
>>>>>>>> for
>>>>>>>>>>>> boosting and
>>>>>>>>>>>>
>>>>>>>>>>>> this seems to be the only boosting option now in Lucene.
>>>>>>>>>>>>
>>>>>>>>>>>> Best regards
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On 10/18/19 10:01 AM, [hidden email] wrote:
>>>>>>>>>>>>> Hi,-
>>>>>>>>>>>>>
>>>>>>>>>>>>> i saw this in the Field class docs and i am figuring out the
>>>>>>>> following
>>>>>>>>>>>>> note in the docs:
>>>>>>>>>>>>>
>>>>>>>>>>>>> setBoost(float boost)
>>>>>>>>>>>>> Deprecated.
>>>>>>>>>>>>> Index-time boosts are deprecated, please index index-time
>>>>>> scoring
>>>>>>>>>>>>> factors into a doc value field and combine them with the score
>>>>>> at
>>>>>>>>>>>>> query time using eg. FunctionScoreQuery.
>>>>>>>>>>>>>
>>>>>>>>>>>>> I appreciate this note. Is there an example about this? I wish
>>>>>>>> docs
>>>>>>>>>>>>> would give a simple example to further help.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>> https://urldefense.proofpoint.com/v2/url?u=https-
>>>>>>>>>> 3A__lucene.apache.org_core_6-5F6-
>>>>>>>>>>
>> 5F0__core_org_apache_lucene_document_&d=DwIFaQ&c=RoP1YumCXCga
>>>>>>>>>> WHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=nlG5z5NcNdIbQAiX-
>>>>>>>>>>
>> BKNeyLlULCbaezrgocEvPhQkl4&m=6rVk8db2H8dAcjS3WCWmAPd08C7JQCvZ
>> 8W80yE9L5xY&s=rIVbw3_TGEwpaet5ibCeYze6vSDUiPhwOzlV0z484fM&e=
>>>>>>>>>>>> Field.html
>>>>>>>>>>>>> vs
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>> https://urldefense.proofpoint.com/v2/url?u=https-
>>>>>>>>>> 3A__lucene.apache.org_core_7-5F7-
>>>>>>>>>>
>> 5F2_core_org_apache_lucene_document_F&d=DwIFaQ&c=RoP1YumCXCgaW
>>>>>>>>>> HvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=nlG5z5NcNdIbQAiX-
>>>>>>>>>>
>> BKNeyLlULCbaezrgocEvPhQkl4&m=6rVk8db2H8dAcjS3WCWmAPd08C7JQCvZ
>> 8W80yE9L5xY&s=yt1toHHZQBqd3qKpWeSzywGJhy928Q5qaEO4v9Lj3vg&e=
>>>>>>>>>>>> ield.html
>>>>>>>>>>>>> Best regards
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>> ---------------------------------------------------------------------
>>>>>>>>>>>> To unsubscribe, e-mail: java-user-
>> [hidden email]
>>>>>>>>>>>> For additional commands, e-mail:
>>>>>> [hidden email]
>>>>>> ---------------------------------------------------------------------
>>>>>>>>>>> To unsubscribe, e-mail: java-user-
>> [hidden email]
>>>>>>>>>>> For additional commands, e-mail: java-user-
>> [hidden email]
>>>>>> ---------------------------------------------------------------------
>>>>>>>>>> To unsubscribe, e-mail: [hidden email]
>>>>>>>>>> For additional commands, e-mail: java-user-
>> [hidden email]
>>>>>> ---------------------------------------------------------------------
>>>>>>>>> To unsubscribe, e-mail: [hidden email]
>>>>>>>>> For additional commands, e-mail: java-user-
>> [hidden email]
>>>>>> ---------------------------------------------------------------------
>>>>>>>> To unsubscribe, e-mail: [hidden email]
>>>>>>>> For additional commands, e-mail: [hidden email]
>>>>>>> --
>>>>>>> Uwe Schindler
>>>>>>> Achterdiek 19, 28357 Bremen
>>>>>>>
>>>>>> https://urldefense.proofpoint.com/v2/url?u=https-
>> 3A__www.thetaphi.de&d=DwIFaQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIr
>> MUB65eapI_JnE&r=nlG5z5NcNdIbQAiX-
>> BKNeyLlULCbaezrgocEvPhQkl4&m=6ksT9ArMj83Yxf_GrxLNeJ4UFEeKdVdLK0Bl
>> OT0d754&s=33f2nq9rOLI5pN9e_RYl_TiEKnP_f4WMZ__vqyz2bzo&e=
>>>>>>
>>>>>> ---------------------------------------------------------------------
>>>>>> To unsubscribe, e-mail: [hidden email]
>>>>>> For additional commands, e-mail: [hidden email]
>>>>> --
>>>>> Uwe Schindler
>>>>> Achterdiek 19, 28357 Bremen
>>>>> https://urldefense.proofpoint.com/v2/url?u=https-
>> 3A__www.thetaphi.de&d=DwIFaQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIr
>> MUB65eapI_JnE&r=nlG5z5NcNdIbQAiX-
>> BKNeyLlULCbaezrgocEvPhQkl4&m=owjI40OeLzt8gvPN44aTdndoiUel5E9Hqx1T
>> EcoWk_Y&s=xbZedNkQXb5eQcw_K7lCOP7b5ToKJVZ1dCPY3hi836c&e=
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: [hidden email]
>>> For additional commands, e-mail: [hidden email]
>>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [hidden email]
>> For additional commands, e-mail: [hidden email]
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

RE: Index-time boosting: Deprecated setBoost method

Uwe Schindler
Hi,

As I said, before that is a misuse of index-time boosting. In addition in previous versions it did not even work correctly, because of query normalization it was normalized away anyways. And on top, to change it your have to reindex.

What you intend to do is a typical use case for query time boosting with BoostQuery. That is explained in almost every book about search, like those about Solr or Elasticsearch.

Most query parsers also allow to also add boost factors for fields, e.g. SimpleQueryParser (for humans that need simple syntax without fields). There you give a list of fields and boost factors.

Uwe

-----
Uwe Schindler
Achterdiek 19, D-28357 Bremen
https://www.thetaphi.de
eMail: [hidden email]

> -----Original Message-----
> From: [hidden email] <[hidden email]>
> Sent: Monday, October 21, 2019 6:45 PM
> To: [hidden email]
> Cc: baris.kazar <[hidden email]>
> Subject: Re: Index-time boosting: Deprecated setBoost method
>
> Hi,-
>
> Thanks and i appreciate the disccussion.
>
> Let me please  ask this way, i think i give too much info at one time:
>
> Currently i have this:
>
> 

Field  f1= new TextField("field1", "string1", Field.Store.YES);

>
> doc.add(f1); 
f1.setBoost(2.0f);


>
> Field f2 = new TextField("field2", "string2", Field.Store.YES);

>
> doc.add(f2);

>
> f2.setBoost(1.0f);


>
>
> But this fails with Lucene 7.7.2.
>
>
> Probably it is more efficient and more flexible to fix this by using
> BoostQuery.
>
> However, what could be the fix with index time boosting? the code in my
> previous post was trying to do that.
>
> Best regards
>
>
> On 10/21/19 12:34 PM, Uwe Schindler wrote:
> > Hi,
> >
> > sorry I don't fully understand what you intend to do? If the boost values
> per field are static and used with exactly same value for every document, it's
> not needed a index time. You can just boost the field on the query side (e.g.
> using BoostQuery). Boosting every document with the same static values is
> an anti-pattern, that's something better suited for the query side - as you are
> more flexible.
> >
> > If you need a different boost value per document, you can save that boost
> value in the index per document using a docvalues field (this consumes extra
> space, of course). Then you need the ExpressionQuery on the query side. But
> just because it looks like Javascript, it's not slow. The syntax is compiled to
> bytecode and directly included into the query execution as a dynamic java
> class, so it's very fast.
> >
> > In short:
> > - If you need to have a different boost factor per field name that's constant
> for all documents, apply it at query time with BoostQuery.
> > - If you have to boost specific documents (e.g., top selling products), index
> a numeric docvalues field per document. On the query side you can use
> different query types to modify the score of each result based on the
> docvalues field. That can be done with Expression modules (using compiled
> Javascript) or by another query in Lucene that operates on ValueSource (e.g.,
> FunctionQuery). The first one is easier to use for complex formulas.4
> >
> > Uwe
> >
> > -----
> > Uwe Schindler
> > Achterdiek 19, D-28357 Bremen
> > https://urldefense.proofpoint.com/v2/url?u=https-
> 3A__www.thetaphi.de&d=DwIFaQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIr
> MUB65eapI_JnE&r=nlG5z5NcNdIbQAiX-
> BKNeyLlULCbaezrgocEvPhQkl4&m=70RoM6loHhMGsp95phVzGQf8w5JxW7gX
> T5XnleMKrOs&s=td7cUfd22mXljSuvkUPXDunkIs_eO4GxdvHHxD2CTk0&e=
> > eMail: [hidden email]
> >
> >> -----Original Message-----
> >> From: [hidden email] <[hidden email]>
> >> Sent: Monday, October 21, 2019 5:17 PM
> >> To: [hidden email]
> >> Cc: baris.kazar <[hidden email]>
> >> Subject: Re: Index-time boosting: Deprecated setBoost method
> >>
> >> Hi,-
> >>
> >> Sorry about the missing parts in previous post. please accept my
> >> apologies for that.
> >>
> >> i needed to add a few more questions/corrections/additions to the
> >> previous post:
> >>
> >> Main Question was: if boost is a single constant value, do we need the
> >> Javascript part below?
> >>
> >>
> >>
> >> === Indexing code snippet for Lucene version 6.6.0 and before===
> >>
> >> Document doc = new Document();
> >>
> >>
> >> 

Field  f1= new TextField("field1", "string1", Field.Store.YES);

> >>
> >> doc.add(f1); 
f1.setBoost(2.0f);


> >>
> >> Field f2 = new TextField("field2", "string2", Field.Store.YES);

> >>
> >> doc.add(f2);

> >>
> >> f2.setBoost(1.0f);


> >>
> >> === end of indexing code snippet for Lucene version 6.6.0 and before ===
> >>
> >>
> >> This turns into this where _boost1 field is associated with field1 and
> >>
> >> _boost2 field is associated with field2 field:
> >>
> >>
> >> In Indexing code:
> >>
> >> === begining of indexing code snippet ===
> >> Field  f1= new TextField("field1", "string1", Field.Store.YES);

> >>
> >> Field _boost1 = new NumericDocValuesField(“field1”, 2L);
> >> doc.add(_boost1);
> >>
> >> // If this boost value needs to be stored, a separate storedField
> >> instance needs to be added as well
> >> … ( i will post this soon)
> >>
> >> Field _boost2 = new NumericDocValuesField(“field2”, 1L);
> >> doc.add(_boost2);
> >>
> >> // If this boost value needs to be stored, a separate storedField
> >> instance needs to be added as well
> >> … ( i will post this soon)
> >>
> >> === end of indexing code snippet ===
> >>
> >>
> >> Now, in the searching code (i.e., at query time) should i need the
> >> FunctionScoreQuery because in this case
> >>
> >> the boost is just a constant value but not a function? However, constant
> >> value can be argued to be a function with the same value all the time, too.
> >>
> >>
> >> == begining of query time code snippet ===
> >> Expression expr = JavascriptCompiler.compile(“_boost1 + _boost2");
> >>
> >> 

// SimpleBindings just maps variables to SortField instances

> >>
> >> SimpleBindings bindings = new SimpleBindings();

> >>
> >> bindings.add(new SortField("_boost1", SortField.Type.LONG));
 
//
> These
> >> have to LONG type i think since NumericDocValuesField accepts "long"
> >> type only, am i right? Can this be DOUBLE type?
> >>
> >> bindings.add(new SortField("_boost2", SortField.Type.LONG));
 
//
> same
> >> question here
> >>
> >> // create a query that matches based on body:contents but

> >>
> >> // scores using expr

> >>
> >> Query query = new FunctionScoreQuery(

> >>
> >>       new TermQuery(new Term("field1", "term_to_look_for")),

> >>
> >> expr.getDoubleValuesSource(bindings));
> >>
> >> 
searcher.search(query, 10);
> >>
> >> === end of code snippet ===
> >>
> >>
> >> Best regards
> >>
> >>
> >> On 10/21/19 11:05 AM, [hidden email] wrote:
> >>> Hi,-
> >>>
> >>>   i would like to ask the following to make it clearer (for me at least):
> >>>
> >>> Document doc = new Document();
> >>>
> >>> 

Field  f1= new TextField("field1", "string1", Field.Store.YES);

> >>>
> >>> doc.add(f1); 
f1.setBoost(2.0f);


> >>>
> >>> Field f2 = new TextField("field2", "string2", Field.Store.YES);

> >>>
> >>> doc.add(f2);

> >>>
> >>> f2.setBoost(1.0f);


> >>>
> >>>
> >>> This turns into this where _boost1 field is associated with field1 and
> >>>
> >>> _boost2 field is associated with field2 field:
> >>>
> >>>
> >>> In Indexing code:
> >>>
> >>> Field  f1= new TextField("field1", "string1", Field.Store.YES);

> >>>
> >>> Field _boost1 = new NumericDocValuesField(“field1”, 2L);
> >>> doc.add(_boost1);
> >>>
> >>> // If this boost value needs to be stored, a separate storedField
> >>> instance needs to be added as well
> >>> … ( i will post this soon)
> >>>
> >>> Field _boost2 = new NumericDocValuesField(“field2”, 1L);
> >>> doc.add(_boost2);
> >>>
> >>> // If this boost value needs to be stored, a separate storedField
> >>> instance needs to be added as well
> >>> … ( i will post this soon)
> >>>
> >>>
> >>> Now, in the searching code (i.e., at query time) should i need the
> >>> FunctionScoreQuery because in this case
> >>>
> >>> the boost is just a constant value but not a function? However,
> >>> constant value can be argued to be a function with the same value all
> >>> the time, too.
> >>>
> >>>
> >>> Expression expr = JavascriptCompiler.compile(“_boost");
> >>>
> >>> 

// SimpleBindings just maps variables to SortField instances

> >>>
> >>> SimpleBindings bindings = new SimpleBindings();

> >>>
> >>> bindings.add(new SortField("_boost1", SortField.Type.SCORE));
 

> >>>
> >>> // create a query that matches based on body:contents but

> >>>
> >>> // scores using expr

> >>>
> >>> Query query = new FunctionScoreQuery(

> >>>
> >>>      new TermQuery(new Term("field1", "term_to_look_for")),

> >>>
> >>> expr.getDoubleValuesSource(bindings));
> >>>
> >>> 
searcher.search(query, 10);
> >>>
> >>>
> >>> So, if boost is a single constant value, do we need the Javascript
> >>> part above?
> >>>
> >>> Best regards
> >>>
> >>>
> >>> On 10/18/19 4:07 PM, [hidden email] wrote:
> >>>> Uwe,-
> >>>>
> >>>>   can this
> >>>> https://urldefense.proofpoint.com/v2/url?u=https-
> >> 3A__lucene.apache.org_core_7-5F7-
> >>
> 5F2_expressions_org_apache_lucene_expressions_Expression.html&d=DwID
> >>
> aQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=nlG5z5NcNdI
> >> bQAiX-
> >>
> BKNeyLlULCbaezrgocEvPhQkl4&m=MR2S9Z9HEge6s665mtGOFRHKGmuiVYkjp
> >> 4tXOciYl7A&s=tMCjb5H5KivfJsp-BfABonpjelgp6hn9cBg2GScCmic&e=
> >>>> doc example that You also gave be extended with
> NumericDocValuesField
> >>>> part that needs to be done at indexing time boosting, too?
> >>>>
> >>>> i see now why You meant that this is mixed type of boosting (i.e.,
> >>>> both indexing time and search time).
> >>>>
> >>>> I need then include this query mentioned in this example on these
> >>>> _score field (i would call it _boost field in my case) into my
> >>>> overall BooleanQuery.
> >>>>
> >>>> i will now try to combine these together and post here for future help.
> >>>>
> >>>> Best regards
> >>>>
> >>>>
> >>>> On 10/18/19 3:18 PM, Uwe Schindler wrote:
> >>>>> Hi,
> >>>>>
> >>>>> Read my original email! The index time values are written using
> >>>>> NumericDocValuesField. The expressions docs also refer to that when
> >>>>> the bindings are documented.
> >>>>>
> >>>>> It's separate from the indexed data (TextField). Think of it like an
> >>>>> additional numeric field in your database table with a factor in
> >>>>> each row.
> >>>>>
> >>>>> Uwe
> >>>>>
> >>>>> Am October 18, 2019 7:14:03 PM UTC schrieb [hidden email]:
> >>>>>> Uwe,-
> >>>>>>
> >>>>>> Two questions there:
> >>>>>>
> >>>>>> i guess this is applicable to TextField, too.
> >>>>>>
> >>>>>> And i was expecting a index writer object in the example for index
> >>>>>> time
> >>>>>>
> >>>>>> boosting.
> >>>>>>
> >>>>>> Best regards
> >>>>>>
> >>>>>>
> >>>>>> On 10/18/19 2:57 PM, Uwe Schindler wrote:
> >>>>>>> Sorry I was imprecise. It's a mix of both. The factors are stored per
> >>>>>> document in index (this is why I called it index time). During query
> >>>>>> time the expression use the index time values to fold them into the
> >>>>>> query boost at query time.
> >>>>>>> What's your problem with that approach?
> >>>>>>>
> >>>>>>> Uwe
> >>>>>>>
> >>>>>>> Am October 18, 2019 6:50:40 PM UTC schrieb
> [hidden email]:
> >>>>>>>> Uwe,-
> >>>>>>>>
> >>>>>>>>     Thanks, if possible i am looking for a pure Java methodology
> >>>>>>>> to do
> >>>>>> the
> >>>>>>>> index time boosting.
> >>>>>>>>
> >>>>>>>> This example looks like a search time boosting example:
> >>>>>>>>
> >>>>>>>>
> >>>>>> https://urldefense.proofpoint.com/v2/url?u=https-
> >> 3A__lucene.apache.org_core_7-5F7-
> >>
> 5F2_expressions_org_apache_lucene_expressions_Expression.html&d=DwIF
> >>
> aQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=nlG5z5NcNdI
> >> bQAiX-
> >>
> BKNeyLlULCbaezrgocEvPhQkl4&m=6m6i5zZXPZNP6DyVv_xG4vXnVTPEdfKLeLS
> >> vGjEXbyw&s=B5_kGwRIbAoGqL0-SVR9r3t78E5XUuzLT37TeyV-bv8&e=
> >>>>>>>>
> >>>>>>>> Best regards
> >>>>>>>>
> >>>>>>>> On 10/18/19 2:31 PM, Uwe Schindler wrote:
> >>>>>>>>> Hi,
> >>>>>>>>>
> >>>>>>>>>> Is there a working example for this? Is this mentioned in the
> >>>>>> Lucene
> >>>>>>>>>> Javadocs or any other docs so that i can look it?
> >>>>>>>>> To index the docvalues, see NumericDocValuesField (it can be
> >> added
> >>>>>> to
> >>>>>>>> documents like indexed or stored fields). You may have used them
> >> for
> >>>>>>>> sorting already.
> >>>>>>>>>> this methodology seems sort of like discouraging using index
> time
> >>>>>>>> boosting.
> >>>>>>>>> Not really. Many use this all the time. It's one of the killer
> >>>>>>>> features of both Solr and Elasticsearch. The problem was how the
> >>>>>>>> Document.setBoost()worked (it did not work correctly, see below).
> >>>>>>>>>> Previous setBoost method call was fine and easy to use.
> >>>>>>>>>> Did it have some performance issues and then is that why it was
> >>>>>>>> deprecated?
> >>>>>>>>> No the reason for deprecating this was for several reasons:
> >>>>>> setBoost
> >>>>>>>> was not doing what the user had expected. Internally the boost
> value
> >>>>>>>> was just multiplied into the document norm factor (which is
> >>>>>> internally
> >>>>>>>> also a docvalues field). The norm factors are only very inprecise
> >>>>>>>> floats stored in a byte, so precision is not well. If you put some
> >>>>>>>> values into it and the length norm was already consuming all bits,
> >>>>>> the
> >>>>>>>> boosting was very coarse. It was also only multiplied into and most
> >>>>>>>> users want to do some stuff like record click counts in the index
> >>>>>> and
> >>>>>>>> then boost for example with the logarithm or some other function.
> If
> >>>>>>>> the boost is just multiplied into the length norm you have no
> >>>>>>>> flexibility at all.
> >>>>>>>>> In addition you can have several docvalues fields and use their
> >>>>>>>> values in a function (e.g. one field with click count and another
> >>>>>> one
> >>>>>>>> with product price). After that you can combine click count and
> >>>>>> price
> >>>>>>>> (which can be modified indipenently during index updates) and
> >> change
> >>>>>>>> boost to boost lower price and higher click count up.
> >>>>>>>>> This is what you can do with the expressions module. You just
> give
> >>>>>> it
> >>>>>>>> a function.
> >>>>>>>>> Here is an example, the second example is using a
> >>>>>> FunctionScoreQuery
> >>>>>>>> that modifies the score based on the function and the given
> >>>>>> docvalues:
> >>>>>> https://urldefense.proofpoint.com/v2/url?u=https-
> >> 3A__lucene.apache.org_core_7-5F7-
> >>
> 5F2_expressions_org_apache_lucene_expressions_Expression.html&d=DwIF
> >>
> aQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=nlG5z5NcNdI
> >> bQAiX-
> >>
> BKNeyLlULCbaezrgocEvPhQkl4&m=6m6i5zZXPZNP6DyVv_xG4vXnVTPEdfKLeLS
> >> vGjEXbyw&s=B5_kGwRIbAoGqL0-SVR9r3t78E5XUuzLT37TeyV-bv8&e=
> >>>>>>>>>> FunctionScoreQuery usage with MultiFieldQueryParser would
> also
> >> be
> >>>>>>>> nice
> >>>>>>>>>> where
> >>>>>>>>>>
> >>>>>>>>>> MultiFieldQuery already has boosts field to do this in its
> >>>>>>>> constructor.
> >>>>>>>>> The boots in the query parser are applied for fields during query
> >>>>>>>> time (to have a different weight per field). Index time boosting is
> >>>>>> per
> >>>>>>>> document. So you can combine both.
> >>>>>>>>>> Maybe it is not needed with MultiFieldQueryParser.
> >>>>>>>>> You use MultiFieldQueryParser to adjust weights of the fields (e.g.
> >>>>>>>> title versus body). The parsed query is then wrapped with an
> >>>>>> expression
> >>>>>>>> that modifies the score per document according to the docvalues.
> >>>>>>>>> Uwe
> >>>>>>>>>
> >>>>>>>>>> On 10/18/19 1:28 PM, Uwe Schindler wrote:
> >>>>>>>>>>
> >>>>>>>>>>> Hi,
> >>>>>>>>>>>
> >>>>>>>>>>> that's not true. You can do index time boosting, but you need
> to
> >>>>>> do
> >>>>>>>> that
> >>>>>>>>>> using a separate field. You just index a numeric docvalues field
> >>>>>>>> (which may
> >>>>>>>>>> contain a long or float value per document). Later you wrap your
> >>>>>>>> query with
> >>>>>>>>>> some FunctionScoreQuery (e.g., use the Javascript function
> query
> >>>>>>>> syntax in
> >>>>>>>>>> the expressions module). This allows you to compile a javascript
> >>>>>>>> function
> >>>>>>>>>> that calculated the final score based on the score returned by
> the
> >>>>>>>> inner query
> >>>>>>>>>> and combines them with docvalues that were indexed per
> >> document.
> >>>>>>>>>>> Uwe
> >>>>>>>>>>>
> >>>>>>>>>>> -----
> >>>>>>>>>>> Uwe Schindler
> >>>>>>>>>>> Achterdiek 19, D-28357 Bremen
> >>>>>>>>>>> https://urldefense.proofpoint.com/v2/url?u=https-
> >>
> 3A__www.thetaphi.de&d=DwIFaQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIr
> >>>>>>>>>> MUB65eapI_JnE&r=nlG5z5NcNdIbQAiX-
> >>>>>>>>>>
> >>
> BKNeyLlULCbaezrgocEvPhQkl4&m=6rVk8db2H8dAcjS3WCWmAPd08C7JQCvZ
> >>
> 8W80yE9L5xY&s=zgKmnmP9gLG4DlEnAfDdtBMEzPXtHNVYojxXIKEnQgs&e=
> >>>>>>>>>>> eMail: [hidden email]
> >>>>>>>>>>>
> >>>>>>>>>>>> -----Original Message-----
> >>>>>>>>>>>> From: [hidden email] <[hidden email]>
> >>>>>>>>>>>> Sent: Friday, October 18, 2019 5:28 PM
> >>>>>>>>>>>> To: [hidden email]
> >>>>>>>>>>>> Cc: [hidden email]
> >>>>>>>>>>>> Subject: Re: Index-time boosting: Deprecated setBoost
> method
> >>>>>>>>>>>>
> >>>>>>>>>>>> It looks like index-time boosting (field) is not possible since
> >>>>>>>> Lucene
> >>>>>>>>>>>> version 7.7.2 and
> >>>>>>>>>>>>
> >>>>>>>>>>>> i was using before for another case the BoostQuery at search
> >>>>>> time
> >>>>>>>> for
> >>>>>>>>>>>> boosting and
> >>>>>>>>>>>>
> >>>>>>>>>>>> this seems to be the only boosting option now in Lucene.
> >>>>>>>>>>>>
> >>>>>>>>>>>> Best regards
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>> On 10/18/19 10:01 AM, [hidden email] wrote:
> >>>>>>>>>>>>> Hi,-
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> i saw this in the Field class docs and i am figuring out the
> >>>>>>>> following
> >>>>>>>>>>>>> note in the docs:
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> setBoost(float boost)
> >>>>>>>>>>>>> Deprecated.
> >>>>>>>>>>>>> Index-time boosts are deprecated, please index index-time
> >>>>>> scoring
> >>>>>>>>>>>>> factors into a doc value field and combine them with the
> score
> >>>>>> at
> >>>>>>>>>>>>> query time using eg. FunctionScoreQuery.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> I appreciate this note. Is there an example about this? I wish
> >>>>>>>> docs
> >>>>>>>>>>>>> would give a simple example to further help.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>> https://urldefense.proofpoint.com/v2/url?u=https-
> >>>>>>>>>> 3A__lucene.apache.org_core_6-5F6-
> >>>>>>>>>>
> >>
> 5F0__core_org_apache_lucene_document_&d=DwIFaQ&c=RoP1YumCXCga
> >>>>>>>>>> WHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=nlG5z5NcNdIbQAiX-
> >>>>>>>>>>
> >>
> BKNeyLlULCbaezrgocEvPhQkl4&m=6rVk8db2H8dAcjS3WCWmAPd08C7JQCvZ
> >> 8W80yE9L5xY&s=rIVbw3_TGEwpaet5ibCeYze6vSDUiPhwOzlV0z484fM&e=
> >>>>>>>>>>>> Field.html
> >>>>>>>>>>>>> vs
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>> https://urldefense.proofpoint.com/v2/url?u=https-
> >>>>>>>>>> 3A__lucene.apache.org_core_7-5F7-
> >>>>>>>>>>
> >>
> 5F2_core_org_apache_lucene_document_F&d=DwIFaQ&c=RoP1YumCXCgaW
> >>>>>>>>>> HvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=nlG5z5NcNdIbQAiX-
> >>>>>>>>>>
> >>
> BKNeyLlULCbaezrgocEvPhQkl4&m=6rVk8db2H8dAcjS3WCWmAPd08C7JQCvZ
> >>
> 8W80yE9L5xY&s=yt1toHHZQBqd3qKpWeSzywGJhy928Q5qaEO4v9Lj3vg&e=
> >>>>>>>>>>>> ield.html
> >>>>>>>>>>>>> Best regards
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>> ---------------------------------------------------------------------
> >>>>>>>>>>>> To unsubscribe, e-mail: java-user-
> >> [hidden email]
> >>>>>>>>>>>> For additional commands, e-mail:
> >>>>>> [hidden email]
> >>>>>> ---------------------------------------------------------------------
> >>>>>>>>>>> To unsubscribe, e-mail: java-user-
> >> [hidden email]
> >>>>>>>>>>> For additional commands, e-mail: java-user-
> >> [hidden email]
> >>>>>> ---------------------------------------------------------------------
> >>>>>>>>>> To unsubscribe, e-mail: java-user-
> [hidden email]
> >>>>>>>>>> For additional commands, e-mail: java-user-
> >> [hidden email]
> >>>>>> ---------------------------------------------------------------------
> >>>>>>>>> To unsubscribe, e-mail: java-user-
> [hidden email]
> >>>>>>>>> For additional commands, e-mail: java-user-
> >> [hidden email]
> >>>>>> ---------------------------------------------------------------------
> >>>>>>>> To unsubscribe, e-mail: [hidden email]
> >>>>>>>> For additional commands, e-mail: java-user-
> [hidden email]
> >>>>>>> --
> >>>>>>> Uwe Schindler
> >>>>>>> Achterdiek 19, 28357 Bremen
> >>>>>>>
> >>>>>> https://urldefense.proofpoint.com/v2/url?u=https-
> >>
> 3A__www.thetaphi.de&d=DwIFaQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIr
> >> MUB65eapI_JnE&r=nlG5z5NcNdIbQAiX-
> >>
> BKNeyLlULCbaezrgocEvPhQkl4&m=6ksT9ArMj83Yxf_GrxLNeJ4UFEeKdVdLK0Bl
> >> OT0d754&s=33f2nq9rOLI5pN9e_RYl_TiEKnP_f4WMZ__vqyz2bzo&e=
> >>>>>>
> >>>>>> ---------------------------------------------------------------------
> >>>>>> To unsubscribe, e-mail: [hidden email]
> >>>>>> For additional commands, e-mail: [hidden email]
> >>>>> --
> >>>>> Uwe Schindler
> >>>>> Achterdiek 19, 28357 Bremen
> >>>>> https://urldefense.proofpoint.com/v2/url?u=https-
> >>
> 3A__www.thetaphi.de&d=DwIFaQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIr
> >> MUB65eapI_JnE&r=nlG5z5NcNdIbQAiX-
> >>
> BKNeyLlULCbaezrgocEvPhQkl4&m=owjI40OeLzt8gvPN44aTdndoiUel5E9Hqx1T
> >> EcoWk_Y&s=xbZedNkQXb5eQcw_K7lCOP7b5ToKJVZ1dCPY3hi836c&e=
> >>> ---------------------------------------------------------------------
> >>> To unsubscribe, e-mail: [hidden email]
> >>> For additional commands, e-mail: [hidden email]
> >>>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: [hidden email]
> >> For additional commands, e-mail: [hidden email]
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: [hidden email]
> > For additional commands, e-mail: [hidden email]
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Index-time boosting: Deprecated setBoost method

baris.kazar
Hi,-

That is ok, and i can see this case would be best with BoostQuery and
also i dont have to use lucene expression jar and its dependents.

However, i am curious how to do this kind of field based boosting at
index time even though i will prefer the query time boosting methodology.

Best regards


On 10/21/19 12:54 PM, Uwe Schindler wrote:

> Hi,
>
> As I said, before that is a misuse of index-time boosting. In addition in previous versions it did not even work correctly, because of query normalization it was normalized away anyways. And on top, to change it your have to reindex.
>
> What you intend to do is a typical use case for query time boosting with BoostQuery. That is explained in almost every book about search, like those about Solr or Elasticsearch.
>
> Most query parsers also allow to also add boost factors for fields, e.g. SimpleQueryParser (for humans that need simple syntax without fields). There you give a list of fields and boost factors.
>
> Uwe
>
> -----
> Uwe Schindler
> Achterdiek 19, D-28357 Bremen
> https://urldefense.proofpoint.com/v2/url?u=https-3A__www.thetaphi.de&d=DwIFaQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=nlG5z5NcNdIbQAiX-BKNeyLlULCbaezrgocEvPhQkl4&m=r7LRZQV82ywkycV4mBw1baHDKxar0wnmJtLLTiUC0wI&s=Zj32e0QqmZFvPbBlD8DPeh7KHYfOgQr89wvmaRvy_n8&e=
> eMail: [hidden email]
>
>> -----Original Message-----
>> From: [hidden email] <[hidden email]>
>> Sent: Monday, October 21, 2019 6:45 PM
>> To: [hidden email]
>> Cc: baris.kazar <[hidden email]>
>> Subject: Re: Index-time boosting: Deprecated setBoost method
>>
>> Hi,-
>>
>> Thanks and i appreciate the disccussion.
>>
>> Let me please  ask this way, i think i give too much info at one time:
>>
>> Currently i have this:
>>
>> 

Field  f1= new TextField("field1", "string1", Field.Store.YES);

>>
>> doc.add(f1); 
f1.setBoost(2.0f);


>>
>> Field f2 = new TextField("field2", "string2", Field.Store.YES);

>>
>> doc.add(f2);

>>
>> f2.setBoost(1.0f);


>>
>>
>> But this fails with Lucene 7.7.2.
>>
>>
>> Probably it is more efficient and more flexible to fix this by using
>> BoostQuery.
>>
>> However, what could be the fix with index time boosting? the code in my
>> previous post was trying to do that.
>>
>> Best regards
>>
>>
>> On 10/21/19 12:34 PM, Uwe Schindler wrote:
>>> Hi,
>>>
>>> sorry I don't fully understand what you intend to do? If the boost values
>> per field are static and used with exactly same value for every document, it's
>> not needed a index time. You can just boost the field on the query side (e.g.
>> using BoostQuery). Boosting every document with the same static values is
>> an anti-pattern, that's something better suited for the query side - as you are
>> more flexible.
>>> If you need a different boost value per document, you can save that boost
>> value in the index per document using a docvalues field (this consumes extra
>> space, of course). Then you need the ExpressionQuery on the query side. But
>> just because it looks like Javascript, it's not slow. The syntax is compiled to
>> bytecode and directly included into the query execution as a dynamic java
>> class, so it's very fast.
>>> In short:
>>> - If you need to have a different boost factor per field name that's constant
>> for all documents, apply it at query time with BoostQuery.
>>> - If you have to boost specific documents (e.g., top selling products), index
>> a numeric docvalues field per document. On the query side you can use
>> different query types to modify the score of each result based on the
>> docvalues field. That can be done with Expression modules (using compiled
>> Javascript) or by another query in Lucene that operates on ValueSource (e.g.,
>> FunctionQuery). The first one is easier to use for complex formulas.4
>>> Uwe
>>>
>>> -----
>>> Uwe Schindler
>>> Achterdiek 19, D-28357 Bremen
>>> https://urldefense.proofpoint.com/v2/url?u=https-
>> 3A__www.thetaphi.de&d=DwIFaQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIr
>> MUB65eapI_JnE&r=nlG5z5NcNdIbQAiX-
>> BKNeyLlULCbaezrgocEvPhQkl4&m=70RoM6loHhMGsp95phVzGQf8w5JxW7gX
>> T5XnleMKrOs&s=td7cUfd22mXljSuvkUPXDunkIs_eO4GxdvHHxD2CTk0&e=
>>> eMail: [hidden email]
>>>
>>>> -----Original Message-----
>>>> From: [hidden email] <[hidden email]>
>>>> Sent: Monday, October 21, 2019 5:17 PM
>>>> To: [hidden email]
>>>> Cc: baris.kazar <[hidden email]>
>>>> Subject: Re: Index-time boosting: Deprecated setBoost method
>>>>
>>>> Hi,-
>>>>
>>>> Sorry about the missing parts in previous post. please accept my
>>>> apologies for that.
>>>>
>>>> i needed to add a few more questions/corrections/additions to the
>>>> previous post:
>>>>
>>>> Main Question was: if boost is a single constant value, do we need the
>>>> Javascript part below?
>>>>
>>>>
>>>>
>>>> === Indexing code snippet for Lucene version 6.6.0 and before===
>>>>
>>>> Document doc = new Document();
>>>>
>>>>
>>>> 

Field  f1= new TextField("field1", "string1", Field.Store.YES);

>>>>
>>>> doc.add(f1); 
f1.setBoost(2.0f);


>>>>
>>>> Field f2 = new TextField("field2", "string2", Field.Store.YES);

>>>>
>>>> doc.add(f2);

>>>>
>>>> f2.setBoost(1.0f);


>>>>
>>>> === end of indexing code snippet for Lucene version 6.6.0 and before ===
>>>>
>>>>
>>>> This turns into this where _boost1 field is associated with field1 and
>>>>
>>>> _boost2 field is associated with field2 field:
>>>>
>>>>
>>>> In Indexing code:
>>>>
>>>> === begining of indexing code snippet ===
>>>> Field  f1= new TextField("field1", "string1", Field.Store.YES);

>>>>
>>>> Field _boost1 = new NumericDocValuesField(“field1”, 2L);
>>>> doc.add(_boost1);
>>>>
>>>> // If this boost value needs to be stored, a separate storedField
>>>> instance needs to be added as well
>>>> … ( i will post this soon)
>>>>
>>>> Field _boost2 = new NumericDocValuesField(“field2”, 1L);
>>>> doc.add(_boost2);
>>>>
>>>> // If this boost value needs to be stored, a separate storedField
>>>> instance needs to be added as well
>>>> … ( i will post this soon)
>>>>
>>>> === end of indexing code snippet ===
>>>>
>>>>
>>>> Now, in the searching code (i.e., at query time) should i need the
>>>> FunctionScoreQuery because in this case
>>>>
>>>> the boost is just a constant value but not a function? However, constant
>>>> value can be argued to be a function with the same value all the time, too.
>>>>
>>>>
>>>> == begining of query time code snippet ===
>>>> Expression expr = JavascriptCompiler.compile(“_boost1 + _boost2");
>>>>
>>>> 

// SimpleBindings just maps variables to SortField instances

>>>>
>>>> SimpleBindings bindings = new SimpleBindings();

>>>>
>>>> bindings.add(new SortField("_boost1", SortField.Type.LONG));
 
//
>> These
>>>> have to LONG type i think since NumericDocValuesField accepts "long"
>>>> type only, am i right? Can this be DOUBLE type?
>>>>
>>>> bindings.add(new SortField("_boost2", SortField.Type.LONG));
 
//
>> same
>>>> question here
>>>>
>>>> // create a query that matches based on body:contents but

>>>>
>>>> // scores using expr

>>>>
>>>> Query query = new FunctionScoreQuery(

>>>>
>>>>        new TermQuery(new Term("field1", "term_to_look_for")),

>>>>
>>>> expr.getDoubleValuesSource(bindings));
>>>>
>>>> 
searcher.search(query, 10);
>>>>
>>>> === end of code snippet ===
>>>>
>>>>
>>>> Best regards
>>>>
>>>>
>>>> On 10/21/19 11:05 AM, [hidden email] wrote:
>>>>> Hi,-
>>>>>
>>>>>    i would like to ask the following to make it clearer (for me at least):
>>>>>
>>>>> Document doc = new Document();
>>>>>
>>>>> 

Field  f1= new TextField("field1", "string1", Field.Store.YES);

>>>>>
>>>>> doc.add(f1); 
f1.setBoost(2.0f);


>>>>>
>>>>> Field f2 = new TextField("field2", "string2", Field.Store.YES);

>>>>>
>>>>> doc.add(f2);

>>>>>
>>>>> f2.setBoost(1.0f);


>>>>>
>>>>>
>>>>> This turns into this where _boost1 field is associated with field1 and
>>>>>
>>>>> _boost2 field is associated with field2 field:
>>>>>
>>>>>
>>>>> In Indexing code:
>>>>>
>>>>> Field  f1= new TextField("field1", "string1", Field.Store.YES);

>>>>>
>>>>> Field _boost1 = new NumericDocValuesField(“field1”, 2L);
>>>>> doc.add(_boost1);
>>>>>
>>>>> // If this boost value needs to be stored, a separate storedField
>>>>> instance needs to be added as well
>>>>> … ( i will post this soon)
>>>>>
>>>>> Field _boost2 = new NumericDocValuesField(“field2”, 1L);
>>>>> doc.add(_boost2);
>>>>>
>>>>> // If this boost value needs to be stored, a separate storedField
>>>>> instance needs to be added as well
>>>>> … ( i will post this soon)
>>>>>
>>>>>
>>>>> Now, in the searching code (i.e., at query time) should i need the
>>>>> FunctionScoreQuery because in this case
>>>>>
>>>>> the boost is just a constant value but not a function? However,
>>>>> constant value can be argued to be a function with the same value all
>>>>> the time, too.
>>>>>
>>>>>
>>>>> Expression expr = JavascriptCompiler.compile(“_boost");
>>>>>
>>>>> 

// SimpleBindings just maps variables to SortField instances

>>>>>
>>>>> SimpleBindings bindings = new SimpleBindings();

>>>>>
>>>>> bindings.add(new SortField("_boost1", SortField.Type.SCORE));
 

>>>>>
>>>>> // create a query that matches based on body:contents but

>>>>>
>>>>> // scores using expr

>>>>>
>>>>> Query query = new FunctionScoreQuery(

>>>>>
>>>>>       new TermQuery(new Term("field1", "term_to_look_for")),

>>>>>
>>>>> expr.getDoubleValuesSource(bindings));
>>>>>
>>>>> 
searcher.search(query, 10);
>>>>>
>>>>>
>>>>> So, if boost is a single constant value, do we need the Javascript
>>>>> part above?
>>>>>
>>>>> Best regards
>>>>>
>>>>>
>>>>> On 10/18/19 4:07 PM, [hidden email] wrote:
>>>>>> Uwe,-
>>>>>>
>>>>>>    can this
>>>>>> https://urldefense.proofpoint.com/v2/url?u=https-
>>>> 3A__lucene.apache.org_core_7-5F7-
>>>>
>> 5F2_expressions_org_apache_lucene_expressions_Expression.html&d=DwID
>> aQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=nlG5z5NcNdI
>>>> bQAiX-
>>>>
>> BKNeyLlULCbaezrgocEvPhQkl4&m=MR2S9Z9HEge6s665mtGOFRHKGmuiVYkjp
>>>> 4tXOciYl7A&s=tMCjb5H5KivfJsp-BfABonpjelgp6hn9cBg2GScCmic&e=
>>>>>> doc example that You also gave be extended with
>> NumericDocValuesField
>>>>>> part that needs to be done at indexing time boosting, too?
>>>>>>
>>>>>> i see now why You meant that this is mixed type of boosting (i.e.,
>>>>>> both indexing time and search time).
>>>>>>
>>>>>> I need then include this query mentioned in this example on these
>>>>>> _score field (i would call it _boost field in my case) into my
>>>>>> overall BooleanQuery.
>>>>>>
>>>>>> i will now try to combine these together and post here for future help.
>>>>>>
>>>>>> Best regards
>>>>>>
>>>>>>
>>>>>> On 10/18/19 3:18 PM, Uwe Schindler wrote:
>>>>>>> Hi,
>>>>>>>
>>>>>>> Read my original email! The index time values are written using
>>>>>>> NumericDocValuesField. The expressions docs also refer to that when
>>>>>>> the bindings are documented.
>>>>>>>
>>>>>>> It's separate from the indexed data (TextField). Think of it like an
>>>>>>> additional numeric field in your database table with a factor in
>>>>>>> each row.
>>>>>>>
>>>>>>> Uwe
>>>>>>>
>>>>>>> Am October 18, 2019 7:14:03 PM UTC schrieb [hidden email]:
>>>>>>>> Uwe,-
>>>>>>>>
>>>>>>>> Two questions there:
>>>>>>>>
>>>>>>>> i guess this is applicable to TextField, too.
>>>>>>>>
>>>>>>>> And i was expecting a index writer object in the example for index
>>>>>>>> time
>>>>>>>>
>>>>>>>> boosting.
>>>>>>>>
>>>>>>>> Best regards
>>>>>>>>
>>>>>>>>
>>>>>>>> On 10/18/19 2:57 PM, Uwe Schindler wrote:
>>>>>>>>> Sorry I was imprecise. It's a mix of both. The factors are stored per
>>>>>>>> document in index (this is why I called it index time). During query
>>>>>>>> time the expression use the index time values to fold them into the
>>>>>>>> query boost at query time.
>>>>>>>>> What's your problem with that approach?
>>>>>>>>>
>>>>>>>>> Uwe
>>>>>>>>>
>>>>>>>>> Am October 18, 2019 6:50:40 PM UTC schrieb
>> [hidden email]:
>>>>>>>>>> Uwe,-
>>>>>>>>>>
>>>>>>>>>>      Thanks, if possible i am looking for a pure Java methodology
>>>>>>>>>> to do
>>>>>>>> the
>>>>>>>>>> index time boosting.
>>>>>>>>>>
>>>>>>>>>> This example looks like a search time boosting example:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>> https://urldefense.proofpoint.com/v2/url?u=https-
>>>> 3A__lucene.apache.org_core_7-5F7-
>>>>
>> 5F2_expressions_org_apache_lucene_expressions_Expression.html&d=DwIF
>> aQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=nlG5z5NcNdI
>>>> bQAiX-
>>>>
>> BKNeyLlULCbaezrgocEvPhQkl4&m=6m6i5zZXPZNP6DyVv_xG4vXnVTPEdfKLeLS
>>>> vGjEXbyw&s=B5_kGwRIbAoGqL0-SVR9r3t78E5XUuzLT37TeyV-bv8&e=
>>>>>>>>>> Best regards
>>>>>>>>>>
>>>>>>>>>> On 10/18/19 2:31 PM, Uwe Schindler wrote:
>>>>>>>>>>> Hi,
>>>>>>>>>>>
>>>>>>>>>>>> Is there a working example for this? Is this mentioned in the
>>>>>>>> Lucene
>>>>>>>>>>>> Javadocs or any other docs so that i can look it?
>>>>>>>>>>> To index the docvalues, see NumericDocValuesField (it can be
>>>> added
>>>>>>>> to
>>>>>>>>>> documents like indexed or stored fields). You may have used them
>>>> for
>>>>>>>>>> sorting already.
>>>>>>>>>>>> this methodology seems sort of like discouraging using index
>> time
>>>>>>>>>> boosting.
>>>>>>>>>>> Not really. Many use this all the time. It's one of the killer
>>>>>>>>>> features of both Solr and Elasticsearch. The problem was how the
>>>>>>>>>> Document.setBoost()worked (it did not work correctly, see below).
>>>>>>>>>>>> Previous setBoost method call was fine and easy to use.
>>>>>>>>>>>> Did it have some performance issues and then is that why it was
>>>>>>>>>> deprecated?
>>>>>>>>>>> No the reason for deprecating this was for several reasons:
>>>>>>>> setBoost
>>>>>>>>>> was not doing what the user had expected. Internally the boost
>> value
>>>>>>>>>> was just multiplied into the document norm factor (which is
>>>>>>>> internally
>>>>>>>>>> also a docvalues field). The norm factors are only very inprecise
>>>>>>>>>> floats stored in a byte, so precision is not well. If you put some
>>>>>>>>>> values into it and the length norm was already consuming all bits,
>>>>>>>> the
>>>>>>>>>> boosting was very coarse. It was also only multiplied into and most
>>>>>>>>>> users want to do some stuff like record click counts in the index
>>>>>>>> and
>>>>>>>>>> then boost for example with the logarithm or some other function.
>> If
>>>>>>>>>> the boost is just multiplied into the length norm you have no
>>>>>>>>>> flexibility at all.
>>>>>>>>>>> In addition you can have several docvalues fields and use their
>>>>>>>>>> values in a function (e.g. one field with click count and another
>>>>>>>> one
>>>>>>>>>> with product price). After that you can combine click count and
>>>>>>>> price
>>>>>>>>>> (which can be modified indipenently during index updates) and
>>>> change
>>>>>>>>>> boost to boost lower price and higher click count up.
>>>>>>>>>>> This is what you can do with the expressions module. You just
>> give
>>>>>>>> it
>>>>>>>>>> a function.
>>>>>>>>>>> Here is an example, the second example is using a
>>>>>>>> FunctionScoreQuery
>>>>>>>>>> that modifies the score based on the function and the given
>>>>>>>> docvalues:
>>>>>>>> https://urldefense.proofpoint.com/v2/url?u=https-
>>>> 3A__lucene.apache.org_core_7-5F7-
>>>>
>> 5F2_expressions_org_apache_lucene_expressions_Expression.html&d=DwIF
>> aQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=nlG5z5NcNdI
>>>> bQAiX-
>>>>
>> BKNeyLlULCbaezrgocEvPhQkl4&m=6m6i5zZXPZNP6DyVv_xG4vXnVTPEdfKLeLS
>>>> vGjEXbyw&s=B5_kGwRIbAoGqL0-SVR9r3t78E5XUuzLT37TeyV-bv8&e=
>>>>>>>>>>>> FunctionScoreQuery usage with MultiFieldQueryParser would
>> also
>>>> be
>>>>>>>>>> nice
>>>>>>>>>>>> where
>>>>>>>>>>>>
>>>>>>>>>>>> MultiFieldQuery already has boosts field to do this in its
>>>>>>>>>> constructor.
>>>>>>>>>>> The boots in the query parser are applied for fields during query
>>>>>>>>>> time (to have a different weight per field). Index time boosting is
>>>>>>>> per
>>>>>>>>>> document. So you can combine both.
>>>>>>>>>>>> Maybe it is not needed with MultiFieldQueryParser.
>>>>>>>>>>> You use MultiFieldQueryParser to adjust weights of the fields (e.g.
>>>>>>>>>> title versus body). The parsed query is then wrapped with an
>>>>>>>> expression
>>>>>>>>>> that modifies the score per document according to the docvalues.
>>>>>>>>>>> Uwe
>>>>>>>>>>>
>>>>>>>>>>>> On 10/18/19 1:28 PM, Uwe Schindler wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>
>>>>>>>>>>>>> that's not true. You can do index time boosting, but you need
>> to
>>>>>>>> do
>>>>>>>>>> that
>>>>>>>>>>>> using a separate field. You just index a numeric docvalues field
>>>>>>>>>> (which may
>>>>>>>>>>>> contain a long or float value per document). Later you wrap your
>>>>>>>>>> query with
>>>>>>>>>>>> some FunctionScoreQuery (e.g., use the Javascript function
>> query
>>>>>>>>>> syntax in
>>>>>>>>>>>> the expressions module). This allows you to compile a javascript
>>>>>>>>>> function
>>>>>>>>>>>> that calculated the final score based on the score returned by
>> the
>>>>>>>>>> inner query
>>>>>>>>>>>> and combines them with docvalues that were indexed per
>>>> document.
>>>>>>>>>>>>> Uwe
>>>>>>>>>>>>>
>>>>>>>>>>>>> -----
>>>>>>>>>>>>> Uwe Schindler
>>>>>>>>>>>>> Achterdiek 19, D-28357 Bremen
>>>>>>>>>>>>> https://urldefense.proofpoint.com/v2/url?u=https-
>> 3A__www.thetaphi.de&d=DwIFaQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIr
>>>>>>>>>>>> MUB65eapI_JnE&r=nlG5z5NcNdIbQAiX-
>>>>>>>>>>>>
>> BKNeyLlULCbaezrgocEvPhQkl4&m=6rVk8db2H8dAcjS3WCWmAPd08C7JQCvZ
>> 8W80yE9L5xY&s=zgKmnmP9gLG4DlEnAfDdtBMEzPXtHNVYojxXIKEnQgs&e=
>>>>>>>>>>>>> eMail: [hidden email]
>>>>>>>>>>>>>
>>>>>>>>>>>>>> -----Original Message-----
>>>>>>>>>>>>>> From: [hidden email] <[hidden email]>
>>>>>>>>>>>>>> Sent: Friday, October 18, 2019 5:28 PM
>>>>>>>>>>>>>> To: [hidden email]
>>>>>>>>>>>>>> Cc: [hidden email]
>>>>>>>>>>>>>> Subject: Re: Index-time boosting: Deprecated setBoost
>> method
>>>>>>>>>>>>>> It looks like index-time boosting (field) is not possible since
>>>>>>>>>> Lucene
>>>>>>>>>>>>>> version 7.7.2 and
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> i was using before for another case the BoostQuery at search
>>>>>>>> time
>>>>>>>>>> for
>>>>>>>>>>>>>> boosting and
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> this seems to be the only boosting option now in Lucene.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Best regards
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On 10/18/19 10:01 AM, [hidden email] wrote:
>>>>>>>>>>>>>>> Hi,-
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> i saw this in the Field class docs and i am figuring out the
>>>>>>>>>> following
>>>>>>>>>>>>>>> note in the docs:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> setBoost(float boost)
>>>>>>>>>>>>>>> Deprecated.
>>>>>>>>>>>>>>> Index-time boosts are deprecated, please index index-time
>>>>>>>> scoring
>>>>>>>>>>>>>>> factors into a doc value field and combine them with the
>> score
>>>>>>>> at
>>>>>>>>>>>>>>> query time using eg. FunctionScoreQuery.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I appreciate this note. Is there an example about this? I wish
>>>>>>>>>> docs
>>>>>>>>>>>>>>> would give a simple example to further help.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>> https://urldefense.proofpoint.com/v2/url?u=https-
>>>>>>>>>>>> 3A__lucene.apache.org_core_6-5F6-
>>>>>>>>>>>>
>> 5F0__core_org_apache_lucene_document_&d=DwIFaQ&c=RoP1YumCXCga
>>>>>>>>>>>> WHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=nlG5z5NcNdIbQAiX-
>>>>>>>>>>>>
>> BKNeyLlULCbaezrgocEvPhQkl4&m=6rVk8db2H8dAcjS3WCWmAPd08C7JQCvZ
>>>> 8W80yE9L5xY&s=rIVbw3_TGEwpaet5ibCeYze6vSDUiPhwOzlV0z484fM&e=
>>>>>>>>>>>>>> Field.html
>>>>>>>>>>>>>>> vs
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>> https://urldefense.proofpoint.com/v2/url?u=https-
>>>>>>>>>>>> 3A__lucene.apache.org_core_7-5F7-
>>>>>>>>>>>>
>> 5F2_core_org_apache_lucene_document_F&d=DwIFaQ&c=RoP1YumCXCgaW
>>>>>>>>>>>> HvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=nlG5z5NcNdIbQAiX-
>>>>>>>>>>>>
>> BKNeyLlULCbaezrgocEvPhQkl4&m=6rVk8db2H8dAcjS3WCWmAPd08C7JQCvZ
>> 8W80yE9L5xY&s=yt1toHHZQBqd3qKpWeSzywGJhy928Q5qaEO4v9Lj3vg&e=
>>>>>>>>>>>>>> ield.html
>>>>>>>>>>>>>>> Best regards
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>> ---------------------------------------------------------------------
>>>>>>>>>>>>>> To unsubscribe, e-mail: java-user-
>>>> [hidden email]
>>>>>>>>>>>>>> For additional commands, e-mail:
>>>>>>>> [hidden email]
>>>>>>>> ---------------------------------------------------------------------
>>>>>>>>>>>>> To unsubscribe, e-mail: java-user-
>>>> [hidden email]
>>>>>>>>>>>>> For additional commands, e-mail: java-user-
>>>> [hidden email]
>>>>>>>> ---------------------------------------------------------------------
>>>>>>>>>>>> To unsubscribe, e-mail: java-user-
>> [hidden email]
>>>>>>>>>>>> For additional commands, e-mail: java-user-
>>>> [hidden email]
>>>>>>>> ---------------------------------------------------------------------
>>>>>>>>>>> To unsubscribe, e-mail: java-user-
>> [hidden email]
>>>>>>>>>>> For additional commands, e-mail: java-user-
>>>> [hidden email]
>>>>>>>> ---------------------------------------------------------------------
>>>>>>>>>> To unsubscribe, e-mail: [hidden email]
>>>>>>>>>> For additional commands, e-mail: java-user-
>> [hidden email]
>>>>>>>>> --
>>>>>>>>> Uwe Schindler
>>>>>>>>> Achterdiek 19, 28357 Bremen
>>>>>>>>>
>>>>>>>> https://urldefense.proofpoint.com/v2/url?u=https-
>> 3A__www.thetaphi.de&d=DwIFaQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIr
>>>> MUB65eapI_JnE&r=nlG5z5NcNdIbQAiX-
>>>>
>> BKNeyLlULCbaezrgocEvPhQkl4&m=6ksT9ArMj83Yxf_GrxLNeJ4UFEeKdVdLK0Bl
>>>> OT0d754&s=33f2nq9rOLI5pN9e_RYl_TiEKnP_f4WMZ__vqyz2bzo&e=
>>>>>>>> ---------------------------------------------------------------------
>>>>>>>> To unsubscribe, e-mail: [hidden email]
>>>>>>>> For additional commands, e-mail: [hidden email]
>>>>>>> --
>>>>>>> Uwe Schindler
>>>>>>> Achterdiek 19, 28357 Bremen
>>>>>>> https://urldefense.proofpoint.com/v2/url?u=https-
>> 3A__www.thetaphi.de&d=DwIFaQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIr
>>>> MUB65eapI_JnE&r=nlG5z5NcNdIbQAiX-
>>>>
>> BKNeyLlULCbaezrgocEvPhQkl4&m=owjI40OeLzt8gvPN44aTdndoiUel5E9Hqx1T
>>>> EcoWk_Y&s=xbZedNkQXb5eQcw_K7lCOP7b5ToKJVZ1dCPY3hi836c&e=
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: [hidden email]
>>>>> For additional commands, e-mail: [hidden email]
>>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: [hidden email]
>>>> For additional commands, e-mail: [hidden email]
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: [hidden email]
>>> For additional commands, e-mail: [hidden email]
>>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [hidden email]
>> For additional commands, e-mail: [hidden email]
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

RE: Index-time boosting: Deprecated setBoost method

Uwe Schindler
Hi Boris,

> That is ok, and i can see this case would be best with BoostQuery and
> also i dont have to use lucene expression jar and its dependents.
>
> However, i am curious how to do this kind of field based boosting at
> index time even though i will prefer the query time boosting methodology.

The reason why it was deprecated is exactly the problem I mentioned before: It did never do what the user expected. The boost factor given in the document's field was multiplied into the per document norms. Unfortunately, at the same time, he query normalization was using query statistics and normalized the scores. As Lucene is working per field, the same normalization is done per field, resulting in the constant factor per field to disappear. There was still some effect of index time boosting if different documents had different values, but it your case all is the same. I am not sure how your queries worked before, but the constant boost factors per field at index time did definitely not have the effect you were thinking of. Since the earliest version of Lucene, boosting at query time was the way to go to have different weights per field.

The new feature in Lucene is now that you can change the score per document using docvalues and apply that per document at query time. Previously this was also possible with Document/Field#setBoost, but the flexibility was missing (only multiplying and limited precision). In addition the normalization effects made the whole thing not reliable.

Uwe

> Best regards
>
>
> On 10/21/19 12:54 PM, Uwe Schindler wrote:
> > Hi,
> >
> > As I said, before that is a misuse of index-time boosting. In addition in
> previous versions it did not even work correctly, because of query
> normalization it was normalized away anyways. And on top, to change it
> your have to reindex.
> >
> > What you intend to do is a typical use case for query time boosting with
> BoostQuery. That is explained in almost every book about search, like those
> about Solr or Elasticsearch.
> >
> > Most query parsers also allow to also add boost factors for fields, e.g.
> SimpleQueryParser (for humans that need simple syntax without fields).
> There you give a list of fields and boost factors.
> >
> > Uwe
> >
> > -----
> > Uwe Schindler
> > Achterdiek 19, D-28357 Bremen
> > https://urldefense.proofpoint.com/v2/url?u=https-
> 3A__www.thetaphi.de&d=DwIFaQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIr
> MUB65eapI_JnE&r=nlG5z5NcNdIbQAiX-
> BKNeyLlULCbaezrgocEvPhQkl4&m=r7LRZQV82ywkycV4mBw1baHDKxar0wnm
> JtLLTiUC0wI&s=Zj32e0QqmZFvPbBlD8DPeh7KHYfOgQr89wvmaRvy_n8&e=
> > eMail: [hidden email]
> >
> >> -----Original Message-----
> >> From: [hidden email] <[hidden email]>
> >> Sent: Monday, October 21, 2019 6:45 PM
> >> To: [hidden email]
> >> Cc: baris.kazar <[hidden email]>
> >> Subject: Re: Index-time boosting: Deprecated setBoost method
> >>
> >> Hi,-
> >>
> >> Thanks and i appreciate the disccussion.
> >>
> >> Let me please  ask this way, i think i give too much info at one time:
> >>
> >> Currently i have this:
> >>
> >> 

Field  f1= new TextField("field1", "string1", Field.Store.YES);

> >>
> >> doc.add(f1); 
f1.setBoost(2.0f);


> >>
> >> Field f2 = new TextField("field2", "string2", Field.Store.YES);

> >>
> >> doc.add(f2);

> >>
> >> f2.setBoost(1.0f);


> >>
> >>
> >> But this fails with Lucene 7.7.2.
> >>
> >>
> >> Probably it is more efficient and more flexible to fix this by using
> >> BoostQuery.
> >>
> >> However, what could be the fix with index time boosting? the code in my
> >> previous post was trying to do that.
> >>
> >> Best regards
> >>
> >>
> >> On 10/21/19 12:34 PM, Uwe Schindler wrote:
> >>> Hi,
> >>>
> >>> sorry I don't fully understand what you intend to do? If the boost values
> >> per field are static and used with exactly same value for every document,
> it's
> >> not needed a index time. You can just boost the field on the query side
> (e.g.
> >> using BoostQuery). Boosting every document with the same static values
> is
> >> an anti-pattern, that's something better suited for the query side - as you
> are
> >> more flexible.
> >>> If you need a different boost value per document, you can save that
> boost
> >> value in the index per document using a docvalues field (this consumes
> extra
> >> space, of course). Then you need the ExpressionQuery on the query side.
> But
> >> just because it looks like Javascript, it's not slow. The syntax is compiled to
> >> bytecode and directly included into the query execution as a dynamic java
> >> class, so it's very fast.
> >>> In short:
> >>> - If you need to have a different boost factor per field name that's
> constant
> >> for all documents, apply it at query time with BoostQuery.
> >>> - If you have to boost specific documents (e.g., top selling products),
> index
> >> a numeric docvalues field per document. On the query side you can use
> >> different query types to modify the score of each result based on the
> >> docvalues field. That can be done with Expression modules (using
> compiled
> >> Javascript) or by another query in Lucene that operates on ValueSource
> (e.g.,
> >> FunctionQuery). The first one is easier to use for complex formulas.4
> >>> Uwe
> >>>
> >>> -----
> >>> Uwe Schindler
> >>> Achterdiek 19, D-28357 Bremen
> >>> https://urldefense.proofpoint.com/v2/url?u=https-
> >>
> 3A__www.thetaphi.de&d=DwIFaQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIr
> >> MUB65eapI_JnE&r=nlG5z5NcNdIbQAiX-
> >>
> BKNeyLlULCbaezrgocEvPhQkl4&m=70RoM6loHhMGsp95phVzGQf8w5JxW7gX
> >> T5XnleMKrOs&s=td7cUfd22mXljSuvkUPXDunkIs_eO4GxdvHHxD2CTk0&e=
> >>> eMail: [hidden email]
> >>>
> >>>> -----Original Message-----
> >>>> From: [hidden email] <[hidden email]>
> >>>> Sent: Monday, October 21, 2019 5:17 PM
> >>>> To: [hidden email]
> >>>> Cc: baris.kazar <[hidden email]>
> >>>> Subject: Re: Index-time boosting: Deprecated setBoost method
> >>>>
> >>>> Hi,-
> >>>>
> >>>> Sorry about the missing parts in previous post. please accept my
> >>>> apologies for that.
> >>>>
> >>>> i needed to add a few more questions/corrections/additions to the
> >>>> previous post:
> >>>>
> >>>> Main Question was: if boost is a single constant value, do we need the
> >>>> Javascript part below?
> >>>>
> >>>>
> >>>>
> >>>> === Indexing code snippet for Lucene version 6.6.0 and before===
> >>>>
> >>>> Document doc = new Document();
> >>>>
> >>>>
> >>>> 

Field  f1= new TextField("field1", "string1", Field.Store.YES);

> >>>>
> >>>> doc.add(f1); 
f1.setBoost(2.0f);


> >>>>
> >>>> Field f2 = new TextField("field2", "string2", Field.Store.YES);

> >>>>
> >>>> doc.add(f2);

> >>>>
> >>>> f2.setBoost(1.0f);


> >>>>
> >>>> === end of indexing code snippet for Lucene version 6.6.0 and before
> ===
> >>>>
> >>>>
> >>>> This turns into this where _boost1 field is associated with field1 and
> >>>>
> >>>> _boost2 field is associated with field2 field:
> >>>>
> >>>>
> >>>> In Indexing code:
> >>>>
> >>>> === begining of indexing code snippet ===
> >>>> Field  f1= new TextField("field1", "string1", Field.Store.YES);

> >>>>
> >>>> Field _boost1 = new NumericDocValuesField(“field1”, 2L);
> >>>> doc.add(_boost1);
> >>>>
> >>>> // If this boost value needs to be stored, a separate storedField
> >>>> instance needs to be added as well
> >>>> … ( i will post this soon)
> >>>>
> >>>> Field _boost2 = new NumericDocValuesField(“field2”, 1L);
> >>>> doc.add(_boost2);
> >>>>
> >>>> // If this boost value needs to be stored, a separate storedField
> >>>> instance needs to be added as well
> >>>> … ( i will post this soon)
> >>>>
> >>>> === end of indexing code snippet ===
> >>>>
> >>>>
> >>>> Now, in the searching code (i.e., at query time) should i need the
> >>>> FunctionScoreQuery because in this case
> >>>>
> >>>> the boost is just a constant value but not a function? However, constant
> >>>> value can be argued to be a function with the same value all the time,
> too.
> >>>>
> >>>>
> >>>> == begining of query time code snippet ===
> >>>> Expression expr = JavascriptCompiler.compile(“_boost1 + _boost2");
> >>>>
> >>>> 

// SimpleBindings just maps variables to SortField instances

> >>>>
> >>>> SimpleBindings bindings = new SimpleBindings();

> >>>>
> >>>> bindings.add(new SortField("_boost1", SortField.Type.LONG));
 
//
> >> These
> >>>> have to LONG type i think since NumericDocValuesField accepts "long"
> >>>> type only, am i right? Can this be DOUBLE type?
> >>>>
> >>>> bindings.add(new SortField("_boost2", SortField.Type.LONG));
 
//
> >> same
> >>>> question here
> >>>>
> >>>> // create a query that matches based on body:contents but

> >>>>
> >>>> // scores using expr

> >>>>
> >>>> Query query = new FunctionScoreQuery(

> >>>>
> >>>>        new TermQuery(new Term("field1", "term_to_look_for")),

> >>>>
> >>>> expr.getDoubleValuesSource(bindings));
> >>>>
> >>>> 
searcher.search(query, 10);
> >>>>
> >>>> === end of code snippet ===
> >>>>
> >>>>
> >>>> Best regards
> >>>>
> >>>>
> >>>> On 10/21/19 11:05 AM, [hidden email] wrote:
> >>>>> Hi,-
> >>>>>
> >>>>>    i would like to ask the following to make it clearer (for me at least):
> >>>>>
> >>>>> Document doc = new Document();
> >>>>>
> >>>>> 

Field  f1= new TextField("field1", "string1", Field.Store.YES);

> >>>>>
> >>>>> doc.add(f1); 
f1.setBoost(2.0f);


> >>>>>
> >>>>> Field f2 = new TextField("field2", "string2", Field.Store.YES);

> >>>>>
> >>>>> doc.add(f2);

> >>>>>
> >>>>> f2.setBoost(1.0f);


> >>>>>
> >>>>>
> >>>>> This turns into this where _boost1 field is associated with field1 and
> >>>>>
> >>>>> _boost2 field is associated with field2 field:
> >>>>>
> >>>>>
> >>>>> In Indexing code:
> >>>>>
> >>>>> Field  f1= new TextField("field1", "string1", Field.Store.YES);

> >>>>>
> >>>>> Field _boost1 = new NumericDocValuesField(“field1”, 2L);
> >>>>> doc.add(_boost1);
> >>>>>
> >>>>> // If this boost value needs to be stored, a separate storedField
> >>>>> instance needs to be added as well
> >>>>> … ( i will post this soon)
> >>>>>
> >>>>> Field _boost2 = new NumericDocValuesField(“field2”, 1L);
> >>>>> doc.add(_boost2);
> >>>>>
> >>>>> // If this boost value needs to be stored, a separate storedField
> >>>>> instance needs to be added as well
> >>>>> … ( i will post this soon)
> >>>>>
> >>>>>
> >>>>> Now, in the searching code (i.e., at query time) should i need the
> >>>>> FunctionScoreQuery because in this case
> >>>>>
> >>>>> the boost is just a constant value but not a function? However,
> >>>>> constant value can be argued to be a function with the same value all
> >>>>> the time, too.
> >>>>>
> >>>>>
> >>>>> Expression expr = JavascriptCompiler.compile(“_boost");
> >>>>>
> >>>>> 

// SimpleBindings just maps variables to SortField instances

> >>>>>
> >>>>> SimpleBindings bindings = new SimpleBindings();

> >>>>>
> >>>>> bindings.add(new SortField("_boost1", SortField.Type.SCORE));
 

> >>>>>
> >>>>> // create a query that matches based on body:contents but

> >>>>>
> >>>>> // scores using expr

> >>>>>
> >>>>> Query query = new FunctionScoreQuery(

> >>>>>
> >>>>>       new TermQuery(new Term("field1", "term_to_look_for")),

> >>>>>
> >>>>> expr.getDoubleValuesSource(bindings));
> >>>>>
> >>>>> 
searcher.search(query, 10);
> >>>>>
> >>>>>
> >>>>> So, if boost is a single constant value, do we need the Javascript
> >>>>> part above?
> >>>>>
> >>>>> Best regards
> >>>>>
> >>>>>
> >>>>> On 10/18/19 4:07 PM, [hidden email] wrote:
> >>>>>> Uwe,-
> >>>>>>
> >>>>>>    can this
> >>>>>> https://urldefense.proofpoint.com/v2/url?u=https-
> >>>> 3A__lucene.apache.org_core_7-5F7-
> >>>>
> >>
> 5F2_expressions_org_apache_lucene_expressions_Expression.html&d=DwID
> >>
> aQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=nlG5z5NcNdI
> >>>> bQAiX-
> >>>>
> >>
> BKNeyLlULCbaezrgocEvPhQkl4&m=MR2S9Z9HEge6s665mtGOFRHKGmuiVYkjp
> >>>> 4tXOciYl7A&s=tMCjb5H5KivfJsp-BfABonpjelgp6hn9cBg2GScCmic&e=
> >>>>>> doc example that You also gave be extended with
> >> NumericDocValuesField
> >>>>>> part that needs to be done at indexing time boosting, too?
> >>>>>>
> >>>>>> i see now why You meant that this is mixed type of boosting (i.e.,
> >>>>>> both indexing time and search time).
> >>>>>>
> >>>>>> I need then include this query mentioned in this example on these
> >>>>>> _score field (i would call it _boost field in my case) into my
> >>>>>> overall BooleanQuery.
> >>>>>>
> >>>>>> i will now try to combine these together and post here for future
> help.
> >>>>>>
> >>>>>> Best regards
> >>>>>>
> >>>>>>
> >>>>>> On 10/18/19 3:18 PM, Uwe Schindler wrote:
> >>>>>>> Hi,
> >>>>>>>
> >>>>>>> Read my original email! The index time values are written using
> >>>>>>> NumericDocValuesField. The expressions docs also refer to that
> when
> >>>>>>> the bindings are documented.
> >>>>>>>
> >>>>>>> It's separate from the indexed data (TextField). Think of it like an
> >>>>>>> additional numeric field in your database table with a factor in
> >>>>>>> each row.
> >>>>>>>
> >>>>>>> Uwe
> >>>>>>>
> >>>>>>> Am October 18, 2019 7:14:03 PM UTC schrieb
> [hidden email]:
> >>>>>>>> Uwe,-
> >>>>>>>>
> >>>>>>>> Two questions there:
> >>>>>>>>
> >>>>>>>> i guess this is applicable to TextField, too.
> >>>>>>>>
> >>>>>>>> And i was expecting a index writer object in the example for index
> >>>>>>>> time
> >>>>>>>>
> >>>>>>>> boosting.
> >>>>>>>>
> >>>>>>>> Best regards
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> On 10/18/19 2:57 PM, Uwe Schindler wrote:
> >>>>>>>>> Sorry I was imprecise. It's a mix of both. The factors are stored
> per
> >>>>>>>> document in index (this is why I called it index time). During query
> >>>>>>>> time the expression use the index time values to fold them into the
> >>>>>>>> query boost at query time.
> >>>>>>>>> What's your problem with that approach?
> >>>>>>>>>
> >>>>>>>>> Uwe
> >>>>>>>>>
> >>>>>>>>> Am October 18, 2019 6:50:40 PM UTC schrieb
> >> [hidden email]:
> >>>>>>>>>> Uwe,-
> >>>>>>>>>>
> >>>>>>>>>>      Thanks, if possible i am looking for a pure Java methodology
> >>>>>>>>>> to do
> >>>>>>>> the
> >>>>>>>>>> index time boosting.
> >>>>>>>>>>
> >>>>>>>>>> This example looks like a search time boosting example:
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>> https://urldefense.proofpoint.com/v2/url?u=https-
> >>>> 3A__lucene.apache.org_core_7-5F7-
> >>>>
> >>
> 5F2_expressions_org_apache_lucene_expressions_Expression.html&d=DwIF
> >>
> aQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=nlG5z5NcNdI
> >>>> bQAiX-
> >>>>
> >>
> BKNeyLlULCbaezrgocEvPhQkl4&m=6m6i5zZXPZNP6DyVv_xG4vXnVTPEdfKLeLS
> >>>> vGjEXbyw&s=B5_kGwRIbAoGqL0-SVR9r3t78E5XUuzLT37TeyV-bv8&e=
> >>>>>>>>>> Best regards
> >>>>>>>>>>
> >>>>>>>>>> On 10/18/19 2:31 PM, Uwe Schindler wrote:
> >>>>>>>>>>> Hi,
> >>>>>>>>>>>
> >>>>>>>>>>>> Is there a working example for this? Is this mentioned in the
> >>>>>>>> Lucene
> >>>>>>>>>>>> Javadocs or any other docs so that i can look it?
> >>>>>>>>>>> To index the docvalues, see NumericDocValuesField (it can be
> >>>> added
> >>>>>>>> to
> >>>>>>>>>> documents like indexed or stored fields). You may have used
> them
> >>>> for
> >>>>>>>>>> sorting already.
> >>>>>>>>>>>> this methodology seems sort of like discouraging using index
> >> time
> >>>>>>>>>> boosting.
> >>>>>>>>>>> Not really. Many use this all the time. It's one of the killer
> >>>>>>>>>> features of both Solr and Elasticsearch. The problem was how
> the
> >>>>>>>>>> Document.setBoost()worked (it did not work correctly, see
> below).
> >>>>>>>>>>>> Previous setBoost method call was fine and easy to use.
> >>>>>>>>>>>> Did it have some performance issues and then is that why it
> was
> >>>>>>>>>> deprecated?
> >>>>>>>>>>> No the reason for deprecating this was for several reasons:
> >>>>>>>> setBoost
> >>>>>>>>>> was not doing what the user had expected. Internally the boost
> >> value
> >>>>>>>>>> was just multiplied into the document norm factor (which is
> >>>>>>>> internally
> >>>>>>>>>> also a docvalues field). The norm factors are only very inprecise
> >>>>>>>>>> floats stored in a byte, so precision is not well. If you put some
> >>>>>>>>>> values into it and the length norm was already consuming all
> bits,
> >>>>>>>> the
> >>>>>>>>>> boosting was very coarse. It was also only multiplied into and
> most
> >>>>>>>>>> users want to do some stuff like record click counts in the index
> >>>>>>>> and
> >>>>>>>>>> then boost for example with the logarithm or some other
> function.
> >> If
> >>>>>>>>>> the boost is just multiplied into the length norm you have no
> >>>>>>>>>> flexibility at all.
> >>>>>>>>>>> In addition you can have several docvalues fields and use their
> >>>>>>>>>> values in a function (e.g. one field with click count and another
> >>>>>>>> one
> >>>>>>>>>> with product price). After that you can combine click count and
> >>>>>>>> price
> >>>>>>>>>> (which can be modified indipenently during index updates) and
> >>>> change
> >>>>>>>>>> boost to boost lower price and higher click count up.
> >>>>>>>>>>> This is what you can do with the expressions module. You just
> >> give
> >>>>>>>> it
> >>>>>>>>>> a function.
> >>>>>>>>>>> Here is an example, the second example is using a
> >>>>>>>> FunctionScoreQuery
> >>>>>>>>>> that modifies the score based on the function and the given
> >>>>>>>> docvalues:
> >>>>>>>> https://urldefense.proofpoint.com/v2/url?u=https-
> >>>> 3A__lucene.apache.org_core_7-5F7-
> >>>>
> >>
> 5F2_expressions_org_apache_lucene_expressions_Expression.html&d=DwIF
> >>
> aQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=nlG5z5NcNdI
> >>>> bQAiX-
> >>>>
> >>
> BKNeyLlULCbaezrgocEvPhQkl4&m=6m6i5zZXPZNP6DyVv_xG4vXnVTPEdfKLeLS
> >>>> vGjEXbyw&s=B5_kGwRIbAoGqL0-SVR9r3t78E5XUuzLT37TeyV-bv8&e=
> >>>>>>>>>>>> FunctionScoreQuery usage with MultiFieldQueryParser would
> >> also
> >>>> be
> >>>>>>>>>> nice
> >>>>>>>>>>>> where
> >>>>>>>>>>>>
> >>>>>>>>>>>> MultiFieldQuery already has boosts field to do this in its
> >>>>>>>>>> constructor.
> >>>>>>>>>>> The boots in the query parser are applied for fields during
> query
> >>>>>>>>>> time (to have a different weight per field). Index time boosting is
> >>>>>>>> per
> >>>>>>>>>> document. So you can combine both.
> >>>>>>>>>>>> Maybe it is not needed with MultiFieldQueryParser.
> >>>>>>>>>>> You use MultiFieldQueryParser to adjust weights of the fields
> (e.g.
> >>>>>>>>>> title versus body). The parsed query is then wrapped with an
> >>>>>>>> expression
> >>>>>>>>>> that modifies the score per document according to the
> docvalues.
> >>>>>>>>>>> Uwe
> >>>>>>>>>>>
> >>>>>>>>>>>> On 10/18/19 1:28 PM, Uwe Schindler wrote:
> >>>>>>>>>>>>
> >>>>>>>>>>>>> Hi,
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> that's not true. You can do index time boosting, but you
> need
> >> to
> >>>>>>>> do
> >>>>>>>>>> that
> >>>>>>>>>>>> using a separate field. You just index a numeric docvalues
> field
> >>>>>>>>>> (which may
> >>>>>>>>>>>> contain a long or float value per document). Later you wrap
> your
> >>>>>>>>>> query with
> >>>>>>>>>>>> some FunctionScoreQuery (e.g., use the Javascript function
> >> query
> >>>>>>>>>> syntax in
> >>>>>>>>>>>> the expressions module). This allows you to compile a
> javascript
> >>>>>>>>>> function
> >>>>>>>>>>>> that calculated the final score based on the score returned by
> >> the
> >>>>>>>>>> inner query
> >>>>>>>>>>>> and combines them with docvalues that were indexed per
> >>>> document.
> >>>>>>>>>>>>> Uwe
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> -----
> >>>>>>>>>>>>> Uwe Schindler
> >>>>>>>>>>>>> Achterdiek 19, D-28357 Bremen
> >>>>>>>>>>>>> https://urldefense.proofpoint.com/v2/url?u=https-
> >>
> 3A__www.thetaphi.de&d=DwIFaQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIr
> >>>>>>>>>>>> MUB65eapI_JnE&r=nlG5z5NcNdIbQAiX-
> >>>>>>>>>>>>
> >>
> BKNeyLlULCbaezrgocEvPhQkl4&m=6rVk8db2H8dAcjS3WCWmAPd08C7JQCvZ
> >>
> 8W80yE9L5xY&s=zgKmnmP9gLG4DlEnAfDdtBMEzPXtHNVYojxXIKEnQgs&e=
> >>>>>>>>>>>>> eMail: [hidden email]
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>> -----Original Message-----
> >>>>>>>>>>>>>> From: [hidden email] <[hidden email]>
> >>>>>>>>>>>>>> Sent: Friday, October 18, 2019 5:28 PM
> >>>>>>>>>>>>>> To: [hidden email]
> >>>>>>>>>>>>>> Cc: [hidden email]
> >>>>>>>>>>>>>> Subject: Re: Index-time boosting: Deprecated setBoost
> >> method
> >>>>>>>>>>>>>> It looks like index-time boosting (field) is not possible since
> >>>>>>>>>> Lucene
> >>>>>>>>>>>>>> version 7.7.2 and
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> i was using before for another case the BoostQuery at
> search
> >>>>>>>> time
> >>>>>>>>>> for
> >>>>>>>>>>>>>> boosting and
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> this seems to be the only boosting option now in Lucene.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Best regards
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> On 10/18/19 10:01 AM, [hidden email] wrote:
> >>>>>>>>>>>>>>> Hi,-
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> i saw this in the Field class docs and i am figuring out the
> >>>>>>>>>> following
> >>>>>>>>>>>>>>> note in the docs:
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> setBoost(float boost)
> >>>>>>>>>>>>>>> Deprecated.
> >>>>>>>>>>>>>>> Index-time boosts are deprecated, please index index-
> time
> >>>>>>>> scoring
> >>>>>>>>>>>>>>> factors into a doc value field and combine them with the
> >> score
> >>>>>>>> at
> >>>>>>>>>>>>>>> query time using eg. FunctionScoreQuery.
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> I appreciate this note. Is there an example about this? I
> wish
> >>>>>>>>>> docs
> >>>>>>>>>>>>>>> would give a simple example to further help.
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>> https://urldefense.proofpoint.com/v2/url?u=https-
> >>>>>>>>>>>> 3A__lucene.apache.org_core_6-5F6-
> >>>>>>>>>>>>
> >>
> 5F0__core_org_apache_lucene_document_&d=DwIFaQ&c=RoP1YumCXCga
> >>>>>>>>>>>> WHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=nlG5z5NcNdIbQAiX-
> >>>>>>>>>>>>
> >>
> BKNeyLlULCbaezrgocEvPhQkl4&m=6rVk8db2H8dAcjS3WCWmAPd08C7JQCvZ
> >>>>
> 8W80yE9L5xY&s=rIVbw3_TGEwpaet5ibCeYze6vSDUiPhwOzlV0z484fM&e=
> >>>>>>>>>>>>>> Field.html
> >>>>>>>>>>>>>>> vs
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>> https://urldefense.proofpoint.com/v2/url?u=https-
> >>>>>>>>>>>> 3A__lucene.apache.org_core_7-5F7-
> >>>>>>>>>>>>
> >>
> 5F2_core_org_apache_lucene_document_F&d=DwIFaQ&c=RoP1YumCXCgaW
> >>>>>>>>>>>> HvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=nlG5z5NcNdIbQAiX-
> >>>>>>>>>>>>
> >>
> BKNeyLlULCbaezrgocEvPhQkl4&m=6rVk8db2H8dAcjS3WCWmAPd08C7JQCvZ
> >>
> 8W80yE9L5xY&s=yt1toHHZQBqd3qKpWeSzywGJhy928Q5qaEO4v9Lj3vg&e=
> >>>>>>>>>>>>>> ield.html
> >>>>>>>>>>>>>>> Best regards
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>> ---------------------------------------------------------------------
> >>>>>>>>>>>>>> To unsubscribe, e-mail: java-user-
> >>>> [hidden email]
> >>>>>>>>>>>>>> For additional commands, e-mail:
> >>>>>>>> [hidden email]
> >>>>>>>> ---------------------------------------------------------------------
> >>>>>>>>>>>>> To unsubscribe, e-mail: java-user-
> >>>> [hidden email]
> >>>>>>>>>>>>> For additional commands, e-mail: java-user-
> >>>> [hidden email]
> >>>>>>>> ---------------------------------------------------------------------
> >>>>>>>>>>>> To unsubscribe, e-mail: java-user-
> >> [hidden email]
> >>>>>>>>>>>> For additional commands, e-mail: java-user-
> >>>> [hidden email]
> >>>>>>>> ---------------------------------------------------------------------
> >>>>>>>>>>> To unsubscribe, e-mail: java-user-
> >> [hidden email]
> >>>>>>>>>>> For additional commands, e-mail: java-user-
> >>>> [hidden email]
> >>>>>>>> ---------------------------------------------------------------------
> >>>>>>>>>> To unsubscribe, e-mail: java-user-
> [hidden email]
> >>>>>>>>>> For additional commands, e-mail: java-user-
> >> [hidden email]
> >>>>>>>>> --
> >>>>>>>>> Uwe Schindler
> >>>>>>>>> Achterdiek 19, 28357 Bremen
> >>>>>>>>>
> >>>>>>>> https://urldefense.proofpoint.com/v2/url?u=https-
> >>
> 3A__www.thetaphi.de&d=DwIFaQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIr
> >>>> MUB65eapI_JnE&r=nlG5z5NcNdIbQAiX-
> >>>>
> >>
> BKNeyLlULCbaezrgocEvPhQkl4&m=6ksT9ArMj83Yxf_GrxLNeJ4UFEeKdVdLK0Bl
> >>>> OT0d754&s=33f2nq9rOLI5pN9e_RYl_TiEKnP_f4WMZ__vqyz2bzo&e=
> >>>>>>>> ---------------------------------------------------------------------
> >>>>>>>> To unsubscribe, e-mail: [hidden email]
> >>>>>>>> For additional commands, e-mail: java-user-
> [hidden email]
> >>>>>>> --
> >>>>>>> Uwe Schindler
> >>>>>>> Achterdiek 19, 28357 Bremen
> >>>>>>> https://urldefense.proofpoint.com/v2/url?u=https-
> >>
> 3A__www.thetaphi.de&d=DwIFaQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIr
> >>>> MUB65eapI_JnE&r=nlG5z5NcNdIbQAiX-
> >>>>
> >>
> BKNeyLlULCbaezrgocEvPhQkl4&m=owjI40OeLzt8gvPN44aTdndoiUel5E9Hqx1T
> >>>> EcoWk_Y&s=xbZedNkQXb5eQcw_K7lCOP7b5ToKJVZ1dCPY3hi836c&e=
> >>>>> ---------------------------------------------------------------------
> >>>>> To unsubscribe, e-mail: [hidden email]
> >>>>> For additional commands, e-mail: [hidden email]
> >>>>>
> >>>> ---------------------------------------------------------------------
> >>>> To unsubscribe, e-mail: [hidden email]
> >>>> For additional commands, e-mail: [hidden email]
> >>> ---------------------------------------------------------------------
> >>> To unsubscribe, e-mail: [hidden email]
> >>> For additional commands, e-mail: [hidden email]
> >>>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: [hidden email]
> >> For additional commands, e-mail: [hidden email]
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: [hidden email]
> > For additional commands, e-mail: [hidden email]
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Index-time boosting: Deprecated setBoost method

baris.kazar
Hi,-

Thanks.

  lets apply to this case:

QueryParser parser = new QueryParser("field1", analyzer) ;
parser.setPhraseSlop(2);
Query query = parser.parse("some string value here"+"*");
TopDocs hits = indexsearcherObject.search(query, 10);

Now i want to use BoostQuery

QueryParser parser = new QueryParser("field1", analyzerObject) ;
parser.setPhraseSlop(2);
Query query = parser.parse("some string value here"+"*");

BoostQuery bq = new BoostQuery(query, "2.0f");

TopDocs hits = indexsearcherObject.search(bq, 10);


Now how will i process field2 with boost value 1.0f?

Before, this was being done at index time.


i can see the only way here is the BooleanQuery which combines

the first boostquery object bq and another one that i need to define for
bq2 for field2.

is there any other way?

Best regards



On 10/21/19 2:33 PM, Uwe Schindler wrote:

> Hi Boris,
>
>> That is ok, and i can see this case would be best with BoostQuery and
>> also i dont have to use lucene expression jar and its dependents.
>>
>> However, i am curious how to do this kind of field based boosting at
>> index time even though i will prefer the query time boosting methodology.
> The reason why it was deprecated is exactly the problem I mentioned before: It did never do what the user expected. The boost factor given in the document's field was multiplied into the per document norms. Unfortunately, at the same time, he query normalization was using query statistics and normalized the scores. As Lucene is working per field, the same normalization is done per field, resulting in the constant factor per field to disappear. There was still some effect of index time boosting if different documents had different values, but it your case all is the same. I am not sure how your queries worked before, but the constant boost factors per field at index time did definitely not have the effect you were thinking of. Since the earliest version of Lucene, boosting at query time was the way to go to have different weights per field.
>
> The new feature in Lucene is now that you can change the score per document using docvalues and apply that per document at query time. Previously this was also possible with Document/Field#setBoost, but the flexibility was missing (only multiplying and limited precision). In addition the normalization effects made the whole thing not reliable.
>
> Uwe
>
>> Best regards
>>
>>
>> On 10/21/19 12:54 PM, Uwe Schindler wrote:
>>> Hi,
>>>
>>> As I said, before that is a misuse of index-time boosting. In addition in
>> previous versions it did not even work correctly, because of query
>> normalization it was normalized away anyways. And on top, to change it
>> your have to reindex.
>>> What you intend to do is a typical use case for query time boosting with
>> BoostQuery. That is explained in almost every book about search, like those
>> about Solr or Elasticsearch.
>>> Most query parsers also allow to also add boost factors for fields, e.g.
>> SimpleQueryParser (for humans that need simple syntax without fields).
>> There you give a list of fields and boost factors.
>>> Uwe
>>>
>>> -----
>>> Uwe Schindler
>>> Achterdiek 19, D-28357 Bremen
>>> https://urldefense.proofpoint.com/v2/url?u=https-
>> 3A__www.thetaphi.de&d=DwIFaQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIr
>> MUB65eapI_JnE&r=nlG5z5NcNdIbQAiX-
>> BKNeyLlULCbaezrgocEvPhQkl4&m=r7LRZQV82ywkycV4mBw1baHDKxar0wnm
>> JtLLTiUC0wI&s=Zj32e0QqmZFvPbBlD8DPeh7KHYfOgQr89wvmaRvy_n8&e=
>>> eMail: [hidden email]
>>>
>>>> -----Original Message-----
>>>> From: [hidden email] <[hidden email]>
>>>> Sent: Monday, October 21, 2019 6:45 PM
>>>> To: [hidden email]
>>>> Cc: baris.kazar <[hidden email]>
>>>> Subject: Re: Index-time boosting: Deprecated setBoost method
>>>>
>>>> Hi,-
>>>>
>>>> Thanks and i appreciate the disccussion.
>>>>
>>>> Let me please  ask this way, i think i give too much info at one time:
>>>>
>>>> Currently i have this:
>>>>
>>>> 

Field  f1= new TextField("field1", "string1", Field.Store.YES);

>>>>
>>>> doc.add(f1); 
f1.setBoost(2.0f);


>>>>
>>>> Field f2 = new TextField("field2", "string2", Field.Store.YES);

>>>>
>>>> doc.add(f2);

>>>>
>>>> f2.setBoost(1.0f);


>>>>
>>>>
>>>> But this fails with Lucene 7.7.2.
>>>>
>>>>
>>>> Probably it is more efficient and more flexible to fix this by using
>>>> BoostQuery.
>>>>
>>>> However, what could be the fix with index time boosting? the code in my
>>>> previous post was trying to do that.
>>>>
>>>> Best regards
>>>>
>>>>
>>>> On 10/21/19 12:34 PM, Uwe Schindler wrote:
>>>>> Hi,
>>>>>
>>>>> sorry I don't fully understand what you intend to do? If the boost values
>>>> per field are static and used with exactly same value for every document,
>> it's
>>>> not needed a index time. You can just boost the field on the query side
>> (e.g.
>>>> using BoostQuery). Boosting every document with the same static values
>> is
>>>> an anti-pattern, that's something better suited for the query side - as you
>> are
>>>> more flexible.
>>>>> If you need a different boost value per document, you can save that
>> boost
>>>> value in the index per document using a docvalues field (this consumes
>> extra
>>>> space, of course). Then you need the ExpressionQuery on the query side.
>> But
>>>> just because it looks like Javascript, it's not slow. The syntax is compiled to
>>>> bytecode and directly included into the query execution as a dynamic java
>>>> class, so it's very fast.
>>>>> In short:
>>>>> - If you need to have a different boost factor per field name that's
>> constant
>>>> for all documents, apply it at query time with BoostQuery.
>>>>> - If you have to boost specific documents (e.g., top selling products),
>> index
>>>> a numeric docvalues field per document. On the query side you can use
>>>> different query types to modify the score of each result based on the
>>>> docvalues field. That can be done with Expression modules (using
>> compiled
>>>> Javascript) or by another query in Lucene that operates on ValueSource
>> (e.g.,
>>>> FunctionQuery). The first one is easier to use for complex formulas.4
>>>>> Uwe
>>>>>
>>>>> -----
>>>>> Uwe Schindler
>>>>> Achterdiek 19, D-28357 Bremen
>>>>> https://urldefense.proofpoint.com/v2/url?u=https-
>> 3A__www.thetaphi.de&d=DwIFaQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIr
>>>> MUB65eapI_JnE&r=nlG5z5NcNdIbQAiX-
>>>>
>> BKNeyLlULCbaezrgocEvPhQkl4&m=70RoM6loHhMGsp95phVzGQf8w5JxW7gX
>>>> T5XnleMKrOs&s=td7cUfd22mXljSuvkUPXDunkIs_eO4GxdvHHxD2CTk0&e=
>>>>> eMail: [hidden email]
>>>>>
>>>>>> -----Original Message-----
>>>>>> From: [hidden email] <[hidden email]>
>>>>>> Sent: Monday, October 21, 2019 5:17 PM
>>>>>> To: [hidden email]
>>>>>> Cc: baris.kazar <[hidden email]>
>>>>>> Subject: Re: Index-time boosting: Deprecated setBoost method
>>>>>>
>>>>>> Hi,-
>>>>>>
>>>>>> Sorry about the missing parts in previous post. please accept my
>>>>>> apologies for that.
>>>>>>
>>>>>> i needed to add a few more questions/corrections/additions to the
>>>>>> previous post:
>>>>>>
>>>>>> Main Question was: if boost is a single constant value, do we need the
>>>>>> Javascript part below?
>>>>>>
>>>>>>
>>>>>>
>>>>>> === Indexing code snippet for Lucene version 6.6.0 and before===
>>>>>>
>>>>>> Document doc = new Document();
>>>>>>
>>>>>>
>>>>>> 

Field  f1= new TextField("field1", "string1", Field.Store.YES);

>>>>>>
>>>>>> doc.add(f1); 
f1.setBoost(2.0f);


>>>>>>
>>>>>> Field f2 = new TextField("field2", "string2", Field.Store.YES);

>>>>>>
>>>>>> doc.add(f2);

>>>>>>
>>>>>> f2.setBoost(1.0f);


>>>>>>
>>>>>> === end of indexing code snippet for Lucene version 6.6.0 and before
>> ===
>>>>>>
>>>>>> This turns into this where _boost1 field is associated with field1 and
>>>>>>
>>>>>> _boost2 field is associated with field2 field:
>>>>>>
>>>>>>
>>>>>> In Indexing code:
>>>>>>
>>>>>> === begining of indexing code snippet ===
>>>>>> Field  f1= new TextField("field1", "string1", Field.Store.YES);

>>>>>>
>>>>>> Field _boost1 = new NumericDocValuesField(“field1”, 2L);
>>>>>> doc.add(_boost1);
>>>>>>
>>>>>> // If this boost value needs to be stored, a separate storedField
>>>>>> instance needs to be added as well
>>>>>> … ( i will post this soon)
>>>>>>
>>>>>> Field _boost2 = new NumericDocValuesField(“field2”, 1L);
>>>>>> doc.add(_boost2);
>>>>>>
>>>>>> // If this boost value needs to be stored, a separate storedField
>>>>>> instance needs to be added as well
>>>>>> … ( i will post this soon)
>>>>>>
>>>>>> === end of indexing code snippet ===
>>>>>>
>>>>>>
>>>>>> Now, in the searching code (i.e., at query time) should i need the
>>>>>> FunctionScoreQuery because in this case
>>>>>>
>>>>>> the boost is just a constant value but not a function? However, constant
>>>>>> value can be argued to be a function with the same value all the time,
>> too.
>>>>>>
>>>>>> == begining of query time code snippet ===
>>>>>> Expression expr = JavascriptCompiler.compile(“_boost1 + _boost2");
>>>>>>
>>>>>> 

// SimpleBindings just maps variables to SortField instances

>>>>>>
>>>>>> SimpleBindings bindings = new SimpleBindings();

>>>>>>
>>>>>> bindings.add(new SortField("_boost1", SortField.Type.LONG));
 
//
>>>> These
>>>>>> have to LONG type i think since NumericDocValuesField accepts "long"
>>>>>> type only, am i right? Can this be DOUBLE type?
>>>>>>
>>>>>> bindings.add(new SortField("_boost2", SortField.Type.LONG));
 
//
>>>> same
>>>>>> question here
>>>>>>
>>>>>> // create a query that matches based on body:contents but

>>>>>>
>>>>>> // scores using expr

>>>>>>
>>>>>> Query query = new FunctionScoreQuery(

>>>>>>
>>>>>>         new TermQuery(new Term("field1", "term_to_look_for")),

>>>>>>
>>>>>> expr.getDoubleValuesSource(bindings));
>>>>>>
>>>>>> 
searcher.search(query, 10);
>>>>>>
>>>>>> === end of code snippet ===
>>>>>>
>>>>>>
>>>>>> Best regards
>>>>>>
>>>>>>
>>>>>> On 10/21/19 11:05 AM, [hidden email] wrote:
>>>>>>> Hi,-
>>>>>>>
>>>>>>>     i would like to ask the following to make it clearer (for me at least):
>>>>>>>
>>>>>>> Document doc = new Document();
>>>>>>>
>>>>>>> 

Field  f1= new TextField("field1", "string1", Field.Store.YES);

>>>>>>>
>>>>>>> doc.add(f1); 
f1.setBoost(2.0f);


>>>>>>>
>>>>>>> Field f2 = new TextField("field2", "string2", Field.Store.YES);

>>>>>>>
>>>>>>> doc.add(f2);

>>>>>>>
>>>>>>> f2.setBoost(1.0f);


>>>>>>>
>>>>>>>
>>>>>>> This turns into this where _boost1 field is associated with field1 and
>>>>>>>
>>>>>>> _boost2 field is associated with field2 field:
>>>>>>>
>>>>>>>
>>>>>>> In Indexing code:
>>>>>>>
>>>>>>> Field  f1= new TextField("field1", "string1", Field.Store.YES);

>>>>>>>
>>>>>>> Field _boost1 = new NumericDocValuesField(“field1”, 2L);
>>>>>>> doc.add(_boost1);
>>>>>>>
>>>>>>> // If this boost value needs to be stored, a separate storedField
>>>>>>> instance needs to be added as well
>>>>>>> … ( i will post this soon)
>>>>>>>
>>>>>>> Field _boost2 = new NumericDocValuesField(“field2”, 1L);
>>>>>>> doc.add(_boost2);
>>>>>>>
>>>>>>> // If this boost value needs to be stored, a separate storedField
>>>>>>> instance needs to be added as well
>>>>>>> … ( i will post this soon)
>>>>>>>
>>>>>>>
>>>>>>> Now, in the searching code (i.e., at query time) should i need the
>>>>>>> FunctionScoreQuery because in this case
>>>>>>>
>>>>>>> the boost is just a constant value but not a function? However,
>>>>>>> constant value can be argued to be a function with the same value all
>>>>>>> the time, too.
>>>>>>>
>>>>>>>
>>>>>>> Expression expr = JavascriptCompiler.compile(“_boost");
>>>>>>>
>>>>>>> 

// SimpleBindings just maps variables to SortField instances

>>>>>>>
>>>>>>> SimpleBindings bindings = new SimpleBindings();

>>>>>>>
>>>>>>> bindings.add(new SortField("_boost1", SortField.Type.SCORE));
 

>>>>>>>
>>>>>>> // create a query that matches based on body:contents but

>>>>>>>
>>>>>>> // scores using expr

>>>>>>>
>>>>>>> Query query = new FunctionScoreQuery(

>>>>>>>
>>>>>>>        new TermQuery(new Term("field1", "term_to_look_for")),

>>>>>>>
>>>>>>> expr.getDoubleValuesSource(bindings));
>>>>>>>
>>>>>>> 
searcher.search(query, 10);
>>>>>>>
>>>>>>>
>>>>>>> So, if boost is a single constant value, do we need the Javascript
>>>>>>> part above?
>>>>>>>
>>>>>>> Best regards
>>>>>>>
>>>>>>>
>>>>>>> On 10/18/19 4:07 PM, [hidden email] wrote:
>>>>>>>> Uwe,-
>>>>>>>>
>>>>>>>>     can this
>>>>>>>> https://urldefense.proofpoint.com/v2/url?u=https-
>>>>>> 3A__lucene.apache.org_core_7-5F7-
>>>>>>
>> 5F2_expressions_org_apache_lucene_expressions_Expression.html&d=DwID
>> aQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=nlG5z5NcNdI
>>>>>> bQAiX-
>>>>>>
>> BKNeyLlULCbaezrgocEvPhQkl4&m=MR2S9Z9HEge6s665mtGOFRHKGmuiVYkjp
>>>>>> 4tXOciYl7A&s=tMCjb5H5KivfJsp-BfABonpjelgp6hn9cBg2GScCmic&e=
>>>>>>>> doc example that You also gave be extended with
>>>> NumericDocValuesField
>>>>>>>> part that needs to be done at indexing time boosting, too?
>>>>>>>>
>>>>>>>> i see now why You meant that this is mixed type of boosting (i.e.,
>>>>>>>> both indexing time and search time).
>>>>>>>>
>>>>>>>> I need then include this query mentioned in this example on these
>>>>>>>> _score field (i would call it _boost field in my case) into my
>>>>>>>> overall BooleanQuery.
>>>>>>>>
>>>>>>>> i will now try to combine these together and post here for future
>> help.
>>>>>>>> Best regards
>>>>>>>>
>>>>>>>>
>>>>>>>> On 10/18/19 3:18 PM, Uwe Schindler wrote:
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>> Read my original email! The index time values are written using
>>>>>>>>> NumericDocValuesField. The expressions docs also refer to that
>> when
>>>>>>>>> the bindings are documented.
>>>>>>>>>
>>>>>>>>> It's separate from the indexed data (TextField). Think of it like an
>>>>>>>>> additional numeric field in your database table with a factor in
>>>>>>>>> each row.
>>>>>>>>>
>>>>>>>>> Uwe
>>>>>>>>>
>>>>>>>>> Am October 18, 2019 7:14:03 PM UTC schrieb
>> [hidden email]:
>>>>>>>>>> Uwe,-
>>>>>>>>>>
>>>>>>>>>> Two questions there:
>>>>>>>>>>
>>>>>>>>>> i guess this is applicable to TextField, too.
>>>>>>>>>>
>>>>>>>>>> And i was expecting a index writer object in the example for index
>>>>>>>>>> time
>>>>>>>>>>
>>>>>>>>>> boosting.
>>>>>>>>>>
>>>>>>>>>> Best regards
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On 10/18/19 2:57 PM, Uwe Schindler wrote:
>>>>>>>>>>> Sorry I was imprecise. It's a mix of both. The factors are stored
>> per
>>>>>>>>>> document in index (this is why I called it index time). During query
>>>>>>>>>> time the expression use the index time values to fold them into the
>>>>>>>>>> query boost at query time.
>>>>>>>>>>> What's your problem with that approach?
>>>>>>>>>>>
>>>>>>>>>>> Uwe
>>>>>>>>>>>
>>>>>>>>>>> Am October 18, 2019 6:50:40 PM UTC schrieb
>>>> [hidden email]:
>>>>>>>>>>>> Uwe,-
>>>>>>>>>>>>
>>>>>>>>>>>>       Thanks, if possible i am looking for a pure Java methodology
>>>>>>>>>>>> to do
>>>>>>>>>> the
>>>>>>>>>>>> index time boosting.
>>>>>>>>>>>>
>>>>>>>>>>>> This example looks like a search time boosting example:
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>> https://urldefense.proofpoint.com/v2/url?u=https-
>>>>>> 3A__lucene.apache.org_core_7-5F7-
>>>>>>
>> 5F2_expressions_org_apache_lucene_expressions_Expression.html&d=DwIF
>> aQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=nlG5z5NcNdI
>>>>>> bQAiX-
>>>>>>
>> BKNeyLlULCbaezrgocEvPhQkl4&m=6m6i5zZXPZNP6DyVv_xG4vXnVTPEdfKLeLS
>>>>>> vGjEXbyw&s=B5_kGwRIbAoGqL0-SVR9r3t78E5XUuzLT37TeyV-bv8&e=
>>>>>>>>>>>> Best regards
>>>>>>>>>>>>
>>>>>>>>>>>> On 10/18/19 2:31 PM, Uwe Schindler wrote:
>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Is there a working example for this? Is this mentioned in the
>>>>>>>>>> Lucene
>>>>>>>>>>>>>> Javadocs or any other docs so that i can look it?
>>>>>>>>>>>>> To index the docvalues, see NumericDocValuesField (it can be
>>>>>> added
>>>>>>>>>> to
>>>>>>>>>>>> documents like indexed or stored fields). You may have used
>> them
>>>>>> for
>>>>>>>>>>>> sorting already.
>>>>>>>>>>>>>> this methodology seems sort of like discouraging using index
>>>> time
>>>>>>>>>>>> boosting.
>>>>>>>>>>>>> Not really. Many use this all the time. It's one of the killer
>>>>>>>>>>>> features of both Solr and Elasticsearch. The problem was how
>> the
>>>>>>>>>>>> Document.setBoost()worked (it did not work correctly, see
>> below).
>>>>>>>>>>>>>> Previous setBoost method call was fine and easy to use.
>>>>>>>>>>>>>> Did it have some performance issues and then is that why it
>> was
>>>>>>>>>>>> deprecated?
>>>>>>>>>>>>> No the reason for deprecating this was for several reasons:
>>>>>>>>>> setBoost
>>>>>>>>>>>> was not doing what the user had expected. Internally the boost
>>>> value
>>>>>>>>>>>> was just multiplied into the document norm factor (which is
>>>>>>>>>> internally
>>>>>>>>>>>> also a docvalues field). The norm factors are only very inprecise
>>>>>>>>>>>> floats stored in a byte, so precision is not well. If you put some
>>>>>>>>>>>> values into it and the length norm was already consuming all
>> bits,
>>>>>>>>>> the
>>>>>>>>>>>> boosting was very coarse. It was also only multiplied into and
>> most
>>>>>>>>>>>> users want to do some stuff like record click counts in the index
>>>>>>>>>> and
>>>>>>>>>>>> then boost for example with the logarithm or some other
>> function.
>>>> If
>>>>>>>>>>>> the boost is just multiplied into the length norm you have no
>>>>>>>>>>>> flexibility at all.
>>>>>>>>>>>>> In addition you can have several docvalues fields and use their
>>>>>>>>>>>> values in a function (e.g. one field with click count and another
>>>>>>>>>> one
>>>>>>>>>>>> with product price). After that you can combine click count and
>>>>>>>>>> price
>>>>>>>>>>>> (which can be modified indipenently during index updates) and
>>>>>> change
>>>>>>>>>>>> boost to boost lower price and higher click count up.
>>>>>>>>>>>>> This is what you can do with the expressions module. You just
>>>> give
>>>>>>>>>> it
>>>>>>>>>>>> a function.
>>>>>>>>>>>>> Here is an example, the second example is using a
>>>>>>>>>> FunctionScoreQuery
>>>>>>>>>>>> that modifies the score based on the function and the given
>>>>>>>>>> docvalues:
>>>>>>>>>> https://urldefense.proofpoint.com/v2/url?u=https-
>>>>>> 3A__lucene.apache.org_core_7-5F7-
>>>>>>
>> 5F2_expressions_org_apache_lucene_expressions_Expression.html&d=DwIF
>> aQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=nlG5z5NcNdI
>>>>>> bQAiX-
>>>>>>
>> BKNeyLlULCbaezrgocEvPhQkl4&m=6m6i5zZXPZNP6DyVv_xG4vXnVTPEdfKLeLS
>>>>>> vGjEXbyw&s=B5_kGwRIbAoGqL0-SVR9r3t78E5XUuzLT37TeyV-bv8&e=
>>>>>>>>>>>>>> FunctionScoreQuery usage with MultiFieldQueryParser would
>>>> also
>>>>>> be
>>>>>>>>>>>> nice
>>>>>>>>>>>>>> where
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> MultiFieldQuery already has boosts field to do this in its
>>>>>>>>>>>> constructor.
>>>>>>>>>>>>> The boots in the query parser are applied for fields during
>> query
>>>>>>>>>>>> time (to have a different weight per field). Index time boosting is
>>>>>>>>>> per
>>>>>>>>>>>> document. So you can combine both.
>>>>>>>>>>>>>> Maybe it is not needed with MultiFieldQueryParser.
>>>>>>>>>>>>> You use MultiFieldQueryParser to adjust weights of the fields
>> (e.g.
>>>>>>>>>>>> title versus body). The parsed query is then wrapped with an
>>>>>>>>>> expression
>>>>>>>>>>>> that modifies the score per document according to the
>> docvalues.
>>>>>>>>>>>>> Uwe
>>>>>>>>>>>>>
>>>>>>>>>>>>>> On 10/18/19 1:28 PM, Uwe Schindler wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> that's not true. You can do index time boosting, but you
>> need
>>>> to
>>>>>>>>>> do
>>>>>>>>>>>> that
>>>>>>>>>>>>>> using a separate field. You just index a numeric docvalues
>> field
>>>>>>>>>>>> (which may
>>>>>>>>>>>>>> contain a long or float value per document). Later you wrap
>> your
>>>>>>>>>>>> query with
>>>>>>>>>>>>>> some FunctionScoreQuery (e.g., use the Javascript function
>>>> query
>>>>>>>>>>>> syntax in
>>>>>>>>>>>>>> the expressions module). This allows you to compile a
>> javascript
>>>>>>>>>>>> function
>>>>>>>>>>>>>> that calculated the final score based on the score returned by
>>>> the
>>>>>>>>>>>> inner query
>>>>>>>>>>>>>> and combines them with docvalues that were indexed per
>>>>>> document.
>>>>>>>>>>>>>>> Uwe
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> -----
>>>>>>>>>>>>>>> Uwe Schindler
>>>>>>>>>>>>>>> Achterdiek 19, D-28357 Bremen
>>>>>>>>>>>>>>> https://urldefense.proofpoint.com/v2/url?u=https-
>> 3A__www.thetaphi.de&d=DwIFaQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIr
>>>>>>>>>>>>>> MUB65eapI_JnE&r=nlG5z5NcNdIbQAiX-
>>>>>>>>>>>>>>
>> BKNeyLlULCbaezrgocEvPhQkl4&m=6rVk8db2H8dAcjS3WCWmAPd08C7JQCvZ
>> 8W80yE9L5xY&s=zgKmnmP9gLG4DlEnAfDdtBMEzPXtHNVYojxXIKEnQgs&e=
>>>>>>>>>>>>>>> eMail: [hidden email]
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> -----Original Message-----
>>>>>>>>>>>>>>>> From: [hidden email] <[hidden email]>
>>>>>>>>>>>>>>>> Sent: Friday, October 18, 2019 5:28 PM
>>>>>>>>>>>>>>>> To: [hidden email]
>>>>>>>>>>>>>>>> Cc: [hidden email]
>>>>>>>>>>>>>>>> Subject: Re: Index-time boosting: Deprecated setBoost
>>>> method
>>>>>>>>>>>>>>>> It looks like index-time boosting (field) is not possible since
>>>>>>>>>>>> Lucene
>>>>>>>>>>>>>>>> version 7.7.2 and
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> i was using before for another case the BoostQuery at
>> search
>>>>>>>>>> time
>>>>>>>>>>>> for
>>>>>>>>>>>>>>>> boosting and
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> this seems to be the only boosting option now in Lucene.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Best regards
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On 10/18/19 10:01 AM, [hidden email] wrote:
>>>>>>>>>>>>>>>>> Hi,-
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> i saw this in the Field class docs and i am figuring out the
>>>>>>>>>>>> following
>>>>>>>>>>>>>>>>> note in the docs:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> setBoost(float boost)
>>>>>>>>>>>>>>>>> Deprecated.
>>>>>>>>>>>>>>>>> Index-time boosts are deprecated, please index index-
>> time
>>>>>>>>>> scoring
>>>>>>>>>>>>>>>>> factors into a doc value field and combine them with the
>>>> score
>>>>>>>>>> at
>>>>>>>>>>>>>>>>> query time using eg. FunctionScoreQuery.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> I appreciate this note. Is there an example about this? I
>> wish
>>>>>>>>>>>> docs
>>>>>>>>>>>>>>>>> would give a simple example to further help.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> https://urldefense.proofpoint.com/v2/url?u=https-
>>>>>>>>>>>>>> 3A__lucene.apache.org_core_6-5F6-
>>>>>>>>>>>>>>
>> 5F0__core_org_apache_lucene_document_&d=DwIFaQ&c=RoP1YumCXCga
>>>>>>>>>>>>>> WHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=nlG5z5NcNdIbQAiX-
>>>>>>>>>>>>>>
>> BKNeyLlULCbaezrgocEvPhQkl4&m=6rVk8db2H8dAcjS3WCWmAPd08C7JQCvZ
>> 8W80yE9L5xY&s=rIVbw3_TGEwpaet5ibCeYze6vSDUiPhwOzlV0z484fM&e=
>>>>>>>>>>>>>>>> Field.html
>>>>>>>>>>>>>>>>> vs
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> https://urldefense.proofpoint.com/v2/url?u=https-
>>>>>>>>>>>>>> 3A__lucene.apache.org_core_7-5F7-
>>>>>>>>>>>>>>
>> 5F2_core_org_apache_lucene_document_F&d=DwIFaQ&c=RoP1YumCXCgaW
>>>>>>>>>>>>>> HvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=nlG5z5NcNdIbQAiX-
>>>>>>>>>>>>>>
>> BKNeyLlULCbaezrgocEvPhQkl4&m=6rVk8db2H8dAcjS3WCWmAPd08C7JQCvZ
>> 8W80yE9L5xY&s=yt1toHHZQBqd3qKpWeSzywGJhy928Q5qaEO4v9Lj3vg&e=
>>>>>>>>>>>>>>>> ield.html
>>>>>>>>>>>>>>>>> Best regards
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>> ---------------------------------------------------------------------
>>>>>>>>>>>>>>>> To unsubscribe, e-mail: java-user-
>>>>>> [hidden email]
>>>>>>>>>>>>>>>> For additional commands, e-mail:
>>>>>>>>>> [hidden email]
>>>>>>>>>> ---------------------------------------------------------------------
>>>>>>>>>>>>>>> To unsubscribe, e-mail: java-user-
>>>>>> [hidden email]
>>>>>>>>>>>>>>> For additional commands, e-mail: java-user-
>>>>>> [hidden email]
>>>>>>>>>> ---------------------------------------------------------------------
>>>>>>>>>>>>>> To unsubscribe, e-mail: java-user-
>>>> [hidden email]
>>>>>>>>>>>>>> For additional commands, e-mail: java-user-
>>>>>> [hidden email]
>>>>>>>>>> ---------------------------------------------------------------------
>>>>>>>>>>>>> To unsubscribe, e-mail: java-user-
>>>> [hidden email]
>>>>>>>>>>>>> For additional commands, e-mail: java-user-
>>>>>> [hidden email]
>>>>>>>>>> ---------------------------------------------------------------------
>>>>>>>>>>>> To unsubscribe, e-mail: java-user-
>> [hidden email]
>>>>>>>>>>>> For additional commands, e-mail: java-user-
>>>> [hidden email]
>>>>>>>>>>> --
>>>>>>>>>>> Uwe Schindler
>>>>>>>>>>> Achterdiek 19, 28357 Bremen
>>>>>>>>>>>
>>>>>>>>>> https://urldefense.proofpoint.com/v2/url?u=https-
>> 3A__www.thetaphi.de&d=DwIFaQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIr
>>>>>> MUB65eapI_JnE&r=nlG5z5NcNdIbQAiX-
>>>>>>
>> BKNeyLlULCbaezrgocEvPhQkl4&m=6ksT9ArMj83Yxf_GrxLNeJ4UFEeKdVdLK0Bl
>>>>>> OT0d754&s=33f2nq9rOLI5pN9e_RYl_TiEKnP_f4WMZ__vqyz2bzo&e=
>>>>>>>>>> ---------------------------------------------------------------------
>>>>>>>>>> To unsubscribe, e-mail: [hidden email]
>>>>>>>>>> For additional commands, e-mail: java-user-
>> [hidden email]
>>>>>>>>> --
>>>>>>>>> Uwe Schindler
>>>>>>>>> Achterdiek 19, 28357 Bremen
>>>>>>>>> https://urldefense.proofpoint.com/v2/url?u=https-
>> 3A__www.thetaphi.de&d=DwIFaQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIr
>>>>>> MUB65eapI_JnE&r=nlG5z5NcNdIbQAiX-
>>>>>>
>> BKNeyLlULCbaezrgocEvPhQkl4&m=owjI40OeLzt8gvPN44aTdndoiUel5E9Hqx1T
>>>>>> EcoWk_Y&s=xbZedNkQXb5eQcw_K7lCOP7b5ToKJVZ1dCPY3hi836c&e=
>>>>>>> ---------------------------------------------------------------------
>>>>>>> To unsubscribe, e-mail: [hidden email]
>>>>>>> For additional commands, e-mail: [hidden email]
>>>>>>>
>>>>>> ---------------------------------------------------------------------
>>>>>> To unsubscribe, e-mail: [hidden email]
>>>>>> For additional commands, e-mail: [hidden email]
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: [hidden email]
>>>>> For additional commands, e-mail: [hidden email]
>>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: [hidden email]
>>>> For additional commands, e-mail: [hidden email]
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: [hidden email]
>>> For additional commands, e-mail: [hidden email]
>>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [hidden email]
>> For additional commands, e-mail: [hidden email]
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Index-time boosting: Deprecated setBoost method

Uwe Schindler
No. That's how you do it: BooleanQuery with 2 should clauses.

Or use a different query parser that offers this out of box.

Uwe

Am October 21, 2019 7:16:01 PM UTC schrieb [hidden email]:

>Hi,-
>
>Thanks.
>
>  lets apply to this case:
>
>QueryParser parser = new QueryParser("field1", analyzer) ;
>parser.setPhraseSlop(2);
>Query query = parser.parse("some string value here"+"*");
>TopDocs hits = indexsearcherObject.search(query, 10);
>
>Now i want to use BoostQuery
>
>QueryParser parser = new QueryParser("field1", analyzerObject) ;
>parser.setPhraseSlop(2);
>Query query = parser.parse("some string value here"+"*");
>
>BoostQuery bq = new BoostQuery(query, "2.0f");
>
>TopDocs hits = indexsearcherObject.search(bq, 10);
>
>
>Now how will i process field2 with boost value 1.0f?
>
>Before, this was being done at index time.
>
>
>i can see the only way here is the BooleanQuery which combines
>
>the first boostquery object bq and another one that i need to define
>for
>bq2 for field2.
>
>is there any other way?
>
>Best regards
>
>
>
>On 10/21/19 2:33 PM, Uwe Schindler wrote:
>> Hi Boris,
>>
>>> That is ok, and i can see this case would be best with BoostQuery
>and
>>> also i dont have to use lucene expression jar and its dependents.
>>>
>>> However, i am curious how to do this kind of field based boosting at
>>> index time even though i will prefer the query time boosting
>methodology.
>> The reason why it was deprecated is exactly the problem I mentioned
>before: It did never do what the user expected. The boost factor given
>in the document's field was multiplied into the per document norms.
>Unfortunately, at the same time, he query normalization was using query
>statistics and normalized the scores. As Lucene is working per field,
>the same normalization is done per field, resulting in the constant
>factor per field to disappear. There was still some effect of index
>time boosting if different documents had different values, but it your
>case all is the same. I am not sure how your queries worked before, but
>the constant boost factors per field at index time did definitely not
>have the effect you were thinking of. Since the earliest version of
>Lucene, boosting at query time was the way to go to have different
>weights per field.
>>
>> The new feature in Lucene is now that you can change the score per
>document using docvalues and apply that per document at query time.
>Previously this was also possible with Document/Field#setBoost, but the
>flexibility was missing (only multiplying and limited precision). In
>addition the normalization effects made the whole thing not reliable.
>>
>> Uwe
>>
>>> Best regards
>>>
>>>
>>> On 10/21/19 12:54 PM, Uwe Schindler wrote:
>>>> Hi,
>>>>
>>>> As I said, before that is a misuse of index-time boosting. In
>addition in
>>> previous versions it did not even work correctly, because of query
>>> normalization it was normalized away anyways. And on top, to change
>it
>>> your have to reindex.
>>>> What you intend to do is a typical use case for query time boosting
>with
>>> BoostQuery. That is explained in almost every book about search,
>like those
>>> about Solr or Elasticsearch.
>>>> Most query parsers also allow to also add boost factors for fields,
>e.g.
>>> SimpleQueryParser (for humans that need simple syntax without
>fields).
>>> There you give a list of fields and boost factors.
>>>> Uwe
>>>>
>>>> -----
>>>> Uwe Schindler
>>>> Achterdiek 19, D-28357 Bremen
>>>> https://urldefense.proofpoint.com/v2/url?u=https-
>>> 3A__www.thetaphi.de&d=DwIFaQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIr
>>> MUB65eapI_JnE&r=nlG5z5NcNdIbQAiX-
>>> BKNeyLlULCbaezrgocEvPhQkl4&m=r7LRZQV82ywkycV4mBw1baHDKxar0wnm
>>> JtLLTiUC0wI&s=Zj32e0QqmZFvPbBlD8DPeh7KHYfOgQr89wvmaRvy_n8&e=
>>>> eMail: [hidden email]
>>>>
>>>>> -----Original Message-----
>>>>> From: [hidden email] <[hidden email]>
>>>>> Sent: Monday, October 21, 2019 6:45 PM
>>>>> To: [hidden email]
>>>>> Cc: baris.kazar <[hidden email]>
>>>>> Subject: Re: Index-time boosting: Deprecated setBoost method
>>>>>
>>>>> Hi,-
>>>>>
>>>>> Thanks and i appreciate the disccussion.
>>>>>
>>>>> Let me please  ask this way, i think i give too much info at one
>time:
>>>>>
>>>>> Currently i have this:
>>>>>
>>>>> 
>
>Field  f1= new TextField("field1", "string1", Field.Store.YES);
>
>>>>>
>>>>> doc.add(f1); 
>f1.setBoost(2.0f);
>
>
>>>>>
>>>>> Field f2 = new TextField("field2", "string2", Field.Store.YES);
>
>>>>>
>>>>> doc.add(f2);
>
>>>>>
>>>>> f2.setBoost(1.0f);
>
>
>>>>>
>>>>>
>>>>> But this fails with Lucene 7.7.2.
>>>>>
>>>>>
>>>>> Probably it is more efficient and more flexible to fix this by
>using
>>>>> BoostQuery.
>>>>>
>>>>> However, what could be the fix with index time boosting? the code
>in my
>>>>> previous post was trying to do that.
>>>>>
>>>>> Best regards
>>>>>
>>>>>
>>>>> On 10/21/19 12:34 PM, Uwe Schindler wrote:
>>>>>> Hi,
>>>>>>
>>>>>> sorry I don't fully understand what you intend to do? If the
>boost values
>>>>> per field are static and used with exactly same value for every
>document,
>>> it's
>>>>> not needed a index time. You can just boost the field on the query
>side
>>> (e.g.
>>>>> using BoostQuery). Boosting every document with the same static
>values
>>> is
>>>>> an anti-pattern, that's something better suited for the query side
>- as you
>>> are
>>>>> more flexible.
>>>>>> If you need a different boost value per document, you can save
>that
>>> boost
>>>>> value in the index per document using a docvalues field (this
>consumes
>>> extra
>>>>> space, of course). Then you need the ExpressionQuery on the query
>side.
>>> But
>>>>> just because it looks like Javascript, it's not slow. The syntax
>is compiled to
>>>>> bytecode and directly included into the query execution as a
>dynamic java
>>>>> class, so it's very fast.
>>>>>> In short:
>>>>>> - If you need to have a different boost factor per field name
>that's
>>> constant
>>>>> for all documents, apply it at query time with BoostQuery.
>>>>>> - If you have to boost specific documents (e.g., top selling
>products),
>>> index
>>>>> a numeric docvalues field per document. On the query side you can
>use
>>>>> different query types to modify the score of each result based on
>the
>>>>> docvalues field. That can be done with Expression modules (using
>>> compiled
>>>>> Javascript) or by another query in Lucene that operates on
>ValueSource
>>> (e.g.,
>>>>> FunctionQuery). The first one is easier to use for complex
>formulas.4
>>>>>> Uwe
>>>>>>
>>>>>> -----
>>>>>> Uwe Schindler
>>>>>> Achterdiek 19, D-28357 Bremen
>>>>>> https://urldefense.proofpoint.com/v2/url?u=https-
>>> 3A__www.thetaphi.de&d=DwIFaQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIr
>>>>> MUB65eapI_JnE&r=nlG5z5NcNdIbQAiX-
>>>>>
>>> BKNeyLlULCbaezrgocEvPhQkl4&m=70RoM6loHhMGsp95phVzGQf8w5JxW7gX
>>>>> T5XnleMKrOs&s=td7cUfd22mXljSuvkUPXDunkIs_eO4GxdvHHxD2CTk0&e=
>>>>>> eMail: [hidden email]
>>>>>>
>>>>>>> -----Original Message-----
>>>>>>> From: [hidden email] <[hidden email]>
>>>>>>> Sent: Monday, October 21, 2019 5:17 PM
>>>>>>> To: [hidden email]
>>>>>>> Cc: baris.kazar <[hidden email]>
>>>>>>> Subject: Re: Index-time boosting: Deprecated setBoost method
>>>>>>>
>>>>>>> Hi,-
>>>>>>>
>>>>>>> Sorry about the missing parts in previous post. please accept my
>>>>>>> apologies for that.
>>>>>>>
>>>>>>> i needed to add a few more questions/corrections/additions to
>the
>>>>>>> previous post:
>>>>>>>
>>>>>>> Main Question was: if boost is a single constant value, do we
>need the
>>>>>>> Javascript part below?
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> === Indexing code snippet for Lucene version 6.6.0 and before===
>>>>>>>
>>>>>>> Document doc = new Document();
>>>>>>>
>>>>>>>
>>>>>>> 
>
>Field  f1= new TextField("field1", "string1",
>Field.Store.YES);
>
>>>>>>>
>>>>>>> doc.add(f1); 
>f1.setBoost(2.0f);
>
>
>>>>>>>
>>>>>>> Field f2 = new TextField("field2", "string2", Field.Store.YES);
>
>>>>>>>
>>>>>>> doc.add(f2);
>
>>>>>>>
>>>>>>> f2.setBoost(1.0f);
>
>
>>>>>>>
>>>>>>> === end of indexing code snippet for Lucene version 6.6.0 and
>before
>>> ===
>>>>>>>
>>>>>>> This turns into this where _boost1 field is associated with
>field1 and
>>>>>>>
>>>>>>> _boost2 field is associated with field2 field:
>>>>>>>
>>>>>>>
>>>>>>> In Indexing code:
>>>>>>>
>>>>>>> === begining of indexing code snippet ===
>>>>>>> Field  f1= new TextField("field1", "string1", Field.Store.YES);
>
>>>>>>>
>>>>>>> Field _boost1 = new NumericDocValuesField(“field1”, 2L);
>>>>>>> doc.add(_boost1);
>>>>>>>
>>>>>>> // If this boost value needs to be stored, a separate
>storedField
>>>>>>> instance needs to be added as well
>>>>>>> … ( i will post this soon)
>>>>>>>
>>>>>>> Field _boost2 = new NumericDocValuesField(“field2”, 1L);
>>>>>>> doc.add(_boost2);
>>>>>>>
>>>>>>> // If this boost value needs to be stored, a separate
>storedField
>>>>>>> instance needs to be added as well
>>>>>>> … ( i will post this soon)
>>>>>>>
>>>>>>> === end of indexing code snippet ===
>>>>>>>
>>>>>>>
>>>>>>> Now, in the searching code (i.e., at query time) should i need
>the
>>>>>>> FunctionScoreQuery because in this case
>>>>>>>
>>>>>>> the boost is just a constant value but not a function? However,
>constant
>>>>>>> value can be argued to be a function with the same value all the
>time,
>>> too.
>>>>>>>
>>>>>>> == begining of query time code snippet ===
>>>>>>> Expression expr = JavascriptCompiler.compile(“_boost1 +
>_boost2");
>>>>>>>
>>>>>>> 
>
>// SimpleBindings just maps variables to SortField instances
>
>>>>>>>
>>>>>>> SimpleBindings bindings = new SimpleBindings();
>
>>>>>>>
>>>>>>> bindings.add(new SortField("_boost1", SortField.Type.LONG));
>
>
>//
>>>>> These
>>>>>>> have to LONG type i think since NumericDocValuesField accepts
>"long"
>>>>>>> type only, am i right? Can this be DOUBLE type?
>>>>>>>
>>>>>>> bindings.add(new SortField("_boost2", SortField.Type.LONG));
>
>
>//
>>>>> same
>>>>>>> question here
>>>>>>>
>>>>>>> // create a query that matches based on body:contents but
>
>>>>>>>
>>>>>>> // scores using expr
>
>>>>>>>
>>>>>>> Query query = new FunctionScoreQuery(
>
>>>>>>>
>>>>>>>         new TermQuery(new Term("field1", "term_to_look_for")),
>
>>>>>>>
>>>>>>> expr.getDoubleValuesSource(bindings));
>>>>>>>
>>>>>>> 
>searcher.search(query, 10);
>>>>>>>
>>>>>>> === end of code snippet ===
>>>>>>>
>>>>>>>
>>>>>>> Best regards
>>>>>>>
>>>>>>>
>>>>>>> On 10/21/19 11:05 AM, [hidden email] wrote:
>>>>>>>> Hi,-
>>>>>>>>
>>>>>>>>     i would like to ask the following to make it clearer (for
>me at least):
>>>>>>>>
>>>>>>>> Document doc = new Document();
>>>>>>>>
>>>>>>>> 
>
>Field  f1= new TextField("field1", "string1",
>Field.Store.YES);
>
>>>>>>>>
>>>>>>>> doc.add(f1); 
>f1.setBoost(2.0f);
>
>
>>>>>>>>
>>>>>>>> Field f2 = new TextField("field2", "string2",
>Field.Store.YES);
>
>>>>>>>>
>>>>>>>> doc.add(f2);
>
>>>>>>>>
>>>>>>>> f2.setBoost(1.0f);
>
>
>>>>>>>>
>>>>>>>>
>>>>>>>> This turns into this where _boost1 field is associated with
>field1 and
>>>>>>>>
>>>>>>>> _boost2 field is associated with field2 field:
>>>>>>>>
>>>>>>>>
>>>>>>>> In Indexing code:
>>>>>>>>
>>>>>>>> Field  f1= new TextField("field1", "string1",
>Field.Store.YES);
>
>>>>>>>>
>>>>>>>> Field _boost1 = new NumericDocValuesField(“field1”, 2L);
>>>>>>>> doc.add(_boost1);
>>>>>>>>
>>>>>>>> // If this boost value needs to be stored, a separate
>storedField
>>>>>>>> instance needs to be added as well
>>>>>>>> … ( i will post this soon)
>>>>>>>>
>>>>>>>> Field _boost2 = new NumericDocValuesField(“field2”, 1L);
>>>>>>>> doc.add(_boost2);
>>>>>>>>
>>>>>>>> // If this boost value needs to be stored, a separate
>storedField
>>>>>>>> instance needs to be added as well
>>>>>>>> … ( i will post this soon)
>>>>>>>>
>>>>>>>>
>>>>>>>> Now, in the searching code (i.e., at query time) should i need
>the
>>>>>>>> FunctionScoreQuery because in this case
>>>>>>>>
>>>>>>>> the boost is just a constant value but not a function? However,
>>>>>>>> constant value can be argued to be a function with the same
>value all
>>>>>>>> the time, too.
>>>>>>>>
>>>>>>>>
>>>>>>>> Expression expr = JavascriptCompiler.compile(“_boost");
>>>>>>>>
>>>>>>>> 
>
>// SimpleBindings just maps variables to SortField instances
>
>>>>>>>>
>>>>>>>> SimpleBindings bindings = new SimpleBindings();
>
>>>>>>>>
>>>>>>>> bindings.add(new SortField("_boost1", SortField.Type.SCORE));
>
>
>
>>>>>>>>
>>>>>>>> // create a query that matches based on body:contents but
>
>>>>>>>>
>>>>>>>> // scores using expr
>
>>>>>>>>
>>>>>>>> Query query = new FunctionScoreQuery(
>
>>>>>>>>
>>>>>>>>        new TermQuery(new Term("field1", "term_to_look_for")),
>
>>>>>>>>
>>>>>>>> expr.getDoubleValuesSource(bindings));
>>>>>>>>
>>>>>>>> 
>searcher.search(query, 10);
>>>>>>>>
>>>>>>>>
>>>>>>>> So, if boost is a single constant value, do we need the
>Javascript
>>>>>>>> part above?
>>>>>>>>
>>>>>>>> Best regards
>>>>>>>>
>>>>>>>>
>>>>>>>> On 10/18/19 4:07 PM, [hidden email] wrote:
>>>>>>>>> Uwe,-
>>>>>>>>>
>>>>>>>>>     can this
>>>>>>>>> https://urldefense.proofpoint.com/v2/url?u=https-
>>>>>>> 3A__lucene.apache.org_core_7-5F7-
>>>>>>>
>>> 5F2_expressions_org_apache_lucene_expressions_Expression.html&d=DwID
>>> aQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=nlG5z5NcNdI
>>>>>>> bQAiX-
>>>>>>>
>>> BKNeyLlULCbaezrgocEvPhQkl4&m=MR2S9Z9HEge6s665mtGOFRHKGmuiVYkjp
>>>>>>> 4tXOciYl7A&s=tMCjb5H5KivfJsp-BfABonpjelgp6hn9cBg2GScCmic&e=
>>>>>>>>> doc example that You also gave be extended with
>>>>> NumericDocValuesField
>>>>>>>>> part that needs to be done at indexing time boosting, too?
>>>>>>>>>
>>>>>>>>> i see now why You meant that this is mixed type of boosting
>(i.e.,
>>>>>>>>> both indexing time and search time).
>>>>>>>>>
>>>>>>>>> I need then include this query mentioned in this example on
>these
>>>>>>>>> _score field (i would call it _boost field in my case) into my
>>>>>>>>> overall BooleanQuery.
>>>>>>>>>
>>>>>>>>> i will now try to combine these together and post here for
>future
>>> help.
>>>>>>>>> Best regards
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 10/18/19 3:18 PM, Uwe Schindler wrote:
>>>>>>>>>> Hi,
>>>>>>>>>>
>>>>>>>>>> Read my original email! The index time values are written
>using
>>>>>>>>>> NumericDocValuesField. The expressions docs also refer to
>that
>>> when
>>>>>>>>>> the bindings are documented.
>>>>>>>>>>
>>>>>>>>>> It's separate from the indexed data (TextField). Think of it
>like an
>>>>>>>>>> additional numeric field in your database table with a factor
>in
>>>>>>>>>> each row.
>>>>>>>>>>
>>>>>>>>>> Uwe
>>>>>>>>>>
>>>>>>>>>> Am October 18, 2019 7:14:03 PM UTC schrieb
>>> [hidden email]:
>>>>>>>>>>> Uwe,-
>>>>>>>>>>>
>>>>>>>>>>> Two questions there:
>>>>>>>>>>>
>>>>>>>>>>> i guess this is applicable to TextField, too.
>>>>>>>>>>>
>>>>>>>>>>> And i was expecting a index writer object in the example for
>index
>>>>>>>>>>> time
>>>>>>>>>>>
>>>>>>>>>>> boosting.
>>>>>>>>>>>
>>>>>>>>>>> Best regards
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On 10/18/19 2:57 PM, Uwe Schindler wrote:
>>>>>>>>>>>> Sorry I was imprecise. It's a mix of both. The factors are
>stored
>>> per
>>>>>>>>>>> document in index (this is why I called it index time).
>During query
>>>>>>>>>>> time the expression use the index time values to fold them
>into the
>>>>>>>>>>> query boost at query time.
>>>>>>>>>>>> What's your problem with that approach?
>>>>>>>>>>>>
>>>>>>>>>>>> Uwe
>>>>>>>>>>>>
>>>>>>>>>>>> Am October 18, 2019 6:50:40 PM UTC schrieb
>>>>> [hidden email]:
>>>>>>>>>>>>> Uwe,-
>>>>>>>>>>>>>
>>>>>>>>>>>>>       Thanks, if possible i am looking for a pure Java
>methodology
>>>>>>>>>>>>> to do
>>>>>>>>>>> the
>>>>>>>>>>>>> index time boosting.
>>>>>>>>>>>>>
>>>>>>>>>>>>> This example looks like a search time boosting example:
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>> https://urldefense.proofpoint.com/v2/url?u=https-
>>>>>>> 3A__lucene.apache.org_core_7-5F7-
>>>>>>>
>>> 5F2_expressions_org_apache_lucene_expressions_Expression.html&d=DwIF
>>> aQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=nlG5z5NcNdI
>>>>>>> bQAiX-
>>>>>>>
>>> BKNeyLlULCbaezrgocEvPhQkl4&m=6m6i5zZXPZNP6DyVv_xG4vXnVTPEdfKLeLS
>>>>>>> vGjEXbyw&s=B5_kGwRIbAoGqL0-SVR9r3t78E5XUuzLT37TeyV-bv8&e=
>>>>>>>>>>>>> Best regards
>>>>>>>>>>>>>
>>>>>>>>>>>>> On 10/18/19 2:31 PM, Uwe Schindler wrote:
>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Is there a working example for this? Is this mentioned
>in the
>>>>>>>>>>> Lucene
>>>>>>>>>>>>>>> Javadocs or any other docs so that i can look it?
>>>>>>>>>>>>>> To index the docvalues, see NumericDocValuesField (it can
>be
>>>>>>> added
>>>>>>>>>>> to
>>>>>>>>>>>>> documents like indexed or stored fields). You may have
>used
>>> them
>>>>>>> for
>>>>>>>>>>>>> sorting already.
>>>>>>>>>>>>>>> this methodology seems sort of like discouraging using
>index
>>>>> time
>>>>>>>>>>>>> boosting.
>>>>>>>>>>>>>> Not really. Many use this all the time. It's one of the
>killer
>>>>>>>>>>>>> features of both Solr and Elasticsearch. The problem was
>how
>>> the
>>>>>>>>>>>>> Document.setBoost()worked (it did not work correctly, see
>>> below).
>>>>>>>>>>>>>>> Previous setBoost method call was fine and easy to use.
>>>>>>>>>>>>>>> Did it have some performance issues and then is that why
>it
>>> was
>>>>>>>>>>>>> deprecated?
>>>>>>>>>>>>>> No the reason for deprecating this was for several
>reasons:
>>>>>>>>>>> setBoost
>>>>>>>>>>>>> was not doing what the user had expected. Internally the
>boost
>>>>> value
>>>>>>>>>>>>> was just multiplied into the document norm factor (which
>is
>>>>>>>>>>> internally
>>>>>>>>>>>>> also a docvalues field). The norm factors are only very
>inprecise
>>>>>>>>>>>>> floats stored in a byte, so precision is not well. If you
>put some
>>>>>>>>>>>>> values into it and the length norm was already consuming
>all
>>> bits,
>>>>>>>>>>> the
>>>>>>>>>>>>> boosting was very coarse. It was also only multiplied into
>and
>>> most
>>>>>>>>>>>>> users want to do some stuff like record click counts in
>the index
>>>>>>>>>>> and
>>>>>>>>>>>>> then boost for example with the logarithm or some other
>>> function.
>>>>> If
>>>>>>>>>>>>> the boost is just multiplied into the length norm you have
>no
>>>>>>>>>>>>> flexibility at all.
>>>>>>>>>>>>>> In addition you can have several docvalues fields and use
>their
>>>>>>>>>>>>> values in a function (e.g. one field with click count and
>another
>>>>>>>>>>> one
>>>>>>>>>>>>> with product price). After that you can combine click
>count and
>>>>>>>>>>> price
>>>>>>>>>>>>> (which can be modified indipenently during index updates)
>and
>>>>>>> change
>>>>>>>>>>>>> boost to boost lower price and higher click count up.
>>>>>>>>>>>>>> This is what you can do with the expressions module. You
>just
>>>>> give
>>>>>>>>>>> it
>>>>>>>>>>>>> a function.
>>>>>>>>>>>>>> Here is an example, the second example is using a
>>>>>>>>>>> FunctionScoreQuery
>>>>>>>>>>>>> that modifies the score based on the function and the
>given
>>>>>>>>>>> docvalues:
>>>>>>>>>>> https://urldefense.proofpoint.com/v2/url?u=https-
>>>>>>> 3A__lucene.apache.org_core_7-5F7-
>>>>>>>
>>> 5F2_expressions_org_apache_lucene_expressions_Expression.html&d=DwIF
>>> aQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=nlG5z5NcNdI
>>>>>>> bQAiX-
>>>>>>>
>>> BKNeyLlULCbaezrgocEvPhQkl4&m=6m6i5zZXPZNP6DyVv_xG4vXnVTPEdfKLeLS
>>>>>>> vGjEXbyw&s=B5_kGwRIbAoGqL0-SVR9r3t78E5XUuzLT37TeyV-bv8&e=
>>>>>>>>>>>>>>> FunctionScoreQuery usage with MultiFieldQueryParser
>would
>>>>> also
>>>>>>> be
>>>>>>>>>>>>> nice
>>>>>>>>>>>>>>> where
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> MultiFieldQuery already has boosts field to do this in
>its
>>>>>>>>>>>>> constructor.
>>>>>>>>>>>>>> The boots in the query parser are applied for fields
>during
>>> query
>>>>>>>>>>>>> time (to have a different weight per field). Index time
>boosting is
>>>>>>>>>>> per
>>>>>>>>>>>>> document. So you can combine both.
>>>>>>>>>>>>>>> Maybe it is not needed with MultiFieldQueryParser.
>>>>>>>>>>>>>> You use MultiFieldQueryParser to adjust weights of the
>fields
>>> (e.g.
>>>>>>>>>>>>> title versus body). The parsed query is then wrapped with
>an
>>>>>>>>>>> expression
>>>>>>>>>>>>> that modifies the score per document according to the
>>> docvalues.
>>>>>>>>>>>>>> Uwe
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On 10/18/19 1:28 PM, Uwe Schindler wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> that's not true. You can do index time boosting, but
>you
>>> need
>>>>> to
>>>>>>>>>>> do
>>>>>>>>>>>>> that
>>>>>>>>>>>>>>> using a separate field. You just index a numeric
>docvalues
>>> field
>>>>>>>>>>>>> (which may
>>>>>>>>>>>>>>> contain a long or float value per document). Later you
>wrap
>>> your
>>>>>>>>>>>>> query with
>>>>>>>>>>>>>>> some FunctionScoreQuery (e.g., use the Javascript
>function
>>>>> query
>>>>>>>>>>>>> syntax in
>>>>>>>>>>>>>>> the expressions module). This allows you to compile a
>>> javascript
>>>>>>>>>>>>> function
>>>>>>>>>>>>>>> that calculated the final score based on the score
>returned by
>>>>> the
>>>>>>>>>>>>> inner query
>>>>>>>>>>>>>>> and combines them with docvalues that were indexed per
>>>>>>> document.
>>>>>>>>>>>>>>>> Uwe
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> -----
>>>>>>>>>>>>>>>> Uwe Schindler
>>>>>>>>>>>>>>>> Achterdiek 19, D-28357 Bremen
>>>>>>>>>>>>>>>> https://urldefense.proofpoint.com/v2/url?u=https-
>>> 3A__www.thetaphi.de&d=DwIFaQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIr
>>>>>>>>>>>>>>> MUB65eapI_JnE&r=nlG5z5NcNdIbQAiX-
>>>>>>>>>>>>>>>
>>> BKNeyLlULCbaezrgocEvPhQkl4&m=6rVk8db2H8dAcjS3WCWmAPd08C7JQCvZ
>>> 8W80yE9L5xY&s=zgKmnmP9gLG4DlEnAfDdtBMEzPXtHNVYojxXIKEnQgs&e=
>>>>>>>>>>>>>>>> eMail: [hidden email]
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> -----Original Message-----
>>>>>>>>>>>>>>>>> From: [hidden email] <[hidden email]>
>>>>>>>>>>>>>>>>> Sent: Friday, October 18, 2019 5:28 PM
>>>>>>>>>>>>>>>>> To: [hidden email]
>>>>>>>>>>>>>>>>> Cc: [hidden email]
>>>>>>>>>>>>>>>>> Subject: Re: Index-time boosting: Deprecated setBoost
>>>>> method
>>>>>>>>>>>>>>>>> It looks like index-time boosting (field) is not
>possible since
>>>>>>>>>>>>> Lucene
>>>>>>>>>>>>>>>>> version 7.7.2 and
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> i was using before for another case the BoostQuery at
>>> search
>>>>>>>>>>> time
>>>>>>>>>>>>> for
>>>>>>>>>>>>>>>>> boosting and
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> this seems to be the only boosting option now in
>Lucene.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Best regards
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On 10/18/19 10:01 AM, [hidden email] wrote:
>>>>>>>>>>>>>>>>>> Hi,-
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> i saw this in the Field class docs and i am figuring
>out the
>>>>>>>>>>>>> following
>>>>>>>>>>>>>>>>>> note in the docs:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> setBoost(float boost)
>>>>>>>>>>>>>>>>>> Deprecated.
>>>>>>>>>>>>>>>>>> Index-time boosts are deprecated, please index index-
>>> time
>>>>>>>>>>> scoring
>>>>>>>>>>>>>>>>>> factors into a doc value field and combine them with
>the
>>>>> score
>>>>>>>>>>> at
>>>>>>>>>>>>>>>>>> query time using eg. FunctionScoreQuery.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> I appreciate this note. Is there an example about
>this? I
>>> wish
>>>>>>>>>>>>> docs
>>>>>>>>>>>>>>>>>> would give a simple example to further help.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> https://urldefense.proofpoint.com/v2/url?u=https-
>>>>>>>>>>>>>>> 3A__lucene.apache.org_core_6-5F6-
>>>>>>>>>>>>>>>
>>> 5F0__core_org_apache_lucene_document_&d=DwIFaQ&c=RoP1YumCXCga
>>>>>>>>>>>>>>> WHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=nlG5z5NcNdIbQAiX-
>>>>>>>>>>>>>>>
>>> BKNeyLlULCbaezrgocEvPhQkl4&m=6rVk8db2H8dAcjS3WCWmAPd08C7JQCvZ
>>> 8W80yE9L5xY&s=rIVbw3_TGEwpaet5ibCeYze6vSDUiPhwOzlV0z484fM&e=
>>>>>>>>>>>>>>>>> Field.html
>>>>>>>>>>>>>>>>>> vs
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> https://urldefense.proofpoint.com/v2/url?u=https-
>>>>>>>>>>>>>>> 3A__lucene.apache.org_core_7-5F7-
>>>>>>>>>>>>>>>
>>> 5F2_core_org_apache_lucene_document_F&d=DwIFaQ&c=RoP1YumCXCgaW
>>>>>>>>>>>>>>> HvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=nlG5z5NcNdIbQAiX-
>>>>>>>>>>>>>>>
>>> BKNeyLlULCbaezrgocEvPhQkl4&m=6rVk8db2H8dAcjS3WCWmAPd08C7JQCvZ
>>> 8W80yE9L5xY&s=yt1toHHZQBqd3qKpWeSzywGJhy928Q5qaEO4v9Lj3vg&e=
>>>>>>>>>>>>>>>>> ield.html
>>>>>>>>>>>>>>>>>> Best regards
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>
>---------------------------------------------------------------------
>>>>>>>>>>>>>>>>> To unsubscribe, e-mail: java-user-
>>>>>>> [hidden email]
>>>>>>>>>>>>>>>>> For additional commands, e-mail:
>>>>>>>>>>> [hidden email]
>>>>>>>>>>>
>---------------------------------------------------------------------
>>>>>>>>>>>>>>>> To unsubscribe, e-mail: java-user-
>>>>>>> [hidden email]
>>>>>>>>>>>>>>>> For additional commands, e-mail: java-user-
>>>>>>> [hidden email]
>>>>>>>>>>>
>---------------------------------------------------------------------
>>>>>>>>>>>>>>> To unsubscribe, e-mail: java-user-
>>>>> [hidden email]
>>>>>>>>>>>>>>> For additional commands, e-mail: java-user-
>>>>>>> [hidden email]
>>>>>>>>>>>
>---------------------------------------------------------------------
>>>>>>>>>>>>>> To unsubscribe, e-mail: java-user-
>>>>> [hidden email]
>>>>>>>>>>>>>> For additional commands, e-mail: java-user-
>>>>>>> [hidden email]
>>>>>>>>>>>
>---------------------------------------------------------------------
>>>>>>>>>>>>> To unsubscribe, e-mail: java-user-
>>> [hidden email]
>>>>>>>>>>>>> For additional commands, e-mail: java-user-
>>>>> [hidden email]
>>>>>>>>>>>> --
>>>>>>>>>>>> Uwe Schindler
>>>>>>>>>>>> Achterdiek 19, 28357 Bremen
>>>>>>>>>>>>
>>>>>>>>>>> https://urldefense.proofpoint.com/v2/url?u=https-
>>> 3A__www.thetaphi.de&d=DwIFaQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIr
>>>>>>> MUB65eapI_JnE&r=nlG5z5NcNdIbQAiX-
>>>>>>>
>>> BKNeyLlULCbaezrgocEvPhQkl4&m=6ksT9ArMj83Yxf_GrxLNeJ4UFEeKdVdLK0Bl
>>>>>>> OT0d754&s=33f2nq9rOLI5pN9e_RYl_TiEKnP_f4WMZ__vqyz2bzo&e=
>>>>>>>>>>>
>---------------------------------------------------------------------
>>>>>>>>>>> To unsubscribe, e-mail:
>[hidden email]
>>>>>>>>>>> For additional commands, e-mail: java-user-
>>> [hidden email]
>>>>>>>>>> --
>>>>>>>>>> Uwe Schindler
>>>>>>>>>> Achterdiek 19, 28357 Bremen
>>>>>>>>>> https://urldefense.proofpoint.com/v2/url?u=https-
>>> 3A__www.thetaphi.de&d=DwIFaQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIr
>>>>>>> MUB65eapI_JnE&r=nlG5z5NcNdIbQAiX-
>>>>>>>
>>> BKNeyLlULCbaezrgocEvPhQkl4&m=owjI40OeLzt8gvPN44aTdndoiUel5E9Hqx1T
>>>>>>> EcoWk_Y&s=xbZedNkQXb5eQcw_K7lCOP7b5ToKJVZ1dCPY3hi836c&e=
>>>>>>>>
>---------------------------------------------------------------------
>>>>>>>> To unsubscribe, e-mail: [hidden email]
>>>>>>>> For additional commands, e-mail:
>[hidden email]
>>>>>>>>
>>>>>>>
>---------------------------------------------------------------------
>>>>>>> To unsubscribe, e-mail: [hidden email]
>>>>>>> For additional commands, e-mail:
>[hidden email]
>>>>>>
>---------------------------------------------------------------------
>>>>>> To unsubscribe, e-mail: [hidden email]
>>>>>> For additional commands, e-mail: [hidden email]
>>>>>>
>>>>>
>---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: [hidden email]
>>>>> For additional commands, e-mail: [hidden email]
>>>>
>---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: [hidden email]
>>>> For additional commands, e-mail: [hidden email]
>>>>
>>>
>---------------------------------------------------------------------
>>> To unsubscribe, e-mail: [hidden email]
>>> For additional commands, e-mail: [hidden email]
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [hidden email]
>> For additional commands, e-mail: [hidden email]
>>
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: [hidden email]
>For additional commands, e-mail: [hidden email]

--
Uwe Schindler
Achterdiek 19, 28357 Bremen
https://www.thetaphi.de