COST vs SCORE vs WEIGHT

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

COST vs SCORE vs WEIGHT

Vadim Gindin
Hi

1) What is the principal difference between COST vs SCORE vs WEIGHT

2) Assume we have BooleanQuery with 5 TermQuery subqueries that are
included via SHOULD condition. Assume we have 5 fields and one subquery is
need to search in one field. Some product of MultiFieldQueryParser. In this
case the score of BooleanQuery is the sum of scores of each subquery. I
expected that not all subqueries will be included but only those who
founded something, but in fact there is a sum of all subqueries. Why? How
to implement need logic: sum of those subqueries that found something? How
to check that?

Regards,
Vadim Gindin
Reply | Threaded
Open this post in threaded view
|

Re: COST vs SCORE vs WEIGHT

Adrien Grand
Hi Vadim,

A Weight is the specialization of a query for a given index reader. It has
access to index statistics that will help compute scores for instance.

A Scorer is the specialization of a weight for a given segment. It can
iterate over matches and compute scores.

The cost of a scorer is the expected number of matching documents for this
scorer. It is useful in order to run operations in the optimal order

Your observation of the behaviour of your BooleanQuery with SHOULD clauses
looks wrong: the score of the boolean query is the sum of the scores of the
matching sub queries.

Le jeu. 30 nov. 2017 à 16:39, Vadim Gindin <[hidden email]> a écrit :

> Hi
>
> 1) What is the principal difference between COST vs SCORE vs WEIGHT
>
> 2) Assume we have BooleanQuery with 5 TermQuery subqueries that are
> included via SHOULD condition. Assume we have 5 fields and one subquery is
> need to search in one field. Some product of MultiFieldQueryParser. In this
> case the score of BooleanQuery is the sum of scores of each subquery. I
> expected that not all subqueries will be included but only those who
> founded something, but in fact there is a sum of all subqueries. Why? How
> to implement need logic: sum of those subqueries that found something? How
> to check that?
>
> Regards,
> Vadim Gindin
>
Reply | Threaded
Open this post in threaded view
|

Re: COST vs SCORE vs WEIGHT

Vadim Gindin
Thanks Adrien!

1) Here is my code snippet:

Query params_vendor = new ConstTermQuery(new Term("params_vendor",
queryStr), 5f);
Query params_model = new ConstTermQuery(new Term("params_model", queryStr), 5f);
Query params_value = new ConstTermQuery(new Term("params_value", queryStr), 3f);
Query param_name = new ConstTermQuery(new Term("params_name", queryStr), 4f);

BooleanQuery bq = expected
        .add(params_vendor, BooleanClause.Occur.SHOULD)
        .add(params_model, BooleanClause.Occur.SHOULD)
        .add(params_value, BooleanClause.Occur.SHOULD)
        .add(param_name, BooleanClause.Occur.SHOULD)
        .setMinimumNumberShouldMatch(1)
        .build()


ConstTermQuery here is my custom Query that creates own WEIGHT and then
SCORE. Created score returns just specified score in constructor (4 for
"params_name"). Testing index does not contain fields "param_name" and
"param_value". But returned Doc.score is 17 for all records. Why?

2) Scorer can iterate over matches. Isn't it?

I used iterator in scorer constructor as follows:

this.iterator = DocIdSetIterator.all(context.reader().maxDoc());

And then

public DocIdSetIterator iterator() {
    return iterator;
}

Is that a correct implementation? Are there other ways to implement it?

Thanks a lot for your response

Regards,
Vadim Gindin

On Thu, Nov 30, 2017 at 8:56 PM, Adrien Grand <[hidden email]> wrote:

> Hi Vadim,
>
> A Weight is the specialization of a query for a given index reader. It has
> access to index statistics that will help compute scores for instance.
>
> A Scorer is the specialization of a weight for a given segment. It can
> iterate over matches and compute scores.
>
> The cost of a scorer is the expected number of matching documents for this
> scorer. It is useful in order to run operations in the optimal order
>
> Your observation of the behaviour of your BooleanQuery with SHOULD clauses
> looks wrong: the score of the boolean query is the sum of the scores of the
> matching sub queries.
>
> Le jeu. 30 nov. 2017 à 16:39, Vadim Gindin <[hidden email]> a écrit
> :
>
> > Hi
> >
> > 1) What is the principal difference between COST vs SCORE vs WEIGHT
> >
> > 2) Assume we have BooleanQuery with 5 TermQuery subqueries that are
> > included via SHOULD condition. Assume we have 5 fields and one subquery
> is
> > need to search in one field. Some product of MultiFieldQueryParser. In
> this
> > case the score of BooleanQuery is the sum of scores of each subquery. I
> > expected that not all subqueries will be included but only those who
> > founded something, but in fact there is a sum of all subqueries. Why? How
> > to implement need logic: sum of those subqueries that found something?
> How
> > to check that?
> >
> > Regards,
> > Vadim Gindin
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: COST vs SCORE vs WEIGHT

Adrien Grand
Your implementation is going to match all documents since the iterator
batches all documents. I believe you could do what you want with bultin
queries by doing

Query params_vendor = new ConstantScoreQuery(new TermQuery(new
Term("params_vendor", queryStr)), 5f);

and similarly for other queries.

Le jeu. 30 nov. 2017 à 17:58, Vadim Gindin <[hidden email]> a écrit :

> Thanks Adrien!
>
> 1) Here is my code snippet:
>
> Query params_vendor = new ConstTermQuery(new Term("params_vendor",
> queryStr), 5f);
> Query params_model = new ConstTermQuery(new Term("params_model",
> queryStr), 5f);
> Query params_value = new ConstTermQuery(new Term("params_value",
> queryStr), 3f);
> Query param_name = new ConstTermQuery(new Term("params_name", queryStr),
> 4f);
>
> BooleanQuery bq = expected
>         .add(params_vendor, BooleanClause.Occur.SHOULD)
>         .add(params_model, BooleanClause.Occur.SHOULD)
>         .add(params_value, BooleanClause.Occur.SHOULD)
>         .add(param_name, BooleanClause.Occur.SHOULD)
>         .setMinimumNumberShouldMatch(1)
>         .build()
>
>
> ConstTermQuery here is my custom Query that creates own WEIGHT and then
> SCORE. Created score returns just specified score in constructor (4 for
> "params_name"). Testing index does not contain fields "param_name" and
> "param_value". But returned Doc.score is 17 for all records. Why?
>
> 2) Scorer can iterate over matches. Isn't it?
>
> I used iterator in scorer constructor as follows:
>
> this.iterator = DocIdSetIterator.all(context.reader().maxDoc());
>
> And then
>
> public DocIdSetIterator iterator() {
>     return iterator;
> }
>
> Is that a correct implementation? Are there other ways to implement it?
>
> Thanks a lot for your response
>
> Regards,
> Vadim Gindin
>
> On Thu, Nov 30, 2017 at 8:56 PM, Adrien Grand <[hidden email]> wrote:
>
> > Hi Vadim,
> >
> > A Weight is the specialization of a query for a given index reader. It
> has
> > access to index statistics that will help compute scores for instance.
> >
> > A Scorer is the specialization of a weight for a given segment. It can
> > iterate over matches and compute scores.
> >
> > The cost of a scorer is the expected number of matching documents for
> this
> > scorer. It is useful in order to run operations in the optimal order
> >
> > Your observation of the behaviour of your BooleanQuery with SHOULD
> clauses
> > looks wrong: the score of the boolean query is the sum of the scores of
> the
> > matching sub queries.
> >
> > Le jeu. 30 nov. 2017 à 16:39, Vadim Gindin <[hidden email]> a
> écrit
> > :
> >
> > > Hi
> > >
> > > 1) What is the principal difference between COST vs SCORE vs WEIGHT
> > >
> > > 2) Assume we have BooleanQuery with 5 TermQuery subqueries that are
> > > included via SHOULD condition. Assume we have 5 fields and one subquery
> > is
> > > need to search in one field. Some product of MultiFieldQueryParser. In
> > this
> > > case the score of BooleanQuery is the sum of scores of each subquery. I
> > > expected that not all subqueries will be included but only those who
> > > founded something, but in fact there is a sum of all subqueries. Why?
> How
> > > to implement need logic: sum of those subqueries that found something?
> > How
> > > to check that?
> > >
> > > Regards,
> > > Vadim Gindin
> > >
> >
>