TermInSetQuery keep terms order in results

classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

TermInSetQuery keep terms order in results

Nicola Buso
Hi,

I need to use the TermInSetQuery, but I would like to keep the sorting
of the results based on the term set order provided. Currently seems
using a index documents insertion order in the results.

Is this already implemented somewhere or do I need to implement a
CustomScoreQuery to calculate this score?

Cheers,


Nicola


--
Nicola Buso <[hidden email]>
EMBL-EBI

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: TermInSetQuery keep terms order in results

Nicola Buso
Probably is more a sort problem than scoring the single document and
the order of the input terms is calculated at runtime, in case someone
is thinking about adding a sorting field at indexing time.

Nicola

On Mon, 2018-06-25 at 12:23 +0100, Nicola Buso wrote:

> Hi,
>
> I need to use the TermInSetQuery, but I would like to keep the
> sorting
> of the results based on the term set order provided. Currently seems
> using a index documents insertion order in the results.
>
> Is this already implemented somewhere or do I need to implement a
> CustomScoreQuery to calculate this score?
>
> Cheers,
>
>
> Nicola
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

RE: TermInSetQuery keep terms order in results

Uwe Schindler
In reply to this post by Nicola Buso
Hi,

the TermInSetQuery is a so-called Constant Score Query. It is more meant as a filter, so you would need some "real" fulltext query in parallel. See the term-in-set query more like the SQL "IN" operator. It can be used to pass lots of identifiers to filter results (e.g. when you apply access rights or group policies for filtering users to your main query as a filter).

As it is a "set", which is by default unordered, the order of terms in the set is undefined. Internally TermInSetQuery reorders the terms to improve processing speed.

If you need scoring, use TermQuery wrapped by a BooleanQuery. Then you can apply some boosts to some terms to improve order (e.g. boost term queries coming first) and apply on a field without norms.

TermInSetQuery is fast because it neglects scoring and is just good at intersecting the terms dict with the given terms set.

Uwe

-----
Uwe Schindler
Achterdiek 19, D-28357 Bremen
http://www.thetaphi.de
eMail: [hidden email]

> -----Original Message-----
> From: Nicola Buso <[hidden email]>
> Sent: Monday, June 25, 2018 1:23 PM
> To: [hidden email]
> Subject: TermInSetQuery keep terms order in results
>
> Hi,
>
> I need to use the TermInSetQuery, but I would like to keep the sorting
> of the results based on the term set order provided. Currently seems
> using a index documents insertion order in the results.
>
> Is this already implemented somewhere or do I need to implement a
> CustomScoreQuery to calculate this score?
>
> Cheers,
>
>
> Nicola
>
>
> --
> Nicola Buso <[hidden email]>
> EMBL-EBI
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: TermInSetQuery keep terms order in results

Nicola Buso
Hi Uwe,

thanks for the reply. TermInSetQuery cover most of my use case:
- thousands of term values (also 100,000)
- no need for scoring, because it's calculated elsewhere
- intersect with normal full text query for further filtering

Using a TermQuery do I risk to hit the BooleanQuery.getMaxClauseCount()
limit?

Cheers,


Nicola



On Mon, 2018-06-25 at 16:52 +0200, Uwe Schindler wrote:

> Hi,
>
> the TermInSetQuery is a so-called Constant Score Query. It is more
> meant as a filter, so you would need some "real" fulltext query in
> parallel. See the term-in-set query more like the SQL "IN" operator.
> It can be used to pass lots of identifiers to filter results (e.g.
> when you apply access rights or group policies for filtering users to
> your main query as a filter).
>
> As it is a "set", which is by default unordered, the order of terms
> in the set is undefined. Internally TermInSetQuery reorders the terms
> to improve processing speed.
>
> If you need scoring, use TermQuery wrapped by a BooleanQuery. Then
> you can apply some boosts to some terms to improve order (e.g. boost
> term queries coming first) and apply on a field without norms.
>
> TermInSetQuery is fast because it neglects scoring and is just good
> at intersecting the terms dict with the given terms set.
>
> Uwe
>
> -----
> Uwe Schindler
> Achterdiek 19, D-28357 Bremen
> http://www.thetaphi.de
> eMail: [hidden email]
>
> > -----Original Message-----
> > From: Nicola Buso <[hidden email]>
> > Sent: Monday, June 25, 2018 1:23 PM
> > To: [hidden email]
> > Subject: TermInSetQuery keep terms order in results
> >
> > Hi,
> >
> > I need to use the TermInSetQuery, but I would like to keep the
> > sorting
> > of the results based on the term set order provided. Currently
> > seems
> > using a index documents insertion order in the results.
> >
> > Is this already implemented somewhere or do I need to implement a
> > CustomScoreQuery to calculate this score?
> >
> > Cheers,
> >
> >
> > Nicola
> >
> >
> > --
> > Nicola Buso <[hidden email]>
> > EMBL-EBI
> >
> > -----------------------------------------------------------------
> > ----
> > To unsubscribe, e-mail: [hidden email]
> > For additional commands, e-mail: [hidden email]
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

RE: TermInSetQuery keep terms order in results

Uwe Schindler
Hi Nicola,

if you sort it elsewhere, why do you care about sort order then? What you see as result is simple: As there is nothing available for scoring a constant score query returns the results in index order. That's wanted. There is no way to change this "default" order for a TermInSetQuery because it's missing information.

Uwe

-----
Uwe Schindler
Achterdiek 19, D-28357 Bremen
http://www.thetaphi.de
eMail: [hidden email]

> -----Original Message-----
> From: Nicola Buso <[hidden email]>
> Sent: Monday, June 25, 2018 5:09 PM
> To: Uwe Schindler <[hidden email]>; [hidden email]
> Subject: Re: TermInSetQuery keep terms order in results
>
> Hi Uwe,
>
> thanks for the reply. TermInSetQuery cover most of my use case:
> - thousands of term values (also 100,000)
> - no need for scoring, because it's calculated elsewhere
> - intersect with normal full text query for further filtering
>
> Using a TermQuery do I risk to hit the BooleanQuery.getMaxClauseCount()
> limit?
>
> Cheers,
>
>
> Nicola
>
>
>
> On Mon, 2018-06-25 at 16:52 +0200, Uwe Schindler wrote:
> > Hi,
> >
> > the TermInSetQuery is a so-called Constant Score Query. It is more
> > meant as a filter, so you would need some "real" fulltext query in
> > parallel. See the term-in-set query more like the SQL "IN" operator.
> > It can be used to pass lots of identifiers to filter results (e.g.
> > when you apply access rights or group policies for filtering users to
> > your main query as a filter).
> >
> > As it is a "set", which is by default unordered, the order of terms
> > in the set is undefined. Internally TermInSetQuery reorders the terms
> > to improve processing speed.
> >
> > If you need scoring, use TermQuery wrapped by a BooleanQuery. Then
> > you can apply some boosts to some terms to improve order (e.g. boost
> > term queries coming first) and apply on a field without norms.
> >
> > TermInSetQuery is fast because it neglects scoring and is just good
> > at intersecting the terms dict with the given terms set.
> >
> > Uwe
> >
> > -----
> > Uwe Schindler
> > Achterdiek 19, D-28357 Bremen
> > http://www.thetaphi.de
> > eMail: [hidden email]
> >
> > > -----Original Message-----
> > > From: Nicola Buso <[hidden email]>
> > > Sent: Monday, June 25, 2018 1:23 PM
> > > To: [hidden email]
> > > Subject: TermInSetQuery keep terms order in results
> > >
> > > Hi,
> > >
> > > I need to use the TermInSetQuery, but I would like to keep the
> > > sorting
> > > of the results based on the term set order provided. Currently
> > > seems
> > > using a index documents insertion order in the results.
> > >
> > > Is this already implemented somewhere or do I need to implement a
> > > CustomScoreQuery to calculate this score?
> > >
> > > Cheers,
> > >
> > >
> > > Nicola
> > >
> > >
> > > --
> > > Nicola Buso <[hidden email]>
> > > EMBL-EBI
> > >
> > > -----------------------------------------------------------------
> > > ----
> > > To unsubscribe, e-mail: [hidden email]
> > > For additional commands, e-mail: [hidden email]
> >
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: TermInSetQuery keep terms order in results

Nicola Buso
Hi Uwe,

as said the sorting is calculated elsewhere upfront and the terms are
provided to Lucene in the order calculated (in any case in an not
ordered Set as by the query API).

I would like an API to keep the input order otherwise I will end up on
the usual problem that I can't re-order afterward because accessing the
results in a paginated way will make impossible this operation.


Nicola

On Mon, 2018-06-25 at 21:49 +0200, Uwe Schindler wrote:

> Hi Nicola,
>
> if you sort it elsewhere, why do you care about sort order then? What
> you see as result is simple: As there is nothing available for
> scoring a constant score query returns the results in index order.
> That's wanted. There is no way to change this "default" order for a
> TermInSetQuery because it's missing information.
>
> Uwe
>
> -----
> Uwe Schindler
> Achterdiek 19, D-28357 Bremen
> http://www.thetaphi.de
> eMail: [hidden email]
>
> > -----Original Message-----
> > From: Nicola Buso <[hidden email]>
> > Sent: Monday, June 25, 2018 5:09 PM
> > To: Uwe Schindler <[hidden email]>; [hidden email]
> > Subject: Re: TermInSetQuery keep terms order in results
> >
> > Hi Uwe,
> >
> > thanks for the reply. TermInSetQuery cover most of my use case:
> > - thousands of term values (also 100,000)
> > - no need for scoring, because it's calculated elsewhere
> > - intersect with normal full text query for further filtering
> >
> > Using a TermQuery do I risk to hit the
> > BooleanQuery.getMaxClauseCount()
> > limit?
> >
> > Cheers,
> >
> >
> > Nicola
> >
> >
> >
> > On Mon, 2018-06-25 at 16:52 +0200, Uwe Schindler wrote:
> > > Hi,
> > >
> > > the TermInSetQuery is a so-called Constant Score Query. It is
> > > more
> > > meant as a filter, so you would need some "real" fulltext query
> > > in
> > > parallel. See the term-in-set query more like the SQL "IN"
> > > operator.
> > > It can be used to pass lots of identifiers to filter results
> > > (e.g.
> > > when you apply access rights or group policies for filtering
> > > users to
> > > your main query as a filter).
> > >
> > > As it is a "set", which is by default unordered, the order of
> > > terms
> > > in the set is undefined. Internally TermInSetQuery reorders the
> > > terms
> > > to improve processing speed.
> > >
> > > If you need scoring, use TermQuery wrapped by a BooleanQuery.
> > > Then
> > > you can apply some boosts to some terms to improve order (e.g.
> > > boost
> > > term queries coming first) and apply on a field without norms.
> > >
> > > TermInSetQuery is fast because it neglects scoring and is just
> > > good
> > > at intersecting the terms dict with the given terms set.
> > >
> > > Uwe
> > >
> > > -----
> > > Uwe Schindler
> > > Achterdiek 19, D-28357 Bremen
> > > http://www.thetaphi.de
> > > eMail: [hidden email]
> > >
> > > > -----Original Message-----
> > > > From: Nicola Buso <[hidden email]>
> > > > Sent: Monday, June 25, 2018 1:23 PM
> > > > To: [hidden email]
> > > > Subject: TermInSetQuery keep terms order in results
> > > >
> > > > Hi,
> > > >
> > > > I need to use the TermInSetQuery, but I would like to keep the
> > > > sorting
> > > > of the results based on the term set order provided. Currently
> > > > seems
> > > > using a index documents insertion order in the results.
> > > >
> > > > Is this already implemented somewhere or do I need to implement
> > > > a
> > > > CustomScoreQuery to calculate this score?
> > > >
> > > > Cheers,
> > > >
> > > >
> > > > Nicola
> > > >
> > > >
> > > > --
> > > > Nicola Buso <[hidden email]>
> > > > EMBL-EBI
> > > >
> > > > -------------------------------------------------------------
> > > > ----
> > > > ----
> > > > To unsubscribe, e-mail: [hidden email]
> > > > For additional commands, e-mail: [hidden email]
> > > > rg
> > >
> > >
> >
> > -----------------------------------------------------------------
> > ----
> > To unsubscribe, e-mail: [hidden email]
> > For additional commands, e-mail: [hidden email]
>
>
--
Nicola Buso <[hidden email]>
EMBL-EBI

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: TermInSetQuery keep terms order in results

Michael Sokolov-4
Since you have the terms ordered, why not index their ordinals, and then
sort by that?

On Mon, Jul 2, 2018, 6:16 AM Nicola Buso <[hidden email]> wrote:

> Hi Uwe,
>
> as said the sorting is calculated elsewhere upfront and the terms are
> provided to Lucene in the order calculated (in any case in an not
> ordered Set as by the query API).
>
> I would like an API to keep the input order otherwise I will end up on
> the usual problem that I can't re-order afterward because accessing the
> results in a paginated way will make impossible this operation.
>
>
> Nicola
>
> On Mon, 2018-06-25 at 21:49 +0200, Uwe Schindler wrote:
> > Hi Nicola,
> >
> > if you sort it elsewhere, why do you care about sort order then? What
> > you see as result is simple: As there is nothing available for
> > scoring a constant score query returns the results in index order.
> > That's wanted. There is no way to change this "default" order for a
> > TermInSetQuery because it's missing information.
> >
> > Uwe
> >
> > -----
> > Uwe Schindler
> > Achterdiek 19, D-28357 Bremen
> > http://www.thetaphi.de
> > eMail: [hidden email]
> >
> > > -----Original Message-----
> > > From: Nicola Buso <[hidden email]>
> > > Sent: Monday, June 25, 2018 5:09 PM
> > > To: Uwe Schindler <[hidden email]>; [hidden email]
> > > Subject: Re: TermInSetQuery keep terms order in results
> > >
> > > Hi Uwe,
> > >
> > > thanks for the reply. TermInSetQuery cover most of my use case:
> > > - thousands of term values (also 100,000)
> > > - no need for scoring, because it's calculated elsewhere
> > > - intersect with normal full text query for further filtering
> > >
> > > Using a TermQuery do I risk to hit the
> > > BooleanQuery.getMaxClauseCount()
> > > limit?
> > >
> > > Cheers,
> > >
> > >
> > > Nicola
> > >
> > >
> > >
> > > On Mon, 2018-06-25 at 16:52 +0200, Uwe Schindler wrote:
> > > > Hi,
> > > >
> > > > the TermInSetQuery is a so-called Constant Score Query. It is
> > > > more
> > > > meant as a filter, so you would need some "real" fulltext query
> > > > in
> > > > parallel. See the term-in-set query more like the SQL "IN"
> > > > operator.
> > > > It can be used to pass lots of identifiers to filter results
> > > > (e.g.
> > > > when you apply access rights or group policies for filtering
> > > > users to
> > > > your main query as a filter).
> > > >
> > > > As it is a "set", which is by default unordered, the order of
> > > > terms
> > > > in the set is undefined. Internally TermInSetQuery reorders the
> > > > terms
> > > > to improve processing speed.
> > > >
> > > > If you need scoring, use TermQuery wrapped by a BooleanQuery.
> > > > Then
> > > > you can apply some boosts to some terms to improve order (e.g.
> > > > boost
> > > > term queries coming first) and apply on a field without norms.
> > > >
> > > > TermInSetQuery is fast because it neglects scoring and is just
> > > > good
> > > > at intersecting the terms dict with the given terms set.
> > > >
> > > > Uwe
> > > >
> > > > -----
> > > > Uwe Schindler
> > > > Achterdiek 19, D-28357 Bremen
> > > > http://www.thetaphi.de
> > > > eMail: [hidden email]
> > > >
> > > > > -----Original Message-----
> > > > > From: Nicola Buso <[hidden email]>
> > > > > Sent: Monday, June 25, 2018 1:23 PM
> > > > > To: [hidden email]
> > > > > Subject: TermInSetQuery keep terms order in results
> > > > >
> > > > > Hi,
> > > > >
> > > > > I need to use the TermInSetQuery, but I would like to keep the
> > > > > sorting
> > > > > of the results based on the term set order provided. Currently
> > > > > seems
> > > > > using a index documents insertion order in the results.
> > > > >
> > > > > Is this already implemented somewhere or do I need to implement
> > > > > a
> > > > > CustomScoreQuery to calculate this score?
> > > > >
> > > > > Cheers,
> > > > >
> > > > >
> > > > > Nicola
> > > > >
> > > > >
> > > > > --
> > > > > Nicola Buso <[hidden email]>
> > > > > EMBL-EBI
> > > > >
> > > > > -------------------------------------------------------------
> > > > > ----
> > > > > ----
> > > > > To unsubscribe, e-mail: [hidden email]
> > > > > For additional commands, e-mail: [hidden email]
> > > > > rg
> > > >
> > > >
> > >
> > > -----------------------------------------------------------------
> > > ----
> > > To unsubscribe, e-mail: [hidden email]
> > > For additional commands, e-mail: [hidden email]
> >
> >
> --
> Nicola Buso <[hidden email]>
> EMBL-EBI
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>
Reply | Threaded
Open this post in threaded view
|

Re: TermInSetQuery keep terms order in results

Nicola Buso
Hi Michael,

I have an index that contains the terms of the TermInSetQuery but the
score provided at query time, represented by the order in a List of
terms, is not known at indexing time; it depend from other calculations
done at runtime. What do you mean to index the ordinals?

I was wondering if I can wrap TermQuery in BoostQuery, where I boost
based on the ordinals I have and create a disjunction query of all the
terms; I was wondering how much slower than TermInSetQuery it can be.


Nicola



On Mon, 2018-07-02 at 06:41 -0400, Michael Sokolov wrote:

> Since you have the terms ordered, why not index their ordinals, and
> then sort by that?
>
> On Mon, Jul 2, 2018, 6:16 AM Nicola Buso <[hidden email]> wrote:
> > Hi Uwe,
> >
> > as said the sorting is calculated elsewhere upfront and the terms
> > are
> > provided to Lucene in the order calculated (in any case in an not
> > ordered Set as by the query API).
> >
> > I would like an API to keep the input order otherwise I will end up
> > on
> > the usual problem that I can't re-order afterward because accessing
> > the
> > results in a paginated way will make impossible this operation.
> >
> >
> > Nicola
> >
> > On Mon, 2018-06-25 at 21:49 +0200, Uwe Schindler wrote:
> > > Hi Nicola,
> > >
> > > if you sort it elsewhere, why do you care about sort order then?
> > What
> > > you see as result is simple: As there is nothing available for
> > > scoring a constant score query returns the results in index
> > order.
> > > That's wanted. There is no way to change this "default" order for
> > a
> > > TermInSetQuery because it's missing information.
> > >
> > > Uwe
> > >
> > > -----
> > > Uwe Schindler
> > > Achterdiek 19, D-28357 Bremen
> > > http://www.thetaphi.de
> > > eMail: [hidden email]
> > >
> > > > -----Original Message-----
> > > > From: Nicola Buso <[hidden email]>
> > > > Sent: Monday, June 25, 2018 5:09 PM
> > > > To: Uwe Schindler <[hidden email]>; [hidden email]
> > g
> > > > Subject: Re: TermInSetQuery keep terms order in results
> > > >
> > > > Hi Uwe,
> > > >
> > > > thanks for the reply. TermInSetQuery cover most of my use case:
> > > > - thousands of term values (also 100,000)
> > > > - no need for scoring, because it's calculated elsewhere
> > > > - intersect with normal full text query for further filtering
> > > >
> > > > Using a TermQuery do I risk to hit the
> > > > BooleanQuery.getMaxClauseCount()
> > > > limit?
> > > >
> > > > Cheers,
> > > >
> > > >
> > > > Nicola
> > > >
> > > >
> > > >
> > > > On Mon, 2018-06-25 at 16:52 +0200, Uwe Schindler wrote:
> > > > > Hi,
> > > > >
> > > > > the TermInSetQuery is a so-called Constant Score Query. It is
> > > > > more
> > > > > meant as a filter, so you would need some "real" fulltext
> > query
> > > > > in
> > > > > parallel. See the term-in-set query more like the SQL "IN"
> > > > > operator.
> > > > > It can be used to pass lots of identifiers to filter results
> > > > > (e.g.
> > > > > when you apply access rights or group policies for filtering
> > > > > users to
> > > > > your main query as a filter).
> > > > >
> > > > > As it is a "set", which is by default unordered, the order of
> > > > > terms
> > > > > in the set is undefined. Internally TermInSetQuery reorders
> > the
> > > > > terms
> > > > > to improve processing speed.
> > > > >
> > > > > If you need scoring, use TermQuery wrapped by a BooleanQuery.
> > > > > Then
> > > > > you can apply some boosts to some terms to improve order
> > (e.g.
> > > > > boost
> > > > > term queries coming first) and apply on a field without
> > norms.
> > > > >
> > > > > TermInSetQuery is fast because it neglects scoring and is
> > just
> > > > > good
> > > > > at intersecting the terms dict with the given terms set.
> > > > >
> > > > > Uwe
> > > > >
> > > > > -----
> > > > > Uwe Schindler
> > > > > Achterdiek 19, D-28357 Bremen
> > > > > http://www.thetaphi.de
> > > > > eMail: [hidden email]
> > > > >
> > > > > > -----Original Message-----
> > > > > > From: Nicola Buso <[hidden email]>
> > > > > > Sent: Monday, June 25, 2018 1:23 PM
> > > > > > To: [hidden email]
> > > > > > Subject: TermInSetQuery keep terms order in results
> > > > > >
> > > > > > Hi,
> > > > > >
> > > > > > I need to use the TermInSetQuery, but I would like to keep
> > the
> > > > > > sorting
> > > > > > of the results based on the term set order provided.
> > Currently
> > > > > > seems
> > > > > > using a index documents insertion order in the results.
> > > > > >
> > > > > > Is this already implemented somewhere or do I need to
> > implement
> > > > > > a
> > > > > > CustomScoreQuery to calculate this score?
> > > > > >
> > > > > > Cheers,
> > > > > >
> > > > > >
> > > > > > Nicola
> > > > > >
> > > > > >
> > > > > > --
> > > > > > Nicola Buso <[hidden email]>
> > > > > > EMBL-EBI
> > > > > >
> > > > > > ---------------------------------------------------------
> > ----
> > > > > > ----
> > > > > > ----
> > > > > > To unsubscribe, e-mail: [hidden email]
> > .org
> > > > > > For additional commands, e-mail: [hidden email]
> > he.o
> > > > > > rg
> > > > >
> > > > >
> > > >
> > > > -------------------------------------------------------------
> > ----
> > > > ----
> > > > To unsubscribe, e-mail: [hidden email]
> > > > For additional commands, e-mail: [hidden email]
> > rg
> > >
> > >
--
Nicola Buso <[hidden email]>
EMBL-EBI

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]