Index-time join ToParentBlockJoinQuery query produces incorrect result

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

Index-time join ToParentBlockJoinQuery query produces incorrect result

ANDREI SOLODIN
Hello, I am trying to understand the requirements for properly using the index-time join. In my use case, I am trying to model a 1-N relationship where parent document could have 0-N child documents. For now I am keeping my data very simple where each child has a single field. So my data right now look like this:


Parent Doc         Children

--------------------------------------
id=id00000
                              none
id=id00001
                              program=P1

id=id00002
                              program=P1
                              program=P2

id=id00003
                              none
id=id00004
                              program=P1

id=id00005
                              program=P1
                              program=P2


So essentially I have 6 parent docs, doc 0 has no children, doc 1 has 1 child, doc 2 has 2 children, etc.


Certain queries are giving me incorrect result. For example:


BitSetProducer parentSet = new QueryBitSetProducer(new TermQuery(new Term("id", "id00003")));
Query q = new ToParentBlockJoinQuery(new WildcardQuery(new Term("program", "*")), parentSet,  ScoreMode.None);


This returns "id00003", which is unexpected.


I opened a bug (https://issues.apache.org/jira/browse/LUCENE-8902) in my haste earlier (sorry) and it was mentioned in there that "chid free is not supported". So I take it to mean that each parent should have at least one child. So let's say I add a "default" child to each parent:


Parent Doc         Children

--------------------------------------
id=id00000
                              field1=val1
id=id00001

                              field1=val1
                              program=P1

id=id00002
                              field1=val1

                              program=P1
                              program=P2

id=id00003
                              field1=val1

id=id00004
                              field1=val1

                              program=P1

id=id00005
                              field1=val1

                              program=P1
                              program=P2


So now every parent has at least one child. That made no difference, still get the same result. What am I doing wrong here?


Thanks
Reply | Threaded
Open this post in threaded view
|

Re: Index-time join ToParentBlockJoinQuery query produces incorrect result

Mikhail Khludnev-2
On Wed, Jul 3, 2019 at 6:11 PM ANDREI SOLODIN <[hidden email]> wrote:

>
> This returns "id00003", which is unexpected.
>
> Please check ToPBJQ javadoc. It's absolutely expected.

--
Sincerely yours
Mikhail Khludnev
Reply | Threaded
Open this post in threaded view
|

Re: Index-time join ToParentBlockJoinQuery query produces incorrect result

ANDREI SOLODIN
Thanks Mikhail.


I read through the javadoc and thought I was satisfying all the preconditions. Obviously not :-) Is it this part that am I getting wrong: "At search time you provide a Filter identifying the parents, however this Filter must provide an BitSet https://lucene.apache.org/core/8_1_1/core/org/apache/lucene/util/BitSet.html?is-external=true per sub-reader."? If so, given the data above how do I properly create a parent query?


> On July 3, 2019 at 10:30 AM Mikhail Khludnev <[hidden email] mailto:[hidden email] > wrote:
>
>
>     On Wed, Jul 3, 2019 at 6:11 PM ANDREI SOLODIN <[hidden email] mailto:[hidden email] > wrote:
>
>     >
>
>         > > This returns "id00003", which is unexpected.
> >
> >     >
>         > > Please check ToPBJQ javadoc. It's absolutely expected.
> >
> >     >     --
>     Sincerely yours
>     Mikhail Khludnev
>
Reply | Threaded
Open this post in threaded view
|

Re: Index-time join ToParentBlockJoinQuery query produces incorrect result

ANDREI SOLODIN
After looking through the unit tests, I got it working. The problem was that I thought the parent filter in the ToParentBlockJoinQuery can be used to select a subset of parents. It appears that the parent filter must select ALL parents, not a subset. This is not explained in the javadoc. If you want to select a subset of parents (independently of the child query), ToParentBlockJoinQuery can not be used on its own, but rather as a clause in another query.

It would be a nice enhancement to just automatically select all parents, I mean, it is already required to be the last document in the block, why do we need to provide a query for them?

> On July 3, 2019 at 10:52 AM ANDREI SOLODIN <[hidden email]> wrote:
>
>
>     Thanks Mikhail.
>
>
>     I read through the javadoc and thought I was satisfying all the preconditions. Obviously not :-) Is it this part that am I getting wrong: "At search time you provide a Filter identifying the parents, however this Filter must provide an BitSet https://lucene.apache.org/core/8_1_1/core/org/apache/lucene/util/BitSet.html?is-external=true per sub-reader."? If so, given the data above how do I properly create a parent query?
>
>
>         > > On July 3, 2019 at 10:30 AM Mikhail Khludnev < [hidden email] mailto:[hidden email] > wrote:
> >
> >
> >         On Wed, Jul 3, 2019 at 6:11 PM ANDREI SOLODIN < [hidden email] mailto:[hidden email] > wrote:
> >
> >         >
> >
> >             > > > This returns "id00003", which is unexpected.
> > >
> > >         > >
> >             > > > Please check ToPBJQ javadoc. It's absolutely expected.
> > >
> > >         > >         --
> >         Sincerely yours
> >         Mikhail Khludnev
> >
> >     >


 
Reply | Threaded
Open this post in threaded view
|

Re: Index-time join ToParentBlockJoinQuery query produces incorrect result

Michael Sokolov-4
Well for one thing, you might have other documents in the index that
are neither parents nor children (in this particular relation). Also,
consider a nested hierarchy - how can we automatically figure out
which "generation" or "level" of parent to select?

On Wed, Jul 3, 2019 at 2:50 PM ANDREI SOLODIN <[hidden email]> wrote:

>
> After looking through the unit tests, I got it working. The problem was that I thought the parent filter in the ToParentBlockJoinQuery can be used to select a subset of parents. It appears that the parent filter must select ALL parents, not a subset. This is not explained in the javadoc. If you want to select a subset of parents (independently of the child query), ToParentBlockJoinQuery can not be used on its own, but rather as a clause in another query.
>
> It would be a nice enhancement to just automatically select all parents, I mean, it is already required to be the last document in the block, why do we need to provide a query for them?
>
> > On July 3, 2019 at 10:52 AM ANDREI SOLODIN <[hidden email]> wrote:
> >
> >
> >     Thanks Mikhail.
> >
> >
> >     I read through the javadoc and thought I was satisfying all the preconditions. Obviously not :-) Is it this part that am I getting wrong: "At search time you provide a Filter identifying the parents, however this Filter must provide an BitSet https://lucene.apache.org/core/8_1_1/core/org/apache/lucene/util/BitSet.html?is-external=true per sub-reader."? If so, given the data above how do I properly create a parent query?
> >
> >
> >         > > On July 3, 2019 at 10:30 AM Mikhail Khludnev < [hidden email] mailto:[hidden email] > wrote:
> > >
> > >
> > >         On Wed, Jul 3, 2019 at 6:11 PM ANDREI SOLODIN < [hidden email] mailto:[hidden email] > wrote:
> > >
> > >         >
> > >
> > >             > > > This returns "id00003", which is unexpected.
> > > >
> > > >         > >
> > >             > > > Please check ToPBJQ javadoc. It's absolutely expected.
> > > >
> > > >         > >         --
> > >         Sincerely yours
> > >         Mikhail Khludnev
> > >
> > >     >
>
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Index-time join ToParentBlockJoinQuery query produces incorrect result

ANDREI SOLODIN
So you are implying that the parent filter allows subsets. The code at https://github.com/apache/lucene-solr/blob/master/lucene/join/src/java/org/apache/lucene/search/join/CheckJoinIndex.java#L46 implies that subset is not allowed. If I select a subset and invoke the checker, I get this IllegalStateException.


> On July 3, 2019 at 2:33 PM Michael Sokolov <[hidden email]> wrote:
>
>
> Well for one thing, you might have other documents in the index that
> are neither parents nor children (in this particular relation). Also,
> consider a nested hierarchy - how can we automatically figure out
> which "generation" or "level" of parent to select?
>
> On Wed, Jul 3, 2019 at 2:50 PM ANDREI SOLODIN <[hidden email]> wrote:
> >
> > After looking through the unit tests, I got it working. The problem was that I thought the parent filter in the ToParentBlockJoinQuery can be used to select a subset of parents. It appears that the parent filter must select ALL parents, not a subset. This is not explained in the javadoc. If you want to select a subset of parents (independently of the child query), ToParentBlockJoinQuery can not be used on its own, but rather as a clause in another query.
> >
> > It would be a nice enhancement to just automatically select all parents, I mean, it is already required to be the last document in the block, why do we need to provide a query for them?
> >
> > > On July 3, 2019 at 10:52 AM ANDREI SOLODIN <[hidden email]> wrote:
> > >
> > >
> > >     Thanks Mikhail.
> > >
> > >
> > >     I read through the javadoc and thought I was satisfying all the preconditions. Obviously not :-) Is it this part that am I getting wrong: "At search time you provide a Filter identifying the parents, however this Filter must provide an BitSet https://lucene.apache.org/core/8_1_1/core/org/apache/lucene/util/BitSet.html?is-external=true per sub-reader."? If so, given the data above how do I properly create a parent query?
> > >
> > >
> > >         > > On July 3, 2019 at 10:30 AM Mikhail Khludnev < [hidden email] mailto:[hidden email] > wrote:
> > > >
> > > >
> > > >         On Wed, Jul 3, 2019 at 6:11 PM ANDREI SOLODIN < [hidden email] mailto:[hidden email] > wrote:
> > > >
> > > >         >
> > > >
> > > >             > > > This returns "id00003", which is unexpected.
> > > > >
> > > > >         > >
> > > >             > > > Please check ToPBJQ javadoc. It's absolutely expected.
> > > > >
> > > > >         > >         --
> > > >         Sincerely yours
> > > >         Mikhail Khludnev
> > > >
> > > >     >
> >
> >
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Index-time join ToParentBlockJoinQuery query produces incorrect result

Mikhail Khludnev-2
Andrei, it's not clear what's the problem, but if you need to join children
to parents and then select only subset of parents you need to combine join
with parent filter. Some cases are explained
https://lucene.apache.org/solr/guide/8_0/other-parsers.html#OtherParsers-BlockJoinParentQueryParser
.

On Sat, Jul 6, 2019 at 1:41 AM ANDREI SOLODIN <[hidden email]> wrote:

> So you are implying that the parent filter allows subsets. The code at
> https://github.com/apache/lucene-solr/blob/master/lucene/join/src/java/org/apache/lucene/search/join/CheckJoinIndex.java#L46
> implies that subset is not allowed. If I select a subset and invoke the
> checker, I get this IllegalStateException.
>
>
> > On July 3, 2019 at 2:33 PM Michael Sokolov <[hidden email]> wrote:
> >
> >
> > Well for one thing, you might have other documents in the index that
> > are neither parents nor children (in this particular relation). Also,
> > consider a nested hierarchy - how can we automatically figure out
> > which "generation" or "level" of parent to select?
> >
> > On Wed, Jul 3, 2019 at 2:50 PM ANDREI SOLODIN <[hidden email]>
> wrote:
> > >
> > > After looking through the unit tests, I got it working. The problem
> was that I thought the parent filter in the ToParentBlockJoinQuery can be
> used to select a subset of parents. It appears that the parent filter must
> select ALL parents, not a subset. This is not explained in the javadoc. If
> you want to select a subset of parents (independently of the child query),
> ToParentBlockJoinQuery can not be used on its own, but rather as a clause
> in another query.
> > >
> > > It would be a nice enhancement to just automatically select all
> parents, I mean, it is already required to be the last document in the
> block, why do we need to provide a query for them?
> > >
> > > > On July 3, 2019 at 10:52 AM ANDREI SOLODIN <[hidden email]>
> wrote:
> > > >
> > > >
> > > >     Thanks Mikhail.
> > > >
> > > >
> > > >     I read through the javadoc and thought I was satisfying all the
> preconditions. Obviously not :-) Is it this part that am I getting wrong:
> "At search time you provide a Filter identifying the parents, however this
> Filter must provide an BitSet
> https://lucene.apache.org/core/8_1_1/core/org/apache/lucene/util/BitSet.html?is-external=true
> per sub-reader."? If so, given the data above how do I properly create a
> parent query?
> > > >
> > > >
> > > >         > > On July 3, 2019 at 10:30 AM Mikhail Khludnev <
> [hidden email] mailto:[hidden email] > wrote:
> > > > >
> > > > >
> > > > >         On Wed, Jul 3, 2019 at 6:11 PM ANDREI SOLODIN <
> [hidden email] mailto:[hidden email] > wrote:
> > > > >
> > > > >         >
> > > > >
> > > > >             > > > This returns "id00003", which is unexpected.
> > > > > >
> > > > > >         > >
> > > > >             > > > Please check ToPBJQ javadoc. It's absolutely
> expected.
> > > > > >
> > > > > >         > >         --
> > > > >         Sincerely yours
> > > > >         Mikhail Khludnev
> > > > >
> > > > >     >
> > >
> > >
> > >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: [hidden email]
> > For additional commands, e-mail: [hidden email]
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>

--
Sincerely yours
Mikhail Khludnev