Streaming Expression intersect() behaviour

classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|

Streaming Expression intersect() behaviour

Christian Spitzlay
Hi,

I don’t seem to get the behaviour of the intersect() stream decorator.
I only ever get one doc from the left stream when I would have expected
more than one.

I constructed a test case that does not depend on my concrete index:

intersect(
cartesianProduct(tuple(fieldA=array(c,c,a,b,d,d)), fieldA, productSort="fieldA asc"),
cartesianProduct(tuple(fieldB=array(c,c,a,d,d)), fieldB, productSort="fieldB asc"),
on="fieldA=fieldB“
)


The result:

{
  "result-set": {
    "docs": [
      {
        "fieldA": "a"
      },
      {
        "EOF": true,
        "RESPONSE_TIME": 0
      }
    ]
  }
}


I would have expected all the docs from the left stream with fieldA values a, c, d
and only the docs with fieldA == b missing.  Do I have a fundamental misunderstanding?


Best regards
Christian Spitzlay


Reply | Threaded
Open this post in threaded view
|

Re: Streaming Expression intersect() behaviour

Christian Spitzlay
Hi,

I noticed that my mail program broke the test case by replacing a double
quote with a different UTF-8 character.

Here is the test case again and I hope it will work this time:

intersect(
cartesianProduct(tuple(fieldA=array(a,b,c,c)), fieldA, productSort="fieldA asc"),
cartesianProduct(tuple(fieldB=array(a,c)), fieldB, productSort="fieldB asc"),
on="fieldA=fieldB"
)

I simplified it a bit, too. I still get one document with fieldA == a.
I would have expected three documents in the output, one with fieldA == a and two with fieldB == c.
Did I misunderstand the docs of the intersect decorator or have I come across a bug?


Best regards,
Christian Spitzlay



> Am 06.06.2018 um 10:18 schrieb Christian Spitzlay <[hidden email]>:
>
> Hi,
>
> I don’t seem to get the behaviour of the intersect() stream decorator.
> I only ever get one doc from the left stream when I would have expected
> more than one.
>
> I constructed a test case that does not depend on my concrete index:
>
> intersect(
> cartesianProduct(tuple(fieldA=array(c,c,a,b,d,d)), fieldA, productSort="fieldA asc"),
> cartesianProduct(tuple(fieldB=array(c,c,a,d,d)), fieldB, productSort="fieldB asc"),
> on="fieldA=fieldB“
> )
>
>
> The result:
>
> {
>  "result-set": {
>    "docs": [
>      {
>        "fieldA": "a"
>      },
>      {
>        "EOF": true,
>        "RESPONSE_TIME": 0
>      }
>    ]
>  }
> }
>
>
> I would have expected all the docs from the left stream with fieldA values a, c, d
> and only the docs with fieldA == b missing.  Do I have a fundamental misunderstanding?
>
>
> Best regards
> Christian Spitzlay
>
>

Reply | Threaded
Open this post in threaded view
|

Re: Streaming Expression intersect() behaviour

Joel Bernstein
Nice example!

I'll take a look at this today. I believe there was/is a bug with the some
of the joins where the "on" parameter is transposing the fields. Its
possible that is the case here as well.



Joel Bernstein
http://joelsolr.blogspot.com/

On Thu, Jun 7, 2018 at 5:34 AM, Christian Spitzlay <
[hidden email]> wrote:

> Hi,
>
> I noticed that my mail program broke the test case by replacing a double
> quote with a different UTF-8 character.
>
> Here is the test case again and I hope it will work this time:
>
> intersect(
> cartesianProduct(tuple(fieldA=array(a,b,c,c)), fieldA,
> productSort="fieldA asc"),
> cartesianProduct(tuple(fieldB=array(a,c)), fieldB, productSort="fieldB
> asc"),
> on="fieldA=fieldB"
> )
>
> I simplified it a bit, too. I still get one document with fieldA == a.
> I would have expected three documents in the output, one with fieldA == a
> and two with fieldB == c.
> Did I misunderstand the docs of the intersect decorator or have I come
> across a bug?
>
>
> Best regards,
> Christian Spitzlay
>
>
>
> > Am 06.06.2018 um 10:18 schrieb Christian Spitzlay <
> [hidden email]>:
> >
> > Hi,
> >
> > I don’t seem to get the behaviour of the intersect() stream decorator.
> > I only ever get one doc from the left stream when I would have expected
> > more than one.
> >
> > I constructed a test case that does not depend on my concrete index:
> >
> > intersect(
> > cartesianProduct(tuple(fieldA=array(c,c,a,b,d,d)), fieldA,
> productSort="fieldA asc"),
> > cartesianProduct(tuple(fieldB=array(c,c,a,d,d)), fieldB,
> productSort="fieldB asc"),
> > on="fieldA=fieldB“
> > )
> >
> >
> > The result:
> >
> > {
> >  "result-set": {
> >    "docs": [
> >      {
> >        "fieldA": "a"
> >      },
> >      {
> >        "EOF": true,
> >        "RESPONSE_TIME": 0
> >      }
> >    ]
> >  }
> > }
> >
> >
> > I would have expected all the docs from the left stream with fieldA
> values a, c, d
> > and only the docs with fieldA == b missing.  Do I have a fundamental
> misunderstanding?
> >
> >
> > Best regards
> > Christian Spitzlay
> >
> >
>
>
Reply | Threaded
Open this post in threaded view
|

Re: Streaming Expression intersect() behaviour

Christian Spitzlay
In reply to this post by Christian Spitzlay


> Am 07.06.2018 um 11:34 schrieb Christian Spitzlay <[hidden email]>:
>
> intersect(
> cartesianProduct(tuple(fieldA=array(a,b,c,c)), fieldA, productSort="fieldA asc"),
> cartesianProduct(tuple(fieldB=array(a,c)), fieldB, productSort="fieldB asc"),
> on="fieldA=fieldB"
> )
>
> I simplified it a bit, too. I still get one document with fieldA == a.
> I would have expected three documents in the output, one with fieldA == a and two with fieldB == c.

That should have read „… and two with fieldA == c“ of course.



Reply | Threaded
Open this post in threaded view
|

Re: Streaming Expression intersect() behaviour

Joel Bernstein
This expression works as expected:

intersect(
cartesianProduct(tuple(fieldA=array(a,b,c,c)), fieldA, productSort="fieldA
asc"),
cartesianProduct(tuple(fieldA=array(a,c)), fieldA, productSort="fieldA
asc"),
on="fieldA"
)

And when you transpose the "on" fields like this:

intersect(
 cartesianProduct(tuple(fieldA=array(a,b,c,c)), fieldA, productSort="fieldA
asc"),
 cartesianProduct(tuple(fieldB=array(a,c)), fieldB, productSort="fieldB
asc"),
 on="fieldB=fieldA"
 )

It also works.

So, yes there is a bug where the fields are being transposed with intersect
function's "on" fields. The same issue was happening with joins and may
have been resolved. I'll do little more research into this.







Joel Bernstein
http://joelsolr.blogspot.com/

On Thu, Jun 7, 2018 at 9:29 AM, Christian Spitzlay <
[hidden email]> wrote:

>
>
> > Am 07.06.2018 um 11:34 schrieb Christian Spitzlay <
> [hidden email]>:
> >
> > intersect(
> > cartesianProduct(tuple(fieldA=array(a,b,c,c)), fieldA,
> productSort="fieldA asc"),
> > cartesianProduct(tuple(fieldB=array(a,c)), fieldB, productSort="fieldB
> asc"),
> > on="fieldA=fieldB"
> > )
> >
> > I simplified it a bit, too. I still get one document with fieldA == a.
> > I would have expected three documents in the output, one with fieldA ==
> a and two with fieldB == c.
>
> That should have read „… and two with fieldA == c“ of course.
>
>
>
>
Reply | Threaded
Open this post in threaded view
|

Re: Streaming Expression intersect() behaviour

Christian Spitzlay
Hi,


> Am 08.06.2018 um 03:42 schrieb Joel Bernstein <[hidden email]>:
>
> And when you transpose the "on" fields like this:
>
> intersect(
> cartesianProduct(tuple(fieldA=array(a,b,c,c)), fieldA, productSort="fieldA
> asc"),
> cartesianProduct(tuple(fieldB=array(a,c)), fieldB, productSort="fieldB
> asc"),
> on="fieldB=fieldA"
> )
>
> It also works.


No, IIUC this does not work correctly.

I had tried this before posting my original question.
That version emits the documents from the left stream
but does not filter out the document with fieldA == b.

This might be due to the fact that fieldB is not present in the left stream
and fieldA is not present in the right stream; it compares two
empty values (null?) and comes to the conclusion that they are equal.
Could that be the reason?



> So, yes there is a bug where the fields are being transposed with intersect
> function's "on" fields. The same issue was happening with joins and may
> have been resolved. I'll do little more research into this.

Thanks for your work on this!


Best regards
Christian Spitzlay





> Joel Bernstein
> http://joelsolr.blogspot.com/
>
> On Thu, Jun 7, 2018 at 9:29 AM, Christian Spitzlay <
> [hidden email]> wrote:
>
>>
>>
>>> Am 07.06.2018 um 11:34 schrieb Christian Spitzlay <
>> [hidden email]>:
>>>
>>> intersect(
>>> cartesianProduct(tuple(fieldA=array(a,b,c,c)), fieldA,
>> productSort="fieldA asc"),
>>> cartesianProduct(tuple(fieldB=array(a,c)), fieldB, productSort="fieldB
>> asc"),
>>> on="fieldA=fieldB"
>>> )
>>>
>>> I simplified it a bit, too. I still get one document with fieldA == a.
>>> I would have expected three documents in the output, one with fieldA ==
>> a and two with fieldB == c.
>>
>> That should have read „… and two with fieldA == c“ of course.
>>
>>
>>
>>

Reply | Threaded
Open this post in threaded view
|

Re: Streaming Expression intersect() behaviour

Joel Bernstein
You're correct, after testing again the only way that this works correctly
appears to be:

intersect(
 cartesianProduct(tuple(fieldA=array(a,b,c,c)), fieldA, productSort="fieldA
asc"),
 cartesianProduct(tuple(fieldA=array(a,c)), fieldA, productSort="fieldA
asc"),
 on="fieldA"
 )

I suspect that there are only test cases that cover this scenario as well.
I'll create a jira issue for this.




Joel Bernstein
http://joelsolr.blogspot.com/

On Fri, Jun 8, 2018 at 3:41 AM, Christian Spitzlay <
[hidden email]> wrote:

> Hi,
>
>
> > Am 08.06.2018 um 03:42 schrieb Joel Bernstein <[hidden email]>:
> >
> > And when you transpose the "on" fields like this:
> >
> > intersect(
> > cartesianProduct(tuple(fieldA=array(a,b,c,c)), fieldA,
> productSort="fieldA
> > asc"),
> > cartesianProduct(tuple(fieldB=array(a,c)), fieldB, productSort="fieldB
> > asc"),
> > on="fieldB=fieldA"
> > )
> >
> > It also works.
>
>
> No, IIUC this does not work correctly.
>
> I had tried this before posting my original question.
> That version emits the documents from the left stream
> but does not filter out the document with fieldA == b.
>
> This might be due to the fact that fieldB is not present in the left stream
> and fieldA is not present in the right stream; it compares two
> empty values (null?) and comes to the conclusion that they are equal.
> Could that be the reason?
>
>
>
> > So, yes there is a bug where the fields are being transposed with
> intersect
> > function's "on" fields. The same issue was happening with joins and may
> > have been resolved. I'll do little more research into this.
>
> Thanks for your work on this!
>
>
> Best regards
> Christian Spitzlay
>
>
>
>
>
> > Joel Bernstein
> > http://joelsolr.blogspot.com/
> >
> > On Thu, Jun 7, 2018 at 9:29 AM, Christian Spitzlay <
> > [hidden email]> wrote:
> >
> >>
> >>
> >>> Am 07.06.2018 um 11:34 schrieb Christian Spitzlay <
> >> [hidden email]>:
> >>>
> >>> intersect(
> >>> cartesianProduct(tuple(fieldA=array(a,b,c,c)), fieldA,
> >> productSort="fieldA asc"),
> >>> cartesianProduct(tuple(fieldB=array(a,c)), fieldB, productSort="fieldB
> >> asc"),
> >>> on="fieldA=fieldB"
> >>> )
> >>>
> >>> I simplified it a bit, too. I still get one document with fieldA == a.
> >>> I would have expected three documents in the output, one with fieldA ==
> >> a and two with fieldB == c.
> >>
> >> That should have read „… and two with fieldA == c“ of course.
> >>
> >>
> >>
> >>
>
>
Reply | Threaded
Open this post in threaded view
|

Re: Streaming Expression intersect() behaviour

Christian Spitzlay
As a temporary workaround until that issue is fixed
one could wrap the right stream with a select that renames the field:

intersect(
cartesianProduct(tuple(fieldA=array(a,b,c,c)), fieldA, productSort="fieldA asc"),
select(cartesianProduct(tuple(fieldB=array(a,c)), fieldB, productSort="fieldB asc"), fieldB as fieldA),
on=fieldA
)



> Am 08.06.2018 um 14:42 schrieb Joel Bernstein <[hidden email]>:
>
> You're correct, after testing again the only way that this works correctly
> appears to be:
>
> intersect(
> cartesianProduct(tuple(fieldA=array(a,b,c,c)), fieldA, productSort="fieldA
> asc"),
> cartesianProduct(tuple(fieldA=array(a,c)), fieldA, productSort="fieldA
> asc"),
> on="fieldA"
> )
>
> I suspect that there are only test cases that cover this scenario as well.
> I'll create a jira issue for this.
>
>
>
>
> Joel Bernstein
> http://joelsolr.blogspot.com/
>
> On Fri, Jun 8, 2018 at 3:41 AM, Christian Spitzlay <
> [hidden email]> wrote:
>
>> Hi,
>>
>>
>>> Am 08.06.2018 um 03:42 schrieb Joel Bernstein <[hidden email]>:
>>>
>>> And when you transpose the "on" fields like this:
>>>
>>> intersect(
>>> cartesianProduct(tuple(fieldA=array(a,b,c,c)), fieldA,
>> productSort="fieldA
>>> asc"),
>>> cartesianProduct(tuple(fieldB=array(a,c)), fieldB, productSort="fieldB
>>> asc"),
>>> on="fieldB=fieldA"
>>> )
>>>
>>> It also works.
>>
>>
>> No, IIUC this does not work correctly.
>>
>> I had tried this before posting my original question.
>> That version emits the documents from the left stream
>> but does not filter out the document with fieldA == b.
>>
>> This might be due to the fact that fieldB is not present in the left stream
>> and fieldA is not present in the right stream; it compares two
>> empty values (null?) and comes to the conclusion that they are equal.
>> Could that be the reason?
>>
>>
>>
>>> So, yes there is a bug where the fields are being transposed with
>> intersect
>>> function's "on" fields. The same issue was happening with joins and may
>>> have been resolved. I'll do little more research into this.
>>
>> Thanks for your work on this!
>>
>>
>> Best regards
>> Christian Spitzlay
>>
>>
>>
>>
>>
>>> Joel Bernstein
>>> http://joelsolr.blogspot.com/
>>>
>>> On Thu, Jun 7, 2018 at 9:29 AM, Christian Spitzlay <
>>> [hidden email]> wrote:
>>>
>>>>
>>>>
>>>>> Am 07.06.2018 um 11:34 schrieb Christian Spitzlay <
>>>> [hidden email]>:
>>>>>
>>>>> intersect(
>>>>> cartesianProduct(tuple(fieldA=array(a,b,c,c)), fieldA,
>>>> productSort="fieldA asc"),
>>>>> cartesianProduct(tuple(fieldB=array(a,c)), fieldB, productSort="fieldB
>>>> asc"),
>>>>> on="fieldA=fieldB"
>>>>> )
>>>>>
>>>>> I simplified it a bit, too. I still get one document with fieldA == a.
>>>>> I would have expected three documents in the output, one with fieldA ==
>>>> a and two with fieldB == c.
>>>>
>>>> That should have read „… and two with fieldA == c“ of course.
>>>>
>>>>
>>>>
>>>>
>>
>>

Reply | Threaded
Open this post in threaded view
|

Re: Streaming Expression intersect() behaviour

Joel Bernstein
yes, I was going to suggest that as well.

Joel Bernstein
http://joelsolr.blogspot.com/

On Fri, Jun 8, 2018 at 9:20 AM, Christian Spitzlay <
[hidden email]> wrote:

> As a temporary workaround until that issue is fixed
> one could wrap the right stream with a select that renames the field:
>
> intersect(
> cartesianProduct(tuple(fieldA=array(a,b,c,c)), fieldA,
> productSort="fieldA asc"),
> select(cartesianProduct(tuple(fieldB=array(a,c)), fieldB,
> productSort="fieldB asc"), fieldB as fieldA),
> on=fieldA
> )
>
>
>
> > Am 08.06.2018 um 14:42 schrieb Joel Bernstein <[hidden email]>:
> >
> > You're correct, after testing again the only way that this works
> correctly
> > appears to be:
> >
> > intersect(
> > cartesianProduct(tuple(fieldA=array(a,b,c,c)), fieldA,
> productSort="fieldA
> > asc"),
> > cartesianProduct(tuple(fieldA=array(a,c)), fieldA, productSort="fieldA
> > asc"),
> > on="fieldA"
> > )
> >
> > I suspect that there are only test cases that cover this scenario as
> well.
> > I'll create a jira issue for this.
> >
> >
> >
> >
> > Joel Bernstein
> > http://joelsolr.blogspot.com/
> >
> > On Fri, Jun 8, 2018 at 3:41 AM, Christian Spitzlay <
> > [hidden email]> wrote:
> >
> >> Hi,
> >>
> >>
> >>> Am 08.06.2018 um 03:42 schrieb Joel Bernstein <[hidden email]>:
> >>>
> >>> And when you transpose the "on" fields like this:
> >>>
> >>> intersect(
> >>> cartesianProduct(tuple(fieldA=array(a,b,c,c)), fieldA,
> >> productSort="fieldA
> >>> asc"),
> >>> cartesianProduct(tuple(fieldB=array(a,c)), fieldB, productSort="fieldB
> >>> asc"),
> >>> on="fieldB=fieldA"
> >>> )
> >>>
> >>> It also works.
> >>
> >>
> >> No, IIUC this does not work correctly.
> >>
> >> I had tried this before posting my original question.
> >> That version emits the documents from the left stream
> >> but does not filter out the document with fieldA == b.
> >>
> >> This might be due to the fact that fieldB is not present in the left
> stream
> >> and fieldA is not present in the right stream; it compares two
> >> empty values (null?) and comes to the conclusion that they are equal.
> >> Could that be the reason?
> >>
> >>
> >>
> >>> So, yes there is a bug where the fields are being transposed with
> >> intersect
> >>> function's "on" fields. The same issue was happening with joins and may
> >>> have been resolved. I'll do little more research into this.
> >>
> >> Thanks for your work on this!
> >>
> >>
> >> Best regards
> >> Christian Spitzlay
> >>
> >>
> >>
> >>
> >>
> >>> Joel Bernstein
> >>> http://joelsolr.blogspot.com/
> >>>
> >>> On Thu, Jun 7, 2018 at 9:29 AM, Christian Spitzlay <
> >>> [hidden email]> wrote:
> >>>
> >>>>
> >>>>
> >>>>> Am 07.06.2018 um 11:34 schrieb Christian Spitzlay <
> >>>> [hidden email]>:
> >>>>>
> >>>>> intersect(
> >>>>> cartesianProduct(tuple(fieldA=array(a,b,c,c)), fieldA,
> >>>> productSort="fieldA asc"),
> >>>>> cartesianProduct(tuple(fieldB=array(a,c)), fieldB,
> productSort="fieldB
> >>>> asc"),
> >>>>> on="fieldA=fieldB"
> >>>>> )
> >>>>>
> >>>>> I simplified it a bit, too. I still get one document with fieldA ==
> a.
> >>>>> I would have expected three documents in the output, one with fieldA
> ==
> >>>> a and two with fieldB == c.
> >>>>
> >>>> That should have read „… and two with fieldA == c“ of course.
> >>>>
> >>>>
> >>>>
> >>>>
> >>
> >>
>
>