Should ChildDocTransformerFactory's limit be local or global for deep-nested documents?

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

Should ChildDocTransformerFactory's limit be local or global for deep-nested documents?

Alexandre Rafalovitch
I am indexing a deeply nested structure and am trying to return it
with fl=*,[child].

And it is supposed to have 5 children under the top element but
returns only 4. Two hours of debugging later, I realize that the
"limit" parameter is set to 10 by default and that 10 seems to be
counting children at ANY level. And calculating them depth-first. So,
it was quite unobvious to discover when the children suddenly stopped
showing up.

The documentation says:
> The maximum number of child documents to be returned per parent document. > The default is `10`.

So, is that (all nested children included in limit) what we actually
mean? Or did we mean maximum number of "immediate children" for any
specific document/level and the code is wrong?

I can update the doc to clarify the results, but I don't know whether
I am looking at the bug or the feature.

Regards,
   Alex.

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Should ChildDocTransformerFactory's limit be local or global for deep-nested documents?

David Smiley
I think that's a bug!  Good catch!

~ David Smiley
Apache Lucene/Solr Search Developer


On Thu, Oct 1, 2020 at 11:38 PM Alexandre Rafalovitch <[hidden email]> wrote:
I am indexing a deeply nested structure and am trying to return it
with fl=*,[child].

And it is supposed to have 5 children under the top element but
returns only 4. Two hours of debugging later, I realize that the
"limit" parameter is set to 10 by default and that 10 seems to be
counting children at ANY level. And calculating them depth-first. So,
it was quite unobvious to discover when the children suddenly stopped
showing up.

The documentation says:
> The maximum number of child documents to be returned per parent document. > The default is `10`.

So, is that (all nested children included in limit) what we actually
mean? Or did we mean maximum number of "immediate children" for any
specific document/level and the code is wrong?

I can update the doc to clarify the results, but I don't know whether
I am looking at the bug or the feature.

Regards,
   Alex.

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Should ChildDocTransformerFactory's limit be local or global for deep-nested documents?

Bar Rotstein
Hey,
Was a ticket opened?

I'd gladly tackle that one if it hasn't been assigned yet.

Thanks in advance,
Bar
On Fri, Oct 2, 2020 at 3:13 PM David Smiley <[hidden email]> wrote:
I think that's a bug!  Good catch!

~ David Smiley
Apache Lucene/Solr Search Developer


On Thu, Oct 1, 2020 at 11:38 PM Alexandre Rafalovitch <[hidden email]> wrote:
I am indexing a deeply nested structure and am trying to return it
with fl=*,[child].

And it is supposed to have 5 children under the top element but
returns only 4. Two hours of debugging later, I realize that the
"limit" parameter is set to 10 by default and that 10 seems to be
counting children at ANY level. And calculating them depth-first. So,
it was quite unobvious to discover when the children suddenly stopped
showing up.

The documentation says:
> The maximum number of child documents to be returned per parent document. > The default is `10`.

So, is that (all nested children included in limit) what we actually
mean? Or did we mean maximum number of "immediate children" for any
specific document/level and the code is wrong?

I can update the doc to clarify the results, but I don't know whether
I am looking at the bug or the feature.

Regards,
   Alex.

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Should ChildDocTransformerFactory's limit be local or global for deep-nested documents?

David Smiley
Glad to hear from you again Bar!
Also, FYI https://issues.apache.org/jira/browse/SOLR-14869 is a serious bug relating to child documents.  It returns deleted docs!

~ David Smiley
Apache Lucene/Solr Search Developer


On Sat, Oct 3, 2020 at 3:23 PM Bar Rotstein <[hidden email]> wrote:
Hey,
Was a ticket opened?

I'd gladly tackle that one if it hasn't been assigned yet.

Thanks in advance,
Bar
On Fri, Oct 2, 2020 at 3:13 PM David Smiley <[hidden email]> wrote:
I think that's a bug!  Good catch!

~ David Smiley
Apache Lucene/Solr Search Developer


On Thu, Oct 1, 2020 at 11:38 PM Alexandre Rafalovitch <[hidden email]> wrote:
I am indexing a deeply nested structure and am trying to return it
with fl=*,[child].

And it is supposed to have 5 children under the top element but
returns only 4. Two hours of debugging later, I realize that the
"limit" parameter is set to 10 by default and that 10 seems to be
counting children at ANY level. And calculating them depth-first. So,
it was quite unobvious to discover when the children suddenly stopped
showing up.

The documentation says:
> The maximum number of child documents to be returned per parent document. > The default is `10`.

So, is that (all nested children included in limit) what we actually
mean? Or did we mean maximum number of "immediate children" for any
specific document/level and the code is wrong?

I can update the doc to clarify the results, but I don't know whether
I am looking at the bug or the feature.

Regards,
   Alex.

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Should ChildDocTransformerFactory's limit be local or global for deep-nested documents?

Alexandre Rafalovitch
In reply to this post by Bar Rotstein
I did not create a ticket (got distracted). Feel free to make one and
add me to watchers. I will be happy to test it with my dataset.

Thanks,
   Alex.

On Sat, 3 Oct 2020 at 15:23, Bar Rotstein <[hidden email]> wrote:

>
> Hey,
> Was a ticket opened?
>
> I'd gladly tackle that one if it hasn't been assigned yet.
>
> Thanks in advance,
> Bar
> On Fri, Oct 2, 2020 at 3:13 PM David Smiley <[hidden email]> wrote:
>>
>> I think that's a bug!  Good catch!
>>
>> ~ David Smiley
>> Apache Lucene/Solr Search Developer
>> http://www.linkedin.com/in/davidwsmiley
>>
>>
>> On Thu, Oct 1, 2020 at 11:38 PM Alexandre Rafalovitch <[hidden email]> wrote:
>>>
>>> I am indexing a deeply nested structure and am trying to return it
>>> with fl=*,[child].
>>>
>>> And it is supposed to have 5 children under the top element but
>>> returns only 4. Two hours of debugging later, I realize that the
>>> "limit" parameter is set to 10 by default and that 10 seems to be
>>> counting children at ANY level. And calculating them depth-first. So,
>>> it was quite unobvious to discover when the children suddenly stopped
>>> showing up.
>>>
>>> The documentation says:
>>> > The maximum number of child documents to be returned per parent document. > The default is `10`.
>>>
>>> So, is that (all nested children included in limit) what we actually
>>> mean? Or did we mean maximum number of "immediate children" for any
>>> specific document/level and the code is wrong?
>>>
>>> I can update the doc to clarify the results, but I don't know whether
>>> I am looking at the bug or the feature.
>>>
>>> Regards,
>>>    Alex.
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: [hidden email]
>>> For additional commands, e-mail: [hidden email]
>>>

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Should ChildDocTransformerFactory's limit be local or global for deep-nested documents?

Bar Rotstein
In reply to this post by David Smiley
Hey David,
long time no speak.

I think I'll start working on SOLR-14869.

Do you have any tips that might enable me to tackle it a little faster?

Thanks,
Bar.

On Sun, Oct 4, 2020 at 12:25 AM David Smiley <[hidden email]> wrote:
Glad to hear from you again Bar!
Also, FYI https://issues.apache.org/jira/browse/SOLR-14869 is a serious bug relating to child documents.  It returns deleted docs!

~ David Smiley
Apache Lucene/Solr Search Developer


On Sat, Oct 3, 2020 at 3:23 PM Bar Rotstein <[hidden email]> wrote:
Hey,
Was a ticket opened?

I'd gladly tackle that one if it hasn't been assigned yet.

Thanks in advance,
Bar
On Fri, Oct 2, 2020 at 3:13 PM David Smiley <[hidden email]> wrote:
I think that's a bug!  Good catch!

~ David Smiley
Apache Lucene/Solr Search Developer


On Thu, Oct 1, 2020 at 11:38 PM Alexandre Rafalovitch <[hidden email]> wrote:
I am indexing a deeply nested structure and am trying to return it
with fl=*,[child].

And it is supposed to have 5 children under the top element but
returns only 4. Two hours of debugging later, I realize that the
"limit" parameter is set to 10 by default and that 10 seems to be
counting children at ANY level. And calculating them depth-first. So,
it was quite unobvious to discover when the children suddenly stopped
showing up.

The documentation says:
> The maximum number of child documents to be returned per parent document. > The default is `10`.

So, is that (all nested children included in limit) what we actually
mean? Or did we mean maximum number of "immediate children" for any
specific document/level and the code is wrong?

I can update the doc to clarify the results, but I don't know whether
I am looking at the bug or the feature.

Regards,
   Alex.

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Should ChildDocTransformerFactory's limit be local or global for deep-nested documents?

David Smiley
On Thu, Oct 8, 2020 at 9:13 AM Bar Rotstein <[hidden email]> wrote:
Hey David,
long time no speak.

I think I'll start working on SOLR-14869.

Do you have any tips that might enable me to tackle it a little faster?


ChildDocTransformer loops over document IDs.  They should be in the same segment.  You should get the LeafReader for that segment and call getLiveDocs on it.  In the transformer when you loop the IDs, check to see if the doc is "live".