[jira] [Commented] (SOLR-12519) Support Deeply Nested Docs In Child Documents Transformer

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

[jira] [Commented] (SOLR-12519) Support Deeply Nested Docs In Child Documents Transformer

JIRA jira@apache.org

    [ https://issues.apache.org/jira/browse/SOLR-12519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16540412#comment-16540412 ]

David Smiley commented on SOLR-12519:
-------------------------------------

I can see right away that the issue is that you are using the PathHierarchyTokenizer on both your index analyzer and query analyzer.  It's one or the other -- as shown in javadocs for PathHierarchyTokenizerFactory and also in the schema for the the test of that factory.  If we use PathHierarchyTokenizerFactory at index time then the query time would be a literal string/term match and would find the exact match with the trailing '/'.  Path tokenizing at index time allows quick/fast descendant matches.  If we *also* want to use the same field for finding ancestors, then this may be a fun challenge but probably doable.  First thing that comes to mind would be the transformer (or whatever client) manually path tokenizing the input and ensuring each token has a trailing '/', thus an ancestor query of "Books/NonFic/Science/" from the javadoc example would become a Boolean OR query for "Books/" and "Books/NonFic/" and Books/NonFic/Science/".

> Support Deeply Nested Docs In Child Documents Transformer
> ---------------------------------------------------------
>
>                 Key: SOLR-12519
>                 URL: https://issues.apache.org/jira/browse/SOLR-12519
>             Project: Solr
>          Issue Type: Sub-task
>      Security Level: Public(Default Security Level. Issues are Public)
>            Reporter: mosh
>            Priority: Major
>         Attachments: SOLR-12519-no-commit.patch
>
>
> As discussed in SOLR-12298, to make use of the meta-data fields in SOLR-12441, there needs to be a smarter child document transformer, which provides the ability to rebuild the original nested documents' structure.
>  In addition, I also propose the transformer will also have the ability to bring only some of the original hierarchy, to prevent unnecessary block join queries. e.g.
> {code}  {"a": "b", "c": [ {"e": "f"}, {"e": "g"} , {"h": "i"} ]} {code}
>  Incase my query is for all the children of "a:b", which contain the key "e" in them, the query will be broken in to two parts:
>  1. The parent query "a:b"
>  2. The child query "e:*".
> If the only children flag is on, the transformer will return the following documents:
>  {code}[ {"e": "f"}, {"e": "g"} ]{code}
> In case the flag was not turned on(perhaps the default state), the whole document hierarchy will be returned, containing only the matching children:
> {code}{"a": "b", "c": [ {"e": "f"}, {"e": "g"} ]{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]