Quantcast

ChildDocTransformerFactory and returning only parents with children

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

ChildDocTransformerFactory and returning only parents with children

David Kramer
Hi.  We’re just ramping up a product search engine for our eCommerce site, so this is all new development and we are slowly building up our Solr knowledgebase, so thanks in advance for any guidance.

Our catalog (mostly shoes and apparel) has three objects nested: Products (title, description, etc), items (color, price, etc), and SKU (size, etc).  Since Solr doesn’t do documents nested three deep, the SKUs and items both get retrieved as children of products.  That has not bit us yet…  Also, our search results page expects a list of Item objects, then groups them (rolls them up) by their parent object.  Right now we are returning just the items, and that’s great, but we want to implement pagination of the products, so we need to return the items nested in products, then paginate on the products.

If I send ‘q=docType:Product description:Armour&fl=title, description,id,[child parentFilter="docType:Product" childFilter="docType:Item"]’ I get a nice list of products with items nested inside them. Woot.

The problem is, if we want to filter on item attributes, I get back products that have no children, which means we can’t paginate on the results if we remove those parents.  For instance, send ‘q=docType:Product description:Armour&fl=title, description,id,[child parentFilter="docType:Product" childFilter="docType:Item AND price:49.99"]’, we get the products and their items nicely nested, and only items with a price of 49.99 are shown, but so are parents that have no matching items.

How can I build a query that will not return parents without children? I haven’t figured out a way to reference the children in the query.

Since we’re not in production yet, I can change lots of things here.  I would PREFER not to denormalize the documents into one document per SKU with all the item and product information too, as our catalog is quite large and that would lead to a huge import file and lots of duplicated content between documents in the index.  If that’s the only way, though, it is possible.

Thanks in advance.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: ChildDocTransformerFactory and returning only parents with children

Alexandre Rafalovitch
You should be able to nest things multiple levels deep. What happens
when you try?

For trying to find parents where children satisfy some criteria,
[child] result transformer is probably a bit later. You may want to
look into json.facets instead and search against children with
shifting domain up to parents after. Then, you also do the [child]
transformer to get the expanded children (if you need them).

Regards,
   Alex.


----
http://www.solr-start.com/ - Resources for Solr users, new and experienced


On 20 March 2017 at 11:58, David Kramer <[hidden email]> wrote:

> Hi.  We’re just ramping up a product search engine for our eCommerce site, so this is all new development and we are slowly building up our Solr knowledgebase, so thanks in advance for any guidance.
>
> Our catalog (mostly shoes and apparel) has three objects nested: Products (title, description, etc), items (color, price, etc), and SKU (size, etc).  Since Solr doesn’t do documents nested three deep, the SKUs and items both get retrieved as children of products.  That has not bit us yet…  Also, our search results page expects a list of Item objects, then groups them (rolls them up) by their parent object.  Right now we are returning just the items, and that’s great, but we want to implement pagination of the products, so we need to return the items nested in products, then paginate on the products.
>
> If I send ‘q=docType:Product description:Armour&fl=title, description,id,[child parentFilter="docType:Product" childFilter="docType:Item"]’ I get a nice list of products with items nested inside them. Woot.
>
> The problem is, if we want to filter on item attributes, I get back products that have no children, which means we can’t paginate on the results if we remove those parents.  For instance, send ‘q=docType:Product description:Armour&fl=title, description,id,[child parentFilter="docType:Product" childFilter="docType:Item AND price:49.99"]’, we get the products and their items nicely nested, and only items with a price of 49.99 are shown, but so are parents that have no matching items.
>
> How can I build a query that will not return parents without children? I haven’t figured out a way to reference the children in the query.
>
> Since we’re not in production yet, I can change lots of things here.  I would PREFER not to denormalize the documents into one document per SKU with all the item and product information too, as our catalog is quite large and that would lead to a huge import file and lots of duplicated content between documents in the index.  If that’s the only way, though, it is possible.
>
> Thanks in advance.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: ChildDocTransformerFactory and returning only parents with children

David Kramer
I’ll be honest I didn’t understand most of what you wrote (like I said we’re just getting started with this).  We will most certainly need to do faceted search in future iterations so thanks for the “json.facets” reference.  And I do understand that the ChildDocTransformer is really for controlling what gets output and not for finding or filtering rows.

Your answer started me thinking about solving different parts of the problem in different parts of the query.  I got something that works now:
       q=title:"Under Armour" OR description:"Under Armour"
  fq={!parent which=docType:Product}color:*Blue*
       fl=title, description, brand,id,[child parentFilter="docType:Product" childFilter="color:*Blue*"]  
This does show me only Under Armor products with blue items, and returns just the blue items nested inside the products.  That will work. There may be a more efficient/direct way of doing it, but at least we can move forward.  Is this a good approach?

With respect to multiple levels, it’s not a matter of trying to query more than two nested documents deep, it’s a matter of I haven’t seen a single example of how to query more than two levels.  The documentation and every example I found for ChildDocTransformer and Block Join just show parents and children.  A few hours ago Mikhail graciously send me a link off-list to an article that basically says grandchildren are children too so you can search/filter on them as if they were children, and I understood most of it. Will have to dig into it more.

Thanks!

On 3/20/17, 1:20 PM, "Alexandre Rafalovitch" <[hidden email]> wrote:

    You should be able to nest things multiple levels deep. What happens
    when you try?
   
    For trying to find parents where children satisfy some criteria,
    [child] result transformer is probably a bit later. You may want to
    look into json.facets instead and search against children with
    shifting domain up to parents after. Then, you also do the [child]
    transformer to get the expanded children (if you need them).
   
    Regards,
       Alex.
   
   
    ----
    http://www.solr-start.com/ - Resources for Solr users, new and experienced
   
   
    On 20 March 2017 at 11:58, David Kramer <[hidden email]> wrote:
    > Hi.  We’re just ramping up a product search engine for our eCommerce site, so this is all new development and we are slowly building up our Solr knowledgebase, so thanks in advance for any guidance.
    >
    > Our catalog (mostly shoes and apparel) has three objects nested: Products (title, description, etc), items (color, price, etc), and SKU (size, etc).  Since Solr doesn’t do documents nested three deep, the SKUs and items both get retrieved as children of products.  That has not bit us yet…  Also, our search results page expects a list of Item objects, then groups them (rolls them up) by their parent object.  Right now we are returning just the items, and that’s great, but we want to implement pagination of the products, so we need to return the items nested in products, then paginate on the products.
    >
    > If I send ‘q=docType:Product description:Armour&fl=title, description,id,[child parentFilter="docType:Product" childFilter="docType:Item"]’ I get a nice list of products with items nested inside them. Woot.
    >
    > The problem is, if we want to filter on item attributes, I get back products that have no children, which means we can’t paginate on the results if we remove those parents.  For instance, send ‘q=docType:Product description:Armour&fl=title, description,id,[child parentFilter="docType:Product" childFilter="docType:Item AND price:49.99"]’, we get the products and their items nicely nested, and only items with a price of 49.99 are shown, but so are parents that have no matching items.
    >
    > How can I build a query that will not return parents without children? I haven’t figured out a way to reference the children in the query.
    >
    > Since we’re not in production yet, I can change lots of things here.  I would PREFER not to denormalize the documents into one document per SKU with all the item and product information too, as our catalog is quite large and that would lead to a huge import file and lots of duplicated content between documents in the index.  If that’s the only way, though, it is possible.
    >
    > Thanks in advance.
   

Loading...