best way to model 1-N

classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

best way to model 1-N

Joel Nylund
Hi,

I have one index so far which contains feeds.  I have been able to de-
normalize several tables and map this data onto the feed entity. There  
is one tricky problem that I need help on.

Feeds have 1 - many categories.

So Lets say we have Category1, Category2 and Category3

Feed 1 - is in Category 1
Feed 2 is in category2 and category3
Feed 3 is in category2
Feed 4 has no category

In the database this is modeled a a 1-N where category table has the  
mapping of feed to category

I need to be able to query , give me all the feeds in any given  
category.

How can I best model this in solr?

Seems like multiValued field might help, but how would I populate it,  
and would the query above work?.

thanks
Joel

Reply | Threaded
Open this post in threaded view
|

Re: best way to model 1-N

Avlesh Singh
>
> In the database this is modeled a a 1-N where category table has the
> mapping of feed to category
> I need to be able to query , give me all the feeds in any given category.
> How can I best model this in solr?
> Seems like multiValued field might help, but how would I populate it, and
> would the query above work?.
>
Yes you are right. A multivalued field for "categories" is the answer.

For populating in the index -

   1. If you use DIH to populate your indexes and your datasource is a
   database then you can use DIH's RegexTransformer on an aggregated list of
   categories. e.g. if your database query retruns "a,b,c,d" in a column called
   "db_categories", this is how you would put it in DIH's data-config file -
   <field column="db_categories" name="categories" splityBy="," />.
   2. If you "add" documents to Solr yourself  multiple values for the field
   can be specified as an array or list of values in the SolrInputDocument.

A multivalued field provides the same faceting and searching capabilites
like regular fields. There is no special syntax.

Cheers
Avlesh

On Fri, Oct 30, 2009 at 4:55 AM, Joel Nylund <[hidden email]> wrote:

> Hi,
>
> I have one index so far which contains feeds.  I have been able to
> de-normalize several tables and map this data onto the feed entity. There is
> one tricky problem that I need help on.
>
> Feeds have 1 - many categories.
>
> So Lets say we have Category1, Category2 and Category3
>
> Feed 1 - is in Category 1
> Feed 2 is in category2 and category3
> Feed 3 is in category2
> Feed 4 has no category
>
> In the database this is modeled a a 1-N where category table has the
> mapping of feed to category
>
> I need to be able to query , give me all the feeds in any given category.
>
> How can I best model this in solr?
>
> Seems like multiValued field might help, but how would I populate it, and
> would the query above work?.
>
> thanks
> Joel
>
>
Reply | Threaded
Open this post in threaded view
|

Re: best way to model 1-N

Joel Nylund
thanks, but im confused how I can aggregate across rows, I dont know  
of any easy way to get my db to return one row for all the categories  
(given the hint from your other email), I have split the category  
query into a separate entity, but its returning multiple rows, how do  
I combine multiple rows into 1 index entity?

thanks
Joel

On Oct 29, 2009, at 8:58 PM, Avlesh Singh wrote:

>>
>> In the database this is modeled a a 1-N where category table has the
>> mapping of feed to category
>> I need to be able to query , give me all the feeds in any given  
>> category.
>> How can I best model this in solr?
>> Seems like multiValued field might help, but how would I populate  
>> it, and
>> would the query above work?.
>>
> Yes you are right. A multivalued field for "categories" is the answer.
>
> For populating in the index -
>
>   1. If you use DIH to populate your indexes and your datasource is a
>   database then you can use DIH's RegexTransformer on an aggregated  
> list of
>   categories. e.g. if your database query retruns "a,b,c,d" in a  
> column called
>   "db_categories", this is how you would put it in DIH's data-config  
> file -
>   <field column="db_categories" name="categories" splityBy="," />.
>   2. If you "add" documents to Solr yourself  multiple values for  
> the field
>   can be specified as an array or list of values in the  
> SolrInputDocument.
>
> A multivalued field provides the same faceting and searching  
> capabilites
> like regular fields. There is no special syntax.
>
> Cheers
> Avlesh
>
> On Fri, Oct 30, 2009 at 4:55 AM, Joel Nylund <[hidden email]>  
> wrote:
>
>> Hi,
>>
>> I have one index so far which contains feeds.  I have been able to
>> de-normalize several tables and map this data onto the feed entity.  
>> There is
>> one tricky problem that I need help on.
>>
>> Feeds have 1 - many categories.
>>
>> So Lets say we have Category1, Category2 and Category3
>>
>> Feed 1 - is in Category 1
>> Feed 2 is in category2 and category3
>> Feed 3 is in category2
>> Feed 4 has no category
>>
>> In the database this is modeled a a 1-N where category table has the
>> mapping of feed to category
>>
>> I need to be able to query , give me all the feeds in any given  
>> category.
>>
>> How can I best model this in solr?
>>
>> Seems like multiValued field might help, but how would I populate  
>> it, and
>> would the query above work?.
>>
>> thanks
>> Joel
>>
>>

Reply | Threaded
Open this post in threaded view
|

Re: best way to model 1-N

Chantal Ackermann
That depends a bit on your database, but it is tricky and might not be
performant.

If you are more of a Java developer, you might prefer retrieving
mutliple rows per SOLR document from your dataSource (join on your
category and main table), and aggregate them in your custom
EntityProcessor. I got a far(!) better performance retrieving everything
in one query and doing the aggregation in Java. But this is, of course,
depending on your table structure and data.

Noble Paul helped me with the custom EntityProcessor, and it turned out
quite easy. Have a look at the thread with the heading from this mailing
list (SOLR-USER):
DataImportHandler / Import from DB : one data set comes in multiple rows

Cheers,
Chantal


Joel Nylund schrieb:

> thanks, but im confused how I can aggregate across rows, I dont know
> of any easy way to get my db to return one row for all the categories
> (given the hint from your other email), I have split the category
> query into a separate entity, but its returning multiple rows, how do
> I combine multiple rows into 1 index entity?
>
> thanks
> Joel
>
> On Oct 29, 2009, at 8:58 PM, Avlesh Singh wrote:
>
>>> In the database this is modeled a a 1-N where category table has the
>>> mapping of feed to category
>>> I need to be able to query , give me all the feeds in any given
>>> category.
>>> How can I best model this in solr?
>>> Seems like multiValued field might help, but how would I populate
>>> it, and
>>> would the query above work?.
>>>
>> Yes you are right. A multivalued field for "categories" is the answer.
>>
>> For populating in the index -
>>
>>   1. If you use DIH to populate your indexes and your datasource is a
>>   database then you can use DIH's RegexTransformer on an aggregated
>> list of
>>   categories. e.g. if your database query retruns "a,b,c,d" in a
>> column called
>>   "db_categories", this is how you would put it in DIH's data-config
>> file -
>>   <field column="db_categories" name="categories" splityBy="," />.
>>   2. If you "add" documents to Solr yourself  multiple values for
>> the field
>>   can be specified as an array or list of values in the
>> SolrInputDocument.
>>
>> A multivalued field provides the same faceting and searching
>> capabilites
>> like regular fields. There is no special syntax.
>>
>> Cheers
>> Avlesh
>>
>> On Fri, Oct 30, 2009 at 4:55 AM, Joel Nylund <[hidden email]>
>> wrote:
>>
>>> Hi,
>>>
>>> I have one index so far which contains feeds.  I have been able to
>>> de-normalize several tables and map this data onto the feed entity.
>>> There is
>>> one tricky problem that I need help on.
>>>
>>> Feeds have 1 - many categories.
>>>
>>> So Lets say we have Category1, Category2 and Category3
>>>
>>> Feed 1 - is in Category 1
>>> Feed 2 is in category2 and category3
>>> Feed 3 is in category2
>>> Feed 4 has no category
>>>
>>> In the database this is modeled a a 1-N where category table has the
>>> mapping of feed to category
>>>
>>> I need to be able to query , give me all the feeds in any given
>>> category.
>>>
>>> How can I best model this in solr?
>>>
>>> Seems like multiValued field might help, but how would I populate
>>> it, and
>>> would the query above work?.
>>>
>>> thanks
>>> Joel
>>>
>>>
>
Reply | Threaded
Open this post in threaded view
|

Re: best way to model 1-N

Joel Nylund
Thanks Chantal, I will keep that in mind for tuning,

for sql I figured  way to combine them into one row using concat, but  
I still seem to be having an issue splitting them:

Db now returns as one column categoryType:
TOPIC,LANGUAGE

but my solr result, if you note the item in categoryType  all seem to  
be within one str, I would expect it to be in multiple strings within  
the array, is this assumption wrong?

<doc>

<arr name="categoryType">
<str>TOPIC,LANGUAGE</str>
</arr>
<str name="id">40</str>
<str name="title">feed title</str>
</doc>


Here is my import:
   <document name="doc">
         <entity name="item"
        query="SELECT f.id, f.title
                FROM Feed f
            <field column="id" name="id" />
             <field column="title" name="title" />
          <entity name="category" query="select cfcr.feedId,  
group_concat(cfcr.categoryType) as categoryType
                                                from CFR cfcr
                                                where
                                                cfcr.feedId = '${item.id}' AND
                                                group by cfcr.feedId">
                            <field column="categoryType" name="categoryType"  
splityBy="," />
                    </entity>
                       
      </entity>

In schema:
        <field name="categoryType" type="text" indexed="true" stored="true"  
required="false" multiValued="true"/>
        <field name="categoryName" type="text" indexed="true" stored="true"  
required="false" multiValued="true"/>


what am I missing?

thanks
Joel


On Oct 30, 2009, at 10:00 AM, Chantal Ackermann wrote:

> That depends a bit on your database, but it is tricky and might not  
> be performant.
>
> If you are more of a Java developer, you might prefer retrieving  
> mutliple rows per SOLR document from your dataSource (join on your  
> category and main table), and aggregate them in your custom  
> EntityProcessor. I got a far(!) better performance retrieving  
> everything in one query and doing the aggregation in Java. But this  
> is, of course, depending on your table structure and data.
>
> Noble Paul helped me with the custom EntityProcessor, and it turned  
> out quite easy. Have a look at the thread with the heading from this  
> mailing list (SOLR-USER):
> DataImportHandler / Import from DB : one data set comes in multiple  
> rows
>
> Cheers,
> Chantal
>
>
> Joel Nylund schrieb:
>> thanks, but im confused how I can aggregate across rows, I dont know
>> of any easy way to get my db to return one row for all the categories
>> (given the hint from your other email), I have split the category
>> query into a separate entity, but its returning multiple rows, how do
>> I combine multiple rows into 1 index entity?
>> thanks
>> Joel
>> On Oct 29, 2009, at 8:58 PM, Avlesh Singh wrote:
>>>> In the database this is modeled a a 1-N where category table has  
>>>> the
>>>> mapping of feed to category
>>>> I need to be able to query , give me all the feeds in any given
>>>> category.
>>>> How can I best model this in solr?
>>>> Seems like multiValued field might help, but how would I populate
>>>> it, and
>>>> would the query above work?.
>>>>
>>> Yes you are right. A multivalued field for "categories" is the  
>>> answer.
>>>
>>> For populating in the index -
>>>
>>>  1. If you use DIH to populate your indexes and your datasource is a
>>>  database then you can use DIH's RegexTransformer on an aggregated
>>> list of
>>>  categories. e.g. if your database query retruns "a,b,c,d" in a
>>> column called
>>>  "db_categories", this is how you would put it in DIH's data-config
>>> file -
>>>  <field column="db_categories" name="categories" splityBy="," />.
>>>  2. If you "add" documents to Solr yourself  multiple values for
>>> the field
>>>  can be specified as an array or list of values in the
>>> SolrInputDocument.
>>>
>>> A multivalued field provides the same faceting and searching
>>> capabilites
>>> like regular fields. There is no special syntax.
>>>
>>> Cheers
>>> Avlesh
>>>
>>> On Fri, Oct 30, 2009 at 4:55 AM, Joel Nylund <[hidden email]>
>>> wrote:
>>>
>>>> Hi,
>>>>
>>>> I have one index so far which contains feeds.  I have been able to
>>>> de-normalize several tables and map this data onto the feed entity.
>>>> There is
>>>> one tricky problem that I need help on.
>>>>
>>>> Feeds have 1 - many categories.
>>>>
>>>> So Lets say we have Category1, Category2 and Category3
>>>>
>>>> Feed 1 - is in Category 1
>>>> Feed 2 is in category2 and category3
>>>> Feed 3 is in category2
>>>> Feed 4 has no category
>>>>
>>>> In the database this is modeled a a 1-N where category table has  
>>>> the
>>>> mapping of feed to category
>>>>
>>>> I need to be able to query , give me all the feeds in any given
>>>> category.
>>>>
>>>> How can I best model this in solr?
>>>>
>>>> Seems like multiValued field might help, but how would I populate
>>>> it, and
>>>> would the query above work?.
>>>>
>>>> thanks
>>>> Joel
>>>>
>>>>

Reply | Threaded
Open this post in threaded view
|

Re: best way to model 1-N

Chantal Ackermann
This looks all right to me, but I might be missing something.
Which version/build of SOLR are you using?

Chantal

Joel Nylund schrieb:

> Thanks Chantal, I will keep that in mind for tuning,
>
> for sql I figured  way to combine them into one row using concat, but
> I still seem to be having an issue splitting them:
>
> Db now returns as one column categoryType:
> TOPIC,LANGUAGE
>
> but my solr result, if you note the item in categoryType  all seem to
> be within one str, I would expect it to be in multiple strings within
> the array, is this assumption wrong?
>
> <doc>
> −
> <arr name="categoryType">
> <str>TOPIC,LANGUAGE</str>
> </arr>
> <str name="id">40</str>
> <str name="title">feed title</str>
> </doc>
>
>
> Here is my import:
>    <document name="doc">
>          <entity name="item"
>         query="SELECT f.id, f.title
>                 FROM Feed f
>             <field column="id" name="id" />
>              <field column="title" name="title" />
>                         <entity name="category" query="select cfcr.feedId,
> group_concat(cfcr.categoryType) as categoryType
>                                                 from CFR cfcr
>                                                 where
>                                                 cfcr.feedId = '${item.id}' AND
>                                                 group by cfcr.feedId">
>                                         <field column="categoryType" name="categoryType"
> splityBy="," />
>                     </entity>
>
>       </entity>
>
> In schema:
>         <field name="categoryType" type="text" indexed="true" stored="true"
> required="false" multiValued="true"/>
>         <field name="categoryName" type="text" indexed="true" stored="true"
> required="false" multiValued="true"/>
>
>
> what am I missing?
>
> thanks
> Joel
>
>
> On Oct 30, 2009, at 10:00 AM, Chantal Ackermann wrote:
>
>> That depends a bit on your database, but it is tricky and might not
>> be performant.
>>
>> If you are more of a Java developer, you might prefer retrieving
>> mutliple rows per SOLR document from your dataSource (join on your
>> category and main table), and aggregate them in your custom
>> EntityProcessor. I got a far(!) better performance retrieving
>> everything in one query and doing the aggregation in Java. But this
>> is, of course, depending on your table structure and data.
>>
>> Noble Paul helped me with the custom EntityProcessor, and it turned
>> out quite easy. Have a look at the thread with the heading from this
>> mailing list (SOLR-USER):
>> DataImportHandler / Import from DB : one data set comes in multiple
>> rows
>>
>> Cheers,
>> Chantal
>>
>>
>> Joel Nylund schrieb:
>>> thanks, but im confused how I can aggregate across rows, I dont know
>>> of any easy way to get my db to return one row for all the categories
>>> (given the hint from your other email), I have split the category
>>> query into a separate entity, but its returning multiple rows, how do
>>> I combine multiple rows into 1 index entity?
>>> thanks
>>> Joel
>>> On Oct 29, 2009, at 8:58 PM, Avlesh Singh wrote:
>>>>> In the database this is modeled a a 1-N where category table has
>>>>> the
>>>>> mapping of feed to category
>>>>> I need to be able to query , give me all the feeds in any given
>>>>> category.
>>>>> How can I best model this in solr?
>>>>> Seems like multiValued field might help, but how would I populate
>>>>> it, and
>>>>> would the query above work?.
>>>>>
>>>> Yes you are right. A multivalued field for "categories" is the
>>>> answer.
>>>>
>>>> For populating in the index -
>>>>
>>>>  1. If you use DIH to populate your indexes and your datasource is a
>>>>  database then you can use DIH's RegexTransformer on an aggregated
>>>> list of
>>>>  categories. e.g. if your database query retruns "a,b,c,d" in a
>>>> column called
>>>>  "db_categories", this is how you would put it in DIH's data-config
>>>> file -
>>>>  <field column="db_categories" name="categories" splityBy="," />.
>>>>  2. If you "add" documents to Solr yourself  multiple values for
>>>> the field
>>>>  can be specified as an array or list of values in the
>>>> SolrInputDocument.
>>>>
>>>> A multivalued field provides the same faceting and searching
>>>> capabilites
>>>> like regular fields. There is no special syntax.
>>>>
>>>> Cheers
>>>> Avlesh
>>>>
>>>> On Fri, Oct 30, 2009 at 4:55 AM, Joel Nylund <[hidden email]>
>>>> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> I have one index so far which contains feeds.  I have been able to
>>>>> de-normalize several tables and map this data onto the feed entity.
>>>>> There is
>>>>> one tricky problem that I need help on.
>>>>>
>>>>> Feeds have 1 - many categories.
>>>>>
>>>>> So Lets say we have Category1, Category2 and Category3
>>>>>
>>>>> Feed 1 - is in Category 1
>>>>> Feed 2 is in category2 and category3
>>>>> Feed 3 is in category2
>>>>> Feed 4 has no category
>>>>>
>>>>> In the database this is modeled a a 1-N where category table has
>>>>> the
>>>>> mapping of feed to category
>>>>>
>>>>> I need to be able to query , give me all the feeds in any given
>>>>> category.
>>>>>
>>>>> How can I best model this in solr?
>>>>>
>>>>> Seems like multiValued field might help, but how would I populate
>>>>> it, and
>>>>> would the query above work?.
>>>>>
>>>>> thanks
>>>>> Joel
>>>>>
>>>>>
>
Reply | Threaded
Open this post in threaded view
|

Re: best way to model 1-N

Joel Nylund
Im using apache-solr-1.3.0

I got it to work using javascript function instead.

thanks
Joel

On Oct 30, 2009, at 12:44 PM, Chantal Ackermann wrote:

> This looks all right to me, but I might be missing something.
> Which version/build of SOLR are you using?
>
> Chantal
>
> Joel Nylund schrieb:
>> Thanks Chantal, I will keep that in mind for tuning,
>> for sql I figured  way to combine them into one row using concat, but
>> I still seem to be having an issue splitting them:
>> Db now returns as one column categoryType:
>> TOPIC,LANGUAGE
>> but my solr result, if you note the item in categoryType  all seem to
>> be within one str, I would expect it to be in multiple strings within
>> the array, is this assumption wrong?
>> <doc>
>> −
>> <arr name="categoryType">
>> <str>TOPIC,LANGUAGE</str>
>> </arr>
>> <str name="id">40</str>
>> <str name="title">feed title</str>
>> </doc>
>> Here is my import:
>>   <document name="doc">
>>         <entity name="item"
>>        query="SELECT f.id, f.title
>>                FROM Feed f
>>            <field column="id" name="id" />
>>             <field column="title" name="title" />
>>                        <entity name="category" query="select  
>> cfcr.feedId,
>> group_concat(cfcr.categoryType) as categoryType
>>                                                from CFR cfcr
>>                                                where
>>                                                cfcr.feedId = '$
>> {item.id}' AND
>>                                                group by cfcr.feedId">
>>                                        <field column="categoryType"  
>> name="categoryType"
>> splityBy="," />
>>                    </entity>
>>      </entity>
>> In schema:
>>        <field name="categoryType" type="text" indexed="true"  
>> stored="true"
>> required="false" multiValued="true"/>
>>        <field name="categoryName" type="text" indexed="true"  
>> stored="true"
>> required="false" multiValued="true"/>
>> what am I missing?
>> thanks
>> Joel
>> On Oct 30, 2009, at 10:00 AM, Chantal Ackermann wrote:
>>> That depends a bit on your database, but it is tricky and might not
>>> be performant.
>>>
>>> If you are more of a Java developer, you might prefer retrieving
>>> mutliple rows per SOLR document from your dataSource (join on your
>>> category and main table), and aggregate them in your custom
>>> EntityProcessor. I got a far(!) better performance retrieving
>>> everything in one query and doing the aggregation in Java. But this
>>> is, of course, depending on your table structure and data.
>>>
>>> Noble Paul helped me with the custom EntityProcessor, and it turned
>>> out quite easy. Have a look at the thread with the heading from this
>>> mailing list (SOLR-USER):
>>> DataImportHandler / Import from DB : one data set comes in multiple
>>> rows
>>>
>>> Cheers,
>>> Chantal
>>>
>>>
>>> Joel Nylund schrieb:
>>>> thanks, but im confused how I can aggregate across rows, I dont  
>>>> know
>>>> of any easy way to get my db to return one row for all the  
>>>> categories
>>>> (given the hint from your other email), I have split the category
>>>> query into a separate entity, but its returning multiple rows,  
>>>> how do
>>>> I combine multiple rows into 1 index entity?
>>>> thanks
>>>> Joel
>>>> On Oct 29, 2009, at 8:58 PM, Avlesh Singh wrote:
>>>>>> In the database this is modeled a a 1-N where category table has
>>>>>> the
>>>>>> mapping of feed to category
>>>>>> I need to be able to query , give me all the feeds in any given
>>>>>> category.
>>>>>> How can I best model this in solr?
>>>>>> Seems like multiValued field might help, but how would I populate
>>>>>> it, and
>>>>>> would the query above work?.
>>>>>>
>>>>> Yes you are right. A multivalued field for "categories" is the
>>>>> answer.
>>>>>
>>>>> For populating in the index -
>>>>>
>>>>> 1. If you use DIH to populate your indexes and your datasource  
>>>>> is a
>>>>> database then you can use DIH's RegexTransformer on an aggregated
>>>>> list of
>>>>> categories. e.g. if your database query retruns "a,b,c,d" in a
>>>>> column called
>>>>> "db_categories", this is how you would put it in DIH's data-config
>>>>> file -
>>>>> <field column="db_categories" name="categories" splityBy="," />.
>>>>> 2. If you "add" documents to Solr yourself  multiple values for
>>>>> the field
>>>>> can be specified as an array or list of values in the
>>>>> SolrInputDocument.
>>>>>
>>>>> A multivalued field provides the same faceting and searching
>>>>> capabilites
>>>>> like regular fields. There is no special syntax.
>>>>>
>>>>> Cheers
>>>>> Avlesh
>>>>>
>>>>> On Fri, Oct 30, 2009 at 4:55 AM, Joel Nylund <[hidden email]>
>>>>> wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> I have one index so far which contains feeds.  I have been able  
>>>>>> to
>>>>>> de-normalize several tables and map this data onto the feed  
>>>>>> entity.
>>>>>> There is
>>>>>> one tricky problem that I need help on.
>>>>>>
>>>>>> Feeds have 1 - many categories.
>>>>>>
>>>>>> So Lets say we have Category1, Category2 and Category3
>>>>>>
>>>>>> Feed 1 - is in Category 1
>>>>>> Feed 2 is in category2 and category3
>>>>>> Feed 3 is in category2
>>>>>> Feed 4 has no category
>>>>>>
>>>>>> In the database this is modeled a a 1-N where category table has
>>>>>> the
>>>>>> mapping of feed to category
>>>>>>
>>>>>> I need to be able to query , give me all the feeds in any given
>>>>>> category.
>>>>>>
>>>>>> How can I best model this in solr?
>>>>>>
>>>>>> Seems like multiValued field might help, but how would I populate
>>>>>> it, and
>>>>>> would the query above work?.
>>>>>>
>>>>>> thanks
>>>>>> Joel
>>>>>>
>>>>>>

Reply | Threaded
Open this post in threaded view
|

Re: best way to model 1-N

Avlesh Singh
In reply to this post by Joel Nylund
>
> what am I missing?
>
Change your <entity name="category" query="select cfcr.feedId ..."> to
<entity name="category" *transformer="RegexTransformer"* query="select
cfcr.feedId .."> The "splitBy" directive is understood by this transformer
and in your case the attribute was simply ignored.

Don't forget to re-index once you have changed.

Cheers
Avlesh

On Fri, Oct 30, 2009 at 9:33 PM, Joel Nylund <[hidden email]> wrote:

> Thanks Chantal, I will keep that in mind for tuning,
>
> for sql I figured  way to combine them into one row using concat, but I
> still seem to be having an issue splitting them:
>
> Db now returns as one column categoryType:
> TOPIC,LANGUAGE
>
> but my solr result, if you note the item in categoryType  all seem to be
> within one str, I would expect it to be in multiple strings within the
> array, is this assumption wrong?
>
> <doc>
> -
> <arr name="categoryType">
> <str>TOPIC,LANGUAGE</str>
> </arr>
> <str name="id">40</str>
> <str name="title">feed title</str>
> </doc>
>
>
> Here is my import:
>  <document name="doc">
>        <entity name="item"
>       query="SELECT f.id, f.title
>                FROM Feed f
>            <field column="id" name="id" />
>            <field column="title" name="title" />
>                        <entity name="category" query="select cfcr.feedId,
> group_concat(cfcr.categoryType) as categoryType
>                                                from CFR cfcr
>                                                where
>                                                cfcr.feedId = '${item.id}'
> AND
>                                                group by cfcr.feedId">
>                                        <field column="categoryType"
> name="categoryType" splityBy="," />
>                    </entity>
>
>     </entity>
>
> In schema:
>        <field name="categoryType" type="text" indexed="true" stored="true"
> required="false" multiValued="true"/>
>        <field name="categoryName" type="text" indexed="true" stored="true"
> required="false" multiValued="true"/>
>
>
> what am I missing?
>
> thanks
> Joel
>
>
>
> On Oct 30, 2009, at 10:00 AM, Chantal Ackermann wrote:
>
>  That depends a bit on your database, but it is tricky and might not be
>> performant.
>>
>> If you are more of a Java developer, you might prefer retrieving mutliple
>> rows per SOLR document from your dataSource (join on your category and main
>> table), and aggregate them in your custom EntityProcessor. I got a far(!)
>> better performance retrieving everything in one query and doing the
>> aggregation in Java. But this is, of course, depending on your table
>> structure and data.
>>
>> Noble Paul helped me with the custom EntityProcessor, and it turned out
>> quite easy. Have a look at the thread with the heading from this mailing
>> list (SOLR-USER):
>> DataImportHandler / Import from DB : one data set comes in multiple rows
>>
>> Cheers,
>> Chantal
>>
>>
>> Joel Nylund schrieb:
>>
>>> thanks, but im confused how I can aggregate across rows, I dont know
>>> of any easy way to get my db to return one row for all the categories
>>> (given the hint from your other email), I have split the category
>>> query into a separate entity, but its returning multiple rows, how do
>>> I combine multiple rows into 1 index entity?
>>> thanks
>>> Joel
>>> On Oct 29, 2009, at 8:58 PM, Avlesh Singh wrote:
>>>
>>>> In the database this is modeled a a 1-N where category table has the
>>>>> mapping of feed to category
>>>>> I need to be able to query , give me all the feeds in any given
>>>>> category.
>>>>> How can I best model this in solr?
>>>>> Seems like multiValued field might help, but how would I populate
>>>>> it, and
>>>>> would the query above work?.
>>>>>
>>>>>  Yes you are right. A multivalued field for "categories" is the answer.
>>>>
>>>> For populating in the index -
>>>>
>>>>  1. If you use DIH to populate your indexes and your datasource is a
>>>>  database then you can use DIH's RegexTransformer on an aggregated
>>>> list of
>>>>  categories. e.g. if your database query retruns "a,b,c,d" in a
>>>> column called
>>>>  "db_categories", this is how you would put it in DIH's data-config
>>>> file -
>>>>  <field column="db_categories" name="categories" splityBy="," />.
>>>>  2. If you "add" documents to Solr yourself  multiple values for
>>>> the field
>>>>  can be specified as an array or list of values in the
>>>> SolrInputDocument.
>>>>
>>>> A multivalued field provides the same faceting and searching
>>>> capabilites
>>>> like regular fields. There is no special syntax.
>>>>
>>>> Cheers
>>>> Avlesh
>>>>
>>>> On Fri, Oct 30, 2009 at 4:55 AM, Joel Nylund <[hidden email]>
>>>> wrote:
>>>>
>>>>  Hi,
>>>>>
>>>>> I have one index so far which contains feeds.  I have been able to
>>>>> de-normalize several tables and map this data onto the feed entity.
>>>>> There is
>>>>> one tricky problem that I need help on.
>>>>>
>>>>> Feeds have 1 - many categories.
>>>>>
>>>>> So Lets say we have Category1, Category2 and Category3
>>>>>
>>>>> Feed 1 - is in Category 1
>>>>> Feed 2 is in category2 and category3
>>>>> Feed 3 is in category2
>>>>> Feed 4 has no category
>>>>>
>>>>> In the database this is modeled a a 1-N where category table has the
>>>>> mapping of feed to category
>>>>>
>>>>> I need to be able to query , give me all the feeds in any given
>>>>> category.
>>>>>
>>>>> How can I best model this in solr?
>>>>>
>>>>> Seems like multiValued field might help, but how would I populate
>>>>> it, and
>>>>> would the query above work?.
>>>>>
>>>>> thanks
>>>>> Joel
>>>>>
>>>>>
>>>>>
>