DIH - rdbms to index confusion

classic Classic list List threaded Threaded
4 messages Options
kmf
Reply | Threaded
Open this post in threaded view
|

DIH - rdbms to index confusion

kmf
I'm new to solr (and indexing in general) and am having a hard time making the transition from rdbms to indexing in terms of the DIH/data-config.xml file.  I've successfully created a working index (so far) for the simple queries in my db, but I'm struggling to add a more "complex" query.  When I say simple I mean one or two tables and when I say complex I'm referring to 3 plus.

I have a table that contains the data values I'm wanting to return when someone makes a search.  This table has, in addition to the data values, 3 id's (FKs) pointing to the data/info that I'm wanting the users to be able to search on (while also returning the data values).

The general rdbms query would be something like:
select f.value, g.gar_name, c.cat_name from foo f, gar g, cat c, dub d
where g.id=f.gar_id
and c.id=f.cat_id
and d.id=f.dub_id

I tried following the item_category entity used in the DIH example here: http://wiki.apache.org/solr/DataImportHandler#Full_Import_Example
and am struggling to get it to work.
 
My current attempt looks like (entity translated to the above rdbms query):
<dataConfig>
   <dataSource .... />
   <document>
     <entity ...simple query-working for main entity, cat>
        <field ... />
     
         <entity name="foo" query="SELECT gar_id FROM foo
                                     WHERE cat_id='${cat.id}'">     
        <entity name="gar" query="SELECT name FROM gar                
                                    WHERE id='${f.gar_id}'">                     
          <field column="name" name="g_name" />
        </entity>                                                                                                   
        <entity name="dub" query="SELECT name FROM dub                                
                                         WHERE id='${f.dub_id}'">
          <field column="name" name="dub_name" />
        </entity>       
        <field column="value" name="f_value" />
        </entity>   

....other working entities
</entity>
</document>
</dataConfig>

I'm getting some of the data/info back, but it's not what I am expecting.  I'm hoping for/expecting a document/record to look like:
cat_name 1 : g_name 1 : dub_name 1 : f_value 1
cat_name 1 : g_name 1 : dub_name 2 : f_value 2
cat_name 1 : g_name 2 : dub_name 1 : f_value 1
cat_name 1 : g_name 2 : dub_name 2 : f_value 2
cat_name 2 : g_name 1 : dub_name 1 : f_value 1
cat_name 2 : g_name 1 : dub_name 2 : f_value 2
cat_name 2 : g_name 2 : dub_name 1 : f_value 1
cat_name 2 : g_name 2 : dub_name 2 : f_value 2

(All but the values are showing up in the index in some form)

Any suggestions on where my logic is failing?

Thanks
Reply | Threaded
Open this post in threaded view
|

Re: DIH - rdbms to index confusion

Alexey Serba
> I have a table that contains the data values I'm wanting to return when
> someone makes a search.  This table has, in addition to the data values, 3
> id's (FKs) pointing to the data/info that I'm wanting the users to be able
> to search on (while also returning the data values).
>
> The general rdbms query would be something like:
> select f.value, g.gar_name, c.cat_name from foo f, gar g, cat c, dub d
> where g.id=f.gar_id
> and c.id=f.cat_id
> and d.id=f.dub_id
>
You can put this general rdbms query as is into single DIH entity - no
need to split it.

You would probably want to split it if your main table has one to many
relation with other tables, so you can't retrieve all the data and
have single result set row per Solr document.
kmf
Reply | Threaded
Open this post in threaded view
|

Re: DIH - rdbms to index confusion

kmf
I'm not understanding this response.  My main table does have a one to many relationship with the other tables.  What should I be anticipating/wanting for each document if I want to return to the user the values while allowing them to search on the other terms?  

Thanks.
Reply | Threaded
Open this post in threaded view
|

Re: DIH - rdbms to index confusion

Stefan Matheis
Hi,

have a look at the mysql-query-log - it will tell you what queries are
executed from the solr dih. so you'll see which variables are empty/not set
as expected and therefore maybe missing in the result.

otherwise (for the rest of the list) it would be easier to help you, when
you're using real queries and not pseudo-queries :)

Regards
Stefan