how to delta index linked entities in 3.5.0

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

how to delta index linked entities in 3.5.0

AdamLane
The delta instructions from https://wiki.apache.org/solr/DataImportHandler#Using_delta-import_command works for me in solr 1.4 but crashes in 3.5.0 (error: "deltaQuery has no column to resolve to declared primary key pk='ITEM_ID, CATEGORY_ID'"  issue: https://issues.apache.org/jira/browse/SOLR-2907)

Is there anyone out there that can confirm my bug?  Because I am new to solr and hopefully I am just doing something wrong based on a misunderstanding of the wiki.  Anyone successfully indexing the join of items and multiple item_categories just like the wiki example that would be willing to share their workaround or suggest a workaround?

Thanks,
Adam  
Reply | Threaded
Open this post in threaded view
|

Re: how to delta index linked entities in 3.5.0

Shawn Heisey-4
On 2/16/2012 6:31 PM, AdamLane wrote:

> The delta instructions from
> https://wiki.apache.org/solr/DataImportHandler#Using_delta-import_command
> works for me in solr 1.4 but crashes in 3.5.0 (error: "deltaQuery has no
> column to resolve to declared primary key pk='ITEM_ID, CATEGORY_ID'"  issue:
> https://issues.apache.org/jira/browse/SOLR-2907)
>
> Is there anyone out there that can confirm my bug?  Because I am new to solr
> and hopefully I am just doing something wrong based on a misunderstanding of
> the wiki.  Anyone successfully indexing the join of items and multiple
> item_categories just like the wiki example that would be willing to share
> their workaround or suggest a workaround?

I ran into something like this, possibly even this exact problem.

Things have been tightened up in 3.x.  All query results now need to
have a field corresponding to what you've defined as pk, or it's
considered an error.  I was not using the results from my deltaQuery,
but I still had to adjust it so that it returned a field with the same
name as my primary key.  You have defined more than one field for your
pk, so I don't really know exactly what you'll have to do - perhaps you
need to have both ITEM_ID and CATEGORY_ID fields in your query results.

Thanks,
Shawn

Reply | Threaded
Open this post in threaded view
|

Re: how to delta index linked entities in 3.5.0

AdamLane
Thanks for your thoughts Shawn.  I did notice 3.x tightened up alot and I did account for it by making sure I had pk defined and columns explicitly aliased with the same name (and I will make sure the bug text reflects that).

To help others that are having the same problem, I just found a thread describing a workaround using group_concat() in mysql and then transformer on solr.  So far this appears to work and also seems to delta around 10x faster.  The only disadvantage is that the delta index process doesn't tell you how many rows have changed.  It just says 1 row because you are hacking deltaQuery to return a single dummy row and making deltaImportQuery take in last_index_time and return all rows that have changed.

Quote:

The following (MySql) query concatenates 3 lang_code fields from the main
table into one field and multiple emails from a secondary table into
another field:
SELECT u.id,
       u.name,
       IF((u.lang_code1 IS NULL AND u.lang_code2 IS NULL AND
u.lang_code3 IS NULL), NULL,
           CONVERT(CONCAT_WS('|', u.lang_code1, u.lang_code2,
u.lang_code3) USING ascii)) AS multi_lang_codes,
       GROUP_CONCAT(e.email SEPARATOR '|') AS multiple_emails
FROM users_tb u
LEFT JOIN emails_tb e ON u.id = e.id
GROUP BY u.id

The entity in data-config.xml looks something like:
<entity name="my_entity"
        query="call get_solr_full();"
        transformer="RegexTransformer">
        <field name="email" column="multiple_emails" splitBy="\|" />
        <field name="lang_code" column="multiple_lang_codes"
splitBy="\|" />
</entity>

Full Thread:
https://mail-archives.apache.org/mod_mbox/lucene-solr-user/201008.mbox/%3C9F8B39CB3B7C6D4594293EA29CCF438B01702F22@ICQ-MAIL.icq.il.office.aol.com%3E

So until the bug is fixed or docs are changed I hope this helps someone else searching for this same error message.

Adam
Em
Reply | Threaded
Open this post in threaded view
|

Re: how to delta index linked entities in 3.5.0

Em
Hi Adam,

I made a quick review of the DIH-code in your exception and it seems
like it is not possible to resolve a multi-column PK at the moment.

Maybe I am wrong, but can anyone confirm to have problems with DIH,
whenever he uses a multi-column PK?

Kind regards,
Em

Am 18.02.2012 02:52, schrieb AdamLane:

> Thanks for your thoughts Shawn.  I did notice 3.x tightened up alot and I did
> account for it by making sure I had pk defined and columns explicitly
> aliased with the same name (and I will make sure the bug text reflects
> that).
>
> To help others that are having the same problem, I just found a thread
> describing a workaround using group_concat() in mysql and then transformer
> on solr.  So far this appears to work and also seems to delta around 10x
> faster.  The only disadvantage is that the delta index process doesn't tell
> you how many rows have changed.  It just says 1 row because you are hacking
> deltaQuery to return a single dummy row and making deltaImportQuery take in
> last_index_time and return all rows that have changed.
>
> Quote:
>
> The following (MySql) query concatenates 3 lang_code fields from the main
> table into one field and multiple emails from a secondary table into
> another field:
> SELECT u.id,
>        u.name,
>        IF((u.lang_code1 IS NULL AND u.lang_code2 IS NULL AND
> u.lang_code3 IS NULL), NULL,
>            CONVERT(CONCAT_WS('|', u.lang_code1, u.lang_code2,
> u.lang_code3) USING ascii)) AS multi_lang_codes,
>        GROUP_CONCAT(e.email SEPARATOR '|') AS multiple_emails
> FROM users_tb u
> LEFT JOIN emails_tb e ON u.id = e.id
> GROUP BY u.id
>
> The entity in data-config.xml looks something like:
> <entity name="my_entity"
>         query="call get_solr_full();"
>         transformer="RegexTransformer">
> <field name="email" column="multiple_emails" splitBy="\|" />
> <field name="lang_code" column="multiple_lang_codes"
> splitBy="\|" />
> </entity>
>
> Full Thread:
> https://mail-archives.apache.org/mod_mbox/lucene-solr-user/201008.mbox/%3C9F8B39CB3B7C6D4594293EA29CCF438B01702F22@...%3E
>
> So until the bug is fixed or docs are changed I hope this helps someone else
> searching for this same error message.
>
> Adam
>
>
> --
> View this message in context: http://lucene.472066.n3.nabble.com/how-to-delta-index-linked-entities-in-3-5-0-tp3752455p3755453.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>