Delta-import with solrj client

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Delta-import with solrj client

Hando420
Greetings. I have a solrj client for fetching data from database. I am using delta-import for fetching data. If a column is changed in database using timestamp with delta-import i get the latest column indexed but there are duplicate values in the index similar to the column but the data is older. This works with cleaning the index but i want to update the index without cleaning it. Is there a way to just update the index with the updated column without having duplicate values. Appreciate for any feedback.

Hando
Reply | Threaded
Open this post in threaded view
|

Re: Delta-import with solrj client

kenf_nc
Short answer is no, there isn't a way. Solr doesn't have the concept of 'Update' to an indexed document. You need to add the full document (all 'columns') each time any one field changes. If doing that in your DataImportHandler logic is difficult you may need to write a separate Update Service that does:

1) Read UniqueID, UpdatedColumn(s)  from database
2) Using UniqueID Retrieve document from Solr
3) Add/Update field(s) with updated column(s)
4) Add document back to Solr

Although, if you use DIH to do a full import, using the same query in your Delta-Import to get the whole document shouldn't be that difficult.
Reply | Threaded
Open this post in threaded view
|

Re: Delta-import with solrj client

Jan Høydahl / Cominvent
In reply to this post by Hando420
Hi,

Make sure you use a proper "ID" field, which does *not* change even if the content in the database changes. In this way, when your delta-import fetches changed rows to index, they will update the existing rows in your index.

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com
Training in Europe - www.solrtraining.com

On 11. aug. 2010, at 12.49, Hando420 wrote:

>
> Greetings. I have a solrj client for fetching data from database. I am using
> delta-import for fetching data. If a column is changed in database using
> timestamp with delta-import i get the latest column indexed but there are
> duplicate values in the index similar to the column but the data is older.
> This works with cleaning the index but i want to update the index without
> cleaning it. Is there a way to just update the index with the updated column
> without having duplicate values. Appreciate for any feedback.
>
> Hando
> --
> View this message in context: http://lucene.472066.n3.nabble.com/Delta-import-with-solrj-client-tp1085763p1085763.html
> Sent from the Solr - User mailing list archive at Nabble.com.