[solr-solrcloud] How does DIH work when there are multiple nodes?

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

[solr-solrcloud] How does DIH work when there are multiple nodes?

유정인
Hi

solrcloud Configured on 3 nodes.

DIH is used for collecting / indexing, and each node has the same DIH. The
DIH is executed at a fixed interval each time.

 

Then there is the question here.

Are you running on 3 nodes simultaneously?

Or is it only a leader?

 

And how do you know the leader?

 

I am wondering how DIH works in solrcloud configuration.

Reply | Threaded
Open this post in threaded view
|

Re: [solr-solrcloud] How does DIH work when there are multiple nodes?

Doss
Hi,

I am assuming you are having the same index replicated in all 3 nodes, then
doing a full index/ delta index using DIH in one node will replicate the
data to other nodes, so no need to do it in all 3 nodes. Hope this helps!

Best,
Doss.



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
Reply | Threaded
Open this post in threaded view
|

RE: [solr-solrcloud] How does DIH work when there are multiple nodes?

유정인
Hi

Did you tell me how to call one node directly?

Are you saying that one of the three nodes is automatically run?

I would like to know how one of the three nodes is automatically performed.

-----Original Message-----
From: Doss <[hidden email]>
Sent: Friday, January 04, 2019 3:38 PM
To: [hidden email]
Subject: Re: [solr-solrcloud] How does DIH work when there are multiple
nodes?

Hi,

I am assuming you are having the same index replicated in all 3 nodes, then
doing a full index/ delta index using DIH in one node will replicate the
data to other nodes, so no need to do it in all 3 nodes. Hope this helps!

Best,
Doss.



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Reply | Threaded
Open this post in threaded view
|

RE: [solr-solrcloud] How does DIH work when there are multiple nodes?

Doss
Hi,

The data import process will not happen automatically, we have to do it
manually through the admin interface or by calling the URL

https://lucene.apache.org/solr/guide/7_5/uploading-structured-data-store-data-with-the-data-import-handler.html

Full Import:

http://node1ip:8983/solr/yourindexname/dataimport?command=full-import&commit=true

Delta Import:

http://node1ip:8983/solr/yourindexname/dataimport?command=delta-import&commit=true


If you want to do the delta import automatically you can setup a cron
(linux) which can call the URL periodically.

Best,
Doss.




--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
Reply | Threaded
Open this post in threaded view
|

RE: [solr-solrcloud] How does DIH work when there are multiple nodes?

유정인
Hi

The reader was looking for a way to do 'DIH' automatically.

The reason was for HA configuration.

Thank you for answer.

If you know how, please reply.
-----Original Message-----
From: Doss <[hidden email]>
Sent: Friday, January 04, 2019 3:59 PM
To: [hidden email]
Subject: RE: [solr-solrcloud] How does DIH work when there are multiple
nodes?

Hi,

The data import process will not happen automatically, we have to do it
manually through the admin interface or by calling the URL

https://lucene.apache.org/solr/guide/7_5/uploading-structured-data-store-
data-with-the-data-import-handler.html

Full Import:

http://node1ip:8983/solr/yourindexname/dataimport?command=full-
import&commit=true

Delta Import:

http://node1ip:8983/solr/yourindexname/dataimport?command=delta-
import&commit=true


If you want to do the delta import automatically you can setup a cron
(linux) which can call the URL periodically.

Best,
Doss.




--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Reply | Threaded
Open this post in threaded view
|

Re: [solr-solrcloud] How does DIH work when there are multiple nodes?

Shawn Heisey
On 1/4/2019 1:04 AM, 유정인 wrote:
> The reader was looking for a way to do 'DIH' automatically.
>
> The reason was for HA configuration.

If you send a DIH request to the collection (as opposed to a specific
core), that request will be load balanced across the cloud.  You won't
know which replica/core actually handles it. This means that an import
command may be handled by a different host than a status command.  In
that situation, the status command will not know about the import,
because it will be running on a different Solr core.

When doing DIH on SolrCloud, you should send your requests directly to a
specific core on a specific node.  It's the only way to be sure what's
happening.  High availability would have to be handled in your application.

Thanks,
Shawn

Reply | Threaded
Open this post in threaded view
|

RE: [solr-solrcloud] How does DIH work when there are multiple nodes?

Davis, Daniel (NIH/NLM) [C]
DIH is also not designed to multi-thread very well.   One way I've handled this is to have a DIH XML that breaks-up a database query into multiple processes by taking the modulo of a row, as follows:

    <entity name="medsite" dataSource="oltp01_prod"
            rootEntity="true"
            query="SELECT * FROM (SELECT t.*, mod(RowNum, 4) threadid FROM your_table t) WHERE threadid = 0"
            transformer="TemplateTransformer,LogTransformer"
            logTemplate="topic thread 0" logLevel="debug">

This allows me to do sub-queries within the entity, but it is often better to just write a small program to get this data from the database, and ETL processors such as Pentaho DI (Kettle) and Talend DI do this quite well.

If you can express what you want in a database view, even a complicated one, then your best way to get it into Solr IMO is to use logstash with the jdbc input plugin.   It can do some transformation, but you'll need your database view to process the data.

> -----Original Message-----
> From: Shawn Heisey <[hidden email]>
> Sent: Friday, January 4, 2019 12:25 PM
> To: [hidden email]
> Subject: Re: [solr-solrcloud] How does DIH work when there are multiple
> nodes?
>
> On 1/4/2019 1:04 AM, 유정인 wrote:
> > The reader was looking for a way to do 'DIH' automatically.
> >
> > The reason was for HA configuration.
>
> If you send a DIH request to the collection (as opposed to a specific
> core), that request will be load balanced across the cloud.  You won't
> know which replica/core actually handles it. This means that an import
> command may be handled by a different host than a status command.  In
> that situation, the status command will not know about the import,
> because it will be running on a different Solr core.
>
> When doing DIH on SolrCloud, you should send your requests directly to a
> specific core on a specific node.  It's the only way to be sure what's
> happening.  High availability would have to be handled in your application.
>
> Thanks,
> Shawn