Does DIH queues up requests

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Does DIH queues up requests

Nkeet Shah
Hi,
I have a multi-thread application that makes DIH request to perform indexing. What I could not gather from the documentation is that does DIH requests are queued up.

In essence if a made a request to say DIH1 and it has accepted the request and is working on the indexing. What would happen if another request is made to the same DIH1. Will it be queued or rejected/

Thanks
Ankit!

Reply | Threaded
Open this post in threaded view
|

Re: Does DIH queues up requests

Billnbell
What we do is :

Run URL to delete *:*, but do not commit.

1. Kick off indexing on DIH1, clean=false, commit=false.
2. Kick off indexing on DIH2, clean=false, commit=false

Then we manually commit.

On Wed, Jan 25, 2017 at 2:57 PM, Nkeet Shah <[hidden email]>
wrote:

> Hi,
> I have a multi-thread application that makes DIH request to perform
> indexing. What I could not gather from the documentation is that does DIH
> requests are queued up.
>
> In essence if a made a request to say DIH1 and it has accepted the request
> and is working on the indexing. What would happen if another request is
> made to the same DIH1. Will it be queued or rejected/
>
> Thanks
> Ankit!
>
>


--
Bill Bell
[hidden email]
cell 720-256-8076
Reply | Threaded
Open this post in threaded view
|

RE: Does DIH queues up requests

Davis, Daniel (NIH/NLM) [C]
DIH is not multi-threaded, and so the idea of "queueing" up requests is a misnomer.   You might be better off using something other than DataImportHandler.
LogStash can pull what it calls "events" from a database and then push them into Solr, and you have some of the same row transformation capabilities that DataImportHandler has.

This is also the bread and butter of ETL tools such as Kettle/Talend/MuleSoft/etc.

That said, what I have done in the past is to take different streams of data and divide them into different requestHandlers, all using DataImportHandler.
Each of these request handlers has its own context as to whether it is busy or not, and so each can be separately active/inactive.

  <!-- Data Import Handler for Health Topics -->
  <requestHandler name="/import/health-topics" class="solr.DataImportHandler">
    <lst name="defaults">
      <str name="config">health-topics-conf.xml</str>
    </lst>
  </requestHandler>

  <!-- Data Import Handler for Drugs and Supplements -->
  <requestHandler name="/import/drugs" class="solr.DataImportHandler">
    <lst name="defaults">
      <str name="config">drugs-conf.xml</str>
    </lst>
  </requestHandler>


Both of the above or XML imports, but with database imports, I also one-time implemented a sort of multithreading by having 4 request handlers and 4 data-config files, each taking their own slice of data:
 
data-config-0.xml
    ...
    <entity name="medsite" dataSource="proddb" rootEntity="true"
            query="SELECT * FROM (SELECT t.*, Mod(RowNum, 4) threadid FROM my_data_view t) WHERE threadid = 0"
            transformer="TemplateTransformer,LogTransformer"
            logTemplate="topic thread 0">
    ...

data-config-1.xml:
    ...
    <entity name="medsite" dataSource="proddb" rootEntity="true"
            query="SELECT * FROM (SELECT t.*, Mod(RowNum, 4) threadid FROM my_data_view t) WHERE threadid = 1"
            transformer="TemplateTransformer,LogTransformer"
            logTemplate="topic thread 1" logLevel="debug">
    ...

And so on...

-----Original Message-----
From: William Bell [mailto:[hidden email]]
Sent: Wednesday, January 25, 2017 5:39 PM
To: [hidden email]
Subject: Re: Does DIH queues up requests

What we do is :

Run URL to delete *:*, but do not commit.

1. Kick off indexing on DIH1, clean=false, commit=false.
2. Kick off indexing on DIH2, clean=false, commit=false

Then we manually commit.

On Wed, Jan 25, 2017 at 2:57 PM, Nkeet Shah <[hidden email]>
wrote:

> Hi,
> I have a multi-thread application that makes DIH request to perform
> indexing. What I could not gather from the documentation is that does
> DIH requests are queued up.
>
> In essence if a made a request to say DIH1 and it has accepted the
> request and is working on the indexing. What would happen if another
> request is made to the same DIH1. Will it be queued or rejected/
>
> Thanks
> Ankit!
>
>


--
Bill Bell
[hidden email]
cell 720-256-8076
Reply | Threaded
Open this post in threaded view
|

Re: Does DIH queues up requests

Billnbell
However you can create multiple DIH configs under a core/collection. You
can run them each in parallel and commit at the end.

SELECT *
 FROM existingtable
 WHERE column >= 1 AND column <= 2000;
SELECT *
 FROM existingtable
 WHERE column >= 2001 AND column <= 4000;


Something like that works for us to speed it up.

On Wed, Jan 25, 2017 at 4:01 PM, Davis, Daniel (NIH/NLM) [C] <
[hidden email]> wrote:

> DIH is not multi-threaded, and so the idea of "queueing" up requests is a
> misnomer.   You might be better off using something other than
> DataImportHandler.
> LogStash can pull what it calls "events" from a database and then push
> them into Solr, and you have some of the same row transformation
> capabilities that DataImportHandler has.
>
> This is also the bread and butter of ETL tools such as
> Kettle/Talend/MuleSoft/etc.
>
> That said, what I have done in the past is to take different streams of
> data and divide them into different requestHandlers, all using
> DataImportHandler.
> Each of these request handlers has its own context as to whether it is
> busy or not, and so each can be separately active/inactive.
>
>   <!-- Data Import Handler for Health Topics -->
>   <requestHandler name="/import/health-topics"
> class="solr.DataImportHandler">
>     <lst name="defaults">
>       <str name="config">health-topics-conf.xml</str>
>     </lst>
>   </requestHandler>
>
>   <!-- Data Import Handler for Drugs and Supplements -->
>   <requestHandler name="/import/drugs" class="solr.DataImportHandler">
>     <lst name="defaults">
>       <str name="config">drugs-conf.xml</str>
>     </lst>
>   </requestHandler>
>
>
> Both of the above or XML imports, but with database imports, I also
> one-time implemented a sort of multithreading by having 4 request handlers
> and 4 data-config files, each taking their own slice of data:
>
> data-config-0.xml
>     ...
>     <entity name="medsite" dataSource="proddb" rootEntity="true"
>             query="SELECT * FROM (SELECT t.*, Mod(RowNum, 4) threadid FROM
> my_data_view t) WHERE threadid = 0"
>             transformer="TemplateTransformer,LogTransformer"
>             logTemplate="topic thread 0">
>     ...
>
> data-config-1.xml:
>     ...
>     <entity name="medsite" dataSource="proddb" rootEntity="true"
>             query="SELECT * FROM (SELECT t.*, Mod(RowNum, 4) threadid FROM
> my_data_view t) WHERE threadid = 1"
>             transformer="TemplateTransformer,LogTransformer"
>             logTemplate="topic thread 1" logLevel="debug">
>     ...
>
> And so on...
>
> -----Original Message-----
> From: William Bell [mailto:[hidden email]]
> Sent: Wednesday, January 25, 2017 5:39 PM
> To: [hidden email]
> Subject: Re: Does DIH queues up requests
>
> What we do is :
>
> Run URL to delete *:*, but do not commit.
>
> 1. Kick off indexing on DIH1, clean=false, commit=false.
> 2. Kick off indexing on DIH2, clean=false, commit=false
>
> Then we manually commit.
>
> On Wed, Jan 25, 2017 at 2:57 PM, Nkeet Shah <[hidden email]>
> wrote:
>
> > Hi,
> > I have a multi-thread application that makes DIH request to perform
> > indexing. What I could not gather from the documentation is that does
> > DIH requests are queued up.
> >
> > In essence if a made a request to say DIH1 and it has accepted the
> > request and is working on the indexing. What would happen if another
> > request is made to the same DIH1. Will it be queued or rejected/
> >
> > Thanks
> > Ankit!
> >
> >
>
>
> --
> Bill Bell
> [hidden email]
> cell 720-256-8076
>



--
Bill Bell
[hidden email]
cell 720-256-8076