Dataimporter status

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Dataimporter status

Mahmoud Almokadem
We're facing an issue related to the dataimporter status on new Admin UI
(7.0.1).

Calling to the API
http://solrip/solr/collection/dataimport?_=1512314812090&command=status&indent=on&wt=json


returns different status despite the importer is running
The messages are swapped between the following when refreshing the page:
{
  "responseHeader":{
    "status":0,
    "QTime":0},
  "initArgs":[
    "defaults",[
      "config","data-config-online-live-pervoice.xml"]],
  "command":"status",
  "status":"idle",
  "importResponse":"",
  "statusMessages":{}}

===============================
{
  "responseHeader":{
    "status":0,
    "QTime":0},
  "initArgs":[
    "defaults",[
      "config","data-config-online-live-pervoice.xml"]],
  "command":"status",
  "status":"idle",
  "importResponse":"",
  "statusMessages":{
    "Total Requests made to DataSource":"2",
    "Total Rows Fetched":"715",
    "Total Documents Processed":"679",
    "Total Documents Skipped":"0",
    "Full Dump Started":"2017-12-03 18:22:31",
    "":"Indexing completed. Added/Updated: 679 documents. Deleted 0
documents.",
    "Committed":"2017-12-03 18:22:32",
    "Total Documents Failed":"36",
    "Time taken":"0:0:54.638",
    "Full Import failed":"2017-12-03 18:22:32"}}

================================
The old Admin UI was working well.

Is that a bug on the new Admin UI?

Thanks,
Mahmoud
Reply | Threaded
Open this post in threaded view
|

Re: Dataimporter status

Shawn Heisey-2
On 12/3/2017 9:27 AM, Mahmoud Almokadem wrote:
> We're facing an issue related to the dataimporter status on new Admin UI
> (7.0.1).
>
> Calling to the API
> http://solrip/solr/collection/dataimport?_=1512314812090&command=status&indent=on&wt=json
>
> returns different status despite the importer is running
> The messages are swapped between the following when refreshing the page:

<snip>

> The old Admin UI was working well.
>
> Is that a bug on the new Admin UI?

What I'm going to say below is based on the idea that you're running
SolrCloud.  If you're not, then this seems extremely odd and should not
be happening.

The first part of your message has a URL that accesses the API directly,
*not* the admin UI, so I'm going to concentrate on that, and not discuss
the admin UI, because the admin UI is not involved when using that kind
of URL.

When requests are sent to a collection name rather than directly to a
core, SolrCloud load balances those requests across the cloud, picking
different replicas and shards so each individual request ends up on a
different core, and possibly on a different server.

This load balancing is a general feature of SolrCloud, and happens even
with the dataimport handler.  You never know which shard/replica is
going to actually get a /dataimport request.  So what is happening here
is that one of the cores in your collection is actually doing a
dataimport, but all the others aren't.  When the status command is load
balanced to the core that did the import, then you see the status with
actual data, and when load balancing sends the request to one of the
other cores, you see the empty status.

If you want to reliably see the status of an import on SolrCloud, you're
going to have to choose one of the cores (collection_shardN_replicaM) on
one of the servers in your cloud, and send both the import command and
the status command to that one core, instead of the collection.  You
might even need to add a distrib=false parameter to the request to keep
it from being load balanced, but I am not sure whether that's needed for
/dataimport.

Thanks,
Shawn
Reply | Threaded
Open this post in threaded view
|

Re: Dataimporter status

Mahmoud Almokadem
Thanks Shawn,

I'm already using the admin UI and get URL for fetching the status of
dataimporter from network console and tried it outside the admin UI. Admin
UI have the same behavior,  when I pressed on execute the status messages
are swapped between "not started", "started and indexing", "completed on 3
seconds", "completed on 10 seconds" something like that.

I understood what you mean that the dataimporter are load balanced between
shards, that's made me using the old admin UI on using dataimporter to get
accurate status of what is running now. Because the it's related to core
not collection.

I think the dataimporter feature must moved to the core level instead of
collection level.

Thanks,
Mahmoud


On Tue, Dec 5, 2017 at 6:57 AM, Shawn Heisey <[hidden email]> wrote:

> On 12/3/2017 9:27 AM, Mahmoud Almokadem wrote:
>
>> We're facing an issue related to the dataimporter status on new Admin UI
>> (7.0.1).
>>
>> Calling to the API
>> http://solrip/solr/collection/dataimport?_=1512314812090&com
>> mand=status&indent=on&wt=json
>>
>> returns different status despite the importer is running
>> The messages are swapped between the following when refreshing the page:
>>
>
> <snip>
>
> The old Admin UI was working well.
>>
>> Is that a bug on the new Admin UI?
>>
>
> What I'm going to say below is based on the idea that you're running
> SolrCloud.  If you're not, then this seems extremely odd and should not be
> happening.
>
> The first part of your message has a URL that accesses the API directly,
> *not* the admin UI, so I'm going to concentrate on that, and not discuss
> the admin UI, because the admin UI is not involved when using that kind of
> URL.
>
> When requests are sent to a collection name rather than directly to a
> core, SolrCloud load balances those requests across the cloud, picking
> different replicas and shards so each individual request ends up on a
> different core, and possibly on a different server.
>
> This load balancing is a general feature of SolrCloud, and happens even
> with the dataimport handler.  You never know which shard/replica is going
> to actually get a /dataimport request.  So what is happening here is that
> one of the cores in your collection is actually doing a dataimport, but all
> the others aren't.  When the status command is load balanced to the core
> that did the import, then you see the status with actual data, and when
> load balancing sends the request to one of the other cores, you see the
> empty status.
>
> If you want to reliably see the status of an import on SolrCloud, you're
> going to have to choose one of the cores (collection_shardN_replicaM) on
> one of the servers in your cloud, and send both the import command and the
> status command to that one core, instead of the collection.  You might even
> need to add a distrib=false parameter to the request to keep it from being
> load balanced, but I am not sure whether that's needed for /dataimport.
>
> Thanks,
> Shawn
>
Reply | Threaded
Open this post in threaded view
|

Re: Dataimporter status

Shawn Heisey-2
On 12/6/2017 1:38 AM, Mahmoud Almokadem wrote:

> I'm already using the admin UI and get URL for fetching the status of
> dataimporter from network console and tried it outside the admin UI. Admin
> UI have the same behavior,  when I pressed on execute the status messages
> are swapped between "not started", "started and indexing", "completed on 3
> seconds", "completed on 10 seconds" something like that.
>
> I understood what you mean that the dataimporter are load balanced between
> shards, that's made me using the old admin UI on using dataimporter to get
> accurate status of what is running now. Because the it's related to core
> not collection.
>
> I think the dataimporter feature must moved to the core level instead of
> collection level.

For production usage, you should be using the API directly, not the
admin UI.

In version 7, the old UI is no longer available.  Moving dataimport back
to the core level in the admin UI is an interesting idea that would make
the problem less likely, though a good fix for SOLR-3666 would be
better.  Any committers want to comment?

Whether it's the admin UI or the API, if you access the DIH handler
through the collection instead of a core, you're going to see this behavior.

Thanks,
Shawn