[jira] [Commented] (SOLR-9961) RestoreCore needs the option to download files in parallel.

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view

[jira] [Commented] (SOLR-9961) RestoreCore needs the option to download files in parallel.

Tim Allison (Jira)

    [ https://issues.apache.org/jira/browse/SOLR-9961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16876525#comment-16876525 ]

Mikhail Khludnev commented on SOLR-9961:

Design would be:
* {{BackupRepositoryFactory}} holds shared thread pool
* thread pool is injected into created {{BackupRepository}} optionally
* Restore (Backup) operation(s) uses dedicated operation {{listAll(path, lambda)}} or {{forEach(list/file, lambda)}}
* Repoes, which accepted thread pool, invoke the lambda in threads
* Lambda accepts a repository delegate and expected to operate with it. This delegate reuses HDFS and close/release it after it's done.

> RestoreCore needs the option to download files in parallel.
> -----------------------------------------------------------
>                 Key: SOLR-9961
>                 URL: https://issues.apache.org/jira/browse/SOLR-9961
>             Project: Solr
>          Issue Type: Improvement
>          Components: Backup/Restore
>    Affects Versions: 6.2.1
>            Reporter: Timothy Potter
>            Priority: Major
>         Attachments: SOLR-9961.patch, SOLR-9961.patch, SOLR-9961.patch, SOLR-9961.patch
> My backup to cloud storage (Google cloud storage in this case, but I think this is a general problem) takes 8 minutes ... the restore of the same core takes hours. The restore loop in RestoreCore is serial and doesn't allow me to parallelize the expensive part of this operation (the IO from the remote cloud storage service). We need the option to parallelize the download (like distcp).
> Also, I tried downloading the same directory using gsutil and it was very fast, like 2 minutes. So I know it's not the pipe that's limiting perf here.
> Here's a very rough patch that does the parallelization. We may also want to consider a two-step approach: 1) download in parallel to a temp dir, 2) perform all the of the checksum validation against the local temp dir. That will save round trips to the remote cloud storage.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]