[jira] [Commented] (LUCENE-7792) Add optional concurrency to OfflineSorter

Previous Topic Next Topic
classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view

[jira] [Commented] (LUCENE-7792) Add optional concurrency to OfflineSorter

JIRA jira@apache.org

    [ https://issues.apache.org/jira/browse/LUCENE-7792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15978202#comment-15978202 ]

Dawid Weiss commented on LUCENE-7792:

I'll file a separate issue for reThrow.

As for executor service -- we can do this as a separate issue. There are multiple ways to implement this:
- immediate (executorservice computes the result on the same thread in submit/execute; returns value-ready future),
- lazy (executorservice returns a future that lazily computes the result on the get-calling thread).

There is some bookkeeping if you want to be super exact (associated with termination status), but otherwise option 1 is fairly simple:
    ExecutorService service = new AbstractExecutorService() {
      private volatile boolean shutdown;

      public void execute(Runnable command) {

      public List<Runnable> shutdownNow() {
        return Collections.emptyList();

      public void shutdown() {
        this.shutdown = true;
      public boolean isTerminated() {
        // Simplified: we don't check for any threads hanging in execute (we could
        // introduce an atomic counter, but there seems to be no point).
        return shutdown == true;
      public boolean isShutdown() {
        return shutdown == true;

      public boolean awaitTermination(long timeout, TimeUnit unit) throws InterruptedException {
        // See comment in isTerminated();
        return true;

      private void checkShutdown() {
        if (shutdown) {
          throw new RejectedExecutionException("Executor is shut down.");
and is pretty much what you had in if/else.

> Add optional concurrency to OfflineSorter
> -----------------------------------------
>                 Key: LUCENE-7792
>                 URL: https://issues.apache.org/jira/browse/LUCENE-7792
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>             Fix For: master (7.0), 6.6
>         Attachments: LUCENE-7792.patch
> OfflineSorter is a heavy operation and is really an embarrassingly concurrent problem at heart, and if you have enough hardware concurrency (e.g. fast SSDs, multiple CPU cores) it can be a big speedup.
> E.g., after reading a partition from the input, one thread can sort and write it, while another thread reads the next partition, etc.  Merging partitions can also be done in the background.  Some things still cannot be concurrent, e.g. the initial read from the input must be a single thread, as well as the final merge and writing to the final output.
> I think I found a fairly non-invasive way to add optional concurrency to this class, by adding an optional ExecutorService to OfflineSorter's ctor (similar to IndexSearcher) and using futures to represent each partition as we sort, and creating Callable classes for sorting and merging partitions.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]