[jira] Created: (LUCENE-997) Add search timeout support to Lucene

classic Classic list List threaded Threaded
63 messages Options
1234
Reply | Threaded
Open this post in threaded view
|

[jira] Created: (LUCENE-997) Add search timeout support to Lucene

ASF GitHub Bot (Jira)
Add search timeout support to Lucene
------------------------------------

                 Key: LUCENE-997
                 URL: https://issues.apache.org/jira/browse/LUCENE-997
             Project: Lucene - Java
          Issue Type: New Feature
            Reporter: Sean Timm
            Priority: Minor


This patch is based on Nutch-308.

This patch adds support for a maximum search time limit. After this time is exceeded, the search thread is stopped, partial results (if any) are returned and the total number of results is estimated.

This patch tries to minimize the overhead related to time-keeping by using a version of safe unsynchronized timer.

This was also discussed in an e-mail thread.
http://www.nabble.com/search-timeout-tf3410206.html#a9501029

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[jira] Updated: (LUCENE-997) Add search timeout support to Lucene

ASF GitHub Bot (Jira)

     [ https://issues.apache.org/jira/browse/LUCENE-997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sean Timm updated LUCENE-997:
-----------------------------

    Attachment: timeout.patch

Patch against trunk revision 575451.

> Add search timeout support to Lucene
> ------------------------------------
>
>                 Key: LUCENE-997
>                 URL: https://issues.apache.org/jira/browse/LUCENE-997
>             Project: Lucene - Java
>          Issue Type: New Feature
>            Reporter: Sean Timm
>            Priority: Minor
>         Attachments: timeout.patch
>
>
> This patch is based on Nutch-308.
> This patch adds support for a maximum search time limit. After this time is exceeded, the search thread is stopped, partial results (if any) are returned and the total number of results is estimated.
> This patch tries to minimize the overhead related to time-keeping by using a version of safe unsynchronized timer.
> This was also discussed in an e-mail thread.
> http://www.nabble.com/search-timeout-tf3410206.html#a9501029

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[jira] Updated: (LUCENE-997) Add search timeout support to Lucene

ASF GitHub Bot (Jira)
In reply to this post by ASF GitHub Bot (Jira)

     [ https://issues.apache.org/jira/browse/LUCENE-997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sean Timm updated LUCENE-997:
-----------------------------

    Attachment: LuceneTimeoutTest.java

Simple test case.  Run by passing in the index directory as an argument.

> Add search timeout support to Lucene
> ------------------------------------
>
>                 Key: LUCENE-997
>                 URL: https://issues.apache.org/jira/browse/LUCENE-997
>             Project: Lucene - Java
>          Issue Type: New Feature
>            Reporter: Sean Timm
>            Priority: Minor
>         Attachments: LuceneTimeoutTest.java, timeout.patch
>
>
> This patch is based on Nutch-308.
> This patch adds support for a maximum search time limit. After this time is exceeded, the search thread is stopped, partial results (if any) are returned and the total number of results is estimated.
> This patch tries to minimize the overhead related to time-keeping by using a version of safe unsynchronized timer.
> This was also discussed in an e-mail thread.
> http://www.nabble.com/search-timeout-tf3410206.html#a9501029

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (LUCENE-997) Add search timeout support to Lucene

ASF GitHub Bot (Jira)
In reply to this post by ASF GitHub Bot (Jira)

    [ https://issues.apache.org/jira/browse/LUCENE-997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12527264 ]

Sean Timm commented on LUCENE-997:
----------------------------------

Here are some additional details on the changes.

New files:
TimeLimitedCollector.java

    Extends HitCollector and detects timeouts resulting in a TimeLimitedCollector.TimeExceeded exception being thrown.

TimerThread.java

    TimerThread provides a pseudo-clock service to all searching threads, so that they can count elapsed time with less overhead than repeatedly calling System.currentTimeMillis.  A single thread should be created to be used for all searches.

Modified Files:
Hits.java

    Added partial result flag.

IndexSearcher.java

    Catches TimeLimitedCollector.TimeExceeded, sets partial results flag on TopDocs and estimates the total hit count (if we hadn't timed out partway through).  Returns TopDocs with partial results.

Searcher.java

    Added methods to set and get the timeout parameters.  This implementation decision has the limitation of only permitting a single timeout value per Searcher instance (of which there is usually only one).  However, this greatly minimizes the number of search methods that would need to be added.  In practice, I have not needed the functionality to change the timeout settings on a per query basis.

TopFieldDocCollector.java

    Uses TimeLimitedCollector functionality.

TopDocCollector.java

    Uses TimeLimitedCollector functionality and exposes it to child class TopFieldDocCollector.

TopDocs.java

    Added partial results flag.  Note, TopFieldDocs extends this class and inherits the new functionality.

> Add search timeout support to Lucene
> ------------------------------------
>
>                 Key: LUCENE-997
>                 URL: https://issues.apache.org/jira/browse/LUCENE-997
>             Project: Lucene - Java
>          Issue Type: New Feature
>            Reporter: Sean Timm
>            Priority: Minor
>         Attachments: LuceneTimeoutTest.java, timeout.patch
>
>
> This patch is based on Nutch-308.
> This patch adds support for a maximum search time limit. After this time is exceeded, the search thread is stopped, partial results (if any) are returned and the total number of results is estimated.
> This patch tries to minimize the overhead related to time-keeping by using a version of safe unsynchronized timer.
> This was also discussed in an e-mail thread.
> http://www.nabble.com/search-timeout-tf3410206.html#a9501029

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (LUCENE-997) Add search timeout support to Lucene

ASF GitHub Bot (Jira)
In reply to this post by ASF GitHub Bot (Jira)

    [ https://issues.apache.org/jira/browse/LUCENE-997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12527605 ]

Daniel Naber commented on LUCENE-997:
-------------------------------------

Thanks for the patch. I didn't have a very close look, just one small thing: it's probably no good idea to catch and ignore the InterruptedException. See http://www-128.ibm.com/developerworks/java/library/j-jtp05236.html

> Add search timeout support to Lucene
> ------------------------------------
>
>                 Key: LUCENE-997
>                 URL: https://issues.apache.org/jira/browse/LUCENE-997
>             Project: Lucene - Java
>          Issue Type: New Feature
>            Reporter: Sean Timm
>            Priority: Minor
>         Attachments: LuceneTimeoutTest.java, timeout.patch
>
>
> This patch is based on Nutch-308.
> This patch adds support for a maximum search time limit. After this time is exceeded, the search thread is stopped, partial results (if any) are returned and the total number of results is estimated.
> This patch tries to minimize the overhead related to time-keeping by using a version of safe unsynchronized timer.
> This was also discussed in an e-mail thread.
> http://www.nabble.com/search-timeout-tf3410206.html#a9501029

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (LUCENE-997) Add search timeout support to Lucene

ASF GitHub Bot (Jira)
In reply to this post by ASF GitHub Bot (Jira)

    [ https://issues.apache.org/jira/browse/LUCENE-997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12527666 ]

Hoss Man commented on LUCENE-997:
---------------------------------

I'm not entirely convinced it makes sense to modify these classes to include timeouts as core functionality ... would it make more sense to deal with this in subclasses of IndexSearcher/TopDocs/Hits ?

> Add search timeout support to Lucene
> ------------------------------------
>
>                 Key: LUCENE-997
>                 URL: https://issues.apache.org/jira/browse/LUCENE-997
>             Project: Lucene - Java
>          Issue Type: New Feature
>            Reporter: Sean Timm
>            Priority: Minor
>         Attachments: LuceneTimeoutTest.java, timeout.patch
>
>
> This patch is based on Nutch-308.
> This patch adds support for a maximum search time limit. After this time is exceeded, the search thread is stopped, partial results (if any) are returned and the total number of results is estimated.
> This patch tries to minimize the overhead related to time-keeping by using a version of safe unsynchronized timer.
> This was also discussed in an e-mail thread.
> http://www.nabble.com/search-timeout-tf3410206.html#a9501029

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[jira] Updated: (LUCENE-997) Add search timeout support to Lucene

ASF GitHub Bot (Jira)
In reply to this post by ASF GitHub Bot (Jira)

     [ https://issues.apache.org/jira/browse/LUCENE-997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sean Timm updated LUCENE-997:
-----------------------------

    Attachment: timeout.patch

http://www-128.ibm.com/developerworks/java/library/j-jtp05236.html

TimerThread Now follows Brian Goetz's best practice for a noncancelable task that restores interrupted status before returning rather than ignoring the InterruptedException.

> Add search timeout support to Lucene
> ------------------------------------
>
>                 Key: LUCENE-997
>                 URL: https://issues.apache.org/jira/browse/LUCENE-997
>             Project: Lucene - Java
>          Issue Type: New Feature
>            Reporter: Sean Timm
>            Priority: Minor
>         Attachments: LuceneTimeoutTest.java, timeout.patch, timeout.patch
>
>
> This patch is based on Nutch-308.
> This patch adds support for a maximum search time limit. After this time is exceeded, the search thread is stopped, partial results (if any) are returned and the total number of results is estimated.
> This patch tries to minimize the overhead related to time-keeping by using a version of safe unsynchronized timer.
> This was also discussed in an e-mail thread.
> http://www.nabble.com/search-timeout-tf3410206.html#a9501029

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[jira] Updated: (LUCENE-997) Add search timeout support to Lucene

ASF GitHub Bot (Jira)
In reply to this post by ASF GitHub Bot (Jira)

     [ https://issues.apache.org/jira/browse/LUCENE-997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sean Timm updated LUCENE-997:
-----------------------------

    Attachment:     (was: timeout.patch)

> Add search timeout support to Lucene
> ------------------------------------
>
>                 Key: LUCENE-997
>                 URL: https://issues.apache.org/jira/browse/LUCENE-997
>             Project: Lucene - Java
>          Issue Type: New Feature
>            Reporter: Sean Timm
>            Priority: Minor
>         Attachments: LuceneTimeoutTest.java, timeout.patch
>
>
> This patch is based on Nutch-308.
> This patch adds support for a maximum search time limit. After this time is exceeded, the search thread is stopped, partial results (if any) are returned and the total number of results is estimated.
> This patch tries to minimize the overhead related to time-keeping by using a version of safe unsynchronized timer.
> This was also discussed in an e-mail thread.
> http://www.nabble.com/search-timeout-tf3410206.html#a9501029

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [jira] Commented: (LUCENE-997) Add search timeout support to Lucene

Roy Ward
In reply to this post by ASF GitHub Bot (Jira)
Hoss Man commented on LUCENE-997:
> I'm not entirely convinced it makes sense to modify these classes to include timeouts
> as core functionality ... would it make more sense to deal with this in subclasses of
> IndexSearcher/TopDocs/Hits ?

I like the idea of timeouts as core functionality, as it makes it much easier to deal with things like partial results.

I do have some thoughts on the patch though:

(1) You only added timeouts to:

  public TopDocs search(Weight weight, Filter filter, final int nDocs)

It's confusing if timeout functionality is not also added to:

  public TopFieldDocs search(Weight weight, Filter filter, final int nDocs,  Sort sort)

(2) Estimating the the number of results is a good idea, however it breaks some of the code in Hits.java when the Vector of results is not as long as expected. This either needs more work or just returning the number or results actually found. Perhaps a separate method for getting the estimate in the case of partial results would be the way to go.

(3) The timer, consisting of a whole lot of millisecond pauses (if the resolution is 1) is not accurate (certainly under load). There needs to be at least an occasional call to an accurate timer. It would also be better to replace getCounter() by something like getMilliseconds() so the caller does not need to know the resolution of the timer.
Reply | Threaded
Open this post in threaded view
|

Re: [jira] Commented: (LUCENE-997) Add search timeout support to Lucene

Sean Timm
Roy,

Thanks for the review and comments.  My comments inline below.

Roy Ward wrote:
> (1) You only added timeouts to:
>
>   public TopDocs search(Weight weight, Filter filter, final int nDocs)
>
> It's confusing if timeout functionality is not also added to:
>
>   public TopFieldDocs search(Weight weight, Filter filter, final int nDocs,
> Sort sort)
>  
Good catch.  That was an oversight.  The necessary changes were made to
TopFieldDocCollector.java, but you are right the changes to

  public TopFieldDocs search(Weight weight, Filter filter, final int nDocs,
Sort sort)

were not in the patch.  I have this fixed locally and will submit a
patch shortly.
> (2) Estimating the the number of results is a good idea, however it breaks
> some of the code in Hits.java when the Vector of results is not as long as
> expected. This either needs more work or just returning the number or
> results actually found. Perhaps a separate method for getting the estimate
> in the case of partial results would be the way to go.
>  
Is there a test case that shows this breakage, or can you point me to
the code in Hits.java that my patch causes problems with?  Sorry, I'm
not seeing it.
> (3) The timer, consisting of a whole lot of millisecond pauses (if the
> resolution is 1) is not accurate (certainly under load). There needs to be
> at least an occasional call to an accurate timer. It would also be better to
> replace getCounter() by something like getMilliseconds() so the caller does
> not need to know the resolution of the timer.
I wouldn't expect anyone to actually use a 1 ms resolution.  That is in
the provided test case simply because it almost guarantees a timeout
occurs.  The accuracy of the timeout as long as it is reasonably close
isn't terribly important.  The typical use case as I see it would be to
preempt the occasional (< 1%)  queries that take an unreasonable amount
of time to complete.  For example the timer may be configured for 10
counts of 100 ms (1 second).  If that isn't preempted until 1.1 seconds
have elapsed, I think my operations team will still be happy.

I do like your suggestion of getMilliseconds().  It is clearer.  I've
made this change locally and will submit a patch shortly.

-Sean

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [jira] Commented: (LUCENE-997) Add search timeout support to Lucene

Roy Ward
Sean Timm wrote:

>> (2) Estimating the the number of results <snip>

>Is there a test case that shows this breakage, or can you point me to
>the code in Hits.java that my patch causes problems with?  Sorry, I'm
>not seeing it.

In the case of no hits at all getting returned, the following code:

if(hits.hasNext())
{
    Hit hit = (Hit)hits.next();
    float s = hit.getScore();
    ...
}

throws the following exception (line numbers are the patch against lucene-2.2):

java.lang.ArrayIndexOutOfBoundsException: 1 >= 1
        at org.apache.lucene.search.Hits.score(Hits.java:125)
        at org.apache.lucene.search.Hit.getScore(Hit.java:68)
<snip>

which is thrown by:

    return hitDoc(n).score;

I haven't worked through this fully (and I haven't yet put this into a nice test case I can send you - I'm testing this within a large application), but I think it's related to the number of estimated hits being different than the actual number of hits, so some code doesn't check that there are enough hits (and user code might do this too). The problem does not occur when a search that actually returns zero hits is done.

> I wouldn't expect anyone to actually use a 1 ms resolution.
<snip>
> If that isn't preempted until 1.1 seconds  have elapsed,
> I think my operations team will still be happy.

This one is no big deal, since as you point out, it's really to stop the tiny proportion of queries that don't finish in a reasonable time, but I was thinking of a couple of things here:

If there's going to be a timer thread running in an application, other things may want to make use of it (such as reporting elapsed times), so some accuracy might be good.

Also, it's nice to have it somewhat bulletproof - If someone makes the mistake of using a resolution of 1ms (I tested it for curiosity), on a heavily loaded system with many threads running that resulted in about 7000 ticks every minute, so it was wrong by a factor of about 8.

If I find time time to put something in the timer class to get good accuracy without too much performance loss, I'll send you a patch (I'm sorry I haven't done so already, but I'm patching against 2.2 rather than the cvs trunk).

Roy Ward
Reply | Threaded
Open this post in threaded view
|

[jira] Updated: (LUCENE-997) Add search timeout support to Lucene

ASF GitHub Bot (Jira)
In reply to this post by ASF GitHub Bot (Jira)

     [ https://issues.apache.org/jira/browse/LUCENE-997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sean Timm updated LUCENE-997:
-----------------------------

    Attachment: timeout.patch

Two issues are addressed in this latest patch:

1) Timeout support was not added to: public TopFieldDocs search(Weight weight, Filter filter, final int nDocs, Sort sort)

2) getCounter() in TimerThread was replaced by getMilliseconds()


> Add search timeout support to Lucene
> ------------------------------------
>
>                 Key: LUCENE-997
>                 URL: https://issues.apache.org/jira/browse/LUCENE-997
>             Project: Lucene - Java
>          Issue Type: New Feature
>            Reporter: Sean Timm
>            Priority: Minor
>         Attachments: LuceneTimeoutTest.java, timeout.patch, timeout.patch
>
>
> This patch is based on Nutch-308.
> This patch adds support for a maximum search time limit. After this time is exceeded, the search thread is stopped, partial results (if any) are returned and the total number of results is estimated.
> This patch tries to minimize the overhead related to time-keeping by using a version of safe unsynchronized timer.
> This was also discussed in an e-mail thread.
> http://www.nabble.com/search-timeout-tf3410206.html#a9501029

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (LUCENE-997) Add search timeout support to Lucene

ASF GitHub Bot (Jira)
In reply to this post by ASF GitHub Bot (Jira)

    [ https://issues.apache.org/jira/browse/LUCENE-997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12537128 ]

Doron Cohen commented on LUCENE-997:
------------------------------------

I think this is a nice feature to have.

But I am not sure about the propsed API - the application creates a TimerThread, starts it, and the timer thread is then passed to the searcher with setTimeOut(timer,ticks). Not so simple.

I think my preference for the API and implementation would be in HitCllector.collect() - in other words, we consider this new feature as an advanced one, and so only allow applications to provide their "timed" hitCollector. The modified collect() would either throw a TimeoutException or return a timedOut indication. If this is a (subclass of) RunTimeException (thuogh I am not crazy about this alternative) then there's no API change (a plus) but we need to verify that the code below propagates the RuntimeException gracefully and closes all the streams and everything (which I believe it does with all last careful changes by Mike and Michael). If RuntimeEXception is not acceptable, then this is an API change (a minus) and also many (simple) changes will be required in scorers (callers to collect).

The application's timedColletor will have all the logic in that collector for both announcing and detecting the timeout. Next we can add a TimedCollector for the benefit of applications, and last, consider adding search() methods with timeOut, but I doubt that last step.

> Add search timeout support to Lucene
> ------------------------------------
>
>                 Key: LUCENE-997
>                 URL: https://issues.apache.org/jira/browse/LUCENE-997
>             Project: Lucene - Java
>          Issue Type: New Feature
>            Reporter: Sean Timm
>            Priority: Minor
>         Attachments: LuceneTimeoutTest.java, timeout.patch, timeout.patch
>
>
> This patch is based on Nutch-308.
> This patch adds support for a maximum search time limit. After this time is exceeded, the search thread is stopped, partial results (if any) are returned and the total number of results is estimated.
> This patch tries to minimize the overhead related to time-keeping by using a version of safe unsynchronized timer.
> This was also discussed in an e-mail thread.
> http://www.nabble.com/search-timeout-tf3410206.html#a9501029

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (LUCENE-997) Add search timeout support to Lucene

ASF GitHub Bot (Jira)
In reply to this post by ASF GitHub Bot (Jira)

    [ https://issues.apache.org/jira/browse/LUCENE-997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12537153 ]

Yonik Seeley commented on LUCENE-997:
-------------------------------------

> allow applications to provide their "timed" hitCollector

+1


> Add search timeout support to Lucene
> ------------------------------------
>
>                 Key: LUCENE-997
>                 URL: https://issues.apache.org/jira/browse/LUCENE-997
>             Project: Lucene - Java
>          Issue Type: New Feature
>            Reporter: Sean Timm
>            Priority: Minor
>         Attachments: LuceneTimeoutTest.java, timeout.patch, timeout.patch
>
>
> This patch is based on Nutch-308.
> This patch adds support for a maximum search time limit. After this time is exceeded, the search thread is stopped, partial results (if any) are returned and the total number of results is estimated.
> This patch tries to minimize the overhead related to time-keeping by using a version of safe unsynchronized timer.
> This was also discussed in an e-mail thread.
> http://www.nabble.com/search-timeout-tf3410206.html#a9501029

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (LUCENE-997) Add search timeout support to Lucene

ASF GitHub Bot (Jira)
In reply to this post by ASF GitHub Bot (Jira)

    [ https://issues.apache.org/jira/browse/LUCENE-997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12538142 ]

Lance Norskog commented on LUCENE-997:
--------------------------------------


I just requested a more fancy feature in the Solr Jira. My apologies, I did not think to search the Lucene Jira.


1) timeout: stop searching after N milliseconds and return results using only those hits already found
2) hit limit: stop searching after N milliseconds and return results using only those hits already found
3) ram limit: estimate the amount of ram used so far and stop searching at a given amount

Here is the complete request:


> Add search timeout support to Lucene
> ------------------------------------
>
>                 Key: LUCENE-997
>                 URL: https://issues.apache.org/jira/browse/LUCENE-997
>             Project: Lucene - Java
>          Issue Type: New Feature
>            Reporter: Sean Timm
>            Priority: Minor
>         Attachments: LuceneTimeoutTest.java, timeout.patch, timeout.patch
>
>
> This patch is based on Nutch-308.
> This patch adds support for a maximum search time limit. After this time is exceeded, the search thread is stopped, partial results (if any) are returned and the total number of results is estimated.
> This patch tries to minimize the overhead related to time-keeping by using a version of safe unsynchronized timer.
> This was also discussed in an e-mail thread.
> http://www.nabble.com/search-timeout-tf3410206.html#a9501029

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (LUCENE-997) Add search timeout support to Lucene

ASF GitHub Bot (Jira)
In reply to this post by ASF GitHub Bot (Jira)

    [ https://issues.apache.org/jira/browse/LUCENE-997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12538144 ]

Lance Norskog commented on LUCENE-997:
--------------------------------------

I stumbled above; I do not yet know Jira :)  The Solr code is SOLR-392.

This request is inspired by a public search engine with millions of records.
 There are three different aspects mentioned above that can cause a query to "go rogue": timing out, finding too many records to give a truly useable result, and using up too much memory. The point is that if a search is going to find 14 million hits, Google does not go and tally them. It stops quickly and estimates how many might remain. I would like to have similar control.

The HitCollector implementation mentioned above would allow all three of these control options. If they could be pipelined together we could use any or all of them.

> Add search timeout support to Lucene
> ------------------------------------
>
>                 Key: LUCENE-997
>                 URL: https://issues.apache.org/jira/browse/LUCENE-997
>             Project: Lucene - Java
>          Issue Type: New Feature
>            Reporter: Sean Timm
>            Priority: Minor
>         Attachments: LuceneTimeoutTest.java, timeout.patch, timeout.patch
>
>
> This patch is based on Nutch-308.
> This patch adds support for a maximum search time limit. After this time is exceeded, the search thread is stopped, partial results (if any) are returned and the total number of results is estimated.
> This patch tries to minimize the overhead related to time-keeping by using a version of safe unsynchronized timer.
> This was also discussed in an e-mail thread.
> http://www.nabble.com/search-timeout-tf3410206.html#a9501029

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (LUCENE-997) Add search timeout support to Lucene

ASF GitHub Bot (Jira)
In reply to this post by ASF GitHub Bot (Jira)

    [ https://issues.apache.org/jira/browse/LUCENE-997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12542675 ]

Sean Timm commented on LUCENE-997:
----------------------------------

> I think my preference for the API and implementation would be in HitCollector.collect()

This would be simpler, but I don't see how it would be possible to estimate the total number of results and return partial results in that case.  I think that is an important feature.

If the concern is complexity for the application, perhaps it is possible to hide the TimerThread altogether.  The TimerThread could be created and started via a searcher setTimeOut(tick, numTicks) method.

To simplify it further, ticks could be fixed at a reasonable number, e.g., 100 ms, and a timeout in milliseconds could be passed in: setTimeout(milliseconds).

> Add search timeout support to Lucene
> ------------------------------------
>
>                 Key: LUCENE-997
>                 URL: https://issues.apache.org/jira/browse/LUCENE-997
>             Project: Lucene - Java
>          Issue Type: New Feature
>            Reporter: Sean Timm
>            Priority: Minor
>         Attachments: LuceneTimeoutTest.java, timeout.patch, timeout.patch
>
>
> This patch is based on Nutch-308.
> This patch adds support for a maximum search time limit. After this time is exceeded, the search thread is stopped, partial results (if any) are returned and the total number of results is estimated.
> This patch tries to minimize the overhead related to time-keeping by using a version of safe unsynchronized timer.
> This was also discussed in an e-mail thread.
> http://www.nabble.com/search-timeout-tf3410206.html#a9501029

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (LUCENE-997) Add search timeout support to Lucene

ASF GitHub Bot (Jira)
In reply to this post by ASF GitHub Bot (Jira)

    [ https://issues.apache.org/jira/browse/LUCENE-997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12552143 ]

Doron Cohen commented on LUCENE-997:
------------------------------------

{quote}
TimerThread provides a pseudo-clock service to all searching threads,
so that they can count elapsed time with less overhead than repeatedly
calling System.currentTimeMillis. A single thread should be created to
be used for all searches.
{quote}
Is this really faster than calling System.currentTimeMillis()?
I quick searched but found no references supporting this.
This one says the opposite:
  http://www.devx.com/Java/Article/28685
Because if this is not the case, you could do without the TimerThread?


> Add search timeout support to Lucene
> ------------------------------------
>
>                 Key: LUCENE-997
>                 URL: https://issues.apache.org/jira/browse/LUCENE-997
>             Project: Lucene - Java
>          Issue Type: New Feature
>            Reporter: Sean Timm
>            Priority: Minor
>         Attachments: LuceneTimeoutTest.java, timeout.patch, timeout.patch
>
>
> This patch is based on Nutch-308.
> This patch adds support for a maximum search time limit. After this time is exceeded, the search thread is stopped, partial results (if any) are returned and the total number of results is estimated.
> This patch tries to minimize the overhead related to time-keeping by using a version of safe unsynchronized timer.
> This was also discussed in an e-mail thread.
> http://www.nabble.com/search-timeout-tf3410206.html#a9501029

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (LUCENE-997) Add search timeout support to Lucene

ASF GitHub Bot (Jira)
In reply to this post by ASF GitHub Bot (Jira)

    [ https://issues.apache.org/jira/browse/LUCENE-997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12552147 ]

Michael McCandless commented on LUCENE-997:
-------------------------------------------

I think one benefit of a dedicated timer thread is not being affected
by clock shift on the machine.  System.currentTimeMillis() is not
guaranteed to be monotonic.  System.nanoTime() (1.5 only!) tries
to be (I think), but it's still not guaranteed.

Without a monotonic clock, if the clock shifts forward then it could
timeout in-flight queries (way) too early.

But: what happens when TimerThread overflows the int (a
2*1024*1024*1024)?  Is it the caller's job to deal with the
wraparound?


> Add search timeout support to Lucene
> ------------------------------------
>
>                 Key: LUCENE-997
>                 URL: https://issues.apache.org/jira/browse/LUCENE-997
>             Project: Lucene - Java
>          Issue Type: New Feature
>            Reporter: Sean Timm
>            Priority: Minor
>         Attachments: LuceneTimeoutTest.java, timeout.patch, timeout.patch
>
>
> This patch is based on Nutch-308.
> This patch adds support for a maximum search time limit. After this time is exceeded, the search thread is stopped, partial results (if any) are returned and the total number of results is estimated.
> This patch tries to minimize the overhead related to time-keeping by using a version of safe unsynchronized timer.
> This was also discussed in an e-mail thread.
> http://www.nabble.com/search-timeout-tf3410206.html#a9501029

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (LUCENE-997) Add search timeout support to Lucene

ASF GitHub Bot (Jira)
In reply to this post by ASF GitHub Bot (Jira)

    [ https://issues.apache.org/jira/browse/LUCENE-997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12552170 ]

Doron Cohen commented on LUCENE-997:
------------------------------------

Nice, I didn't think of this.

So with this understanding the timer thread  (with long vs int)
makes sense while in Java 1.4, then in 1.5 System.nanoTime
will do.

The suggested patch relied on collect() being called, and
so if a scorer takes long going over all the posting lists but
fails to find a single match after the time passed, the search
operation will not be stopped. I guess it is a fair assumption
that this would be very rare...
(so would be a system clock shift...  : - ) )

Also important to understand is what happens with IO
resources once search is aborted with timeout exception.
Current patch does not close the underlying streams (I
mean IndexInput clones). I think this is ok, because
once the search is aborted and there are no more references
to the weights&scorers, the IndexInput clones would be
eventually garbage collected. Others?

> Add search timeout support to Lucene
> ------------------------------------
>
>                 Key: LUCENE-997
>                 URL: https://issues.apache.org/jira/browse/LUCENE-997
>             Project: Lucene - Java
>          Issue Type: New Feature
>            Reporter: Sean Timm
>            Priority: Minor
>         Attachments: LuceneTimeoutTest.java, timeout.patch, timeout.patch
>
>
> This patch is based on Nutch-308.
> This patch adds support for a maximum search time limit. After this time is exceeded, the search thread is stopped, partial results (if any) are returned and the total number of results is estimated.
> This patch tries to minimize the overhead related to time-keeping by using a version of safe unsynchronized timer.
> This was also discussed in an e-mail thread.
> http://www.nabble.com/search-timeout-tf3410206.html#a9501029

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

1234