[jira] Created: (LUCENE-874) Automatic reopen of IndexSearcher/IndexReader

classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|

[jira] Created: (LUCENE-874) Automatic reopen of IndexSearcher/IndexReader

Nick Burch (Jira)
Automatic reopen of IndexSearcher/IndexReader
---------------------------------------------

                 Key: LUCENE-874
                 URL: https://issues.apache.org/jira/browse/LUCENE-874
             Project: Lucene - Java
          Issue Type: Improvement
          Components: Search
            Reporter: João Fonseca
            Priority: Minor


To improve performance, a single instance of IndexSearcher should be used. However, if the index is updated, it's hard to close/reopen it, because multiple threads may be accessing it at the same time.

Lucene should include an out-of-the-box solution to this problem. Either a new class should be implemented to manage this behaviour (singleton IndexSearcher, plus detection of a modified index, plus safely closing and reopening the IndexSearcher) or this could be behind the scenes by the IndexSearcher class.



--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (LUCENE-874) Automatic reopen of IndexSearcher/IndexReader

Nick Burch (Jira)

    [ https://issues.apache.org/jira/browse/LUCENE-874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12493441 ]

Mark Miller commented on LUCENE-874:
------------------------------------

I agree that there should be a Sandbox endorsed solution to the very common need of a live, interactive index. Solr is great if you want to move beyond Lucene, but not everyone wants to use Solr. There are probably a lot of custom management systems out there to handle index access for a live system and I think there would be a lot of benefit to new users to standardize some of that, especially considering how often people seem to get it wrong when trying to do this themselves (multiple index writers and searchers, etc). IndexModifier is a step in that direction, but a lot more could be done. A lot of pieces are already around:

Lucene-390 : this class is great. It has a few shortcoming and could use some improvements and extending (warming searchers), but it is a very solid base for such things.

Lucene-112 : probably outdated, but some ideas for keeping a reader fresh.

I know there are others as well.

The new IndexWriter that can delete is a great help in this regard as well, especially in combination with Lucene-390. Deleting used to involve releasing a writer, getting a writing reader and then doing the delete. This made batching extremely important...with IndexWriter now able to delete, you don't have to rely nearly as much on batching...reindexing in an object oriented environment can become much faster and designs can simplify.

- Mark

> Automatic reopen of IndexSearcher/IndexReader
> ---------------------------------------------
>
>                 Key: LUCENE-874
>                 URL: https://issues.apache.org/jira/browse/LUCENE-874
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>            Reporter: João Fonseca
>            Priority: Minor
>
> To improve performance, a single instance of IndexSearcher should be used. However, if the index is updated, it's hard to close/reopen it, because multiple threads may be accessing it at the same time.
> Lucene should include an out-of-the-box solution to this problem. Either a new class should be implemented to manage this behaviour (singleton IndexSearcher, plus detection of a modified index, plus safely closing and reopening the IndexSearcher) or this could be behind the scenes by the IndexSearcher class.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (LUCENE-874) Automatic reopen of IndexSearcher/IndexReader

Nick Burch (Jira)
In reply to this post by Nick Burch (Jira)

    [ https://issues.apache.org/jira/browse/LUCENE-874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12493470 ]

Karl Wettin commented on LUCENE-874:
------------------------------------

LUCENE-550 contains NotifiableIndex that by decoration keeps track of whats going on in a index. It comes with AutofreshedReader and AutofreshedSearcher, active hit collection cache et c. Perhaps that works for you? Java 1.5

> Automatic reopen of IndexSearcher/IndexReader
> ---------------------------------------------
>
>                 Key: LUCENE-874
>                 URL: https://issues.apache.org/jira/browse/LUCENE-874
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>            Reporter: João Fonseca
>            Priority: Minor
>
> To improve performance, a single instance of IndexSearcher should be used. However, if the index is updated, it's hard to close/reopen it, because multiple threads may be accessing it at the same time.
> Lucene should include an out-of-the-box solution to this problem. Either a new class should be implemented to manage this behaviour (singleton IndexSearcher, plus detection of a modified index, plus safely closing and reopening the IndexSearcher) or this could be behind the scenes by the IndexSearcher class.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (LUCENE-874) Automatic reopen of IndexSearcher/IndexReader

Nick Burch (Jira)
In reply to this post by Nick Burch (Jira)

    [ https://issues.apache.org/jira/browse/LUCENE-874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12493570 ]

Erik Hatcher commented on LUCENE-874:
-------------------------------------

Do note that Solr can be embedded: http://wiki.apache.org/solr/EmbeddedSolr
And there are improvements to this in the works too.

> Automatic reopen of IndexSearcher/IndexReader
> ---------------------------------------------
>
>                 Key: LUCENE-874
>                 URL: https://issues.apache.org/jira/browse/LUCENE-874
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>            Reporter: João Fonseca
>            Priority: Minor
>
> To improve performance, a single instance of IndexSearcher should be used. However, if the index is updated, it's hard to close/reopen it, because multiple threads may be accessing it at the same time.
> Lucene should include an out-of-the-box solution to this problem. Either a new class should be implemented to manage this behaviour (singleton IndexSearcher, plus detection of a modified index, plus safely closing and reopening the IndexSearcher) or this could be behind the scenes by the IndexSearcher class.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (LUCENE-874) Automatic reopen of IndexSearcher/IndexReader

Nick Burch (Jira)
In reply to this post by Nick Burch (Jira)

    [ https://issues.apache.org/jira/browse/LUCENE-874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12493656 ]

João Fonseca commented on LUCENE-874:
-------------------------------------

AutofreshedSearcher seems to be something like what I was proposing, but it seems to rely on the NotifiableIndex mechanism. Will this work if the task that updates the index is on another process/JVM?

I struggled with this problem on my own project, and came with a solution with points similar to Lucene-550 - a LuceneFactory, with factory methods for creating everything related with Lucene (IndexSearcher, IndexWriter, Analyzer, etc). The factory method for creating an IndexSearcher always returns the same instance, as advised by the Lucene javadocs.

The problem is when the index is modified (which, in my case, is done by an external process, from time to time). The IndexSearcher must be reopened. There are several issues to solve:

-How to test if the index was modified? That's easy: !IndexReader.isCurrent && !IndexReader.locked
-When to test if the index was modified? I test it on my LuceneFactory.getIndexSearcher() method, but only from time to time - it would be costly to test for every search that was made.
-The index was modified; how to close the current IndexSearcher? Other processes may be still using it, or using Hits generated by it. This is the hardest part to solve.
    * A reference count to the IndexSearcher must be maintained by the LuceneFactory, to know when all parties have finished searching.
    * To maintain a reference count, these parties must have a way to notify the factory that the search is finished.
    * Also, to maintain the reference count in a thread-safe manner, some locking must be used when getting and releasing the searcher (slow!)
    * How to wait for the reference count to reach 0? On another thread? Polling from time to time (on each LuceneFactory.getIndexSearcher call)?

As you can see, this is not trivial at all to do correctly - and so it should be implemented and given out-of-the-box with Lucene. Note that the above description uses Lucene as a black box, maybe it's easier to implement this inside the IndexSearcher class, by updating its internal structure when the index is changed.

Another way is to maintain one IndexSearcher per thread (with a ThreadLocal). The reopening of the IndexSearcher would be easier, but there would be several IndexSearchers on memory...






> Automatic reopen of IndexSearcher/IndexReader
> ---------------------------------------------
>
>                 Key: LUCENE-874
>                 URL: https://issues.apache.org/jira/browse/LUCENE-874
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>            Reporter: João Fonseca
>            Priority: Minor
>
> To improve performance, a single instance of IndexSearcher should be used. However, if the index is updated, it's hard to close/reopen it, because multiple threads may be accessing it at the same time.
> Lucene should include an out-of-the-box solution to this problem. Either a new class should be implemented to manage this behaviour (singleton IndexSearcher, plus detection of a modified index, plus safely closing and reopening the IndexSearcher) or this could be behind the scenes by the IndexSearcher class.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (LUCENE-874) Automatic reopen of IndexSearcher/IndexReader

Nick Burch (Jira)
In reply to this post by Nick Burch (Jira)

    [ https://issues.apache.org/jira/browse/LUCENE-874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12493884 ]

Karl Wettin commented on LUCENE-874:
------------------------------------

João Fonseca [04/May/07 04:49 AM]
> AutofreshedSearcher seems to be something like what I was proposing, but it
> seems to rely on the NotifiableIndex mechanism. Will this work if the task that
> updates the index is on another process/JVM?

No, it is a single JVM solution.

> The problem is when the index is modified (which, in my case, is done by an
> external process, from time to time). The IndexSearcher must be reopened. There
> are several issues to solve:

I would consider a JMS solution on top of NotifiableIndex (or your own factory).


> Automatic reopen of IndexSearcher/IndexReader
> ---------------------------------------------
>
>                 Key: LUCENE-874
>                 URL: https://issues.apache.org/jira/browse/LUCENE-874
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>            Reporter: João Fonseca
>            Priority: Minor
>
> To improve performance, a single instance of IndexSearcher should be used. However, if the index is updated, it's hard to close/reopen it, because multiple threads may be accessing it at the same time.
> Lucene should include an out-of-the-box solution to this problem. Either a new class should be implemented to manage this behaviour (singleton IndexSearcher, plus detection of a modified index, plus safely closing and reopening the IndexSearcher) or this could be behind the scenes by the IndexSearcher class.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (LUCENE-874) Automatic reopen of IndexSearcher/IndexReader

Nick Burch (Jira)
In reply to this post by Nick Burch (Jira)

    [ https://issues.apache.org/jira/browse/LUCENE-874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12494026 ]

João Fonseca commented on LUCENE-874:
-------------------------------------

> I would consider a JMS solution on top of NotifiableIndex (or your own factory).

JMS seems to be a complicated and heavy solution to a simple and recurring problem for those using Lucene: you want to use a singleton IndexSearcher to improve performance, but when the index changes, you want the IndexSearch to show the updated information.


> Automatic reopen of IndexSearcher/IndexReader
> ---------------------------------------------
>
>                 Key: LUCENE-874
>                 URL: https://issues.apache.org/jira/browse/LUCENE-874
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>            Reporter: João Fonseca
>            Priority: Minor
>
> To improve performance, a single instance of IndexSearcher should be used. However, if the index is updated, it's hard to close/reopen it, because multiple threads may be accessing it at the same time.
> Lucene should include an out-of-the-box solution to this problem. Either a new class should be implemented to manage this behaviour (singleton IndexSearcher, plus detection of a modified index, plus safely closing and reopening the IndexSearcher) or this could be behind the scenes by the IndexSearcher class.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (LUCENE-874) Automatic reopen of IndexSearcher/IndexReader

Nick Burch (Jira)
In reply to this post by Nick Burch (Jira)

    [ https://issues.apache.org/jira/browse/LUCENE-874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12786686#action_12786686 ]

Mark Miller commented on LUCENE-874:
------------------------------------

Anyone interested in this issue? I think the new ref stuff actually makes this rather easy now ...

> Automatic reopen of IndexSearcher/IndexReader
> ---------------------------------------------
>
>                 Key: LUCENE-874
>                 URL: https://issues.apache.org/jira/browse/LUCENE-874
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>            Reporter: João Fonseca
>            Priority: Minor
>
> To improve performance, a single instance of IndexSearcher should be used. However, if the index is updated, it's hard to close/reopen it, because multiple threads may be accessing it at the same time.
> Lucene should include an out-of-the-box solution to this problem. Either a new class should be implemented to manage this behaviour (singleton IndexSearcher, plus detection of a modified index, plus safely closing and reopening the IndexSearcher) or this could be behind the scenes by the IndexSearcher class.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[jira] Closed: (LUCENE-874) Automatic reopen of IndexSearcher/IndexReader

Nick Burch (Jira)
In reply to this post by Nick Burch (Jira)

     [ https://issues.apache.org/jira/browse/LUCENE-874?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Mark Miller closed LUCENE-874.
------------------------------

    Resolution: Won't Fix

> Automatic reopen of IndexSearcher/IndexReader
> ---------------------------------------------
>
>                 Key: LUCENE-874
>                 URL: https://issues.apache.org/jira/browse/LUCENE-874
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>            Reporter: João Fonseca
>            Priority: Minor
>
> To improve performance, a single instance of IndexSearcher should be used. However, if the index is updated, it's hard to close/reopen it, because multiple threads may be accessing it at the same time.
> Lucene should include an out-of-the-box solution to this problem. Either a new class should be implemented to manage this behaviour (singleton IndexSearcher, plus detection of a modified index, plus safely closing and reopening the IndexSearcher) or this could be behind the scenes by the IndexSearcher class.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]