Running Lucene as a stateless session bean

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Running Lucene as a stateless session bean

Walker, Keith 1
I'm using an EJB to process documents using Lucene 1.3.  Things are
working fine now, but I  wanted to double check that this will work with
multiple instances of the EJB.  I know this is not conforming to the EJB
spec concerning file I/O, but ignoring that for now, my question is
about thread safety.  From the FAQ I see that IndexWriter and
IndexSearcher are thread safe, but QueryParser is not, so I'll have to
change that to a singleton.

My use of Lucene is not the typical scenario:  A document is converted
from it's original format (ex. PDF) using the GATE framework, then the
index created, a query parsed, the query run, and the index deleted.  So
each call to the EJB is acting only on objects/index created during that
call.  Here are the core steps:

Factory.createDataStore("gate.persist.SerialDataStore",
                                datastoreURLStr);  [GATE call]
indexedCorpus.getIndexManager().createIndex();  [GATE call that uses
Lucene under the hood]
IndexSearcher search = new IndexSearcher(this.indexedCorpus
       
.getIndexDefinition().getIndexLocation());
luceneQuery = QueryParser.parse(theQuery,"body", new SimpleAnalyzer());
Hits hits = search.search(luceneQuery);
Explanation ex = search.explain(luceneQuery, hits.id(0));
deleteDirectory(this.indexSubDir);
Factory.deleteResource(this.indexedCorpus); [GATE call]


Thanks,
Keith
Reply | Threaded
Open this post in threaded view
|

Re: Running Lucene as a stateless session bean

mark harwood
Be careful with your use of GATE and multiple threads.
I recently had some trouble with their Factory.delete.. methods which
ended up requiring a change to the core and this was applied to the 4.0
trunk. A 3.1 patch has not been released so you'll need to be using the
latest from SVN (now requires Java 1.5). GATE applications/Controller
Pipelines are still un-threadsafe and are likely to remain that way.
However, it looks to me like you're only using it for it's document
parser framework here so I'd recommend looking at Lius or the Lucene in
Action parser framework.

Cheers,
Mark


Walker, Keith 1 wrote:

> I'm using an EJB to process documents using Lucene 1.3.  Things are
> working fine now, but I  wanted to double check that this will work with
> multiple instances of the EJB.  I know this is not conforming to the EJB
> spec concerning file I/O, but ignoring that for now, my question is
> about thread safety.  From the FAQ I see that IndexWriter and
> IndexSearcher are thread safe, but QueryParser is not, so I'll have to
> change that to a singleton.
>
> My use of Lucene is not the typical scenario:  A document is converted
> from it's original format (ex. PDF) using the GATE framework, then the
> index created, a query parsed, the query run, and the index deleted.  So
> each call to the EJB is acting only on objects/index created during that
> call.  Here are the core steps:
>
> Factory.createDataStore("gate.persist.SerialDataStore",
> datastoreURLStr);  [GATE call]
> indexedCorpus.getIndexManager().createIndex();  [GATE call that uses
> Lucene under the hood]
> IndexSearcher search = new IndexSearcher(this.indexedCorpus
>
> .getIndexDefinition().getIndexLocation());
> luceneQuery = QueryParser.parse(theQuery,"body", new SimpleAnalyzer());
> Hits hits = search.search(luceneQuery);
> Explanation ex = search.explain(luceneQuery, hits.id(0));
> deleteDirectory(this.indexSubDir);
> Factory.deleteResource(this.indexedCorpus); [GATE call]
>
>
> Thanks,
> Keith
>
>  



               
___________________________________________________________
The all-new Yahoo! Mail goes wherever you go - free your email address from your Internet provider. http://uk.docs.yahoo.com/nowyoucan.html

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Running Lucene as a stateless session bean

Doron Cohen
In reply to this post by Walker, Keith 1
Is that perhaps Lucene 1.4.3?
(Current release is 2.1.0, I am not aware of 1.3, such
old version is not even in the releases archives).

The static parse() was deprecated at 1.9 and removed
at 2.0, so it must be Lucene 1.9 or older.

Anyhow, at least from Lucene point of view (I am not
familiar with GATE), by instantiating (each time) a new
QueryParser object and using it, there should be no
thread safety problems.

Regards,
Doron


"Walker, Keith 1" <[hidden email]> wrote on 20/02/2007 11:22:33:

> I'm using an EJB to process documents using Lucene 1.3.  Things are
> working fine now, but I  wanted to double check that this will work with
> multiple instances of the EJB.  I know this is not conforming to the EJB
> spec concerning file I/O, but ignoring that for now, my question is
> about thread safety.  From the FAQ I see that IndexWriter and
> IndexSearcher are thread safe, but QueryParser is not, so I'll have to
> change that to a singleton.

I think "singleton" would not be a good idea, thread safety wise.

>
> My use of Lucene is not the typical scenario:  A document is converted
> from it's original format (ex. PDF) using the GATE framework, then the
> index created, a query parsed, the query run, and the index deleted.  So
> each call to the EJB is acting only on objects/index created during that
> call.  Here are the core steps:
>
> Factory.createDataStore("gate.persist.SerialDataStore",
>             datastoreURLStr);  [GATE call]
> indexedCorpus.getIndexManager().createIndex();  [GATE call that uses
> Lucene under the hood]
> IndexSearcher search = new IndexSearcher(this.indexedCorpus
>
> .getIndexDefinition().getIndexLocation());
> luceneQuery = QueryParser.parse(theQuery,"body", new SimpleAnalyzer());
> Hits hits = search.search(luceneQuery);
> Explanation ex = search.explain(luceneQuery, hits.id(0));
> deleteDirectory(this.indexSubDir);
> Factory.deleteResource(this.indexedCorpus); [GATE call]
>
>
> Thanks,
> Keith


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

RE: Running Lucene as a stateless session bean

Walker, Keith 1
Thanks for the suggestions.

I'm using the Lucene packaged with Gate, which is lucene-1.3-final.jar
(ancient I suppose).

I am now seeing the threading problems with GATE, and although I was
hoping to stay with Gate in case we need some of it's capabilities,
although with the current design we could go with something like Lius.  

Regards,
Keith

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]