Indexing - scheduled batch process or server?

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Indexing - scheduled batch process or server?

marc_dauncey
Hi everyone,

I'm currently designing a Lucene search system and i'm
considering the indexing side of things.

Just wondered what kind of architecture people have
adopted for indexing - are CHRON jobs sufficient for
high volume drip feed indexing or has anyone
implemented a more sophisticated solution with web
services to index on demand?  

And has anyone used Quartz to schedule Lucene index
updates?  Sounds like an interesting product in this
context.

Many thanks


Marc Dauncey


       
       
               
___________________________________________________________
Yahoo! Messenger - NEW crystal clear PC to PC calling worldwide with voicemail http://uk.messenger.yahoo.com

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Indexing - scheduled batch process or server?

Jeremy Hanna-2
I'm pretty new with this, but with my index for a database, I'm using  
a Quartz scheduler.  Also at the end of the index update, I set my  
singleton of IndexSearcher to null.  That way the index searcher will  
be using the latest information.  That bit as well as setting it to  
null and not closing it I found searching around on forums.  The  
reason given for not closing it is to allow searches currently using  
the index searches to finish using it.
Anyway, I hope this helps.
Jeremy

On Apr 17, 2006, at 2:53 PM, Marc Dauncey wrote:

> Hi everyone,
>
> I'm currently designing a Lucene search system and i'm
> considering the indexing side of things.
>
> Just wondered what kind of architecture people have
> adopted for indexing - are CHRON jobs sufficient for
> high volume drip feed indexing or has anyone
> implemented a more sophisticated solution with web
> services to index on demand?
>
> And has anyone used Quartz to schedule Lucene index
> updates?  Sounds like an interesting product in this
> context.
>
> Many thanks
>
>
> Marc Dauncey
>
>
>
>
>
> ___________________________________________________________
> Yahoo! Messenger - NEW crystal clear PC to PC calling worldwide  
> with voicemail http://uk.messenger.yahoo.com
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Indexing - scheduled batch process or server?

marc_dauncey
Thanks for the response, Jeremy.

Quartz seems like a great solution - are you running
it within the app server?

I think the benefits of doing this would be
convenience of messaging the search server to pick up
fresh indexes. Previously I considered a CRON job and
was thinking of making a web services call to achieve
the same thing.

The only thing that concerns me (and this is maybe a
question for the Quartz mailing list rather than this
one) is the spawning of user threads issue. That kind
of thing makes me nervous in an app server context,
but lots of people use Quartz for J2EE scheduling so
it must be fairly stable.

What was your experience of it?

Many thanks

Marc


--- Jeremy Hanna <[hidden email]> wrote:

> I'm pretty new with this, but with my index for a
> database, I'm using  
> a Quartz scheduler.  Also at the end of the index
> update, I set my  
> singleton of IndexSearcher to null.  That way the
> index searcher will  
> be using the latest information.  That bit as well
> as setting it to  
> null and not closing it I found searching around on
> forums.  The  
> reason given for not closing it is to allow searches
> currently using  
> the index searches to finish using it.
> Anyway, I hope this helps.
> Jeremy
>
> On Apr 17, 2006, at 2:53 PM, Marc Dauncey wrote:
>
> > Hi everyone,
> >
> > I'm currently designing a Lucene search system and
> i'm
> > considering the indexing side of things.
> >
> > Just wondered what kind of architecture people
> have
> > adopted for indexing - are CHRON jobs sufficient
> for
> > high volume drip feed indexing or has anyone
> > implemented a more sophisticated solution with web
> > services to index on demand?
> >
> > And has anyone used Quartz to schedule Lucene
> index
> > updates?  Sounds like an interesting product in
> this
> > context.
> >
> > Many thanks
> >
> >
> > Marc Dauncey
> >
> >
> >
> >
> >
> >
>
___________________________________________________________
> > Yahoo! Messenger - NEW crystal clear PC to PC
> calling worldwide  
> > with voicemail http://uk.messenger.yahoo.com
> >
> >
>
---------------------------------------------------------------------
> > To unsubscribe, e-mail:
> [hidden email]
> > For additional commands, e-mail:
> [hidden email]
> >
>
>
>
---------------------------------------------------------------------
> To unsubscribe, e-mail:
> [hidden email]
> For additional commands, e-mail:
> [hidden email]
>
>



               
___________________________________________________________
24 FIFA World Cup tickets to be won with Yahoo! Mail http://uk.mail.yahoo.com

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Indexing - scheduled batch process or server?

Yonik Seeley
In reply to this post by marc_dauncey
On 4/17/06, Marc Dauncey <[hidden email]> wrote:
> or has anyone
> implemented a more sophisticated solution with web
> services to index on demand?

In Solr, documents (XML versions of Lucene Documents) are POSTed to the server.
There are explicit <commit/> commands that cause an new IndexReader to
be opened and warmed in the background.

-Yonik
http://incubator.apache.org/solr Solr, The Open Source Lucene Search Server

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Indexing - scheduled batch process or server?

Jeremy Hanna-2
In reply to this post by marc_dauncey
Marc,

I am using it within the web app.  I use Spring and there are ways to  
throttle a call down to one thread with Spring, if you're worried  
about overloading the server when you update the index.  I'm not sure  
about Quartz and its ability to set a priority or limit the number of  
threads or how to use a thread pool or have load-balancing.  I did a  
quick search on the Quartz user forum and found a lot of discussion  
on Java threads though, so that might be promising (http://
forums.opensymphony.com/search.jspa?objID=f6&q=thread).

Anyway, Quartz has seemed to work for what I'm doing - I inherited  
using it from the previous developer and it's had a good history of  
being reliable for our stuff.

Jeremy

On Apr 18, 2006, at 7:38 AM, Marc Dauncey wrote:

> Thanks for the response, Jeremy.
>
> Quartz seems like a great solution - are you running
> it within the app server?
>
> I think the benefits of doing this would be
> convenience of messaging the search server to pick up
> fresh indexes. Previously I considered a CRON job and
> was thinking of making a web services call to achieve
> the same thing.
>
> The only thing that concerns me (and this is maybe a
> question for the Quartz mailing list rather than this
> one) is the spawning of user threads issue. That kind
> of thing makes me nervous in an app server context,
> but lots of people use Quartz for J2EE scheduling so
> it must be fairly stable.
>
> What was your experience of it?
>
> Many thanks
>
> Marc
>
>
> --- Jeremy Hanna <[hidden email]> wrote:
>
>> I'm pretty new with this, but with my index for a
>> database, I'm using
>> a Quartz scheduler.  Also at the end of the index
>> update, I set my
>> singleton of IndexSearcher to null.  That way the
>> index searcher will
>> be using the latest information.  That bit as well
>> as setting it to
>> null and not closing it I found searching around on
>> forums.  The
>> reason given for not closing it is to allow searches
>> currently using
>> the index searches to finish using it.
>> Anyway, I hope this helps.
>> Jeremy
>>
>> On Apr 17, 2006, at 2:53 PM, Marc Dauncey wrote:
>>
>>> Hi everyone,
>>>
>>> I'm currently designing a Lucene search system and
>> i'm
>>> considering the indexing side of things.
>>>
>>> Just wondered what kind of architecture people
>> have
>>> adopted for indexing - are CHRON jobs sufficient
>> for
>>> high volume drip feed indexing or has anyone
>>> implemented a more sophisticated solution with web
>>> services to index on demand?
>>>
>>> And has anyone used Quartz to schedule Lucene
>> index
>>> updates?  Sounds like an interesting product in
>> this
>>> context.
>>>
>>> Many thanks
>>>
>>>
>>> Marc Dauncey
>>>
>>>
>>>
>>>
>>>
>>>
>>
> ___________________________________________________________
>>> Yahoo! Messenger - NEW crystal clear PC to PC
>> calling worldwide
>>> with voicemail http://uk.messenger.yahoo.com
>>>
>>>
>>
> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail:
>> [hidden email]
>>> For additional commands, e-mail:
>> [hidden email]
>>>
>>
>>
>>
> ---------------------------------------------------------------------
>> To unsubscribe, e-mail:
>> [hidden email]
>> For additional commands, e-mail:
>> [hidden email]
>>
>>
>
>
>
>
> ___________________________________________________________
> 24 FIFA World Cup tickets to be won with Yahoo! Mail http://
> uk.mail.yahoo.com
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]