When Index is Updated Frequently

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

When Index is Updated Frequently

Bing Li
Dear all,

According to my experiences, when the Lucene index updated frequently, its
performance must become low. Is it correct?

In my system, most data crawled from the Web is indexed and the
corresponding index will NOT be updated any more.

However, some indexes should be updated frequently like the records in
relational databases. The sizes of the indexes are not so large as the
crawled data. The updated index will NOT be scaled to many other nodes. In
most time, they are located on a very limited number of machines.

In this case, may I use Lucene indexes? Or I need to replace them with
relational databases?

Thanks so much!
LB
Reply | Threaded
Open this post in threaded view
|

Re: When Index is Updated Frequently

Michael McCandless-2
On Fri, Mar 4, 2011 at 10:09 AM, Bing Li <[hidden email]> wrote:

> According to my experiences, when the Lucene index updated frequently, its
> performance must become low. Is it correct?

In fact Lucene can gracefully handle a high rate of updates with low
latency turnaround on the readers, using the near-real-time (NRT) API
-- IndexWriter.getReader() (or in soon-to-be 31,
IndexReader.open(IndexWriter)).

NRT is really something a hybrid of "eventual consistency" and
"immediate consistency", because it lets your app have full control
over how quickly changes must be visible by controlling when you
pull a new NRT reader.

That said, Lucene can't offer true immediate consistency at a high
update rate -- the time to open a new NRT reader is usually too costly
to do, eg, for every search.  But eg every 100 msec (say) is
reasonable (depending on many variables...).

So... for your app you should run some tests and see.  And please
report back.

(But, unfortunately, NRT hasn't been exposed in Solr yet...).

--
Mike

http://blog.mikemccandless.com
Reply | Threaded
Open this post in threaded view
|

Re: When Index is Updated Frequently

Bing Li
Dear Michael,

Thanks so much for your answer!

I have a question. If Lucene is good at updating, it must more loads on the
Solr cluster. So in my system, I will leave the large amount of crawled data
unchanged for ever. Meanwhile, I use a traditional database to keep mutable
data.

Fortunately, in most Internet systems, the amount of mutable data is much
less than that of immutable one.

How do you think about my solution?

Best,
LB

On Sat, Mar 5, 2011 at 2:45 AM, Michael McCandless <
[hidden email]> wrote:

> On Fri, Mar 4, 2011 at 10:09 AM, Bing Li <[hidden email]> wrote:
>
> > According to my experiences, when the Lucene index updated frequently,
> its
> > performance must become low. Is it correct?
>
> In fact Lucene can gracefully handle a high rate of updates with low
> latency turnaround on the readers, using the near-real-time (NRT) API
> -- IndexWriter.getReader() (or in soon-to-be 31,
> IndexReader.open(IndexWriter)).
>
> NRT is really something a hybrid of "eventual consistency" and
> "immediate consistency", because it lets your app have full control
> over how quickly changes must be visible by controlling when you
> pull a new NRT reader.
>
> That said, Lucene can't offer true immediate consistency at a high
> update rate -- the time to open a new NRT reader is usually too costly
> to do, eg, for every search.  But eg every 100 msec (say) is
> reasonable (depending on many variables...).
>
> So... for your app you should run some tests and see.  And please
> report back.
>
> (But, unfortunately, NRT hasn't been exposed in Solr yet...).
>
> --
> Mike
>
> http://blog.mikemccandless.com
>
Reply | Threaded
Open this post in threaded view
|

Re: When Index is Updated Frequently

gearond
In reply to this post by Michael McCandless-2
Nearly 100ms? If any netizen ever complained about that, I'd 'round-file' the
complaint. Internal to a single process's execution, well, mabye it's an issue.
Not too hard to handle.

Good job to the team that made it!

 From: Michael McCandless <[hidden email]>

To: [hidden email]; [hidden email]
Cc: Bing Li <[hidden email]>
Sent: Fri, March 4, 2011 10:45:05 AM
Subject: Re: When Index is Updated Frequently

On Fri, Mar 4, 2011 at 10:09 AM, Bing Li <[hidden email]> wrote:

> According to my experiences, when the Lucene index updated frequently, its
> performance must become low. Is it correct?

In fact Lucene can gracefully handle a high rate of updates with low
latency turnaround on the readers, using the near-real-time (NRT) API
-- IndexWriter.getReader() (or in soon-to-be 31,
IndexReader.open(IndexWriter)).

NRT is really something a hybrid of "eventual consistency" and
"immediate consistency", because it lets your app have full control
over how quickly changes must be visible by controlling when you
pull a new NRT reader.

That said, Lucene can't offer true immediate consistency at a high
update rate -- the time to open a new NRT reader is usually too costly
to do, eg, for every search.  But eg every 100 msec (say) is
reasonable (depending on many variables...).

So... for your app you should run some tests and see.  And please
report back.

(But, unfortunately, NRT hasn't been exposed in Solr yet...).

--
Mike

http://blog.mikemccandless.com
Reply | Threaded
Open this post in threaded view
|

RE: When Index is Updated Frequently

Jonathan Rochkind
In reply to this post by Bing Li
If you can make that solution work for you, I think it is a wise one which will serve you well. In some cases that solution won't work, because you _need_ the frequently changing data in Solr to be searched against in Solr.  But if you can get away without that, I think you will be well-served by keeping any data that doesn't need to be searched against by Solr in an external non-Solr store. It's really rarely a bad plan to just put in Solr what needs to be searched against in Solr -- whether or not the 'other' stuff changes frequently.

Only you (if anyone!) know enough about your requirements and plans to know how much of a problem it will be to have your 'mutable' data not in Solr, and thus not searchable with Solr.
________________________________________
From: Bing Li [[hidden email]]
Sent: Friday, March 04, 2011 3:21 PM
To: Michael McCandless
Cc: [hidden email]
Subject: Re: When Index is Updated Frequently

Dear Michael,

Thanks so much for your answer!

I have a question. If Lucene is good at updating, it must more loads on the
Solr cluster. So in my system, I will leave the large amount of crawled data
unchanged for ever. Meanwhile, I use a traditional database to keep mutable
data.

Fortunately, in most Internet systems, the amount of mutable data is much
less than that of immutable one.

How do you think about my solution?

Best,
LB

On Sat, Mar 5, 2011 at 2:45 AM, Michael McCandless <
[hidden email]> wrote:

> On Fri, Mar 4, 2011 at 10:09 AM, Bing Li <[hidden email]> wrote:
>
> > According to my experiences, when the Lucene index updated frequently,
> its
> > performance must become low. Is it correct?
>
> In fact Lucene can gracefully handle a high rate of updates with low
> latency turnaround on the readers, using the near-real-time (NRT) API
> -- IndexWriter.getReader() (or in soon-to-be 31,
> IndexReader.open(IndexWriter)).
>
> NRT is really something a hybrid of "eventual consistency" and
> "immediate consistency", because it lets your app have full control
> over how quickly changes must be visible by controlling when you
> pull a new NRT reader.
>
> That said, Lucene can't offer true immediate consistency at a high
> update rate -- the time to open a new NRT reader is usually too costly
> to do, eg, for every search.  But eg every 100 msec (say) is
> reasonable (depending on many variables...).
>
> So... for your app you should run some tests and see.  And please
> report back.
>
> (But, unfortunately, NRT hasn't been exposed in Solr yet...).
>
> --
> Mike
>
> http://blog.mikemccandless.com
>
Reply | Threaded
Open this post in threaded view
|

Re: When Index is Updated Frequently

Michael McCandless-2
In reply to this post by Bing Li
On Fri, Mar 4, 2011 at 3:21 PM, Bing Li <[hidden email]> wrote:

> I have a question. If Lucene is good at updating, it must more loads on the
> Solr cluster. So in my system, I will leave the large amount of crawled data
> unchanged for ever. Meanwhile, I use a traditional database to keep mutable
> data.
>
> Fortunately, in most Internet systems, the amount of mutable data is much
> less than that of immutable one.
>
> How do you think about my solution?

In general this sounds like a fine solution?  But I really don't know
enough specifics here to pass a proper judgement ;)

--
Mike

http://blog.mikemccandless.com
Reply | Threaded
Open this post in threaded view
|

Re: When Index is Updated Frequently

Michael McCandless-2
In reply to this post by gearond
On Fri, Mar 4, 2011 at 3:22 PM, Dennis Gearon <[hidden email]> wrote:

> Nearly 100ms? If any netizen ever complained about that, I'd 'round-file' the
> complaint. Internal to a single process's execution, well, mabye it's an issue.
> Not too hard to handle.

Well there are many caveats, but 100 msec is where (in my testing a
while back...) I was able to get down to and not see much impact on
ingest rate.  But, it's gonna depend heavily on how large the index
is, whether you are updating vs just adding, etc.

We've made a number of improvements to NRT recently that should make
things even faster.

For some reason many apps seem to think they need "immediate
consistency", but you have to pay a big price for that.

> Good job to the team that made it!

Thanks!  (on behalf of all Lucene contributors/devs)

--
Mike

http://blog.mikemccandless.com