[lucy-user] Perfomance issue

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

[lucy-user] Perfomance issue

Gerald Richter
Hi,

I am using Lucy to index documents that are stored in an Apache CouchDB that changes very often.

There is one process that runs the updates to Lucy and several processes that query the index.

I have two questions:

1.) As far as I see I have to commit and recreate the indexer every time when I have made changes, otherwise the changes will not be seen by the other processes (or even the process itself). On the other side, I have to destroy and recreate the SearchIndexer to see the new documents in the index.

While searching itself takes only 10-30 ms. The process of destroy/commit and recreate takes up to 400ms. This makes things slow.

Is there a different way to make changes visible?

2.) From time to time I have to restart the process that heavily uses the SearchIndexer. Searching gets very slow (up 10-60 seconds, instead of milliseconds). Simply restarting the process fixes this, so it's not an issue on how the index is organized on disk. Any idea how to track down this?

Thanks & Regards

Gerald



Reply | Threaded
Open this post in threaded view
|

Re: [lucy-user] Perfomance issue

Nick Wellnhofer
On 05/08/2015 06:36, [hidden email] wrote:
> I have two questions:
>
> 1.) As far as I see I have to commit and recreate the indexer every time when I have made changes, otherwise the changes will not be seen by the other processes (or even the process itself). On the other side, I have to destroy and recreate the SearchIndexer to see the new documents in the index.

It's enough to call "commit" on an Indexer to make changes visible. You
shouldn't recreate an Indexer immediately after comitting because every new
Indexer holds a write lock on the index until it's committed. Just create a
Indexer before adding documents, then call commit and destroy it.

Since you mention "destroy" explicitly, are you using the C bindings or the
Perl bindings?

> While searching itself takes only 10-30 ms. The process of destroy/commit and recreate takes up to 400ms. This makes things slow.

How many documents do you add in an indexing run? Adding only a few small
documents should typically be faster than 400ms, but sometimes, it can take
longer if some larger segments have to be merged. See the FastUpdates guide in
the Lucy cookbook for how to make updates consistently fast:

     https://metacpan.org/pod/Lucy::Docs::Cookbook::FastUpdates

> 2.) From time to time I have to restart the process that heavily uses the SearchIndexer. Searching gets very slow (up 10-60 seconds, instead of milliseconds). Simply restarting the process fixes this, so it's not an issue on how the index is organized on disk. Any idea how to track down this?

First, I'd try to find out which call into Lucy takes so long, whether the
process is consuming CPU the whole time, and how the overall memory behavior
of the process looks like. If the process is hanging for multiple seconds, you
could also try to attach a debugger to the running process and see where it hangs.

Does this process only use IndexSearcher or does it also use Indexer? If
there's an uncommitted Indexer, it might be a locking issue. But you'd
probably get a lock timeout error in this case.

Nick

Reply | Threaded
Open this post in threaded view
|

AW: [lucy-user] Perfomance issue

Gerald Richter
Hi Nick,

thanks for your feedback.

> > I have two questions:
> >
> > 1.) As far as I see I have to commit and recreate the indexer every time
> when I have made changes, otherwise the changes will not be seen by the
> other processes (or even the process itself). On the other side, I have to
> destroy and recreate the SearchIndexer to see the new documents in the
> index.
>
> It's enough to call "commit" on an Indexer to make changes visible. You
> shouldn't recreate an Indexer immediately after comitting because every
> new Indexer holds a write lock on the index until it's committed. Just create a
> Indexer before adding documents, then call commit and destroy it.

Yes, I didn't gone to much into details in my first mail, but this is exactly how I handle it.

Since there might be a lot of changes, the question is has the write lock any impact on searching the index?

>
> Since you mention "destroy" explicitly, are you using the C bindings or the
> Perl bindings?

I use Perl

>
> > While searching itself takes only 10-30 ms. The process of destroy/commit
> and recreate takes up to 400ms. This makes things slow.
>
> How many documents do you add in an indexing run?

About 1-8

> Adding only a few
> small documents should typically be faster than 400ms, but sometimes, it can
> take longer if some larger segments have to be merged.

At the moment it takes constantly about 800ms...

> See the
> FastUpdates guide in the Lucy cookbook for how to make updates
> consistently fast:
>
>      https://metacpan.org/pod/Lucy::Docs::Cookbook::FastUpdates

I have tried this, but this causes things to behave very bad. Instead of speeding things up, indexing and search gets very slow.

>
> > 2.) From time to time I have to restart the process that heavily uses the
> SearchIndexer. Searching gets very slow (up 10-60 seconds, instead of
> milliseconds). Simply restarting the process fixes this, so it's not an issue on
> how the index is organized on disk. Any idea how to track down this?
>
> First, I'd try to find out which call into Lucy takes so long,

As far as I can see it is the loop that reads the result with $hits->next

> whether the process
> is consuming CPU the whole time, and how the overall memory behavior of
> the process looks like. If the process is hanging for multiple seconds, you
> could also try to attach a debugger to the running process and see where it
> hangs.
>

It's a production system (it not happens in the test system) and I do a restart every night, so user don't run into this problems, but from time to time it still happens.

When I see it the next time, I will try to investigate deeper what's going on. I there some kind of logging I can turn on to see what Lucy is doing in such a case?

> Does this process only use IndexSearcher or does it also use Indexer? If
> there's an uncommitted Indexer, it might be a locking issue. But you'd
> probably get a lock timeout error in this case.
>

I have one process that is only using Indexer and multiple other processes that are only using IndexSearcher.

Just to summarize, there are two (different) issues:

1.) The normal behavior: There are many changes in the index after every few changes I want to become these changes visible in other processes. At the moment I commit in the process that runs the Indexer and need to destroy and recreate a new IndexSearcherr to see the changes. This all over process takes even under good conditions 200-400ms. This is decreases the performance of the whole application. So the question is, is there some way to faster see the changes in other processes?

2.) The second issue is, that from time to time the search time increases drastically, so it goes up to serveral seconds and more

Regards

Gerald

P.S. The system is a vm with 16GB ram, 6 cores and is running on a SSD and most times 98% idle, so it should not be a performance issues of the host itself.

Reply | Threaded
Open this post in threaded view
|

Re: [lucy-user] Perfomance issue

Marvin Humphrey
On Sat, Aug 8, 2015 at 7:48 AM,  <[hidden email]> wrote:

>> First, I'd try to find out which call into Lucy takes so long,
>
> As far as I can see it is the loop that reads the result with $hits->next

All that Hits#next does is read a document from disk.  At that point, the
search has been completed and we have a list of document IDs; each call to
next() takes one of those IDs, finds the location of the data and reads from
mmap'd memory to deserialize it into a hash.

Performance of next() scales with the size of the document fetched, not with
the size of the index (except insofar as index size affects OS cache
performance) or the complexity of the search query.  Text data in the document
store is not compressed.

How large are these documents?  If they are small, yet each iteration of the
$hits->next loop is unacceptably slow, it it hard to understand how the code
Lucy is executing could be the source of the slowdown.

> It's a production system (it not happens in the test system)

!!!

> 2.) The second issue is, that from time to time the search time increases
>     drastically, so it goes up to serveral seconds and more

You had mentioned before that restarting the process fixes this issue.  How
much memory is the process consuming when you restart it?

> P.S. The system is a vm with 16GB ram, 6 cores and is running on a SSD and
>      most times 98% idle, so it should not be a performance issues of the
>      host itself.

What do graphs of memory and IO on this box look like?

Marvin Humphrey