[lucy-user] Large index sizes

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

[lucy-user] Large index sizes

Edwin Crockford
Have recently built started to use Lucy (with Perl) and everything went
well until I tried to index a large file store (>300,000 files). The
indexer process reached >8Bbytes and the machine ran out of resources.
My questions are:

a) Is this the normal resources requirements?

b) Is there a way to avoid swamping machines?

I also found that the searcher becomes very large for large indexes and
as ours runs as a part of a FastCGI process it exceeded the ulimit of
the process. Upping the ulimit fixed this, but diagnosing the issue was
difficult as the query would just return 0 results rather than
indicating that it had run out of procees space.

Many thanks

Edwin Crockford
Reply | Threaded
Open this post in threaded view
|

Re: [lucy-user] Large index sizes

Bob Bruen

Hi,

I have indexed millions of files, ending up with a 127G index file, which
works fine. There are enough resources for this.

I also tried to do the same with 10s of millions, but the indexing process
never could finish, even with enough resources (index file ~400G). It
kept updating one file a tiny bit every few minutes. I think I could do a
better job in the code, but I have not been able to get back to it yet.

             -bob


On Thu, 25 Apr 2013, Edwin Crockford wrote:

> Have recently built started to use Lucy (with Perl) and everything went well
> until I tried to index a large file store (>300,000 files). The indexer
> process reached >8Bbytes and the machine ran out of resources. My questions
> are:
>
> a) Is this the normal resources requirements?
>
> b) Is there a way to avoid swamping machines?
>
> I also found that the searcher becomes very large for large indexes and as
> ours runs as a part of a FastCGI process it exceeded the ulimit of the
> process. Upping the ulimit fixed this, but diagnosing the issue was difficult
> as the query would just return 0 results rather than indicating that it had
> run out of procees space.
>
> Many thanks
>
> Edwin Crockford
>

--
Dr. Robert Bruen
Cold Rain Labs
http://coldrain.net/bruen
+1.802.579.6288

Reply | Threaded
Open this post in threaded view
|

Re: [lucy-user] Large index sizes

Edwin Crockford
Hi Bob,

Many thanks for the quick reply, it looks like we will have to beef up
the machine a bit. Currently the largest index we have successfully
built is 2G so still along ways below your figures. I notice there is a
feature to search multiple indexes simultaneously
(Lucy::Search::PolySearcher). Is this a possible way around our resource
issue, split the index into small ones and then do a polysearch across
them all, or is there a noticeable performance hit?

Regards
Edwin

On 25/04/2013 13:16, Bob Bruen wrote:

>
> Hi,
>
> I have indexed millions of files, ending up with a 127G index file,
> which works fine. There are enough resources for this.
>
> I also tried to do the same with 10s of millions, but the indexing
> process never could finish, even with enough resources (index file
> ~400G). It kept updating one file a tiny bit every few minutes. I
> think I could do a better job in the code, but I have not been able to
> get back to it yet.
>
>             -bob
>
>
> On Thu, 25 Apr 2013, Edwin Crockford wrote:
>
>> Have recently built started to use Lucy (with Perl) and everything
>> went well until I tried to index a large file store (>300,000 files).
>> The indexer process reached >8Bbytes and the machine ran out of
>> resources. My questions are:
>>
>> a) Is this the normal resources requirements?
>>
>> b) Is there a way to avoid swamping machines?
>>
>> I also found that the searcher becomes very large for large indexes
>> and as ours runs as a part of a FastCGI process it exceeded the
>> ulimit of the process. Upping the ulimit fixed this, but diagnosing
>> the issue was difficult as the query would just return 0 results
>> rather than indicating that it had run out of procees space.
>>
>> Many thanks
>>
>> Edwin Crockford
>>
>

Reply | Threaded
Open this post in threaded view
|

Re: [lucy-user] Large index sizes

Thomas den Braber
Edwin,

I have a setup with two indexes of 1.2 million files each (small files). I have joined
them together in a union with the PolySearcher, it does hot have a noticeable speed
difference than searching in one index.
This way you can join one static and one dynamic indexer together to get best
update/search time.

//Thomas