Lucene Performance and usage alternatives

classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|

Lucene Performance and usage alternatives

ezer
I just made a program using the java api of Lucene. Its is working fine for my actually index size. But i am worried about performance with an biger index and simultaneous users access.

1) I am worried with the fact of having to make the program in java. I searched for alternative like the C Port, but i saw that the version used its a little old an no much people seem to use that.

2) I also thinking in compiling the code with cgj to generate native code and not use the jvm. Anybody tried it ? Can be an advantage that could aproximate to the performance of a C program ?

3) I wont use an application server, i will call the program directly from a php page, is there any architecture model suggested for doing that? I mean for preview many users accessing to the program. The fact of initiating one isntance each time someone do a query and opening the index should not degrade the performance?
Reply | Threaded
Open this post in threaded view
|

Re: Lucene Performance and usage alternatives

Grant Ingersoll-2
Before we go solving a problem that isn't necessarily there, can you  
share a bit about what sizes you are at currently?  Num docs, index  
size, query rate?

Have you looked at http://wiki.apache.org/lucene-java/BasicsOfPerformance 
   ?

-Grant

On Aug 5, 2008, at 10:21 AM, ezer wrote:

>
> I just made a program using the java api of Lucene. Its is working  
> fine for
> my actually index size. But i am worried about performance with an  
> biger
> index and simultaneous users access.
>
> 1) I am worried with the fact of having to make the program in java. I
> searched for alternative like the C Port, but i saw that the version  
> used
> its a little old an no much people seem to use that.
>
> 2) I also thinking in compiling the code with cgj to generate native  
> code
> and not use the jvm. Anybody tried it ? Can be an advantage that could
> aproximate to the performance of a C program ?
>
> 3) I wont use an application server, i will call the program  
> directly from a
> php page, is there any architecture model suggested for doing that?  
> I mean
> for preview many users accessing to the program. The fact of  
> initiating one
> isntance each time someone do a query and opening the index should not
> degrade the performance?

You shouldn't be instantiating a Reader/Searcher for each query.  See  
the link above.

>
> --
> View this message in context: http://www.nabble.com/Lucene-Performance-and-usage-alternatives-tp18832162p18832162.html
> Sent from the Lucene - General mailing list archive at Nabble.com.
>


Reply | Threaded
Open this post in threaded view
|

Re: Lucene Performance and usage alternatives

ezer
Yes i saw that.. it talks about performance, but not about the variants i mentioned before.
Actually i tested indexing a database of about 200.000 registers. As i mentioned it works fine with response of less than a second. But this database can grow to millions of registers, and not sure if i am choosing the best architecture for that step to allow simultaneous accesing.

Thanks for the help

Grant Ingersoll-6 wrote
Before we go solving a problem that isn't necessarily there, can you  
share a bit about what sizes you are at currently?  Num docs, index  
size, query rate?

Have you looked at http://wiki.apache.org/lucene-java/BasicsOfPerformance 
   ?

-Grant

On Aug 5, 2008, at 10:21 AM, ezer wrote:

>
> I just made a program using the java api of Lucene. Its is working  
> fine for
> my actually index size. But i am worried about performance with an  
> biger
> index and simultaneous users access.
>
> 1) I am worried with the fact of having to make the program in java. I
> searched for alternative like the C Port, but i saw that the version  
> used
> its a little old an no much people seem to use that.
>
> 2) I also thinking in compiling the code with cgj to generate native  
> code
> and not use the jvm. Anybody tried it ? Can be an advantage that could
> aproximate to the performance of a C program ?
>
> 3) I wont use an application server, i will call the program  
> directly from a
> php page, is there any architecture model suggested for doing that?  
> I mean
> for preview many users accessing to the program. The fact of  
> initiating one
> isntance each time someone do a query and opening the index should not
> degrade the performance?

You shouldn't be instantiating a Reader/Searcher for each query.  See  
the link above.

>
> --
> View this message in context: http://www.nabble.com/Lucene-Performance-and-usage-alternatives-tp18832162p18832162.html
> Sent from the Lucene - General mailing list archive at Nabble.com.
>

Reply | Threaded
Open this post in threaded view
|

Re: Lucene Performance and usage alternatives

Grant Ingersoll-2
My point is more that you don't necessarily need to go looking for  
variants.  I've seen Lucene Java scale to millions no problem.  I  
talked w/ a guy using Solr this past week who had ~80 million records  
in a single 80 gb index on one machine.

If I had a PHP front end, I would most likely start with Solr and it's  
PHP client.  No sense in reinventing the wheel, IMO.

On Aug 5, 2008, at 11:15 AM, ezer wrote:

>
> Yes i saw that.. it talks about performance, but not about the  
> variants i
> mentioned before.
> Actually i tested indexing a database of about 200.000 registers. As i
> mentioned it works fine with response of less than a second. But this
> database can grow to millions of registers, and not sure if i am  
> choosing
> the best architecture for that step to allow simultaneous accesing.
>
> Thanks for the help
>
>
> Grant Ingersoll-6 wrote:
>>
>> Before we go solving a problem that isn't necessarily there, can you
>> share a bit about what sizes you are at currently?  Num docs, index
>> size, query rate?
>>
>> Have you looked at http://wiki.apache.org/lucene-java/BasicsOfPerformance
>>   ?
>>
>> -Grant
>>
>> On Aug 5, 2008, at 10:21 AM, ezer wrote:
>>
>>>
>>> I just made a program using the java api of Lucene. Its is working
>>> fine for
>>> my actually index size. But i am worried about performance with an
>>> biger
>>> index and simultaneous users access.
>>>
>>> 1) I am worried with the fact of having to make the program in  
>>> java. I
>>> searched for alternative like the C Port, but i saw that the version
>>> used
>>> its a little old an no much people seem to use that.
>>>
>>> 2) I also thinking in compiling the code with cgj to generate native
>>> code
>>> and not use the jvm. Anybody tried it ? Can be an advantage that  
>>> could
>>> aproximate to the performance of a C program ?
>>>
>>> 3) I wont use an application server, i will call the program
>>> directly from a
>>> php page, is there any architecture model suggested for doing that?
>>> I mean
>>> for preview many users accessing to the program. The fact of
>>> initiating one
>>> isntance each time someone do a query and opening the index should  
>>> not
>>> degrade the performance?
>>
>> You shouldn't be instantiating a Reader/Searcher for each query.  See
>> the link above.
>>
>>>
>>> --
>>> View this message in context:
>>> http://www.nabble.com/Lucene-Performance-and-usage-alternatives-tp18832162p18832162.html
>>> Sent from the Lucene - General mailing list archive at Nabble.com.
>>>
>>
>>
>>
>>
>
> --
> View this message in context: http://www.nabble.com/Lucene-Performance-and-usage-alternatives-tp18832162p18833292.html
> Sent from the Lucene - General mailing list archive at Nabble.com.
>






Reply | Threaded
Open this post in threaded view
|

Re: Lucene Performance and usage alternatives

ezer
In reply to this post by ezer
Grant, wich other information can i provide in order to clarify my questions?


ezer wrote
Yes i saw that.. it talks about performance, but not about the variants i mentioned before.
Actually i tested indexing a database of about 200.000 registers. As i mentioned it works fine with response of less than a second. But this database can grow to millions of registers, and not sure if i am choosing the best architecture for that step to allow simultaneous accesing.

Thanks for the help

Grant Ingersoll-6 wrote
Before we go solving a problem that isn't necessarily there, can you  
share a bit about what sizes you are at currently?  Num docs, index  
size, query rate?

Have you looked at http://wiki.apache.org/lucene-java/BasicsOfPerformance 
   ?

-Grant

On Aug 5, 2008, at 10:21 AM, ezer wrote:

>
> I just made a program using the java api of Lucene. Its is working  
> fine for
> my actually index size. But i am worried about performance with an  
> biger
> index and simultaneous users access.
>
> 1) I am worried with the fact of having to make the program in java. I
> searched for alternative like the C Port, but i saw that the version  
> used
> its a little old an no much people seem to use that.
>
> 2) I also thinking in compiling the code with cgj to generate native  
> code
> and not use the jvm. Anybody tried it ? Can be an advantage that could
> aproximate to the performance of a C program ?
>
> 3) I wont use an application server, i will call the program  
> directly from a
> php page, is there any architecture model suggested for doing that?  
> I mean
> for preview many users accessing to the program. The fact of  
> initiating one
> isntance each time someone do a query and opening the index should not
> degrade the performance?

You shouldn't be instantiating a Reader/Searcher for each query.  See  
the link above.

>
> --
> View this message in context: http://www.nabble.com/Lucene-Performance-and-usage-alternatives-tp18832162p18832162.html
> Sent from the Lucene - General mailing list archive at Nabble.com.
>

Reply | Threaded
Open this post in threaded view
|

Re: Lucene Performance and usage alternatives

Stefan Groschupf
In reply to this post by ezer
An alternative is always to distribute the index to a set of servers.  
If you need to scale I guess this is the only long term perspective.
You can do your own home grown lucene distribution or look into  
existing one.
I'm currently working on katta (http://katta.wiki.sourceforge.net/) -  
there is no release yet but we are in the QA and test cycles.
But there are other as well - solar for example provides distribution  
as well.

Stefan


On Aug 5, 2008, at 7:21 AM, ezer wrote:

>
> I just made a program using the java api of Lucene. Its is working  
> fine for
> my actually index size. But i am worried about performance with an  
> biger
> index and simultaneous users access.
>
> 1) I am worried with the fact of having to make the program in java. I
> searched for alternative like the C Port, but i saw that the version  
> used
> its a little old an no much people seem to use that.
>
> 2) I also thinking in compiling the code with cgj to generate native  
> code
> and not use the jvm. Anybody tried it ? Can be an advantage that could
> aproximate to the performance of a C program ?
>
> 3) I wont use an application server, i will call the program  
> directly from a
> php page, is there any architecture model suggested for doing that?  
> I mean
> for preview many users accessing to the program. The fact of  
> initiating one
> isntance each time someone do a query and opening the index should not
> degrade the performance?
> --
> View this message in context: http://www.nabble.com/Lucene-Performance-and-usage-alternatives-tp18832162p18832162.html
> Sent from the Lucene - General mailing list archive at Nabble.com.
>
>

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
101tec Inc.
Menlo Park, California, USA
http://www.101tec.com


Reply | Threaded
Open this post in threaded view
|

Re: Lucene Performance and usage alternatives

ezer
In reply to this post by Grant Ingersoll-2
Thanks Stefan and Grant.
Yes solr seems very intresting i tried once, i am seeing now the part of the php client you mentioned.
What hapens if rhater than starting a server that opens a port to listen to requests, i call from php every time i need to search using for example exec(theSearchingProgram...., $arrayResult). By now is the solution i am testing, but i am not sure if it is an elegant way of use this. I would like to know the pros and cons from each solution, in the first instance i think that opening a port has a  security issue behind.


Grant Ingersoll-6 wrote
My point is more that you don't necessarily need to go looking for  
variants.  I've seen Lucene Java scale to millions no problem.  I  
talked w/ a guy using Solr this past week who had ~80 million records  
in a single 80 gb index on one machine.

If I had a PHP front end, I would most likely start with Solr and it's  
PHP client.  No sense in reinventing the wheel, IMO.

On Aug 5, 2008, at 11:15 AM, ezer wrote:

>
> Yes i saw that.. it talks about performance, but not about the  
> variants i
> mentioned before.
> Actually i tested indexing a database of about 200.000 registers. As i
> mentioned it works fine with response of less than a second. But this
> database can grow to millions of registers, and not sure if i am  
> choosing
> the best architecture for that step to allow simultaneous accesing.
>
> Thanks for the help
>
>
> Grant Ingersoll-6 wrote:
>>
>> Before we go solving a problem that isn't necessarily there, can you
>> share a bit about what sizes you are at currently?  Num docs, index
>> size, query rate?
>>
>> Have you looked at http://wiki.apache.org/lucene-java/BasicsOfPerformance
>>   ?
>>
>> -Grant
>>
>> On Aug 5, 2008, at 10:21 AM, ezer wrote:
>>
>>>
>>> I just made a program using the java api of Lucene. Its is working
>>> fine for
>>> my actually index size. But i am worried about performance with an
>>> biger
>>> index and simultaneous users access.
>>>
>>> 1) I am worried with the fact of having to make the program in  
>>> java. I
>>> searched for alternative like the C Port, but i saw that the version
>>> used
>>> its a little old an no much people seem to use that.
>>>
>>> 2) I also thinking in compiling the code with cgj to generate native
>>> code
>>> and not use the jvm. Anybody tried it ? Can be an advantage that  
>>> could
>>> aproximate to the performance of a C program ?
>>>
>>> 3) I wont use an application server, i will call the program
>>> directly from a
>>> php page, is there any architecture model suggested for doing that?
>>> I mean
>>> for preview many users accessing to the program. The fact of
>>> initiating one
>>> isntance each time someone do a query and opening the index should  
>>> not
>>> degrade the performance?
>>
>> You shouldn't be instantiating a Reader/Searcher for each query.  See
>> the link above.
>>
>>>
>>> --
>>> View this message in context:
>>> http://www.nabble.com/Lucene-Performance-and-usage-alternatives-tp18832162p18832162.html
>>> Sent from the Lucene - General mailing list archive at Nabble.com.
>>>
>>
>>
>>
>>
>
> --
> View this message in context: http://www.nabble.com/Lucene-Performance-and-usage-alternatives-tp18832162p18833292.html
> Sent from the Lucene - General mailing list archive at Nabble.com.
>





Reply | Threaded
Open this post in threaded view
|

Re: Lucene Performance and usage alternatives

Grant Ingersoll-2

On Aug 5, 2008, at 2:29 PM, ezer wrote:

>
> Thanks Stefan and Grant.
> Yes solr seems very intresting i tried once, i am seeing now the  
> part of the
> php client you mentioned.
> What hapens if rhater than starting a server that opens a port to  
> listen to
> requests, i call from php every time i need to search using for  
> example
> exec(theSearchingProgram...., $arrayResult).

That won't perform.  The main cost of searching is loading up the  
index and you would have to do that every time.

> By now is the solution i am
> testing, but i am not sure if it is an elegant way of use this. I  
> would like
> to know the pros and cons from each solution, in the first instance  
> i think
> that opening a port has a  security issue behind.

What kind of environment are you in that you can't secure the port?  
I'm not a security expert, but starting points would be to allow only  
from a given IP, use SSL, put behind a firewall, etc.   Treat Solr  
just as you treat a database in the typical tiered architecture.

-Grant
Reply | Threaded
Open this post in threaded view
|

Re: Lucene Performance and usage alternatives

Grant Ingersoll-2
In reply to this post by ezer
Ezer,

I've never tried it, but I just downloaded the wpSearch Wordpress  
plugin, that uses Zend Search for Lucene: http://devzone.zend.com/node/view/id/91

So, it seems you could do PHP search that way, too.

-Grant

On Aug 5, 2008, at 2:29 PM, ezer wrote:

>
> Thanks Stefan and Grant.
> Yes solr seems very intresting i tried once, i am seeing now the  
> part of the
> php client you mentioned.
> What hapens if rhater than starting a server that opens a port to  
> listen to
> requests, i call from php every time i need to search using for  
> example
> exec(theSearchingProgram...., $arrayResult). By now is the solution  
> i am
> testing, but i am not sure if it is an elegant way of use this. I  
> would like
> to know the pros and cons from each solution, in the first instance  
> i think
> that opening a port has a  security issue behind.
>
>
>
> Grant Ingersoll-6 wrote:
>>
>> My point is more that you don't necessarily need to go looking for
>> variants.  I've seen Lucene Java scale to millions no problem.  I
>> talked w/ a guy using Solr this past week who had ~80 million records
>> in a single 80 gb index on one machine.
>>
>> If I had a PHP front end, I would most likely start with Solr and  
>> it's
>> PHP client.  No sense in reinventing the wheel, IMO.
>>
>> On Aug 5, 2008, at 11:15 AM, ezer wrote:
>>
>>>
>>> Yes i saw that.. it talks about performance, but not about the
>>> variants i
>>> mentioned before.
>>> Actually i tested indexing a database of about 200.000 registers.  
>>> As i
>>> mentioned it works fine with response of less than a second. But  
>>> this
>>> database can grow to millions of registers, and not sure if i am
>>> choosing
>>> the best architecture for that step to allow simultaneous accesing.
>>>
>>> Thanks for the help
>>>
>>>
>>> Grant Ingersoll-6 wrote:
>>>>
>>>> Before we go solving a problem that isn't necessarily there, can  
>>>> you
>>>> share a bit about what sizes you are at currently?  Num docs, index
>>>> size, query rate?
>>>>
>>>> Have you looked at
>>>> http://wiki.apache.org/lucene-java/BasicsOfPerformance
>>>>  ?
>>>>
>>>> -Grant
>>>>
>>>> On Aug 5, 2008, at 10:21 AM, ezer wrote:
>>>>
>>>>>
>>>>> I just made a program using the java api of Lucene. Its is working
>>>>> fine for
>>>>> my actually index size. But i am worried about performance with an
>>>>> biger
>>>>> index and simultaneous users access.
>>>>>
>>>>> 1) I am worried with the fact of having to make the program in
>>>>> java. I
>>>>> searched for alternative like the C Port, but i saw that the  
>>>>> version
>>>>> used
>>>>> its a little old an no much people seem to use that.
>>>>>
>>>>> 2) I also thinking in compiling the code with cgj to generate  
>>>>> native
>>>>> code
>>>>> and not use the jvm. Anybody tried it ? Can be an advantage that
>>>>> could
>>>>> aproximate to the performance of a C program ?
>>>>>
>>>>> 3) I wont use an application server, i will call the program
>>>>> directly from a
>>>>> php page, is there any architecture model suggested for doing  
>>>>> that?
>>>>> I mean
>>>>> for preview many users accessing to the program. The fact of
>>>>> initiating one
>>>>> isntance each time someone do a query and opening the index should
>>>>> not
>>>>> degrade the performance?
>>>>
>>>> You shouldn't be instantiating a Reader/Searcher for each query.  
>>>> See
>>>> the link above.
>>>>
>>>>>
>>>>> --
>>>>> View this message in context:
>>>>> http://www.nabble.com/Lucene-Performance-and-usage-alternatives-tp18832162p18832162.html
>>>>> Sent from the Lucene - General mailing list archive at Nabble.com.
>>>>>
>>>>
>>>>
>>>>
>>>>
>>>
>>> --
>>> View this message in context:
>>> http://www.nabble.com/Lucene-Performance-and-usage-alternatives-tp18832162p18833292.html
>>> Sent from the Lucene - General mailing list archive at Nabble.com.
>>>
>>
>>
>>
>>
>>
>>
>>
>>
>
> --
> View this message in context: http://www.nabble.com/Lucene-Performance-and-usage-alternatives-tp18832162p18837195.html
> Sent from the Lucene - General mailing list archive at Nabble.com.
>

--------------------------
Grant Ingersoll
http://www.lucidimagination.com

Lucene Helpful Hints:
http://wiki.apache.org/lucene-java/BasicsOfPerformance
http://wiki.apache.org/lucene-java/LuceneFAQ