Sorting performance

classic Classic list List threaded Threaded
22 messages Options
12
Reply | Threaded
Open this post in threaded view
|

Sorting performance

lec74
Hi,

I'm doing some tests with Solr1.3
I have loaded around 7M documents, each with a few stored and indexed
fields.

This query: text:sometext returns the results, sorted by score in a few
milliseconds. (I display 10 out of 8747 matched documents)
This one: text:sometext;id desc   takes something like 60s or more to
return the data (when it doesn't fails with an out of memory error). (id
is a string type).
I have tried to display only id, same results.

Any ideas ? I'm sure I'm doing something wrong.....

My schema is based on the sample, with the following fields:

   <field name="id" type="string" indexed="true" stored="true" required="true" />
   <field name="url" type="string" indexed="true" stored="true"/>
   <field name="type" type="string" indexed="true" stored="true"/>
   <field name="title" type="string" indexed="true" stored="true"/>
   <field name="text" type="text" indexed="true" stored="true" />
   <field name="tag" type="string" indexed="true" stored="true" multiValued="true" />
   <field name="timestamp" type="date" indexed="true" stored="true" default="NOW" multiValued="false"/>
   <dynamicField name="*" type="ignored" />


Thanks
Christophe



Reply | Threaded
Open this post in threaded view
|

Re: Sorting performance

Otis Gospodnetic-2
Is the sorted query slow only the first time or every time you run it?

You got an OOM?  What -Xmx value are you using?  Try increasing it.

Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



----- Original Message ----

> From: christophe <[hidden email]>
> To: [hidden email]
> Sent: Friday, October 17, 2008 1:28:52 PM
> Subject: Sorting performance
>
> Hi,
>
> I'm doing some tests with Solr1.3
> I have loaded around 7M documents, each with a few stored and indexed
> fields.
>
> This query: text:sometext returns the results, sorted by score in a few
> milliseconds. (I display 10 out of 8747 matched documents)
> This one: text:sometext;id desc   takes something like 60s or more to
> return the data (when it doesn't fails with an out of memory error). (id
> is a string type).
> I have tried to display only id, same results.
>
> Any ideas ? I'm sure I'm doing something wrong.....
>
> My schema is based on the sample, with the following fields:
>
>  
> />
>  
>  
>  
>  
>  
> multiValued="true" />
>  
> default="NOW" multiValued="false"/>
>  
>
>
> Thanks
> Christophe

Reply | Threaded
Open this post in threaded view
|

Re: Sorting performance

lec74
It is slow each time I run it. (I test it from the Solr admin console or
from a JAVA program using the Http client).
I do not get the OOM each time.

Thx
Christophe

Otis Gospodnetic wrote:

> Is the sorted query slow only the first time or every time you run it?
>
> You got an OOM?  What -Xmx value are you using?  Try increasing it.
>
> Otis
> --
> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
>
>
>
> ----- Original Message ----
>  
>> From: christophe <[hidden email]>
>> To: [hidden email]
>> Sent: Friday, October 17, 2008 1:28:52 PM
>> Subject: Sorting performance
>>
>> Hi,
>>
>> I'm doing some tests with Solr1.3
>> I have loaded around 7M documents, each with a few stored and indexed
>> fields.
>>
>> This query: text:sometext returns the results, sorted by score in a few
>> milliseconds. (I display 10 out of 8747 matched documents)
>> This one: text:sometext;id desc   takes something like 60s or more to
>> return the data (when it doesn't fails with an out of memory error). (id
>> is a string type).
>> I have tried to display only id, same results.
>>
>> Any ideas ? I'm sure I'm doing something wrong.....
>>
>> My schema is based on the sample, with the following fields:
>>
>>  
>> />
>>  
>>  
>>  
>>  
>>  
>> multiValued="true" />
>>  
>> default="NOW" multiValued="false"/>
>>  
>>
>>
>> Thanks
>> Christophe
>>    
>
>  

Reply | Threaded
Open this post in threaded view
|

Re: Sorting performance

lec74
Here are the memory parameters I'm using now(Tomcat): -Xms2024m -Xmx2024m
With those values, the second query is way faster. Only the first one is
very slow.
Thanks for the tip.
However, I'm wondering if will be enough and I will not hit the same
issues when I will have many users searching at the same time: I will do
a stress test to check this.

Thanks
Christophe

christophe wrote:

> It is slow each time I run it. (I test it from the Solr admin console
> or from a JAVA program using the Http client).
> I do not get the OOM each time.
>
> Thx
> Christophe
>
> Otis Gospodnetic wrote:
>> Is the sorted query slow only the first time or every time you run it?
>>
>> You got an OOM?  What -Xmx value are you using?  Try increasing it.
>>
>> Otis
>> --
>> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
>>
>>
>>
>> ----- Original Message ----
>>  
>>> From: christophe <[hidden email]>
>>> To: [hidden email]
>>> Sent: Friday, October 17, 2008 1:28:52 PM
>>> Subject: Sorting performance
>>> Hi,
>>>
>>> I'm doing some tests with Solr1.3
>>> I have loaded around 7M documents, each with a few stored and
>>> indexed fields.
>>>
>>> This query: text:sometext returns the results, sorted by score in a
>>> few milliseconds. (I display 10 out of 8747 matched documents)
>>> This one: text:sometext;id desc   takes something like 60s or more
>>> to return the data (when it doesn't fails with an out of memory
>>> error). (id is a string type).
>>> I have tried to display only id, same results.
>>>
>>> Any ideas ? I'm sure I'm doing something wrong.....
>>>
>>> My schema is based on the sample, with the following fields:
>>>
>>>   />           multiValued="true" />
>>>   default="NOW" multiValued="false"/>
>>>  
>>>
>>> Thanks
>>> Christophe
>>>    
>>
>>  
>

Reply | Threaded
Open this post in threaded view
|

Re: Sorting performance

Mark Miller-3
You need to setup a warming query that sorts so that the initial long  
query is done behind the scenes. Users first query will then be fast.  
Solrconfig.

- Mark


On Oct 18, 2008, at 1:34 AM, christophe <[hidden email]>  
wrote:

> Here are the memory parameters I'm using now(Tomcat): -Xms2024m -
> Xmx2024m
> With those values, the second query is way faster. Only the first  
> one is very slow.
> Thanks for the tip.
> However, I'm wondering if will be enough and I will not hit the same  
> issues when I will have many users searching at the same time: I  
> will do a stress test to check this.
>
> Thanks
> Christophe
>
> christophe wrote:
>> It is slow each time I run it. (I test it from the Solr admin  
>> console or from a JAVA program using the Http client).
>> I do not get the OOM each time.
>>
>> Thx
>> Christophe
>>
>> Otis Gospodnetic wrote:
>>> Is the sorted query slow only the first time or every time you run  
>>> it?
>>>
>>> You got an OOM?  What -Xmx value are you using?  Try increasing it.
>>>
>>> Otis
>>> --
>>> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
>>>
>>>
>>>
>>> ----- Original Message ----
>>>
>>>> From: christophe <[hidden email]>
>>>> To: [hidden email]
>>>> Sent: Friday, October 17, 2008 1:28:52 PM
>>>> Subject: Sorting performance
>>>> Hi,
>>>>
>>>> I'm doing some tests with Solr1.3
>>>> I have loaded around 7M documents, each with a few stored and  
>>>> indexed fields.
>>>>
>>>> This query: text:sometext returns the results, sorted by score in  
>>>> a few milliseconds. (I display 10 out of 8747 matched documents)
>>>> This one: text:sometext;id desc   takes something like 60s or  
>>>> more to return the data (when it doesn't fails with an out of  
>>>> memory error). (id is a string type).
>>>> I have tried to display only id, same results.
>>>>
>>>> Any ideas ? I'm sure I'm doing something wrong.....
>>>>
>>>> My schema is based on the sample, with the following fields:
>>>>
>>>>  />           multiValued="true" />
>>>>  default="NOW" multiValued="false"/>
>>>>
>>>> Thanks
>>>> Christophe
>>>>
>>>
>>>
>>
>
Reply | Threaded
Open this post in threaded view
|

Re: Sorting performance

lec74
Will do so. Thanks.
Are there any metrics on how to compute memory requirements (based on
doc average size, number of sorted fields, number of indexed documents +
number of new document / day) ?

Thanks
Christophe


Mark Miller wrote:

> You need to setup a warming query that sorts so that the initial long
> query is done behind the scenes. Users first query will then be fast.
> Solrconfig.
>
> - Mark
>
>
> On Oct 18, 2008, at 1:34 AM, christophe <[hidden email]>
> wrote:
>
>> Here are the memory parameters I'm using now(Tomcat): -Xms2024m
>> -Xmx2024m
>> With those values, the second query is way faster. Only the first one
>> is very slow.
>> Thanks for the tip.
>> However, I'm wondering if will be enough and I will not hit the same
>> issues when I will have many users searching at the same time: I will
>> do a stress test to check this.
>>
>> Thanks
>> Christophe
>>
>> christophe wrote:
>>> It is slow each time I run it. (I test it from the Solr admin
>>> console or from a JAVA program using the Http client).
>>> I do not get the OOM each time.
>>>
>>> Thx
>>> Christophe
>>>
>>> Otis Gospodnetic wrote:
>>>> Is the sorted query slow only the first time or every time you run it?
>>>>
>>>> You got an OOM?  What -Xmx value are you using?  Try increasing it.
>>>>
>>>> Otis
>>>> --
>>>> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
>>>>
>>>>
>>>>
>>>> ----- Original Message ----
>>>>
>>>>> From: christophe <[hidden email]>
>>>>> To: [hidden email]
>>>>> Sent: Friday, October 17, 2008 1:28:52 PM
>>>>> Subject: Sorting performance
>>>>> Hi,
>>>>>
>>>>> I'm doing some tests with Solr1.3
>>>>> I have loaded around 7M documents, each with a few stored and
>>>>> indexed fields.
>>>>>
>>>>> This query: text:sometext returns the results, sorted by score in
>>>>> a few milliseconds. (I display 10 out of 8747 matched documents)
>>>>> This one: text:sometext;id desc   takes something like 60s or more
>>>>> to return the data (when it doesn't fails with an out of memory
>>>>> error). (id is a string type).
>>>>> I have tried to display only id, same results.
>>>>>
>>>>> Any ideas ? I'm sure I'm doing something wrong.....
>>>>>
>>>>> My schema is based on the sample, with the following fields:
>>>>>
>>>>>  />           multiValued="true" />
>>>>>  default="NOW" multiValued="false"/>
>>>>>
>>>>> Thanks
>>>>> Christophe
>>>>>
>>>>
>>>>
>>>
>>

Reply | Threaded
Open this post in threaded view
|

Re: Sorting performance

lec74
In reply to this post by Mark Miller-3
When I start indexing new documents, searches are taking long time
again: is the sort cache flushed when new documents are indexed ?

Thanks
Christophe

Mark Miller wrote:

> You need to setup a warming query that sorts so that the initial long
> query is done behind the scenes. Users first query will then be fast.
> Solrconfig.
>
> - Mark
>
>
> On Oct 18, 2008, at 1:34 AM, christophe <[hidden email]>
> wrote:
>
>> Here are the memory parameters I'm using now(Tomcat): -Xms2024m
>> -Xmx2024m
>> With those values, the second query is way faster. Only the first one
>> is very slow.
>> Thanks for the tip.
>> However, I'm wondering if will be enough and I will not hit the same
>> issues when I will have many users searching at the same time: I will
>> do a stress test to check this.
>>
>> Thanks
>> Christophe
>>
>> christophe wrote:
>>> It is slow each time I run it. (I test it from the Solr admin
>>> console or from a JAVA program using the Http client).
>>> I do not get the OOM each time.
>>>
>>> Thx
>>> Christophe
>>>
>>> Otis Gospodnetic wrote:
>>>> Is the sorted query slow only the first time or every time you run it?
>>>>
>>>> You got an OOM?  What -Xmx value are you using?  Try increasing it.
>>>>
>>>> Otis
>>>> --
>>>> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
>>>>
>>>>
>>>>
>>>> ----- Original Message ----
>>>>
>>>>> From: christophe <[hidden email]>
>>>>> To: [hidden email]
>>>>> Sent: Friday, October 17, 2008 1:28:52 PM
>>>>> Subject: Sorting performance
>>>>> Hi,
>>>>>
>>>>> I'm doing some tests with Solr1.3
>>>>> I have loaded around 7M documents, each with a few stored and
>>>>> indexed fields.
>>>>>
>>>>> This query: text:sometext returns the results, sorted by score in
>>>>> a few milliseconds. (I display 10 out of 8747 matched documents)
>>>>> This one: text:sometext;id desc   takes something like 60s or more
>>>>> to return the data (when it doesn't fails with an out of memory
>>>>> error). (id is a string type).
>>>>> I have tried to display only id, same results.
>>>>>
>>>>> Any ideas ? I'm sure I'm doing something wrong.....
>>>>>
>>>>> My schema is based on the sample, with the following fields:
>>>>>
>>>>>  />           multiValued="true" />
>>>>>  default="NOW" multiValued="false"/>
>>>>>
>>>>> Thanks
>>>>> Christophe
>>>>>
>>>>
>>>>
>>>
>>

Reply | Threaded
Open this post in threaded view
|

query parsing issue + behavior as OR (solr 1.4-dev)

Sunil Sarje-2
I am working with nightly build of Oct 17, 2008  and found the issue that something wrong with LuceneQParserPlugin; It takes + as OR

e.g. q=first_name:joe+last_name:smith is behaving as OR instead of AND.
Default operator is set to AND in schema.xml
<solrQueryParser defaultOperator="AND"/>


Is there any new configuration I need to put in place in order to get this working ?

Thanks
-Sunil


Reply | Threaded
Open this post in threaded view
|

Re: Sorting performance

Erick Erickson
In reply to this post by lec74
Caches are specific to opening a searcher. So whenever you open a reader,
the caches are rebuilt for that server. If you are picking up your changes,
you
MUST be opening a new reader so yes, indeed, your caches are being flushed.

You can get around this by firing a few warmup queries at the server before
using it "for real".

If you are opening a new reader for each request, well, you shouldn't do
that <G>.

Best
Erick

On Mon, Oct 20, 2008 at 9:02 AM, christophe <[hidden email]>wrote:

> When I start indexing new documents, searches are taking long time again:
> is the sort cache flushed when new documents are indexed ?
>
> Thanks
> Christophe
>
> Mark Miller wrote:
>
>> You need to setup a warming query that sorts so that the initial long
>> query is done behind the scenes. Users first query will then be fast.
>> Solrconfig.
>>
>> - Mark
>>
>>
>> On Oct 18, 2008, at 1:34 AM, christophe <[hidden email]>
>> wrote:
>>
>>  Here are the memory parameters I'm using now(Tomcat): -Xms2024m -Xmx2024m
>>> With those values, the second query is way faster. Only the first one is
>>> very slow.
>>> Thanks for the tip.
>>> However, I'm wondering if will be enough and I will not hit the same
>>> issues when I will have many users searching at the same time: I will do a
>>> stress test to check this.
>>>
>>> Thanks
>>> Christophe
>>>
>>> christophe wrote:
>>>
>>>> It is slow each time I run it. (I test it from the Solr admin console or
>>>> from a JAVA program using the Http client).
>>>> I do not get the OOM each time.
>>>>
>>>> Thx
>>>> Christophe
>>>>
>>>> Otis Gospodnetic wrote:
>>>>
>>>>> Is the sorted query slow only the first time or every time you run it?
>>>>>
>>>>> You got an OOM?  What -Xmx value are you using?  Try increasing it.
>>>>>
>>>>> Otis
>>>>> --
>>>>> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
>>>>>
>>>>>
>>>>>
>>>>> ----- Original Message ----
>>>>>
>>>>>  From: christophe <[hidden email]>
>>>>>> To: [hidden email]
>>>>>> Sent: Friday, October 17, 2008 1:28:52 PM
>>>>>> Subject: Sorting performance
>>>>>> Hi,
>>>>>>
>>>>>> I'm doing some tests with Solr1.3
>>>>>> I have loaded around 7M documents, each with a few stored and indexed
>>>>>> fields.
>>>>>>
>>>>>> This query: text:sometext returns the results, sorted by score in a
>>>>>> few milliseconds. (I display 10 out of 8747 matched documents)
>>>>>> This one: text:sometext;id desc   takes something like 60s or more to
>>>>>> return the data (when it doesn't fails with an out of memory error). (id is
>>>>>> a string type).
>>>>>> I have tried to display only id, same results.
>>>>>>
>>>>>> Any ideas ? I'm sure I'm doing something wrong.....
>>>>>>
>>>>>> My schema is based on the sample, with the following fields:
>>>>>>
>>>>>>  />           multiValued="true" />
>>>>>>  default="NOW" multiValued="false"/>
>>>>>>
>>>>>> Thanks
>>>>>> Christophe
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>
Reply | Threaded
Open this post in threaded view
|

Re: Sorting performance

Mark Miller-3
In reply to this post by lec74
christophe wrote:
> When I start indexing new documents, searches are taking long time
> again: is the sort cache flushed when new documents are indexed ?

When you commit, a new Reader will be opened (or reopened) so that the
freshly added docs can be seen. This would make the first search slow
again, but if you have the warming queries, it should be warmed before
being put into use. Be sure the warming query sorts on the right field.

>
> Are there any metrics on how to compute memory requirements (based on
> doc average size, number of sorted fields, number of indexed documents
> + number of new document / day) ?

Depends on the field type, but I think its 32bits x numDocs for most
datatypes, with the String datatype also requiring an array of all the
unique terms to index into. Thats not everything, but it dominates.


> Thanks
> Christophe
> Mark Miller wrote:
>> You need to setup a warming query that sorts so that the initial long
>> query is done behind the scenes. Users first query will then be fast.
>> Solrconfig.
>>
>> - Mark
>>
>>
>> On Oct 18, 2008, at 1:34 AM, christophe <[hidden email]>
>> wrote:
>>
>>> Here are the memory parameters I'm using now(Tomcat): -Xms2024m
>>> -Xmx2024m
>>> With those values, the second query is way faster. Only the first
>>> one is very slow.
>>> Thanks for the tip.
>>> However, I'm wondering if will be enough and I will not hit the same
>>> issues when I will have many users searching at the same time: I
>>> will do a stress test to check this.
>>>
>>> Thanks
>>> Christophe
>>>
>>> christophe wrote:
>>>> It is slow each time I run it. (I test it from the Solr admin
>>>> console or from a JAVA program using the Http client).
>>>> I do not get the OOM each time.
>>>>
>>>> Thx
>>>> Christophe
>>>>
>>>> Otis Gospodnetic wrote:
>>>>> Is the sorted query slow only the first time or every time you run
>>>>> it?
>>>>>
>>>>> You got an OOM?  What -Xmx value are you using?  Try increasing it.
>>>>>
>>>>> Otis
>>>>> --
>>>>> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
>>>>>
>>>>>
>>>>>
>>>>> ----- Original Message ----
>>>>>
>>>>>> From: christophe <[hidden email]>
>>>>>> To: [hidden email]
>>>>>> Sent: Friday, October 17, 2008 1:28:52 PM
>>>>>> Subject: Sorting performance
>>>>>> Hi,
>>>>>>
>>>>>> I'm doing some tests with Solr1.3
>>>>>> I have loaded around 7M documents, each with a few stored and
>>>>>> indexed fields.
>>>>>>
>>>>>> This query: text:sometext returns the results, sorted by score in
>>>>>> a few milliseconds. (I display 10 out of 8747 matched documents)
>>>>>> This one: text:sometext;id desc   takes something like 60s or
>>>>>> more to return the data (when it doesn't fails with an out of
>>>>>> memory error). (id is a string type).
>>>>>> I have tried to display only id, same results.
>>>>>>
>>>>>> Any ideas ? I'm sure I'm doing something wrong.....
>>>>>>
>>>>>> My schema is based on the sample, with the following fields:
>>>>>>
>>>>>>  />           multiValued="true" />
>>>>>>  default="NOW" multiValued="false"/>
>>>>>>
>>>>>> Thanks
>>>>>> Christophe
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>

Reply | Threaded
Open this post in threaded view
|

Re: Sorting performance

lec74
Hum..... this mean I have to wait before I index new documents and avoid
indexing when they are created (I have about 50 000 new documents
created each day and I was planning to make those searchable ASAP).
Too bad there is no way to have a centralized cache that can be shared
AND updated when new documents are created.

C.

Mark Miller wrote:

> christophe wrote:
>> When I start indexing new documents, searches are taking long time
>> again: is the sort cache flushed when new documents are indexed ?
>
> When you commit, a new Reader will be opened (or reopened) so that the
> freshly added docs can be seen. This would make the first search slow
> again, but if you have the warming queries, it should be warmed before
> being put into use. Be sure the warming query sorts on the right field.
>
>>
>> Are there any metrics on how to compute memory requirements (based on
>> doc average size, number of sorted fields, number of indexed
>> documents + number of new document / day) ?
>
> Depends on the field type, but I think its 32bits x numDocs for most
> datatypes, with the String datatype also requiring an array of all the
> unique terms to index into. Thats not everything, but it dominates.
>
>
>> Thanks
>> Christophe
>> Mark Miller wrote:
>>> You need to setup a warming query that sorts so that the initial
>>> long query is done behind the scenes. Users first query will then be
>>> fast. Solrconfig.
>>>
>>> - Mark
>>>
>>>
>>> On Oct 18, 2008, at 1:34 AM, christophe <[hidden email]>
>>> wrote:
>>>
>>>> Here are the memory parameters I'm using now(Tomcat): -Xms2024m
>>>> -Xmx2024m
>>>> With those values, the second query is way faster. Only the first
>>>> one is very slow.
>>>> Thanks for the tip.
>>>> However, I'm wondering if will be enough and I will not hit the
>>>> same issues when I will have many users searching at the same time:
>>>> I will do a stress test to check this.
>>>>
>>>> Thanks
>>>> Christophe
>>>>
>>>> christophe wrote:
>>>>> It is slow each time I run it. (I test it from the Solr admin
>>>>> console or from a JAVA program using the Http client).
>>>>> I do not get the OOM each time.
>>>>>
>>>>> Thx
>>>>> Christophe
>>>>>
>>>>> Otis Gospodnetic wrote:
>>>>>> Is the sorted query slow only the first time or every time you
>>>>>> run it?
>>>>>>
>>>>>> You got an OOM?  What -Xmx value are you using?  Try increasing it.
>>>>>>
>>>>>> Otis
>>>>>> --
>>>>>> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
>>>>>>
>>>>>>
>>>>>>
>>>>>> ----- Original Message ----
>>>>>>
>>>>>>> From: christophe <[hidden email]>
>>>>>>> To: [hidden email]
>>>>>>> Sent: Friday, October 17, 2008 1:28:52 PM
>>>>>>> Subject: Sorting performance
>>>>>>> Hi,
>>>>>>>
>>>>>>> I'm doing some tests with Solr1.3
>>>>>>> I have loaded around 7M documents, each with a few stored and
>>>>>>> indexed fields.
>>>>>>>
>>>>>>> This query: text:sometext returns the results, sorted by score
>>>>>>> in a few milliseconds. (I display 10 out of 8747 matched documents)
>>>>>>> This one: text:sometext;id desc   takes something like 60s or
>>>>>>> more to return the data (when it doesn't fails with an out of
>>>>>>> memory error). (id is a string type).
>>>>>>> I have tried to display only id, same results.
>>>>>>>
>>>>>>> Any ideas ? I'm sure I'm doing something wrong.....
>>>>>>>
>>>>>>> My schema is based on the sample, with the following fields:
>>>>>>>
>>>>>>>  />           multiValued="true" />
>>>>>>>  default="NOW" multiValued="false"/>
>>>>>>>
>>>>>>> Thanks
>>>>>>> Christophe
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>
>

Reply | Threaded
Open this post in threaded view
|

Re: Sorting performance

Norberto Meijome-6
On Mon, 20 Oct 2008 16:28:23 +0300
christophe <[hidden email]> wrote:

> Hum..... this mean I have to wait before I index new documents and avoid
> indexing when they are created (I have about 50 000 new documents
> created each day and I was planning to make those searchable ASAP).

you can always index + optimize out of band in a 'master' / RW server , and
then send the updated index to your slave (the one actually serving the
requests).

This *will NOT* remove the need to refresh your cache, but it will remove any
delay introduced by commit/indexing + optimise.

> Too bad there is no way to have a centralized cache that can be shared
> AND updated when new documents are created.

hmm not sure it makes sense like that... but maybe along the lines of having an
active cache that is used to serve queries, and new ones being prepared, and
then swapped when ready.

Speaking of which (or not :P) , has anyone thought about / done any work on
using memcached for these internal solr caches? I guess it would make sense for
setups with several slaves ( or even a master updating memcached
too...)...though for a setup with shards it would be slightly more involved
(although it *could* be used to support several slaves per 'data shard' ).

All the best,
B
_________________________
{Beto|Norberto|Numard} Meijome

RTFM and STFW before anything bad happens.

I speak for myself, not my employer. Contents may be hot. Slippery when wet.
Reading disclaimers makes you go blind. Writing them is worse. You have been
Warned.
Reply | Threaded
Open this post in threaded view
|

Re: query parsing issue + behavior as OR (solr 1.4-dev)

Norberto Meijome-6
In reply to this post by Sunil Sarje-2
On Mon, 20 Oct 2008 06:21:06 -0700 (PDT)
Sunil Sarje <[hidden email]> wrote:

> I am working with nightly build of Oct 17, 2008  and found the issue that
> something wrong with LuceneQParserPlugin; It takes + as OR

Sunil, please do not hijack the thread :

http://en.wikipedia.org/wiki/Thread_hijacking

thanks,
B

_________________________
{Beto|Norberto|Numard} Meijome

He could be a poster child for retroactive birth control.

I speak for myself, not my employer. Contents may be hot. Slippery when wet. Reading disclaimers makes you go blind. Writing them is worse. You have been Warned.
Reply | Threaded
Open this post in threaded view
|

Re: Sorting performance

lec74
In reply to this post by Norberto Meijome-6
The problem is that I will have hundreds of users doing queries, and a
continuous flow of document coming in.
So a delay in warming up a cache "could" be acceptable if I do it a few
times per day. But not on a too regular basis (right now, the first
query that loads the cache takes 150s).

However: I'm not sure why it looks not to be a good idea to update the
caches when updates are committed ? Any centralized cache (memcached is
a good one) that is maintained up to date by the update/commit process
would be great. Config options could then let to the user to decide if
the cache is shared between servers or not. Creating a new cache and
then swap it will double the necessary memory.

I also have a related questions regarding readers: a new reader is
opened when documents are committed. And the cache is associated with
the reader (if I got it right). Are all user requests served by this
reader ? How does that scale if I have many concurrent users ?

C.

Norberto Meijome wrote:

> On Mon, 20 Oct 2008 16:28:23 +0300
> christophe <[hidden email]> wrote:
>
>  
>> Hum..... this mean I have to wait before I index new documents and avoid
>> indexing when they are created (I have about 50 000 new documents
>> created each day and I was planning to make those searchable ASAP).
>>    
>
> you can always index + optimize out of band in a 'master' / RW server , and
> then send the updated index to your slave (the one actually serving the
> requests).
>
> This *will NOT* remove the need to refresh your cache, but it will remove any
> delay introduced by commit/indexing + optimise.
>
>  
>> Too bad there is no way to have a centralized cache that can be shared
>> AND updated when new documents are created.
>>    
>
> hmm not sure it makes sense like that... but maybe along the lines of having an
> active cache that is used to serve queries, and new ones being prepared, and
> then swapped when ready.
>
> Speaking of which (or not :P) , has anyone thought about / done any work on
> using memcached for these internal solr caches? I guess it would make sense for
> setups with several slaves ( or even a master updating memcached
> too...)...though for a setup with shards it would be slightly more involved
> (although it *could* be used to support several slaves per 'data shard' ).
>
> All the best,
> B
> _________________________
> {Beto|Norberto|Numard} Meijome
>
> RTFM and STFW before anything bad happens.
>
> I speak for myself, not my employer. Contents may be hot. Slippery when wet.
> Reading disclaimers makes you go blind. Writing them is worse. You have been
> Warned.
>  

Reply | Threaded
Open this post in threaded view
|

RE: Sorting performance

Lance Norskog-2
In reply to this post by Mark Miller-3
Accd to previous posters on this topic, sorting requires an array with an
entry per document in the entire index. Each entry has 32 bits for the 'int'
type, and 32 bits plus the field representation length for other types. Not
knowing Lucene internals I have a hard time believing that it really has to
be this wasteful, but oh well.

Since 'sint' is needed to do range queries on a field, and 'int' is needed
for efficient sorting, we wound up have one field of each type and a
<copyField> to make sure they both get the same numbers.  Yes, it's
annoying.

-----Original Message-----
From: Mark Miller [mailto:[hidden email]]
Sent: Monday, October 20, 2008 6:24 AM
To: [hidden email]
Subject: Re: Sorting performance

christophe wrote:
> When I start indexing new documents, searches are taking long time
> again: is the sort cache flushed when new documents are indexed ?

When you commit, a new Reader will be opened (or reopened) so that the
freshly added docs can be seen. This would make the first search slow again,
but if you have the warming queries, it should be warmed before being put
into use. Be sure the warming query sorts on the right field.

>
> Are there any metrics on how to compute memory requirements (based on
> doc average size, number of sorted fields, number of indexed documents
> + number of new document / day) ?

Depends on the field type, but I think its 32bits x numDocs for most
datatypes, with the String datatype also requiring an array of all the
unique terms to index into. Thats not everything, but it dominates.


> Thanks
> Christophe
> Mark Miller wrote:
>> You need to setup a warming query that sorts so that the initial long
>> query is done behind the scenes. Users first query will then be fast.
>> Solrconfig.
>>
>> - Mark
>>
>>
>> On Oct 18, 2008, at 1:34 AM, christophe <[hidden email]>
>> wrote:
>>
>>> Here are the memory parameters I'm using now(Tomcat): -Xms2024m
>>> -Xmx2024m
>>> With those values, the second query is way faster. Only the first
>>> one is very slow.
>>> Thanks for the tip.
>>> However, I'm wondering if will be enough and I will not hit the same
>>> issues when I will have many users searching at the same time: I
>>> will do a stress test to check this.
>>>
>>> Thanks
>>> Christophe
>>>
>>> christophe wrote:
>>>> It is slow each time I run it. (I test it from the Solr admin
>>>> console or from a JAVA program using the Http client).
>>>> I do not get the OOM each time.
>>>>
>>>> Thx
>>>> Christophe
>>>>
>>>> Otis Gospodnetic wrote:
>>>>> Is the sorted query slow only the first time or every time you run
>>>>> it?
>>>>>
>>>>> You got an OOM?  What -Xmx value are you using?  Try increasing it.
>>>>>
>>>>> Otis
>>>>> --
>>>>> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
>>>>>
>>>>>
>>>>>
>>>>> ----- Original Message ----
>>>>>
>>>>>> From: christophe <[hidden email]>
>>>>>> To: [hidden email]
>>>>>> Sent: Friday, October 17, 2008 1:28:52 PM
>>>>>> Subject: Sorting performance
>>>>>> Hi,
>>>>>>
>>>>>> I'm doing some tests with Solr1.3
>>>>>> I have loaded around 7M documents, each with a few stored and
>>>>>> indexed fields.
>>>>>>
>>>>>> This query: text:sometext returns the results, sorted by score in
>>>>>> a few milliseconds. (I display 10 out of 8747 matched documents)
>>>>>> This one: text:sometext;id desc   takes something like 60s or
>>>>>> more to return the data (when it doesn't fails with an out of
>>>>>> memory error). (id is a string type).
>>>>>> I have tried to display only id, same results.
>>>>>>
>>>>>> Any ideas ? I'm sure I'm doing something wrong.....
>>>>>>
>>>>>> My schema is based on the sample, with the following fields:
>>>>>>
>>>>>>  />           multiValued="true" />
>>>>>>  default="NOW" multiValued="false"/>
>>>>>>
>>>>>> Thanks
>>>>>> Christophe
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>


Reply | Threaded
Open this post in threaded view
|

Re: Sorting performance

lec74
In reply to this post by lec74
I'm now considering if Solr (Lucene) is a good choice when we have a
huge number of indexed document and a large number of new documents
needs to be indexed everyday.

Maybe I'm wrong, but my feeling is that the way the sort caches are
handled (recreated after new commit, not shared between Searcher), the
solution does not scale. And it is not just a memory issue (memory is
cheap), but more the lack of update of an existing cache.

I'm testing if I can sort on a field that might be faster to cache: any
hints on this ? Would that make a difference if  I use a field with less
different values than a timestamp ? I'm looking for some details on how
the cache is populated on the first query. Also, for the code insiders
;-), would that be difficult to change this caching mechanism to allow
update and reuse of an existing cache ?

Thanks for your help
Christophe

christophe wrote:

> The problem is that I will have hundreds of users doing queries, and a
> continuous flow of document coming in.
> So a delay in warming up a cache "could" be acceptable if I do it a
> few times per day. But not on a too regular basis (right now, the
> first query that loads the cache takes 150s).
>
> However: I'm not sure why it looks not to be a good idea to update the
> caches when updates are committed ? Any centralized cache (memcached
> is a good one) that is maintained up to date by the update/commit
> process would be great. Config options could then let to the user to
> decide if the cache is shared between servers or not. Creating a new
> cache and then swap it will double the necessary memory.
>
> I also have a related questions regarding readers: a new reader is
> opened when documents are committed. And the cache is associated with
> the reader (if I got it right). Are all user requests served by this
> reader ? How does that scale if I have many concurrent users ?
>
> C.
>
> Norberto Meijome wrote:
>> On Mon, 20 Oct 2008 16:28:23 +0300
>> christophe <[hidden email]> wrote:
>>
>>  
>>> Hum..... this mean I have to wait before I index new documents and
>>> avoid indexing when they are created (I have about 50 000 new
>>> documents created each day and I was planning to make those
>>> searchable ASAP).
>>>    
>>
>> you can always index + optimize out of band in a 'master' / RW server
>> , and
>> then send the updated index to your slave (the one actually serving the
>> requests).
>> This *will NOT* remove the need to refresh your cache, but it will
>> remove any
>> delay introduced by commit/indexing + optimise.
>>
>>  
>>> Too bad there is no way to have a centralized cache that can be
>>> shared AND updated when new documents are created.
>>>    
>>
>> hmm not sure it makes sense like that... but maybe along the lines of
>> having an
>> active cache that is used to serve queries, and new ones being
>> prepared, and
>> then swapped when ready.
>> Speaking of which (or not :P) , has anyone thought about / done any
>> work on
>> using memcached for these internal solr caches? I guess it would make
>> sense for
>> setups with several slaves ( or even a master updating memcached
>> too...)...though for a setup with shards it would be slightly more
>> involved
>> (although it *could* be used to support several slaves per 'data
>> shard' ).
>>
>> All the best,
>> B
>> _________________________
>> {Beto|Norberto|Numard} Meijome
>>
>> RTFM and STFW before anything bad happens.
>>
>> I speak for myself, not my employer. Contents may be hot. Slippery
>> when wet.
>> Reading disclaimers makes you go blind. Writing them is worse. You
>> have been
>> Warned.
>>  
>

Reply | Threaded
Open this post in threaded view
|

Re: Sorting performance

hossman
In reply to this post by lec74

: The problem is that I will have hundreds of users doing queries, and a
: continuous flow of document coming in.
: So a delay in warming up a cache "could" be acceptable if I do it a few times
: per day. But not on a too regular basis (right now, the first query that loads
: the cache takes 150s).
:
: However: I'm not sure why it looks not to be a good idea to update the caches

you can refresh the caches automaticly after updating, the "newSearcher"
event is fired whenever a searcher is opened (but before it's used by
clients) so you can configure warming queries for it -- it doesn't have to
be done manually (or by the first user to use that reader)

so you can send your updates anytime you want, and as long as you only
commit every 5 minutes (or commit on a master as often as you want, but
only run snappuller/snapinstaller on your slaves every 5 minutes) your
results will be at most 5minutes + warming time stale.


-Hoss

Reply | Threaded
Open this post in threaded view
|

RE: Sorting performance

Beniamin Janicki
:so you can send your updates anytime you want, and as long as you only
:commit every 5 minutes (or commit on a master as often as you want, but
:only run snappuller/snapinstaller on your slaves every 5 minutes) your
:results will be at most 5minutes + warming time stale.

This is what I do as well ( commits are done once per 5 minutes ). I've got
master - slave configuration. Master has turned off all caches (commented in
solrconfig.cml) and setup only 2 maxWarmingSearchers. Index size has 5GB
,Xmx= 1GB and committing takes around 10 secs ( on default configuration
with warming it took from 30 mins up to 2 hours).

Slave caches are configured to have autowarmCount="0" and
maxWarmingSearchers=1 , and I have new data 1 second after snapshoot is
done. I haven't noticed any huge delays while serving search request.
Try to use those values - may be they'll help in your case too.

Ben Janicki


-----Original Message-----
From: Chris Hostetter [mailto:[hidden email]]
Sent: 22 October 2008 04:56
To: [hidden email]
Subject: Re: Sorting performance


: The problem is that I will have hundreds of users doing queries, and a
: continuous flow of document coming in.
: So a delay in warming up a cache "could" be acceptable if I do it a few
times
: per day. But not on a too regular basis (right now, the first query that
loads
: the cache takes 150s).
:
: However: I'm not sure why it looks not to be a good idea to update the
caches

you can refresh the caches automaticly after updating, the "newSearcher"
event is fired whenever a searcher is opened (but before it's used by
clients) so you can configure warming queries for it -- it doesn't have to
be done manually (or by the first user to use that reader)

so you can send your updates anytime you want, and as long as you only
commit every 5 minutes (or commit on a master as often as you want, but
only run snappuller/snapinstaller on your slaves every 5 minutes) your
results will be at most 5minutes + warming time stale.


-Hoss

Reply | Threaded
Open this post in threaded view
|

Re: Sorting performance

lec74
Hi,

I'm now reloading my index.
The issue might be related with the way dates are handled (I was sorting
on a date field).
Now, I have added an integer field that represent the date (but in
minutes instead of milli seconds).
With 4M documents (and indexing running in background), I have a correct
response time, even for the first query. I still want to check with 10M
and more documents.

Once my index is fully loaded, I will try the config parameters you suggest.

Thanks
Christophe

Beniamin Janicki wrote:

> :so you can send your updates anytime you want, and as long as you only
> :commit every 5 minutes (or commit on a master as often as you want, but
> :only run snappuller/snapinstaller on your slaves every 5 minutes) your
> :results will be at most 5minutes + warming time stale.
>
> This is what I do as well ( commits are done once per 5 minutes ). I've got
> master - slave configuration. Master has turned off all caches (commented in
> solrconfig.cml) and setup only 2 maxWarmingSearchers. Index size has 5GB
> ,Xmx= 1GB and committing takes around 10 secs ( on default configuration
> with warming it took from 30 mins up to 2 hours).
>
> Slave caches are configured to have autowarmCount="0" and
> maxWarmingSearchers=1 , and I have new data 1 second after snapshoot is
> done. I haven't noticed any huge delays while serving search request.
> Try to use those values - may be they'll help in your case too.
>
> Ben Janicki
>
>
> -----Original Message-----
> From: Chris Hostetter [mailto:[hidden email]]
> Sent: 22 October 2008 04:56
> To: [hidden email]
> Subject: Re: Sorting performance
>
>
> : The problem is that I will have hundreds of users doing queries, and a
> : continuous flow of document coming in.
> : So a delay in warming up a cache "could" be acceptable if I do it a few
> times
> : per day. But not on a too regular basis (right now, the first query that
> loads
> : the cache takes 150s).
> :
> : However: I'm not sure why it looks not to be a good idea to update the
> caches
>
> you can refresh the caches automaticly after updating, the "newSearcher"
> event is fired whenever a searcher is opened (but before it's used by
> clients) so you can configure warming queries for it -- it doesn't have to
> be done manually (or by the first user to use that reader)
>
> so you can send your updates anytime you want, and as long as you only
> commit every 5 minutes (or commit on a master as often as you want, but
> only run snappuller/snapinstaller on your slaves every 5 minutes) your
> results will be at most 5minutes + warming time stale.
>
>
> -Hoss
>
>  
Reply | Threaded
Open this post in threaded view
|

Re: Sorting performance + replication of index between cores

lec74
In reply to this post by Beniamin Janicki
Hi,

After fully reloading my index, using another field than a Data does not
help that much.
Using a warmup query avoids having the first request slow, but:
     - Frequents commits means that the Searcher is reloaded frequently
and, as the warmup takes time, the clients must wait.
     - Having warmup slows down the index process (I guess this is
because after a commit, the Searchers are recreated)

So I'm considering, as suggested,  to have two instances: one for
indexing and one for searching.
I was wondering if there are simple ways to replicate the index in a
single Solr server running two cores ? Any such config already tested ?
I guess that the standard replication based on rsync can be simplified a
lot in this case as the two indexes are on the same server.

Thanks
Christophe

Beniamin Janicki wrote:

> :so you can send your updates anytime you want, and as long as you only
> :commit every 5 minutes (or commit on a master as often as you want, but
> :only run snappuller/snapinstaller on your slaves every 5 minutes) your
> :results will be at most 5minutes + warming time stale.
>
> This is what I do as well ( commits are done once per 5 minutes ). I've got
> master - slave configuration. Master has turned off all caches (commented in
> solrconfig.cml) and setup only 2 maxWarmingSearchers. Index size has 5GB
> ,Xmx= 1GB and committing takes around 10 secs ( on default configuration
> with warming it took from 30 mins up to 2 hours).
>
> Slave caches are configured to have autowarmCount="0" and
> maxWarmingSearchers=1 , and I have new data 1 second after snapshoot is
> done. I haven't noticed any huge delays while serving search request.
> Try to use those values - may be they'll help in your case too.
>
> Ben Janicki
>
>
> -----Original Message-----
> From: Chris Hostetter [mailto:[hidden email]]
> Sent: 22 October 2008 04:56
> To: [hidden email]
> Subject: Re: Sorting performance
>
>
> : The problem is that I will have hundreds of users doing queries, and a
> : continuous flow of document coming in.
> : So a delay in warming up a cache "could" be acceptable if I do it a few
> times
> : per day. But not on a too regular basis (right now, the first query that
> loads
> : the cache takes 150s).
> :
> : However: I'm not sure why it looks not to be a good idea to update the
> caches
>
> you can refresh the caches automaticly after updating, the "newSearcher"
> event is fired whenever a searcher is opened (but before it's used by
> clients) so you can configure warming queries for it -- it doesn't have to
> be done manually (or by the first user to use that reader)
>
> so you can send your updates anytime you want, and as long as you only
> commit every 5 minutes (or commit on a master as often as you want, but
> only run snappuller/snapinstaller on your slaves every 5 minutes) your
> results will be at most 5minutes + warming time stale.
>
>
> -Hoss
>
>  
12