Searching for negative numbers very slow

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

Searching for negative numbers very slow

Simon Wistow
If I do

        qt=dismax
    fq=uid:1

(or any other positive number) then queries are as quick as normal - in
the 20ms range.

However, any of

        fq=uid:\-1

or

    fq=uid:[* TO -1]

or
   
    fq=uid:[-1 to -1]

or

    fq=-uid:[0 TO *]

then queries are incredibly slow - in the 9 *second* range.

Anything I can do to mitigate this? Negative numbers have significant
meaning in our system so it wouldn't be trivial to shift all uids up by
the number of negative ids.


Thanks,

Simon


Reply | Threaded
Open this post in threaded view
|

Re: Searching for negative numbers very slow

Simon Wistow
On Thu, Jan 27, 2011 at 11:32:26PM +0000, me said:
> If I do
>
> qt=dismax
>     fq=uid:1
>
> (or any other positive number) then queries are as quick as normal - in
> the 20ms range.

For what it's worth uid is a TrieIntField with precisionStep=0,
omitNorms=true, positionIncrementGap=0


Reply | Threaded
Open this post in threaded view
|

Re: Searching for negative numbers very slow

Yonik Seeley-2-2
In reply to this post by Simon Wistow
On Thu, Jan 27, 2011 at 6:32 PM, Simon Wistow <[hidden email]> wrote:

> If I do
>
>        qt=dismax
>    fq=uid:1
>
> (or any other positive number) then queries are as quick as normal - in
> the 20ms range.
>
> However, any of
>
>        fq=uid:\-1
>
> or
>
>    fq=uid:[* TO -1]
>
> or
>
>    fq=uid:[-1 to -1]
>
> or
>
>    fq=-uid:[0 TO *]
>
> then queries are incredibly slow - in the 9 *second* range.

That's odd - there should be nothing special about negative numbers.
Here are a couple of ideas:
  - if you have a really big index and querying by a negative number
is much more rare, it could just be that part of the index wasn't
cached by the OS and so the query needs to hit the disk.  This can
happen with any term and a really big index - nothing special for
negatives here.
 - if -1 is a really common value, it can be slower.  is fq=uid:\-2 or
other negative numbers really slow also?

-Yonik
http://lucidimagination.com
Reply | Threaded
Open this post in threaded view
|

Re: Searching for negative numbers very slow

Simon Wistow
On Fri, Jan 28, 2011 at 12:29:18PM -0500, Yonik Seeley said:
> That's odd - there should be nothing special about negative numbers.
> Here are a couple of ideas:
>   - if you have a really big index and querying by a negative number
> is much more rare, it could just be that part of the index wasn't
> cached by the OS and so the query needs to hit the disk.  This can
> happen with any term and a really big index - nothing special for
> negatives here.
>  - if -1 is a really common value, it can be slower.  is fq=uid:\-2 or
> other negative numbers really slow also?

This was my first thought but -1 is relatively common but we have other
numbers just as common.


Interestingly enough

fq=uid:-1
fq=foo:bar
fq=alpha:omega

is much (4x) slower than

q="uid:-1 AND foo:bar AND alpha:omega"

but only when searching for that number.

I'm going to wave my hands here and say something like "Maybe something
to do with the field caches?"



Reply | Threaded
Open this post in threaded view
|

Re: Searching for negative numbers very slow

Chris Hostetter-3

: This was my first thought but -1 is relatively common but we have other
: numbers just as common.

i assume that when you say that you mean "...we have other numbers
(that are not negative) just as common, (but searching for them is much
faster)" ?

I don't have any insight into why your negative numbers are slower, but
FWIW...

: Interestingly enough
:
: fq=uid:-1
: fq=foo:bar
: fq=alpha:omega
:
: is much (4x) slower than
:
: q="uid:-1 AND foo:bar AND alpha:omega"

...this is (in and of itself) not that suprising for any three arbitrary
disjoint queries.  when a BoleanQuery is a full disjunction like this (all
clause required) it can efficiently skip scoring a lot of documents by
looping over the clauses, asking each one for the "next" doc they
match, and then leap frogging the other clauses to that doc.  in the case
of the three "fq" params, each query is executd in isolatin, and *all* of
the matches of each is accounted for.

the speed of using distinct "fq" params in situations like this comes from
the reuse after they are in the filterCache -- you can change fq=foo:bar
to fq=foo:baz on the next query, and still reuse 2/3 of the work that was
done on the first query. likewise if hte next query is
fq=uid:-1&fq=foo:bar&fq=alpha:omegabeta then 2/3 of the work is already
done again, and if a following query is
fq=uid:-1&fq=foo:baz&fq=alpha:omegabeta then all of the work is already
done and cached even though that particular request has never been seen by
solr.


-Hoss
Reply | Threaded
Open this post in threaded view
|

Re: Searching for negative numbers very slow

gearond
Is it my imagination or has this exact email been on the list already?

 Dennis Gearon


Signature Warning
----------------
It is always a good idea to learn from your own mistakes. It is usually a better
idea to learn from others’ mistakes, so you do not have to make them yourself.
from 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036'


EARTH has a Right To Life,
otherwise we all die.




________________________________
From: Chris Hostetter <[hidden email]>
To: [hidden email]
Cc: [hidden email]
Sent: Wed, February 16, 2011 6:20:28 PM
Subject: Re: Searching for negative numbers very slow


: This was my first thought but -1 is relatively common but we have other
: numbers just as common.

i assume that when you say that you mean "...we have other numbers
(that are not negative) just as common, (but searching for them is much
faster)" ?

I don't have any insight into why your negative numbers are slower, but
FWIW...

: Interestingly enough
:
: fq=uid:-1
: fq=foo:bar
: fq=alpha:omega
:
: is much (4x) slower than
:
: q="uid:-1 AND foo:bar AND alpha:omega"

...this is (in and of itself) not that suprising for any three arbitrary
disjoint queries.  when a BoleanQuery is a full disjunction like this (all
clause required) it can efficiently skip scoring a lot of documents by
looping over the clauses, asking each one for the "next" doc they
match, and then leap frogging the other clauses to that doc.  in the case
of the three "fq" params, each query is executd in isolatin, and *all* of
the matches of each is accounted for.

the speed of using distinct "fq" params in situations like this comes from
the reuse after they are in the filterCache -- you can change fq=foo:bar
to fq=foo:baz on the next query, and still reuse 2/3 of the work that was
done on the first query. likewise if hte next query is
fq=uid:-1&fq=foo:bar&fq=alpha:omegabeta then 2/3 of the work is already
done again, and if a following query is
fq=uid:-1&fq=foo:baz&fq=alpha:omegabeta then all of the work is already
done and cached even though that particular request has never been seen by
solr.


-Hoss