facet query counts

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

facet query counts

Kevin Osborn-2
I have a large subset (47640) of my total index. Most of them (45335) have a single field, which we will call Field1. Field1 is a sfloat.

If my query restricts the resultset to my subset and I do a facet count on Field1, then the number of records returned is 47640. And if I sum up the facet counts, it adds to 45335. So far, so good. But, I really want to do range queries on Field1. So, I use facet.query to split Field1 into 5 ranges.

So,

&facet.query=Field1%3A%5B4000000000+TO+*%5D
&facet.query=Field1%3A%5B3000000000+TO+3999999999.9%5D
&facet.query=Field1%3A%5B2000000000+TO+2999999999.9%5D
&facet.query=Field1%3A%5B1000000000+TO+1999999999.9%5D
&facet.query=Field1%3A%5B*+TO+999999999.9%5D

Now, if I sum up the counts, it adds to 54697. I can't find where this number comes from. If I have open-ended ranges on both my high and low end, shouldn't the sum of facet.query equal the sum of a normal facet count? And if a record never has more than one instance of Field1, how can the sum be greater than the total record set?

And this problem seems to occur in most (if not all) of my range queries. Is there anything that I am doing wrong here?

Reply | Threaded
Open this post in threaded view
|

Re: facet query counts

Mike Klaas
On 14-Jun-07, at 4:29 PM, Kevin Osborn wrote:

> I have a large subset (47640) of my total index. Most of them  
> (45335) have a single field, which we will call Field1. Field1 is a  
> sfloat.
>
> If my query restricts the resultset to my subset and I do a facet  
> count on Field1, then the number of records returned is 47640. And  
> if I sum up the facet counts, it adds to 45335. So far, so good.  
> But, I really want to do range queries on Field1. So, I use  
> facet.query to split Field1 into 5 ranges.
>
> So,
>
> &facet.query=Field1%3A%5B4000000000+TO+*%5D
> &facet.query=Field1%3A%5B3000000000+TO+3999999999.9%5D
> &facet.query=Field1%3A%5B2000000000+TO+2999999999.9%5D
> &facet.query=Field1%3A%5B1000000000+TO+1999999999.9%5D
> &facet.query=Field1%3A%5B*+TO+999999999.9%5D


> Now, if I sum up the counts, it adds to 54697. I can't find where  
> this number comes from. If I have open-ended ranges on both my high  
> and low end, shouldn't the sum of facet.query equal the sum of a  
> normal facet count? And if a record never has more than one  
> instance of Field1, how can the sum be greater than the total  
> record set?

My guess is precision issues.  Those are mighty large values to be  
storing in a binary float--you're probably comparing mostly the  
exponent, which is not necessarily disjoint.  Have you tried sdouble?

> And this problem seems to occur in most (if not all) of my range  
> queries. Is there anything that I am doing wrong here?

Is this true on other field types as well?

-Mike

Reply | Threaded
Open this post in threaded view
|

Re: facet query counts

Yonik Seeley-2
In reply to this post by Kevin Osborn-2
A 32 bit float has about 7 decimal digits of precision, so your range
queries actually do overlap since 4000000000f is exactly the same as
3999999999f

-Yonik

On 6/14/07, Kevin Osborn <[hidden email]> wrote:

> I have a large subset (47640) of my total index. Most of them (45335) have a single field, which we will call Field1. Field1 is a sfloat.
>
> If my query restricts the resultset to my subset and I do a facet count on Field1, then the number of records returned is 47640. And if I sum up the facet counts, it adds to 45335. So far, so good. But, I really want to do range queries on Field1. So, I use facet.query to split Field1 into 5 ranges.
>
> So,
>
> &facet.query=Field1%3A%5B4000000000+TO+*%5D
> &facet.query=Field1%3A%5B3000000000+TO+3999999999.9%5D
> &facet.query=Field1%3A%5B2000000000+TO+2999999999.9%5D
> &facet.query=Field1%3A%5B1000000000+TO+1999999999.9%5D
> &facet.query=Field1%3A%5B*+TO+999999999.9%5D
>
> Now, if I sum up the counts, it adds to 54697. I can't find where this number comes from. If I have open-ended ranges on both my high and low end, shouldn't the sum of facet.query equal the sum of a normal facet count? And if a record never has more than one instance of Field1, how can the sum be greater than the total record set?
>
> And this problem seems to occur in most (if not all) of my range queries. Is there anything that I am doing wrong here?
Reply | Threaded
Open this post in threaded view
|

Re: facet query counts

Kevin Osborn-2
In reply to this post by Kevin Osborn-2
Thanks. That actually make perfect sense. I was thinking that the problem was somehow more complicated than a simple precision problem. These are all dynamic fields, so I rarely look at the schema. I'll change the precision in the schema and then reindex. Hopefully that clears out the problem.

----- Original Message ----
From: Yonik Seeley <[hidden email]>
To: [hidden email]
Sent: Thursday, June 14, 2007 5:30:04 PM
Subject: Re: facet query counts

A 32 bit float has about 7 decimal digits of precision, so your range
queries actually do overlap since 4000000000f is exactly the same as
3999999999f

-Yonik

On 6/14/07, Kevin Osborn <[hidden email]> wrote:

> I have a large subset (47640) of my total index. Most of them (45335) have a single field, which we will call Field1. Field1 is a sfloat.
>
> If my query restricts the resultset to my subset and I do a facet count on Field1, then the number of records returned is 47640. And if I sum up the facet counts, it adds to 45335. So far, so good. But, I really want to do range queries on Field1. So, I use facet.query to split Field1 into 5 ranges.
>
> So,
>
> &facet.query=Field1%3A%5B4000000000+TO+*%5D
> &facet.query=Field1%3A%5B3000000000+TO+3999999999.9%5D
> &facet.query=Field1%3A%5B2000000000+TO+2999999999.9%5D
> &facet.query=Field1%3A%5B1000000000+TO+1999999999.9%5D
> &facet.query=Field1%3A%5B*+TO+999999999.9%5D
>
> Now, if I sum up the counts, it adds to 54697. I can't find where this number comes from. If I have open-ended ranges on both my high and low end, shouldn't the sum of facet.query equal the sum of a normal facet count? And if a record never has more than one instance of Field1, how can the sum be greater than the total record set?
>
> And this problem seems to occur in most (if not all) of my range queries. Is there anything that I am doing wrong here?