Ability to sort integer fields on large index

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Ability to sort integer fields on large index

Fleming Shi
Here is the problem:

- Single Large index with upto 200 million documents
- Each document contains field using epoch timestamp format (padding  
is required when creating range requests)
- One of the frequently used search query,  a range request on the  
timestamp field (10 digits)
- Other searches are fine, but the search using range request  
(1207190000 to 1207190258), even small result set, still slow when  
"sort" is requested

My questions:
- Is Lucene sorting then search or is it doing search then sort the  
results?
- Is there a way to get this type of searches to return faster with  
heavier weight on the timestamp not on relevancy?

Any insight would be greatly appreciated

/Regards



---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Ability to sort integer fields on large index

Mark Miller-3
You should use DateTools to break up your time stamp into multiple
fields. This can work a lot faster than using a field with so many
different terms.

Are you using a RangeQuery? If you are, ditch it and use a
ConstantScoreRangeQuery. The former will expand the query to a boolean
that contains each matching unique term while the latter will just score
1 for each matching doc.


> My questions:
> - Is Lucene sorting then search or is it doing search then sort the  
> results?
> - Is there a way to get this type of searches to return faster with  
> heavier weight on the timestamp not on relevancy?

I believe that Lucene is scoring docs, and as it scores them, dropping
them into a priority queue that sorts...normally by the score, but
optionally by fields as well.



- Mark

On Fri, 2008-04-04 at 09:49 -0700, Fleming Shi wrote:

> Here is the problem:
>
> - Single Large index with upto 200 million documents
> - Each document contains field using epoch timestamp format (padding  
> is required when creating range requests)
> - One of the frequently used search query,  a range request on the  
> timestamp field (10 digits)
> - Other searches are fine, but the search using range request  
> (1207190000 to 1207190258), even small result set, still slow when  
> "sort" is requested
>
> My questions:
> - Is Lucene sorting then search or is it doing search then sort the  
> results?
> - Is there a way to get this type of searches to return faster with  
> heavier weight on the timestamp not on relevancy?
>
> Any insight would be greatly appreciated
>
> /Regards
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]