Quest about Lucene's IndexSearcher.search(Query query, int n) API's parameter n

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Quest about Lucene's IndexSearcher.search(Query query, int n) API's parameter n

小鱼儿-2
I'm doing a POI(Point-of-interest) search using lucene, each POI has a
"location" which is a GeoPoint/LonLat type. I need do a keyword-range
search but the query result POIs need to sort by distance to a starting
point.

This "distance", in fact, is a dynamic computed property which cannot be
used by the SortField API, i doubt if Lucene can support a
"DynamicSortField", that would be perfect. Or i had to do:
use IndexSearcher.search(Query query, int n) API to first filter out Top-n
POIs and then do a manual sort after these n documents' StoredField's have
all be loaded, which seems not efficient.

The problem is, the parameter n in IndexSearcher.search API has a usability
problem, it may be not large enough to cover all the candidates. & the
low-level search(Query, Collector) API seems to be short of documentations.
If set the n to a very large value, the later sort proc may be very
inefficient...

My current idea: use more detailed near-to-far sub geo ranges to
iteratively/incrementally search/filter -> load documents -> manual sort ->
combine.

Any suggestions?
Reply | Threaded
Open this post in threaded view
|

Re: Quest about Lucene's IndexSearcher.search(Query query, int n) API's parameter n

Uwe Schindler
You can sort with custom formulas. All values that are needed for calculation must be part of the index as docvalues fields. You can then use expressions module to supply a formula for the calculation, which may include the original score. The expressions module can override the score (so standard sorting works) or provide a SortField.

https://lucene.apache.org/core/8_4_0/expressions/org/apache/lucene/expressions/Expression.html

It is only a bad idea to do this if the calculation is expensive, as it needs to be done for every possible hit. One optimization is therefore to do a simple calculation using expressions, which brings all documents into a average order, so only manually sorting top-n is ok.

Uwe

Am January 10, 2020 4:39:58 AM UTC schrieb "小鱼儿" <[hidden email]>:

>I'm doing a POI(Point-of-interest) search using lucene, each POI has a
>"location" which is a GeoPoint/LonLat type. I need do a keyword-range
>search but the query result POIs need to sort by distance to a starting
>point.
>
>This "distance", in fact, is a dynamic computed property which cannot
>be
>used by the SortField API, i doubt if Lucene can support a
>"DynamicSortField", that would be perfect. Or i had to do:
>use IndexSearcher.search(Query query, int n) API to first filter out
>Top-n
>POIs and then do a manual sort after these n documents' StoredField's
>have
>all be loaded, which seems not efficient.
>
>The problem is, the parameter n in IndexSearcher.search API has a
>usability
>problem, it may be not large enough to cover all the candidates. & the
>low-level search(Query, Collector) API seems to be short of
>documentations.
>If set the n to a very large value, the later sort proc may be very
>inefficient...
>
>My current idea: use more detailed near-to-far sub geo ranges to
>iteratively/incrementally search/filter -> load documents -> manual
>sort ->
>combine.
>
>Any suggestions?

--
Uwe Schindler
Achterdiek 19, 28357 Bremen
https://www.thetaphi.de