Solr Geospatial Polygon Indexing/Querying Issue

classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

Solr Geospatial Polygon Indexing/Querying Issue

Sanders, Marshall (CAI - Atlanta)
We’re trying to index a polygon into solr and then filter/calculate geodist on the polygon (ideally we actually want a circle, but it looks like that’s not really supported officially by wkt/geojson and instead you have to switch format=”legacy” which seems like something that might be removed in the future so don’t want to rely on it).

Here’s the info from schema:
<field name="latlng" type="location_rpt" indexed="true" stored="true" multiValued="true"/>

<fieldType name="location_rpt" class="solr.SpatialRecursivePrefixTreeFieldType"
                   geo="true" distErrPct="0.025" maxDistErr="0.000009" distanceUnits="kilometers"
                    spatialContextFactory="Geo3D"/>


We’ve tried indexing some different data, but to keep it as simple as possible we started with a triangle (will eventually add more points to approximate a circle).  Here’s an example document that we’ve added just for testing:

{
"latlng": ["POLYGON((33.7942704 -84.4412613, 33.7100611 -84.4028091, 33.7802888 -84.3279648, 33.7942704 -84.4412613))"],
"ID": "284598223"
}


However, it seems like filtering/distance calculations aren’t working (at least not the way we are used to doing it for points).  Here’s an example query where the pt is several hundred kilometers away from the polygon, yet the document still returns.  Also, it seems that regardless of origin point or polygon location the calculated geodist is always 20015.115

Example query:
select?d=1&fl=ID,latlng,geodist()&fq=%7B!geofilt%7D&indent=on&pt=33.9798087,-94.3286133&q=*:*&sfield=latlng&wt=json

Example documents coming back anyway:
"docs": [
{
"latlng": ["POLYGON((33.7942704 -84.4412613, 33.7100611 -84.4028091, 33.7802888 -84.3279648, 33.7942704 -84.4412613))"],
"ID": "284598223",
"geodist()": 20015.115
},
{
"latlng": ["POLYGON((33.7942704 -84.4412613, 33.7100611 -84.4028091, 33.7802888 -84.3279648, 33.7942704 -84.4412613))"],
"ID": "284600596",
"geodist()": 20015.115
}
]


Anyone who has experience in this area can you point us in the right direction about what we’re doing incorrectly with either how we are indexing the data and/or how we are querying against the polygons.

Thank you,


--
Marshall Sanders
Principal Software Engineer
Autotrader.com
[hidden email]<mailto:[hidden email]>


Reply | Threaded
Open this post in threaded view
|

Re: Solr Geospatial Polygon Indexing/Querying Issue

Ere Maijala
I think you might be missing the d parameter in geofilt. I'm not sure if
geofilt actually does anything useful without it.

Regards,
Ere

Sanders, Marshall (CAI - Atlanta) kirjoitti 23.7.2019 klo 21.32:

> We’re trying to index a polygon into solr and then filter/calculate geodist on the polygon (ideally we actually want a circle, but it looks like that’s not really supported officially by wkt/geojson and instead you have to switch format=”legacy” which seems like something that might be removed in the future so don’t want to rely on it).
>
> Here’s the info from schema:
> <field name="latlng" type="location_rpt" indexed="true" stored="true" multiValued="true"/>
>
> <fieldType name="location_rpt" class="solr.SpatialRecursivePrefixTreeFieldType"
>                    geo="true" distErrPct="0.025" maxDistErr="0.000009" distanceUnits="kilometers"
>                     spatialContextFactory="Geo3D"/>
>
>
> We’ve tried indexing some different data, but to keep it as simple as possible we started with a triangle (will eventually add more points to approximate a circle).  Here’s an example document that we’ve added just for testing:
>
> {
> "latlng": ["POLYGON((33.7942704 -84.4412613, 33.7100611 -84.4028091, 33.7802888 -84.3279648, 33.7942704 -84.4412613))"],
> "ID": "284598223"
> }
>
>
> However, it seems like filtering/distance calculations aren’t working (at least not the way we are used to doing it for points).  Here’s an example query where the pt is several hundred kilometers away from the polygon, yet the document still returns.  Also, it seems that regardless of origin point or polygon location the calculated geodist is always 20015.115
>
> Example query:
> select?d=1&fl=ID,latlng,geodist()&fq=%7B!geofilt%7D&indent=on&pt=33.9798087,-94.3286133&q=*:*&sfield=latlng&wt=json
>
> Example documents coming back anyway:
> "docs": [
> {
> "latlng": ["POLYGON((33.7942704 -84.4412613, 33.7100611 -84.4028091, 33.7802888 -84.3279648, 33.7942704 -84.4412613))"],
> "ID": "284598223",
> "geodist()": 20015.115
> },
> {
> "latlng": ["POLYGON((33.7942704 -84.4412613, 33.7100611 -84.4028091, 33.7802888 -84.3279648, 33.7942704 -84.4412613))"],
> "ID": "284600596",
> "geodist()": 20015.115
> }
> ]
>
>
> Anyone who has experience in this area can you point us in the right direction about what we’re doing incorrectly with either how we are indexing the data and/or how we are querying against the polygons.
>
> Thank you,
>
>
> --
> Marshall Sanders
> Principal Software Engineer
> Autotrader.com
> [hidden email]<mailto:[hidden email]>
>
>

--
Ere Maijala
Kansalliskirjasto / The National Library of Finland
Reply | Threaded
Open this post in threaded view
|

Re: Solr Geospatial Polygon Indexing/Querying Issue

Sanders, Marshall (CAI - Atlanta)
My example query has d=1 as the first parameter, so none of the results should be coming back, but they are which makes it seem like it's not doing any geofiltering for some reason.

On 7/24/19, 2:06 AM, "Ere Maijala" <[hidden email]> wrote:

    I think you might be missing the d parameter in geofilt. I'm not sure if
    geofilt actually does anything useful without it.
   
    Regards,
    Ere
   
    Sanders, Marshall (CAI - Atlanta) kirjoitti 23.7.2019 klo 21.32:
    > We’re trying to index a polygon into solr and then filter/calculate geodist on the polygon (ideally we actually want a circle, but it looks like that’s not really supported officially by wkt/geojson and instead you have to switch format=”legacy” which seems like something that might be removed in the future so don’t want to rely on it).
    >
    > Here’s the info from schema:
    > <field name="latlng" type="location_rpt" indexed="true" stored="true" multiValued="true"/>
    >
    > <fieldType name="location_rpt" class="solr.SpatialRecursivePrefixTreeFieldType"
    >                    geo="true" distErrPct="0.025" maxDistErr="0.000009" distanceUnits="kilometers"
    >                     spatialContextFactory="Geo3D"/>
    >
    >
    > We’ve tried indexing some different data, but to keep it as simple as possible we started with a triangle (will eventually add more points to approximate a circle).  Here’s an example document that we’ve added just for testing:
    >
    > {
    > "latlng": ["POLYGON((33.7942704 -84.4412613, 33.7100611 -84.4028091, 33.7802888 -84.3279648, 33.7942704 -84.4412613))"],
    > "ID": "284598223"
    > }
    >
    >
    > However, it seems like filtering/distance calculations aren’t working (at least not the way we are used to doing it for points).  Here’s an example query where the pt is several hundred kilometers away from the polygon, yet the document still returns.  Also, it seems that regardless of origin point or polygon location the calculated geodist is always 20015.115
    >
    > Example query:
    > select?d=1&fl=ID,latlng,geodist()&fq=%7B!geofilt%7D&indent=on&pt=33.9798087,-94.3286133&q=*:*&sfield=latlng&wt=json
    >
    > Example documents coming back anyway:
    > "docs": [
    > {
    > "latlng": ["POLYGON((33.7942704 -84.4412613, 33.7100611 -84.4028091, 33.7802888 -84.3279648, 33.7942704 -84.4412613))"],
    > "ID": "284598223",
    > "geodist()": 20015.115
    > },
    > {
    > "latlng": ["POLYGON((33.7942704 -84.4412613, 33.7100611 -84.4028091, 33.7802888 -84.3279648, 33.7942704 -84.4412613))"],
    > "ID": "284600596",
    > "geodist()": 20015.115
    > }
    > ]
    >
    >
    > Anyone who has experience in this area can you point us in the right direction about what we’re doing incorrectly with either how we are indexing the data and/or how we are querying against the polygons.
    >
    > Thank you,
    >
    >
    > --
    > Marshall Sanders
    > Principal Software Engineer
    > Autotrader.com
    > [hidden email]<mailto:[hidden email]>
    >
    >
   
    --
    Ere Maijala
    Kansalliskirjasto / The National Library of Finland
   

Reply | Threaded
Open this post in threaded view
|

Re: Solr Geospatial Polygon Indexing/Querying Issue

Ere Maijala
Oops, sorry! Don't know how I missed that.

Have you tested if it makes any difference if you put the sfield
parameter inside the fq like in the example
(https://lucene.apache.org/solr/guide/8_1/spatial-search.html#geofilt)?
We actually put pt and d in there too, e.g.

{!geofilt+sfield%3Dlocation_geo+pt%3D61.2%2C24.9+d%3D1}

--Ere

Sanders, Marshall (CAI - Atlanta) kirjoitti 24.7.2019 klo 16.33:

> My example query has d=1 as the first parameter, so none of the results should be coming back, but they are which makes it seem like it's not doing any geofiltering for some reason.
>
> On 7/24/19, 2:06 AM, "Ere Maijala" <[hidden email]> wrote:
>
>     I think you might be missing the d parameter in geofilt. I'm not sure if
>     geofilt actually does anything useful without it.
>    
>     Regards,
>     Ere
>    
>     Sanders, Marshall (CAI - Atlanta) kirjoitti 23.7.2019 klo 21.32:
>     > We’re trying to index a polygon into solr and then filter/calculate geodist on the polygon (ideally we actually want a circle, but it looks like that’s not really supported officially by wkt/geojson and instead you have to switch format=”legacy” which seems like something that might be removed in the future so don’t want to rely on it).
>     >
>     > Here’s the info from schema:
>     > <field name="latlng" type="location_rpt" indexed="true" stored="true" multiValued="true"/>
>     >
>     > <fieldType name="location_rpt" class="solr.SpatialRecursivePrefixTreeFieldType"
>     >                    geo="true" distErrPct="0.025" maxDistErr="0.000009" distanceUnits="kilometers"
>     >                     spatialContextFactory="Geo3D"/>
>     >
>     >
>     > We’ve tried indexing some different data, but to keep it as simple as possible we started with a triangle (will eventually add more points to approximate a circle).  Here’s an example document that we’ve added just for testing:
>     >
>     > {
>     > "latlng": ["POLYGON((33.7942704 -84.4412613, 33.7100611 -84.4028091, 33.7802888 -84.3279648, 33.7942704 -84.4412613))"],
>     > "ID": "284598223"
>     > }
>     >
>     >
>     > However, it seems like filtering/distance calculations aren’t working (at least not the way we are used to doing it for points).  Here’s an example query where the pt is several hundred kilometers away from the polygon, yet the document still returns.  Also, it seems that regardless of origin point or polygon location the calculated geodist is always 20015.115
>     >
>     > Example query:
>     > select?d=1&fl=ID,latlng,geodist()&fq=%7B!geofilt%7D&indent=on&pt=33.9798087,-94.3286133&q=*:*&sfield=latlng&wt=json
>     >
>     > Example documents coming back anyway:
>     > "docs": [
>     > {
>     > "latlng": ["POLYGON((33.7942704 -84.4412613, 33.7100611 -84.4028091, 33.7802888 -84.3279648, 33.7942704 -84.4412613))"],
>     > "ID": "284598223",
>     > "geodist()": 20015.115
>     > },
>     > {
>     > "latlng": ["POLYGON((33.7942704 -84.4412613, 33.7100611 -84.4028091, 33.7802888 -84.3279648, 33.7942704 -84.4412613))"],
>     > "ID": "284600596",
>     > "geodist()": 20015.115
>     > }
>     > ]
>     >
>     >
>     > Anyone who has experience in this area can you point us in the right direction about what we’re doing incorrectly with either how we are indexing the data and/or how we are querying against the polygons.
>     >
>     > Thank you,
>     >
>     >
>     > --
>     > Marshall Sanders
>     > Principal Software Engineer
>     > Autotrader.com
>     > [hidden email]<mailto:[hidden email]>
>     >
>     >
>    
>     --
>     Ere Maijala
>     Kansalliskirjasto / The National Library of Finland
>    
>

--
Ere Maijala
Kansalliskirjasto / The National Library of Finland
Reply | Threaded
Open this post in threaded view
|

Re: Solr Geospatial Polygon Indexing/Querying Issue

Sanders, Marshall (CAI - Atlanta)
That didn't seem to work either.  I think there must be something wrong with how we're indexing/storing the polygon and/or how we've configured the field/querying it.  The docs are so sparse on this (  

Here's the response:

{
  "responseHeader":{
    "status":0,
    "QTime":1,
    "params":{
      "q":"*:*",
      "fl":"latlng,ID",
      "fq":"{!geofilt sfield=latlng pt=33.3786,-94.8985 d=1}",
      "rows":"2",
      "_":"1564065725241"}},
  "response":{"numFound":10,"start":0,"docs":[
      {
        "latlng":["POLYGON((33.7942704 -84.4412613, 33.7100611 -84.4028091, 33.7802888 -84.3279648, 33.7942704 -84.4412613))"],
        "ID":"284598223"},
      {
        "latlng":["POLYGON((33.7942704 -84.4412613, 33.7100611 -84.4028091, 33.7802888 -84.3279648, 33.7942704 -84.4412613))"],
        "ID":"284600596"}]
  }}



On 7/25/19, 2:51 AM, "Ere Maijala" <[hidden email]> wrote:

    Oops, sorry! Don't know how I missed that.
   
    Have you tested if it makes any difference if you put the sfield
    parameter inside the fq like in the example
    (https://urldefense.proofpoint.com/v2/url?u=https-3A__lucene.apache.org_solr_guide_8-5F1_spatial-2Dsearch.html-23geofilt-29-3F&d=DwIDaQ&c=hrETxhO8sRCXAcJITi-bu62jJ43QQVS6-BatTNT-3bs&r=3lL1Fjs6t-l8MLo9jYFBo7cXQNBxZBB5BXFvpvXk4cU&m=JR0_KNI-GjB0_I3qC1jsCqb3SySydbHO0e6W5SeYKH4&s=76D0RQHnWeh9KYT1Kx4Q4rz3lMgPR3krYF8uuKVtFaU&e= 
    We actually put pt and d in there too, e.g.
   
    {!geofilt+sfield%3Dlocation_geo+pt%3D61.2%2C24.9+d%3D1}
   
    --Ere
   
    Sanders, Marshall (CAI - Atlanta) kirjoitti 24.7.2019 klo 16.33:
    > My example query has d=1 as the first parameter, so none of the results should be coming back, but they are which makes it seem like it's not doing any geofiltering for some reason.
    >
    > On 7/24/19, 2:06 AM, "Ere Maijala" <[hidden email]> wrote:
    >
    >     I think you might be missing the d parameter in geofilt. I'm not sure if
    >     geofilt actually does anything useful without it.
    >    
    >     Regards,
    >     Ere
    >    
    >     Sanders, Marshall (CAI - Atlanta) kirjoitti 23.7.2019 klo 21.32:
    >     > We’re trying to index a polygon into solr and then filter/calculate geodist on the polygon (ideally we actually want a circle, but it looks like that’s not really supported officially by wkt/geojson and instead you have to switch format=”legacy” which seems like something that might be removed in the future so don’t want to rely on it).
    >     >
    >     > Here’s the info from schema:
    >     > <field name="latlng" type="location_rpt" indexed="true" stored="true" multiValued="true"/>
    >     >
    >     > <fieldType name="location_rpt" class="solr.SpatialRecursivePrefixTreeFieldType"
    >     >                    geo="true" distErrPct="0.025" maxDistErr="0.000009" distanceUnits="kilometers"
    >     >                     spatialContextFactory="Geo3D"/>
    >     >
    >     >
    >     > We’ve tried indexing some different data, but to keep it as simple as possible we started with a triangle (will eventually add more points to approximate a circle).  Here’s an example document that we’ve added just for testing:
    >     >
    >     > {
    >     > "latlng": ["POLYGON((33.7942704 -84.4412613, 33.7100611 -84.4028091, 33.7802888 -84.3279648, 33.7942704 -84.4412613))"],
    >     > "ID": "284598223"
    >     > }
    >     >
    >     >
    >     > However, it seems like filtering/distance calculations aren’t working (at least not the way we are used to doing it for points).  Here’s an example query where the pt is several hundred kilometers away from the polygon, yet the document still returns.  Also, it seems that regardless of origin point or polygon location the calculated geodist is always 20015.115
    >     >
    >     > Example query:
    >     > select?d=1&fl=ID,latlng,geodist()&fq=%7B!geofilt%7D&indent=on&pt=33.9798087,-94.3286133&q=*:*&sfield=latlng&wt=json
    >     >
    >     > Example documents coming back anyway:
    >     > "docs": [
    >     > {
    >     > "latlng": ["POLYGON((33.7942704 -84.4412613, 33.7100611 -84.4028091, 33.7802888 -84.3279648, 33.7942704 -84.4412613))"],
    >     > "ID": "284598223",
    >     > "geodist()": 20015.115
    >     > },
    >     > {
    >     > "latlng": ["POLYGON((33.7942704 -84.4412613, 33.7100611 -84.4028091, 33.7802888 -84.3279648, 33.7942704 -84.4412613))"],
    >     > "ID": "284600596",
    >     > "geodist()": 20015.115
    >     > }
    >     > ]
    >     >
    >     >
    >     > Anyone who has experience in this area can you point us in the right direction about what we’re doing incorrectly with either how we are indexing the data and/or how we are querying against the polygons.
    >     >
    >     > Thank you,
    >     >
    >     >
    >     > --
    >     > Marshall Sanders
    >     > Principal Software Engineer
    >     > Autotrader.com
    >     > [hidden email]<mailto:[hidden email]>
    >     >
    >     >
    >    
    >     --
    >     Ere Maijala
    >     Kansalliskirjasto / The National Library of Finland
    >    
    >
   
    --
    Ere Maijala
    Kansalliskirjasto / The National Library of Finland
   

Reply | Threaded
Open this post in threaded view
|

Re: Solr Geospatial Polygon Indexing/Querying Issue

david.w.smiley@gmail.com
In reply to this post by Sanders, Marshall (CAI - Atlanta)
Hello Marshall,

I worked on a lot of this functionality.  I have lots to say:

* Personally, I find it highly confusing to have a field named "latlng" and
have it be anything other than a simple point -- it's all you have if given
a single latitude longitude pair.  If you intend for the data to be a
circle (either exactly or approximated) then perhaps call it latLngCircle
* geodist() and for that matter any other attempt to get the distance to a
non-point shape is not going to work -- either error or confusing results;
I forget.  This is hard to do and the logic isn't there for it, and
probably wouldn't perform to user's expectations if it did.  This ought to
be documented but seems not to be.
* Generally RptWithGeometrySpatialField should be used
over SpatialRecursivePrefixTreeFieldType unless you want heatmaps or are
willing to make trade-offs in higher index size and lossy precision in
order to get faster search.  It's up to you; if you benchmark both I'd love
to hear how it went.
* In WKT format, the ordinate order is "X Y" (thus longitude then
latitude).  Looking at your triangle, it is extremely close to Antarctica,
and I'm skeptical you intended that. This is not directly documented AFAICT
but it's such a common mistake that it ought to be called out in the docs.
* I see you are using Geo3D, which is not the default.  Geo3D is strict
about the coordinate order -- counter-clickwise.  Your triangle is
clockwise and thus it has an inverted interpretation -- thus it's a shape
that covers nearly the whole globe.  I recently documented this
https://issues.apache.org/jira/browse/SOLR-13467 but it's not published yet
since it's so new.
* You can absolutely index a circle in Solr -- this is something cool and
somewhat unique. And you don't need format=legacy.  The documentation needs
to call this out better, though it at least refers to circles as a
"buffered point" which is the currently supported way of representing it,
and it does have one example.  Search for "BUFFER" and you'll see a
WKT-like syntax to do it.  BUFFER is not standard WKT; it was added on to
do this.  The first arg is a X Y center, and 2nd arg is a distance in
decimal degrees (not km).  BTW Geo3D is a good choice here but not
essential either.

Back to your core requirement -- you want to index circles and sort results
by distance.  Can you please elaborate better on this... distance to the
outer ring of the circle or the center point?  Center point is easy to do
simply by putting the center point additionally in a field using
LatLonPointSpatialField and use geodist referring to that.  Also,

FYI geodist() is a function that can take arguments directly which makes
more sense when multiple spatial fields are in play.  Sadly this aspect is
not documented.  Suffice it to say, if you do geodist(latLng) (maybe
quoted?) then it'll use that field, and parse "pt" param from the request.

~ David Smiley
Apache Lucene/Solr Search Developer
http://www.linkedin.com/in/davidwsmiley


On Tue, Jul 23, 2019 at 2:32 PM Sanders, Marshall (CAI - Atlanta) <
[hidden email]> wrote:

> We’re trying to index a polygon into solr and then filter/calculate
> geodist on the polygon (ideally we actually want a circle, but it looks
> like that’s not really supported officially by wkt/geojson and instead you
> have to switch format=”legacy” which seems like something that might be
> removed in the future so don’t want to rely on it).
>
> Here’s the info from schema:
> <field name="latlng" type="location_rpt" indexed="true" stored="true"
> multiValued="true"/>
>
> <fieldType name="location_rpt"
> class="solr.SpatialRecursivePrefixTreeFieldType"
>                    geo="true" distErrPct="0.025" maxDistErr="0.000009"
> distanceUnits="kilometers"
>                     spatialContextFactory="Geo3D"/>
>
>
> We’ve tried indexing some different data, but to keep it as simple as
> possible we started with a triangle (will eventually add more points to
> approximate a circle).  Here’s an example document that we’ve added just
> for testing:
>
> {
> "latlng": ["POLYGON((33.7942704 -84.4412613, 33.7100611 -84.4028091,
> 33.7802888 -84.3279648, 33.7942704 -84.4412613))"],
> "ID": "284598223"
> }
>
>
> However, it seems like filtering/distance calculations aren’t working (at
> least not the way we are used to doing it for points).  Here’s an example
> query where the pt is several hundred kilometers away from the polygon, yet
> the document still returns.  Also, it seems that regardless of origin point
> or polygon location the calculated geodist is always 20015.115
>
> Example query:
>
> select?d=1&fl=ID,latlng,geodist()&fq=%7B!geofilt%7D&indent=on&pt=33.9798087,-94.3286133&q=*:*&sfield=latlng&wt=json
>
> Example documents coming back anyway:
> "docs": [
> {
> "latlng": ["POLYGON((33.7942704 -84.4412613, 33.7100611 -84.4028091,
> 33.7802888 -84.3279648, 33.7942704 -84.4412613))"],
> "ID": "284598223",
> "geodist()": 20015.115
> },
> {
> "latlng": ["POLYGON((33.7942704 -84.4412613, 33.7100611 -84.4028091,
> 33.7802888 -84.3279648, 33.7942704 -84.4412613))"],
> "ID": "284600596",
> "geodist()": 20015.115
> }
> ]
>
>
> Anyone who has experience in this area can you point us in the right
> direction about what we’re doing incorrectly with either how we are
> indexing the data and/or how we are querying against the polygons.
>
> Thank you,
>
>
> --
> Marshall Sanders
> Principal Software Engineer
> Autotrader.com
> [hidden email]<mailto:[hidden email]>
>
>
>
Reply | Threaded
Open this post in threaded view
|

Re: Solr Geospatial Polygon Indexing/Querying Issue

Sanders, Marshall (CAI - Atlanta)
David,

Firstly, thanks for putting together such a thorough email it helps a lot to understand some of the things we were just guessing at because (as you mentioned a few times) the documentation around all of this is rather sparse.

I’ll explain the context around the use case we’re trying to solve and then attempt to respond as best I can to each of your points.  What we have is a list of documents that in our case the location is sometimes a point and sometimes a circle.  These basically represent (in our case) inventory at a physical location (point) or inventory that can be delivered to you within X km (configurable per document) which represents the circle use case.  We want to be able to allow a user to say I want all documents within X distance of my location, but also all documents that are able to be delivered to your point where the delivery distance is defined on the inventory (creating the circle).

This is why we were actually trying to combine both point based data and poly/circle data into a single geospatial field, since I don’t believe you could do something like fq=geofilt(latlng, x, y, d) OR geofilt(latlngCircle, x, y, 1) but perhaps we’re just not getting quite the right syntax, etc.

* Personally, I find it highly confusing to have a field named "latlng" and have it be anything other than a simple point -- it's all you have if given a single latitude longitude pair.  If you intend for the data to be a circle (either exactly or approximated) then perhaps call it latLngCircle

        - This is happening because we’re trying to combine two different use cases into a single field, since I don’t think we have that option from the query side.  The name is really just us re-using our current field for this exploration, but would probably end up being named something different.

* geodist() and for that matter any other attempt to get the distance to a non-point shape is not going to work -- either error or confusing results; I forget.  This is hard to do and the logic isn't there for it, and probably wouldn't perform to user's expectations if it did.  This ought to be documented but seems not to be.

        -Good to know, so no matter what we’ll have to have a point value stored somewhere for each document and calculate geodist on that.

* Generally RptWithGeometrySpatialField should be used over SpatialRecursivePrefixTreeFieldType unless you want heatmaps or are willing to make trade-offs in higher index size and lossy precision in order to get faster search.  It's up to you; if you benchmark both I'd love to hear how it went.

        -We may explore both but typically we’re more interested in speed than accuracy, benchmarking it may be a very interesting exercise however.  For sorting for instance we’re actually using sqedist instead of geodist because we’re not overly concerned about sorting accuracy.

* In WKT format, the ordinate order is "X Y" (thus longitude then latitude).  Looking at your triangle, it is extremely close to Antarctica, and I'm skeptical you intended that. This is not directly documented AFAICT but it's such a common mistake that it ought to be called out in the docs.

        -Definitely did not intend it to be close to Antarctica,  I think we tried both but probably went back to lat,long and was definitely more common in our (failed) testing.


* I see you are using Geo3D, which is not the default.  Geo3D is strict about the coordinate order -- counter-clickwise.  Your triangle is clockwise and thus it has an inverted interpretation -- thus it's a shape that covers nearly the whole globe.  I recently documented this https://issues.apache.org/jira/browse/SOLR-13467 but it's not published yet since it's so new.

        - Thanks for this clarification as well.  I had read this in the WKT docs too, again something we tried but really weren’t sure about what the right answer was and had been going back and forth on.  The documentation seems to specify that you need to specify either JTS or Geo3d, but doesn’t provide much info/guidance about which to use when and since JTS required adding another jar manually and therefore complicates our build process significantly (at least vs using Geo3D) we tried Geo3D.  I’d love to hear more about the tradeoffs and other considerations between the two, but sounds like we should switch to JTS (the default, correct?)


* You can absolutely index a circle in Solr -- this is something cool and somewhat unique. And you don't need format=legacy.  The documentation needs to call this out better, though it at least refers to circles as a "buffered point" which is the currently supported way of representing it, and it does have one example.  Search for "BUFFER" and you'll see a WKT-like syntax to do it.  BUFFER is not standard WKT; it was added on to do this.  The first arg is a X Y center, and 2nd arg is a distance in decimal degrees (not km).  BTW Geo3D is a good choice here but not essential either.

- This sounds very promising and we’ll definitely spend some time here because it may ultimately be what we really want to use, sounds like Geo3D may actually be the right choice now?

One other question I have is what the behavior will be if both my point and my search radius are inside of the circle/polygon entirely?  Like geofilt(x,y,10) and a buffered point (in km instead decimal degrees for simplicity) of BUFFER(x y 20km).  Will this document return even though my filter is entirely inside the polygon, or is it looking for edge intersections?

Thanks so much for the response/help!!!


-Marshall Sanders



On 7/26/19, 12:01 AM, "David Smiley" <[hidden email]> wrote:

    Hello Marshall,
   
    I worked on a lot of this functionality.  I have lots to say:
   
    * Personally, I find it highly confusing to have a field named "latlng" and
    have it be anything other than a simple point -- it's all you have if given
    a single latitude longitude pair.  If you intend for the data to be a
    circle (either exactly or approximated) then perhaps call it latLngCircle
    * geodist() and for that matter any other attempt to get the distance to a
    non-point shape is not going to work -- either error or confusing results;
    I forget.  This is hard to do and the logic isn't there for it, and
    probably wouldn't perform to user's expectations if it did.  This ought to
    be documented but seems not to be.
    * Generally RptWithGeometrySpatialField should be used
    over SpatialRecursivePrefixTreeFieldType unless you want heatmaps or are
    willing to make trade-offs in higher index size and lossy precision in
    order to get faster search.  It's up to you; if you benchmark both I'd love
    to hear how it went.
    * In WKT format, the ordinate order is "X Y" (thus longitude then
    latitude).  Looking at your triangle, it is extremely close to Antarctica,
    and I'm skeptical you intended that. This is not directly documented AFAICT
    but it's such a common mistake that it ought to be called out in the docs.
    * I see you are using Geo3D, which is not the default.  Geo3D is strict
    about the coordinate order -- counter-clickwise.  Your triangle is
    clockwise and thus it has an inverted interpretation -- thus it's a shape
    that covers nearly the whole globe.  I recently documented this
    https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_browse_SOLR-2D13467&d=DwIFaQ&c=hrETxhO8sRCXAcJITi-bu62jJ43QQVS6-BatTNT-3bs&r=3lL1Fjs6t-l8MLo9jYFBo_kY10nHjYnyV94cayAiWXc&m=qODo0eqhR9JwDbcVfadRxP0k6Pc2jQizOaf9CBky6Ow&s=gRY9FeOVxx3Oe1nLMsc4d2ATvX81qtF0UmuyCRI2fRc&e=  but it's not published yet
    since it's so new.
    * You can absolutely index a circle in Solr -- this is something cool and
    somewhat unique. And you don't need format=legacy.  The documentation needs
    to call this out better, though it at least refers to circles as a
    "buffered point" which is the currently supported way of representing it,
    and it does have one example.  Search for "BUFFER" and you'll see a
    WKT-like syntax to do it.  BUFFER is not standard WKT; it was added on to
    do this.  The first arg is a X Y center, and 2nd arg is a distance in
    decimal degrees (not km).  BTW Geo3D is a good choice here but not
    essential either.
   
    Back to your core requirement -- you want to index circles and sort results
    by distance.  Can you please elaborate better on this... distance to the
    outer ring of the circle or the center point?  Center point is easy to do
    simply by putting the center point additionally in a field using
    LatLonPointSpatialField and use geodist referring to that.  Also,
   
    FYI geodist() is a function that can take arguments directly which makes
    more sense when multiple spatial fields are in play.  Sadly this aspect is
    not documented.  Suffice it to say, if you do geodist(latLng) (maybe
    quoted?) then it'll use that field, and parse "pt" param from the request.
   
    ~ David Smiley
    Apache Lucene/Solr Search Developer
    https://urldefense.proofpoint.com/v2/url?u=http-3A__www.linkedin.com_in_davidwsmiley&d=DwIFaQ&c=hrETxhO8sRCXAcJITi-bu62jJ43QQVS6-BatTNT-3bs&r=3lL1Fjs6t-l8MLo9jYFBo_kY10nHjYnyV94cayAiWXc&m=qODo0eqhR9JwDbcVfadRxP0k6Pc2jQizOaf9CBky6Ow&s=naIBT_8ZUFFX6dx4Oxe3SqU-K5xw51R2C2dsoalFcDY&e= 
   
   
    On Tue, Jul 23, 2019 at 2:32 PM Sanders, Marshall (CAI - Atlanta) <
    [hidden email]> wrote:
   
    > We’re trying to index a polygon into solr and then filter/calculate
    > geodist on the polygon (ideally we actually want a circle, but it looks
    > like that’s not really supported officially by wkt/geojson and instead you
    > have to switch format=”legacy” which seems like something that might be
    > removed in the future so don’t want to rely on it).
    >
    > Here’s the info from schema:
    > <field name="latlng" type="location_rpt" indexed="true" stored="true"
    > multiValued="true"/>
    >
    > <fieldType name="location_rpt"
    > class="solr.SpatialRecursivePrefixTreeFieldType"
    >                    geo="true" distErrPct="0.025" maxDistErr="0.000009"
    > distanceUnits="kilometers"
    >                     spatialContextFactory="Geo3D"/>
    >
    >
    > We’ve tried indexing some different data, but to keep it as simple as
    > possible we started with a triangle (will eventually add more points to
    > approximate a circle).  Here’s an example document that we’ve added just
    > for testing:
    >
    > {
    > "latlng": ["POLYGON((33.7942704 -84.4412613, 33.7100611 -84.4028091,
    > 33.7802888 -84.3279648, 33.7942704 -84.4412613))"],
    > "ID": "284598223"
    > }
    >
    >
    > However, it seems like filtering/distance calculations aren’t working (at
    > least not the way we are used to doing it for points).  Here’s an example
    > query where the pt is several hundred kilometers away from the polygon, yet
    > the document still returns.  Also, it seems that regardless of origin point
    > or polygon location the calculated geodist is always 20015.115
    >
    > Example query:
    >
    > select?d=1&fl=ID,latlng,geodist()&fq=%7B!geofilt%7D&indent=on&pt=33.9798087,-94.3286133&q=*:*&sfield=latlng&wt=json
    >
    > Example documents coming back anyway:
    > "docs": [
    > {
    > "latlng": ["POLYGON((33.7942704 -84.4412613, 33.7100611 -84.4028091,
    > 33.7802888 -84.3279648, 33.7942704 -84.4412613))"],
    > "ID": "284598223",
    > "geodist()": 20015.115
    > },
    > {
    > "latlng": ["POLYGON((33.7942704 -84.4412613, 33.7100611 -84.4028091,
    > 33.7802888 -84.3279648, 33.7942704 -84.4412613))"],
    > "ID": "284600596",
    > "geodist()": 20015.115
    > }
    > ]
    >
    >
    > Anyone who has experience in this area can you point us in the right
    > direction about what we’re doing incorrectly with either how we are
    > indexing the data and/or how we are querying against the polygons.
    >
    > Thank you,
    >
    >
    > --
    > Marshall Sanders
    > Principal Software Engineer
    > Autotrader.com
    > [hidden email]<mailto:[hidden email]>
    >
    >
    >
   

Reply | Threaded
Open this post in threaded view
|

Re: Solr Geospatial Polygon Indexing/Querying Issue

david.w.smiley@gmail.com
On Tue, Jul 30, 2019 at 4:41 PM Sanders, Marshall (CAI - Atlanta) <
[hidden email]> wrote:

> I’ll explain the context around the use case we’re trying to solve and
> then attempt to respond as best I can to each of your points.  What we have
> is a list of documents that in our case the location is sometimes a point
> and sometimes a circle.  These basically represent (in our case) inventory
> at a physical location (point) or inventory that can be delivered to you
> within X km (configurable per document) which represents the circle use
> case.  We want to be able to allow a user to say I want all documents
> within X distance of my location, but also all documents that are able to
> be delivered to your point where the delivery distance is defined on the
> inventory (creating the circle).
>

That background info helps me understand things!


> This is why we were actually trying to combine both point based data and
> poly/circle data into a single geospatial field, since I don’t believe you
> could do something like fq=geofilt(latlng, x, y, d) OR
> geofilt(latlngCircle, x, y, 1) but perhaps we’re just not getting quite the
> right syntax, etc.
>

Oh quite possible :-).   It would look something like this:   fq= {!geofilt
sfield=latLng d=queryDistance} OR {!geofilt sfield=latLngCircle
d=0}&pt=myLocation
Notice the space after the fq= which is critical so that the first
local-params (i.e. first geofilt) does not "own" the entire filter query
string end to end.  Due to the space, the whole thing is parsed by the
default lucene/standard query parser, and then we have the two clauses
clearly there.  The second geofilt has distance 0; it'd be nice if it
internally optimized to a point but nonetheless it's fine.  Alternatively
there's another syntax to embed WKT where you can specify a point
explicitly... something like this: ...  {!field f=latLngCircle
v="Intersects(POINT(x y))"}

That said, it's also just fine to do as you were planning -- have one RPT
based field for the shape representation (mixture of points and circles),
and one LLPSF field purely for the center point that is used for sorting.
That LLPSF field would be indexed=false docValues=true since you wouldn't
be filtering on it.

>
> * Generally RptWithGeometrySpatialField should be used over
> SpatialRecursivePrefixTreeFieldType unless you want heatmaps or are willing
> to make trade-offs in higher index size and lossy precision in order to get
> faster search.  It's up to you; if you benchmark both I'd love to hear how
> it went.
>
>         -We may explore both but typically we’re more interested in speed
> than accuracy, benchmarking it may be a very interesting exercise however.
> For sorting for instance we’re actually using sqedist instead of geodist
> because we’re not overly concerned about sorting accuracy.
>

Okay... though geodist on a LLPSF field is remarkably optimized.


> * I see you are using Geo3D, which is not the default.  Geo3D is strict
> about the coordinate order -- counter-clickwise.  Your triangle is
> clockwise and thus it has an inverted interpretation -- thus it's a shape
> that covers nearly the whole globe.  I recently documented this
> https://issues.apache.org/jira/browse/SOLR-13467 but it's not published
> yet since it's so new.
>
>         - Thanks for this clarification as well.  I had read this in the
> WKT docs too, again something we tried but really weren’t sure about what
> the right answer was and had been going back and forth on.  The
> documentation seems to specify that you need to specify either JTS or
> Geo3d, but doesn’t provide much info/guidance about which to use when and
> since JTS required adding another jar manually and therefore complicates
> our build process significantly (at least vs using Geo3D) we tried Geo3D.
> I’d love to hear more about the tradeoffs and other considerations between
> the two, but sounds like we should switch to JTS (the default, correct?)
>

The default spatialContextFactory is something internal; not JTS or Geo3D.
Based on your requirements, you needn't specify either JTS or Geo3D, mostly
because you don't actually need polygons.  I wouldn't bother specifying it
unless you want to experiment with some benchmarking.  JTS would give you
nothing here but Geo3D + prefixTree=S2 (in Solr 8.2) might be faster.


> * You can absolutely index a circle in Solr -- this is something cool and
> somewhat unique. And you don't need format=legacy.  The documentation needs
> to call t out better, though it at least refers to circles as a "buffered
> point" which is the currently supported way of representing it, and it does
> have one example.  Search for "BUFFER" and you'll see a WKT-like syntax to
> do it.  BUFFER is not standard WKT; it was added on to do this.  The first
> arg is a X Y center, and 2nd arg is a distance in decimal degrees (not
> km).  BTW Geo3D is a good choice here but not essential either.
>
> -       This sounds very promising and we’ll definitely spend some time
> here because it may ultimately be what we really want to use, sounds like
> Geo3D may actually be the right choice now?\
>

RE Geo3D, shrug... maybe Geo3D with S2 could be a little faster, perhaps.
For lots of circles perhaps, but points, I doubt it.


> One other question I have is what the behavior will be if both my point
> and my search radius are inside of the circle/polygon entirely?  Like
> geofilt(x,y,10) and a buffered point (in km instead decimal degrees for
> simplicity) of BUFFER(x y 20km).  Will this document return even though my
> filter is entirely inside the polygon, or is it looking for edge
> intersections?
>

It would match; the geofilt and buffered point both represent regions that
intersect.

No prob.