Geospatial clustering + zoom in/out help

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Geospatial clustering + zoom in/out help

Bojan Šmid
Hi,

I have an index with 300K docs with lat,lon. I need to cluster the docs
based on lat,lon for display in the UI. The user then needs to be able to
click on any cluster and zoom in (up to 11 levels deep).

I'm using Solr 4.6 and I'm wondering how best to implement this efficiently?

A bit more specific questions below.

I need to:

1) cluster data points at different zoom levels

2) click on a specific cluster and zoom in

3) be able to select a region (bounding box or polygon) and show clusters
in the selected area

What's the best way to implement this so that queries are fast?

What I thought I would try, but maybe there are better ways:

* divide the world in NxM large squares and then each of these squares into
4 more squares, and so on - 11 levels deep

* at index time figure out all squares (at all 11 levels) each data point
belongs to and index that info into 11 different fields: e.g.
<id=1 name=foo lat=x lon=y zoom1=square1_62  zoom2=square1_62_47
zoom3=square1_62_47_33 ....>

* at search time, use field collapsing on zoomX field to get which docs
belong to which square on particular level

* calculate center point of each square (by calculating mean value of
positions for all points in that square) using StatsComponent (facet on
zoomX field, avg on lat and lon fields) - I would consider those squares as
separate clusters (one square is one cluster) and center points of those
squares as center points of clusters derived from them

I *think* the problem with this approach is that:

* there will be many unique fields for bigger zoom levels, which means
field collapsing / StatsComponent maaay not work fast enough

* clusters will not look very natural because I would have many clusters on
each zoom level and what are "real" geographical clusters would be
displayed as multiple clusters since their points would in some cases be
dispersed into multiple squares. But that may be OK

* a lot will depend on how the squares are calculated - linearly dividing
360 degrees by N to get "equal" size squares in degrees would produce
issues with "real" square sizes and counts of points in each of them


So I'm wondering if there is a better way?

Thanks,


  Bojan
Reply | Threaded
Open this post in threaded view
|

RE: Geospatial clustering + zoom in/out help

David Smiley
Hi Bojan.

You've got some good ideas here along the lines of some that others have tried.  I've through together a page on the wiki about this subject some time ago that I'm sure you will find interesting.  It references a relevant stack-overflow post, and also a presentation at DrupalCon which had a segment from a guy using the same approach you suggest here involving field-collapsing and/or stats components.  The video shows it in action.

http://wiki.apache.org/solr/SpatialClustering

It would be helpful for everyone if you share your experience with whatever you choose, once you give an approach a try.

~ David
________________________________________
From: Bojan Šmid [[hidden email]]
Sent: Thursday, January 30, 2014 1:15 PM
To: [hidden email]
Subject: Geospatial clustering + zoom in/out help

Hi,

I have an index with 300K docs with lat,lon. I need to cluster the docs
based on lat,lon for display in the UI. The user then needs to be able to
click on any cluster and zoom in (up to 11 levels deep).

I'm using Solr 4.6 and I'm wondering how best to implement this efficiently?

A bit more specific questions below.

I need to:

1) cluster data points at different zoom levels

2) click on a specific cluster and zoom in

3) be able to select a region (bounding box or polygon) and show clusters
in the selected area

What's the best way to implement this so that queries are fast?

What I thought I would try, but maybe there are better ways:

* divide the world in NxM large squares and then each of these squares into
4 more squares, and so on - 11 levels deep

* at index time figure out all squares (at all 11 levels) each data point
belongs to and index that info into 11 different fields: e.g.
<id=1 name=foo lat=x lon=y zoom1=square1_62  zoom2=square1_62_47
zoom3=square1_62_47_33 ....>

* at search time, use field collapsing on zoomX field to get which docs
belong to which square on particular level

* calculate center point of each square (by calculating mean value of
positions for all points in that square) using StatsComponent (facet on
zoomX field, avg on lat and lon fields) - I would consider those squares as
separate clusters (one square is one cluster) and center points of those
squares as center points of clusters derived from them

I *think* the problem with this approach is that:

* there will be many unique fields for bigger zoom levels, which means
field collapsing / StatsComponent maaay not work fast enough

* clusters will not look very natural because I would have many clusters on
each zoom level and what are "real" geographical clusters would be
displayed as multiple clusters since their points would in some cases be
dispersed into multiple squares. But that may be OK

* a lot will depend on how the squares are calculated - linearly dividing
360 degrees by N to get "equal" size squares in degrees would produce
issues with "real" square sizes and counts of points in each of them


So I'm wondering if there is a better way?

Thanks,


  Bojan
Reply | Threaded
Open this post in threaded view
|

Re: Geospatial clustering + zoom in/out help

Bojan Šmid
Hi David,

  I was hoping to get an answer on Geospatial topic from you :). These
links basically confirm that approach I wanted to take should work ok with
similar (or even bigger) amount of data than I plan to have. Instead of my
custom NxM division of world, I'll try existing GeoHash encoding, it may be
good enough (and will be quicker to implement).

  Thanks!

  Bojan


On Fri, Jan 31, 2014 at 8:27 PM, Smiley, David W. <[hidden email]> wrote:

> Hi Bojan.
>
> You've got some good ideas here along the lines of some that others have
> tried.  I've through together a page on the wiki about this subject some
> time ago that I'm sure you will find interesting.  It references a relevant
> stack-overflow post, and also a presentation at DrupalCon which had a
> segment from a guy using the same approach you suggest here involving
> field-collapsing and/or stats components.  The video shows it in action.
>
> http://wiki.apache.org/solr/SpatialClustering
>
> It would be helpful for everyone if you share your experience with
> whatever you choose, once you give an approach a try.
>
> ~ David
> ________________________________________
> From: Bojan Šmid [[hidden email]]
> Sent: Thursday, January 30, 2014 1:15 PM
> To: [hidden email]
> Subject: Geospatial clustering + zoom in/out help
>
> Hi,
>
> I have an index with 300K docs with lat,lon. I need to cluster the docs
> based on lat,lon for display in the UI. The user then needs to be able to
> click on any cluster and zoom in (up to 11 levels deep).
>
> I'm using Solr 4.6 and I'm wondering how best to implement this
> efficiently?
>
> A bit more specific questions below.
>
> I need to:
>
> 1) cluster data points at different zoom levels
>
> 2) click on a specific cluster and zoom in
>
> 3) be able to select a region (bounding box or polygon) and show clusters
> in the selected area
>
> What's the best way to implement this so that queries are fast?
>
> What I thought I would try, but maybe there are better ways:
>
> * divide the world in NxM large squares and then each of these squares into
> 4 more squares, and so on - 11 levels deep
>
> * at index time figure out all squares (at all 11 levels) each data point
> belongs to and index that info into 11 different fields: e.g.
> <id=1 name=foo lat=x lon=y zoom1=square1_62  zoom2=square1_62_47
> zoom3=square1_62_47_33 ....>
>
> * at search time, use field collapsing on zoomX field to get which docs
> belong to which square on particular level
>
> * calculate center point of each square (by calculating mean value of
> positions for all points in that square) using StatsComponent (facet on
> zoomX field, avg on lat and lon fields) - I would consider those squares as
> separate clusters (one square is one cluster) and center points of those
> squares as center points of clusters derived from them
>
> I *think* the problem with this approach is that:
>
> * there will be many unique fields for bigger zoom levels, which means
> field collapsing / StatsComponent maaay not work fast enough
>
> * clusters will not look very natural because I would have many clusters on
> each zoom level and what are "real" geographical clusters would be
> displayed as multiple clusters since their points would in some cases be
> dispersed into multiple squares. But that may be OK
>
> * a lot will depend on how the squares are calculated - linearly dividing
> 360 degrees by N to get "equal" size squares in degrees would produce
> issues with "real" square sizes and counts of points in each of them
>
>
> So I'm wondering if there is a better way?
>
> Thanks,
>
>
>   Bojan
>