[jira] [Commented] (SOLR-2155) Geospatial search using geohash prefixes

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

[jira] [Commented] (SOLR-2155) Geospatial search using geohash prefixes

ASF GitHub Bot (Jira)

    [ https://issues.apache.org/jira/browse/SOLR-2155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13114839#comment-13114839 ]

geert-jan brits commented on SOLR-2155:
---------------------------------------

David,

I try not to swamp this discussion, but I have a totally different issue for which I might misuse this patch / LSP.

It's about pois having multiple openinghours (depending on day of week, special festivitydays, and sometimes even multiple timeslots per day)
I want to query, for example, all pois that are open NOW, and that will remain open until NOW+3H.

For background see: http://lucene.472066.n3.nabble.com/multiple-dateranges-timeslots-per-doc-modeling-openinghours-td3368790.html on why all normal approaches don't work (afaik): basically it's about needing multiple opening/closing times and having them be pairwise related.

I have the feeling that opening/closing datetimes might be modelled as multiple lat/long points. But I would need a query of the form:

Given a user defined point x, return all docs that have a point p defined for which:
 - x.latitude > p.latitude
 - x.longitude < p.longitude

Is this possible? (As far as I see GeoFilt, BBox, GeoDist don't provide me with what I need)

Basically this is how I envision encoding it:
 - each <open,closedelta)>-tuple is represented as a (lat/long)point
 - open is matched on latitude
 - closedelta (closedelta is represented as delta from open) is matched on longitude
 - granularity is 5 minutes
    - open can be a max of 100 days in future -> ~30.000 distinct values.
    - closedelta can be at most 24 hours -> ~300 distinct values

The above lat/long query applied to the domain would become:
Given a user defined open/closedelta-datetime x, return all docs that have a open/close-datetime p defined for which:
 - x.open > p.open (poi is already open at requested opening time)
 - x.closedelta  < p.closedelta (poi is not yet closed on the requested closing time)

In other words, the poi is open from the requested open-datetime until at least the requested close-datetime.

Ok, good exercise in writing this down, the question remains is this query possible (perhaps with some coding-efforts)?

Thanks,
Geert-Jan  
 



> Geospatial search using geohash prefixes
> ----------------------------------------
>
>                 Key: SOLR-2155
>                 URL: https://issues.apache.org/jira/browse/SOLR-2155
>             Project: Solr
>          Issue Type: Improvement
>            Reporter: David Smiley
>            Assignee: Grant Ingersoll
>         Attachments: GeoHashPrefixFilter.patch, GeoHashPrefixFilter.patch, GeoHashPrefixFilter.patch, SOLR-2155_GeoHashPrefixFilter_with_sorting_no_poly.patch, SOLR.2155.p3.patch, SOLR.2155.p3tests.patch
>
>
> There currently isn't a solution in Solr for doing geospatial filtering on documents that have a variable number of points.  This scenario occurs when there is location extraction (i.e. via a "gazateer") occurring on free text.  None, one, or many geospatial locations might be extracted from any given document and users want to limit their search results to those occurring in a user-specified area.
> I've implemented this by furthering the GeoHash based work in Lucene/Solr with a geohash prefix based filter.  A geohash refers to a lat-lon box on the earth.  Each successive character added further subdivides the box into a 4x8 (or 8x4 depending on the even/odd length of the geohash) grid.  The first step in this scheme is figuring out which geohash grid squares cover the user's search query.  I've added various extra methods to GeoHashUtils (and added tests) to assist in this purpose.  The next step is an actual Lucene Filter, GeoHashPrefixFilter, that uses these geohash prefixes in TermsEnum.seek() to skip to relevant grid squares in the index.  Once a matching geohash grid is found, the points therein are compared against the user's query to see if it matches.  I created an abstraction GeoShape extended by subclasses named PointDistance... and CartesianBox.... to support different queried shapes so that the filter need not care about these details.
> This work was presented at LuceneRevolution in Boston on October 8th.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Opening hours search

Jan Høydahl / Cominvent
Hi Geert-Jan

I think this discussion is better suited in dev-list than in an issue regarding Geo-spatial, so I'm continuing the discussion in this new thread.

Geospatial LatLonType is implemented as a PolyField http://wiki.apache.org/solr/SchemaXml#Poly_Field_Types
Another example is MoneyFieldType in SOLR-2202, indexing both a amount value and currency type together.

You could probably use Poly Fields for opening hours pairs as well. Imagine a TimeRangeFieldType which holds a from-to value between 00:00 and 23:59. And then feed a document with from-to pairs:
  <field name="open-mo">0900,1600</field>
  <field name="open-tu">0800,1200</field>
  <field name="open-tu">1400,1900</field>
  <field name="open-we">0900,1600</field>


Have no practical experience in programming FieldTypes but I imagine it could be done this way, and then be able to query for e.g. open-tu:[1500 TO 1800] to find shops open on tuesday from 1500 and three hours..

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com
Solr Training - www.solrtraining.com

> David,
>
> I try not to swamp this discussion, but I have a totally different issue for which I might misuse this patch / LSP.
>
> It's about pois having multiple openinghours (depending on day of week, special festivitydays, and sometimes even multiple timeslots per day)
> I want to query, for example, all pois that are open NOW, and that will remain open until NOW+3H.
>
> For background see: http://lucene.472066.n3.nabble.com/multiple-dateranges-timeslots-per-doc-modeling-openinghours-td3368790.html on why all normal approaches don't work (afaik): basically it's about needing multiple opening/closing times and having them be pairwise related.
>
> I have the feeling that opening/closing datetimes might be modelled as multiple lat/long points. But I would need a query of the form:
>
> Given a user defined point x, return all docs that have a point p defined for which:
> - x.latitude > p.latitude
> - x.longitude < p.longitude
>
> Is this possible? (As far as I see GeoFilt, BBox, GeoDist don't provide me with what I need)
>
> Basically this is how I envision encoding it:
> - each <open,closedelta)>-tuple is represented as a (lat/long)point
> - open is matched on latitude
> - closedelta (closedelta is represented as delta from open) is matched on longitude
> - granularity is 5 minutes
>    - open can be a max of 100 days in future -> ~30.000 distinct values.
>    - closedelta can be at most 24 hours -> ~300 distinct values
>
> The above lat/long query applied to the domain would become:
> Given a user defined open/closedelta-datetime x, return all docs that have a open/close-datetime p defined for which:
> - x.open > p.open (poi is already open at requested opening time)
> - x.closedelta  < p.closedelta (poi is not yet closed on the requested closing time)
>
> In other words, the poi is open from the requested open-datetime until at least the requested close-datetime.
>
> Ok, good exercise in writing this down, the question remains is this query possible (perhaps with some coding-efforts)?
>
> Thanks,
> Geert-Jan  
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Opening hours search

britske
Hi Jan, 

Thanks, you're probably right this question sits better in dev.

I didn't know of Poly Fieldtypes, sounds interesting. 
Ideally I want to have a multivalued Poly FieldType: "openinghours", which contains tuples of <DateTime,DateTime> representing <openingdateTime, closingdateTime>
but from a pretty recent comment from Grant Ingersoll in http://lucene.grantingersoll.com/2009/12/24/complex-fields-aka-poly-fields-in-apache-solr/ this doesn't seem to be supported. 

I believe this means that your suggestion of representing the data as follows, would not give good reliable results either, since the field 'open-tu' is multivalued?

> <field name="open-mo">0900,1600</field>
> <field name="open-tu">0800,1200</field>
> <field name="open-tu">1400,1900</field>
> <field name="open-we">0900,1600</field>

Moreover, I like to represent openinghours on the more detailed Date-level, because otherwise I can't incorporate special openinghours on special dates like national festivities, etc. which may be different from the usual dayOfWeek-hours. 

Forgetting the multivalued-roadblock for a second, it seems from http://lucene.grantingersoll.com/2009/12/24/complex-fields-aka-poly-fields-in-apache-solr/ that rangequeries on Poly FieldTypes seem to work differently than what I need. 
For sake of argument lets say a multivalued field 'openinghours' exists. Then I need to do range-queries of the form like you suggest,  (values are not real datetimes, but you get the point): 

openinghours[open TO close], e.g: 
openinghours:[20110822:2200 TO 20110823:0400] 

But instead it 'only' seems possible to use range-queries, like: 
openinghours[openA,closeA TO openB,closeB], which is not what I need, unless: 

hmm just thinking about this, would this work? : 
openinghours[20110822:2200,* TO *,20110823:0400]

Trying to wrap my head on this one. 
 - In short, Poly FieldTypes seems to get close, but the combi Poly FieldTypes and multivalued isn't implemented afaik. Any patches in the making on this?  Would it be trivial?
 - would querying work (forgetting the multivalued problem for a moment) 
 
Back to Solr2155 and  Lucene Spatial Playground for a second. 
Here multivalued (lat/long)points are implemented. So this means a special case of multivalued Poly FieldTypes is implemented. 
Which would mean that the following would work IFF the querying part works: 

openinghours[latitude,* TO *,longitude]
where 'latitude' is open encoded as latitude, and close is encoded as 'longitude'. 
Correct? 

Sorry for the rambling / braindump style post. 
I do appreciate some pointers if I'm barking up the right tree or not though. 

Thanks, 
Geert-Jan


Op 26 september 2011 21:52 schreef Jan Høydahl <[hidden email]> het volgende:
Hi Geert-Jan
 
I think this discussion is better suited in dev-list than in an issue regarding Geo-spatial, so I'm continuing the discussion in this new thread.

Geospatial LatLonType is implemented as a PolyField http://wiki.apache.org/solr/SchemaXml#Poly_Field_Types
Another example is MoneyFieldType in SOLR-2202, indexing both a amount value and currency type together.

You could probably use Poly Fields for opening hours pairs as well. Imagine a TimeRangeFieldType which holds a from-to value between 00:00 and 23:59. And then feed a document with from-to pairs:
 <field name="open-mo">0900,1600</field>
 <field name="open-tu">0800,1200</field>
 <field name="open-tu">1400,1900</field>
 <field name="open-we">0900,1600</field>


Have no practical experience in programming FieldTypes but I imagine it could be done this way, and then be able to query for e.g. open-tu:[1500 TO 1800] to find shops open on tuesday from 1500 and three hours..

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com
Solr Training - www.solrtraining.com

> David,
>
> I try not to swamp this discussion, but I have a totally different issue for which I might misuse this patch / LSP.
>
> It's about pois having multiple openinghours (depending on day of week, special festivitydays, and sometimes even multiple timeslots per day)
> I want to query, for example, all pois that are open NOW, and that will remain open until NOW+3H.
>
> For background see: http://lucene.472066.n3.nabble.com/multiple-dateranges-timeslots-per-doc-modeling-openinghours-td3368790.html on why all normal approaches don't work (afaik): basically it's about needing multiple opening/closing times and having them be pairwise related.
>
> I have the feeling that opening/closing datetimes might be modelled as multiple lat/long points. But I would need a query of the form:
>
> Given a user defined point x, return all docs that have a point p defined for which:
> - x.latitude > p.latitude
> - x.longitude < p.longitude
>
> Is this possible? (As far as I see GeoFilt, BBox, GeoDist don't provide me with what I need)
>
> Basically this is how I envision encoding it:
> - each <open,closedelta)>-tuple is represented as a (lat/long)point
> - open is matched on latitude
> - closedelta (closedelta is represented as delta from open) is matched on longitude
> - granularity is 5 minutes
>    - open can be a max of 100 days in future -> ~30.000 distinct values.
>    - closedelta can be at most 24 hours -> ~300 distinct values
>
> The above lat/long query applied to the domain would become:
> Given a user defined open/closedelta-datetime x, return all docs that have a open/close-datetime p defined for which:
> - x.open > p.open (poi is already open at requested opening time)
> - x.closedelta  < p.closedelta (poi is not yet closed on the requested closing time)
>
> In other words, the poi is open from the requested open-datetime until at least the requested close-datetime.
>
> Ok, good exercise in writing this down, the question remains is this query possible (perhaps with some coding-efforts)?
>
> Thanks,
> Geert-Jan
>
>