modeling prices based on daterange using multipoints

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

modeling prices based on daterange using multipoints

britske
HI all,

Based on some good discussion in
Modeling openinghours using multipoints I was triggered to have a review of an old painpoint of mine: modeling pricing & availability of hotels which depends on a couple of factors including, date or arrival, length of stay & roomtype.

This question is to see if it would be possible to model the above using multipoints (os ome other technique I'm not aware of that's been come into existence in Lucene / Solr in the last 2 years or so.

Let me explain: Hotels (in my implementation) have pricing & availability based on: date, duration, nr of persons, roomtype (e.g.: single, double, twin, triple, family). Instead of modeling these as separate documents, currently I model 1 doc per hotel where each <date*duration*persons*roomtype> combo has each own price and is modeled as a separate field:  (configured in backend as dynamic fields: ddp-*).. Non-availability is just modeled as the absence of the particular field.

The advantage of modeling 1 doc per hotel is clear: users have no chance of seeing multiple offers per hotel in the frontend. It's just how they have become accustomed to these type of travel/ hotel searchengines.

Now there's also a big diadvantage of my current setup: Lucene/Solr just isn't really build for having 20.000+ fields on which can be sorted and filtered on. (Could go into this, but it's not really the point of this question)

I realize the new spatial-stuff in Solr 4 is no magic bullet, but I'm wondering if I could model multiple prices per day as multipoints, whereas:

 - date*duration*nr of persons*roomtype is modeled as point.x (discretized in some 20.000 values)
 - price modeled as point.y ( in dollarcents / normalized as avg price per day: range:  [0,200000] covering a max price of $2.000/day)

The stuff that needs to be possible:
 A) 1 required filter on point.x (filtering a 1 particular <date*duration*nr of persons* roomtype> combo.
 B) an optional range query on point.y (min and./or max price filter)
 C) optional soring on point.y (sorting on price (normal or reverse))

I'm pretty certain A) and B) won't be a problem as far is functionality is concerned, but how about performance? I.e: would some sort of cached Solr filter jump in for a given <date*duration*nr of persons* roomtype> combo, for quick doc-interesection, just as would with multiple dynamic fields in my desribed as-is-case?

How about C)? Is sorting on point.y possible? (potenially in conjunction with other sorting-fields used as tiebreaker, to give a stable sort? I remember to have read that any filterquery can be used for sorting combined with multipoints (which would make the above work I guess) but just would like to confirm.

Looking forward to your feedback,

Best,
Geert-Jan




Reply | Threaded
Open this post in threaded view
|

Re: modeling prices based on daterange using multipoints

David Smiley
Hi Britske,
  This is a very interesting question!

britske wrote
...
I realize the new spatial-stuff in Solr 4 is no magic bullet, but I'm wondering if I could model multiple prices per day as multipoints, whereas:

 - date*duration*nr of persons*roomtype is modeled as point.x (discretized in some 20.000 values)
 - price modeled as point.y ( in dollarcents / normalized as avg price per day: range:  [0,200000] covering a max price of $2.000/day)

The stuff that needs to be possible:
 A) 1 required filter on point.x (filtering a 1 particular <date*duration*nr of persons* roomtype> combo.
 B) an optional range query on point.y (min and./or max price filter)
 C) optional soring on point.y (sorting on price (normal or reverse))

I'm pretty certain A) and B) won't be a problem as far is functionality is concerned, but how about performance? I.e: would some sort of cached Solr filter jump in for a given <date*duration*nr of persons* roomtype> combo, for quick doc-interesection, just as would with multiple dynamic fields in my desribed as-is-case?
A & B are indeed not a problem and there are no special caches / memory requirements inherent in this.

britske wrote
How about C)? Is sorting on point.y possible? (potenially in conjunction with other sorting-fields used as tiebreaker, to give a stable sort? I remember to have read that any filterquery can be used for sorting combined with multipoints (which would make the above work I guess) but just would like to confirm.
...
'C' (sorting) is the challenge.  As it stands, you will have to implement a variation of this class:  http://svn.apache.org/viewvc/lucene/dev/branches/branch_4x/lucene/spatial/src/java/org/apache/lucene/spatial/util/ShapeFieldCacheDistanceValueSource.java?view=markup  Unlike this implementation, your implementation should  ensure the point is indeed in the query shape, and it should be configured to take the smallest or largest 'y' as desired.  Note that the cache infrastructure that this is built on is flakey right now -- a memory hog in multiple ways.  There will be a Point implementation in memory for all of your indexed points, and an ArrayList per doc.  And it's not NRT search friendly, and doesn't relinquish its resources (i.e. on commit) as quickly as it should.  I know what it's problems are but I have been quite busy.  

~ David
Reply | Threaded
Open this post in threaded view
|

Re: modeling prices based on daterange using multipoints

britske
Hi David, 

Yeah interesting (as well as problematic as far is implementing) use-case indeed :)

1. You mention "there are no special caches / memory requirements inherent in this.". For a given user-query this would mean all hotels would have to seach for all point.x each time right? What would be a good plugin-point to build in some custom cached filter code for this (perhaps using the Solr Filter cache)? As I see it, determining all hotels that have a particular point.x value is probably: A) pretty costly to do on each user query. B). is static and can be cached easily without a lot of memory (relatively speaking) i.e: 20.000 filters (representing all of the 20.000 different point.x, that is, <date,duration,nr persons, roomtype> combos) with a bitset per filter  representing ids of hotels that have the said point.x. 

2. I'm not sure I explained C. (sorting) well, since I believe you're talking about implementing custom code to sort multiple point.y's per hotel, correct?. That's not what I need. Instead, for every user-query at most 1 point ever matches. I.e: a hotel has a price for a particular <date,duration,nrpersons,roomtype>-combo (P.x) or it hasn't. 

Say a user queries for the <date,duration,nrpersons,roomtype>-combo: <21 dec 2012,3 days,2 persons, double>. This might be encoded into a value, say: 12345. 
Now, for the hotels that do match that query (i.e: those hotels that have a point P for which P.x=12345) I want to sort those hotels on P.y (the price for the requested P.x)

Geert-Jan




2012/12/11 David Smiley (@MITRE.org) [via Lucene] <[hidden email]>
Hi Britske,
  This is a very interesting question!

britske wrote
...
I realize the new spatial-stuff in Solr 4 is no magic bullet, but I'm wondering if I could model multiple prices per day as multipoints, whereas:

 - date*duration*nr of persons*roomtype is modeled as point.x (discretized in some 20.000 values)
 - price modeled as point.y ( in dollarcents / normalized as avg price per day: range:  [0,200000] covering a max price of $2.000/day)

The stuff that needs to be possible:
 A) 1 required filter on point.x (filtering a 1 particular <date*duration*nr of persons* roomtype> combo.
 B) an optional range query on point.y (min and./or max price filter)
 C) optional soring on point.y (sorting on price (normal or reverse))

I'm pretty certain A) and B) won't be a problem as far is functionality is concerned, but how about performance? I.e: would some sort of cached Solr filter jump in for a given <date*duration*nr of persons* roomtype> combo, for quick doc-interesection, just as would with multiple dynamic fields in my desribed as-is-case?
A & B are indeed not a problem and there are no special caches / memory requirements inherent in this.

britske wrote
How about C)? Is sorting on point.y possible? (potenially in conjunction with other sorting-fields used as tiebreaker, to give a stable sort? I remember to have read that any filterquery can be used for sorting combined with multipoints (which would make the above work I guess) but just would like to confirm.
...
'C' (sorting) is the challenge.  As it stands, you will have to implement a variation of this class:  http://svn.apache.org/viewvc/lucene/dev/branches/branch_4x/lucene/spatial/src/java/org/apache/lucene/spatial/util/ShapeFieldCacheDistanceValueSource.java?view=markup  Unlike this implementation, your implementation should  ensure the point is indeed in the query shape, and it should be configured to take the smallest or largest 'y' as desired.  Note that the cache infrastructure that this is built on is flakey right now -- a memory hog in multiple ways.  There will be a Point implementation in memory for all of your indexed points, and an ArrayList per doc.  And it's not NRT search friendly, and doesn't relinquish its resources (i.e. on commit) as quickly as it should.  I know what it's problems are but I have been quite busy.  

~ David


If you reply to this email, your message will be added to the discussion below:
http://lucene.472066.n3.nabble.com/modeling-prices-based-on-daterange-using-multipoints-tp4026011p4026151.html
To unsubscribe from modeling prices based on daterange using multipoints, click here.
NAML

Reply | Threaded
Open this post in threaded view
|

Re: modeling prices based on daterange using multipoints

David Smiley
britske wrote
Hi David,

Yeah interesting (as well as problematic as far is implementing) use-case
indeed :)

1. You mention "there are no special caches / memory requirements inherent
in this.". For a given user-query this would mean all hotels would have to
seach for all point.x each time right? What would be a good plugin-point to
build in some custom cached filter code for this (perhaps using the Solr
Filter cache)? As I see it, determining all hotels that have a particular
point.x value is probably: A) pretty costly to do on each user query. B).
is static and can be cached easily without a lot of memory (relatively
speaking) i.e: 20.000 filters (representing all of the 20.000 different
point.x, that is, <date,duration,nr persons, roomtype> combos) with a
bitset per filter  representing ids of hotels that have the said point.x.
I think you're over-thinking the complexity of this query.  I bet it's faster than you think and even then putting this in a filter query 'fq' is going to be cached by Solr any way, making it lightning fast at subsequent queries.

britske wrote
2. I'm not sure I explained C. (sorting) well, since I believe you're
talking about implementing custom code to sort multiple point.y's per
hotel, correct?. That's not what I need. Instead, for every user-query at
most 1 point ever matches. I.e: a hotel has a price for a particular <date,
duration,nrpersons,roomtype>-combo (P.x) or it hasn't.

Say a user queries for the <date,duration,nrpersons,roomtype>-combo: <21
dec 2012,3 days,2 persons, double>. This might be encoded into a value,
say: 12345.
Now, for the hotels that do match that query (i.e: those hotels that have a
point P for which P.x=12345) I want to sort those hotels on P.y (the price
for the requested P.x)
Ah; ok.  But still, my first suggestion is still what I think you could do except that the algorithm is simpler -- return the first matching 'y' in the document where the point matches the query.  Alternatively, if you're confident the number of matching documents (hotels) is going to be small-ish, say less than a couple hundred, then you could simply sort it client-side.  You'd have to get back all the values, or maybe write a DocTransformer to find the specific one.

~ David
Reply | Threaded
Open this post in threaded view
|

Re: modeling prices based on daterange using multipoints

britske
2012/12/12 David Smiley (@MITRE.org) <[hidden email]>

> britske wrote
> > Hi David,
> >
> > Yeah interesting (as well as problematic as far is implementing) use-case
> > indeed :)
> >
> > 1. You mention "there are no special caches / memory requirements
> inherent
> > in this.". For a given user-query this would mean all hotels would have
> to
> > seach for all point.x each time right? What would be a good plugin-point
> > to
> > build in some custom cached filter code for this (perhaps using the Solr
> > Filter cache)? As I see it, determining all hotels that have a particular
> > point.x value is probably: A) pretty costly to do on each user query. B).
> > is static and can be cached easily without a lot of memory (relatively
> > speaking) i.e: 20.000 filters (representing all of the 20.000 different
> > point.x, that is, &lt;date,duration,nr persons, roomtype&gt; combos) with
> > a
> > bitset per filter  representing ids of hotels that have the said point.x.
>
> I think you're over-thinking the complexity of this query.  I bet it's
> faster than you think and even then putting this in a filter query 'fq' is
> going to be cached by Solr any way, making it lightning fast at subsequent
> queries.
>
>
Ah! Didn't realize such a spatial query could be dropped in a FQ. Nice,
that solves this part indeed.


>  britske wrote
> > 2. I'm not sure I explained C. (sorting) well, since I believe you're
> > talking about implementing custom code to sort multiple point.y's per
> > hotel, correct?. That's not what I need. Instead, for every user-query at
> > most 1 point ever matches. I.e: a hotel has a price for a particular
> > &lt;date,
> > duration,nrpersons,roomtype&gt;-combo (P.x) or it hasn't.
> >
> > Say a user queries for the
> &lt;date,duration,nrpersons,roomtype&gt;-combo:
> > <21
> > dec 2012,3 days,2 persons, double>. This might be encoded into a value,
> > say: 12345.
> > Now, for the hotels that do match that query (i.e: those hotels that have
> > a
> > point P for which P.x=12345) I want to sort those hotels on P.y (the
> price
> > for the requested P.x)
>
> Ah; ok.  But still, my first suggestion is still what I think you could do
> except that the algorithm is simpler -- return the first matching 'y' in
> the
> document where the point matches the query.  Alternatively, if you're
> confident the number of matching documents (hotels) is going to be
> small-ish, say less than a couple hundred, then you could simply sort it
> client-side.  You'd have to get back all the values, or maybe write a
> DocTransformer to find the specific one.
>
> ~ David
>
>
Writing something similar to ShapeFieldCacheDistanceValueSource, being a
valueSource, would enable me to expose it by name to the frontend?
What I'm saying is: let's say I want to call this implementation
'pricesort' and chain it with other sorts, like: 'sort=pricesort asc,
popularity desc, name asc'. Or use it by name in a functionquery. That
would be possible right?

Geert-Jan


>
> -----
>  Author:
> http://www.packtpub.com/apache-solr-3-enterprise-search-server/book
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/modeling-prices-based-on-daterange-using-multipoints-tp4026011p4026256.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
Reply | Threaded
Open this post in threaded view
|

Re: modeling prices based on daterange using multipoints

David Smiley
britske wrote
> Ah; ok.  But still, my first suggestion is still what I think you could do
> except that the algorithm is simpler -- return the first matching 'y' in
> the
> document where the point matches the query.  Alternatively, if you're
> confident the number of matching documents (hotels) is going to be
> small-ish, say less than a couple hundred, then you could simply sort it
> client-side.  You'd have to get back all the values, or maybe write a
> DocTransformer to find the specific one.
>
> ~ David
>
>
Writing something similar to ShapeFieldCacheDistanceValueSource, being a
valueSource, would enable me to expose it by name to the frontend?
What I'm saying is: let's say I want to call this implementation
'pricesort' and chain it with other sorts, like: 'sort=pricesort asc,
popularity desc, name asc'. Or use it by name in a functionquery. That
would be possible right?

Geert-Jan
It wouldn't quite work this way.  The Solr adapters to Lucene spatial can't simply have a field expose a ValueSource because it needs to be configured with the search parameters (e.g. the query center point).  See: http://wiki.apache.org/solr/SolrAdaptersForLuceneSpatial4#Sorting_and_Relevancy
and in particular the sort=query(...) part.   The wiki shows 2 ways, this way and the other way when q= the spatial query then you simply do score sorting.

~ David