Sort by relevance+distance

classic Classic list List threaded Threaded
31 messages Options
12
Reply | Threaded
Open this post in threaded view
|

Sort by relevance+distance

James Huang
Hi,

I can sort the search results by distance now. But,
the relevance is lost.

I like to have the results sorted by relevance +
distance, i.e., relevance first; for results of
similar relevance, order by distance. How to do that?

Thanks a lot in advance!
-James


--- James Huang <[hidden email]> wrote:

> Hi Otis,
>
> Thanks for your answer. I do have LIA (but not with
> me
> now physically), and have the impression that the
> search ordering is predetermined (at index time);
> what
> I want is search-time ordering, e.g.,
>
> "I'm at (x,y) now and low on gas; find me the
> closest
> airports that can land 747, the closest first,
> please".
>
> I'll re-read the book/chapter tonight, but look
> forward to any expert advises.
>
> Thanks,
> -James
>
> --- Otis Gospodnetic <[hidden email]>
> wrote:
>
> > Hi James,
> >
> > Check out the org.apache.lucene.search.package,
> > there are several sort
> > classes that will let you write  a custom sorter.
> > If you have a copy
> > of LIA, look at chapter 6 for an example (
> >
>
http://www.lucenebook.com/search?query=custom+sort+section%3A6*

> > )
> >
> > Otis
> >
> > --- James Huang <[hidden email]> wrote:
> >
> > > Suppose I have a book index with
> > field="publisher", field="title",
> > > etc.
> > > If a user has bought Manning books, then I like
> to
> > sort the result
> > > with Manning books listed first.
> > >  
> > > In essence, I'm asking for a parameterized
> custom
> > sorting. Is there a
> > > way to do this?
> > >  
> > > Thanks,
> > > -James
> > >
> > >
> > >
> > > ---------------------------------
> > > Yahoo! for Good
> > >  Click here to donate to the Hurricane Katrina
> > relief effort.
> >
> >
> >
>
---------------------------------------------------------------------

> > To unsubscribe, e-mail:
> > [hidden email]
> > For additional commands, e-mail:
> > [hidden email]
> >
> >
>
>
>
>
> __________________________________
> Yahoo! Mail - PC Magazine Editors' Choice 2005
> http://mail.yahoo.com
>


__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around
http://mail.yahoo.com 

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Sort by relevance+distance

James Huang
I guess I can use HitCollector and implement my own
sorting, right?

Is there a better approach?

--- James Huang <[hidden email]> wrote:

> Hi,
>
> I can sort the search results by distance now. But,
> the relevance is lost.
>
> I like to have the results sorted by relevance +
> distance, i.e., relevance first; for results of
> similar relevance, order by distance. How to do
> that?
>
> Thanks a lot in advance!
> -James
>
>
> --- James Huang <[hidden email]> wrote:
>
> > Hi Otis,
> >
> > Thanks for your answer. I do have LIA (but not
> with
> > me
> > now physically), and have the impression that the
> > search ordering is predetermined (at index time);
> > what
> > I want is search-time ordering, e.g.,
> >
> > "I'm at (x,y) now and low on gas; find me the
> > closest
> > airports that can land 747, the closest first,
> > please".
> >
> > I'll re-read the book/chapter tonight, but look
> > forward to any expert advises.
> >
> > Thanks,
> > -James
> >
> > --- Otis Gospodnetic <[hidden email]>
> > wrote:
> >
> > > Hi James,
> > >
> > > Check out the org.apache.lucene.search.package,
> > > there are several sort
> > > classes that will let you write  a custom
> sorter.
> > > If you have a copy
> > > of LIA, look at chapter 6 for an example (
> > >
> >
>
http://www.lucenebook.com/search?query=custom+sort+section%3A6*

> > > )
> > >
> > > Otis
> > >
> > > --- James Huang <[hidden email]> wrote:
> > >
> > > > Suppose I have a book index with
> > > field="publisher", field="title",
> > > > etc.
> > > > If a user has bought Manning books, then I
> like
> > to
> > > sort the result
> > > > with Manning books listed first.
> > > >  
> > > > In essence, I'm asking for a parameterized
> > custom
> > > sorting. Is there a
> > > > way to do this?
> > > >  
> > > > Thanks,
> > > > -James
> > > >
> > > >
> > > >
> > > > ---------------------------------
> > > > Yahoo! for Good
> > > >  Click here to donate to the Hurricane Katrina
> > > relief effort.
> > >
> > >
> > >
> >
>
---------------------------------------------------------------------

> > > To unsubscribe, e-mail:
> > > [hidden email]
> > > For additional commands, e-mail:
> > > [hidden email]
> > >
> > >
> >
> >
> >
> >
> > __________________________________
> > Yahoo! Mail - PC Magazine Editors' Choice 2005
> > http://mail.yahoo.com
> >
>
>
> __________________________________________________
> Do You Yahoo!?
> Tired of spam?  Yahoo! Mail has the best spam
> protection around
> http://mail.yahoo.com 
>
>
---------------------------------------------------------------------
> To unsubscribe, e-mail:
> [hidden email]
> For additional commands, e-mail:
> [hidden email]
>
>


__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around
http://mail.yahoo.com 

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Sort by relevance+distance

Erik Hatcher
In reply to this post by James Huang

On Sep 17, 2005, at 4:10 PM, James Huang wrote:

> Hi,
>
> I can sort the search results by distance now. But,
> the relevance is lost.
>
> I like to have the results sorted by relevance +
> distance, i.e., relevance first; for results of
> similar relevance, order by distance. How to do that?

How are you currently sorting?   You can use multiple sort fields  
within a Sort.

     Erik




>
> Thanks a lot in advance!
> -James
>
>
> --- James Huang <[hidden email]> wrote:
>
>
>> Hi Otis,
>>
>> Thanks for your answer. I do have LIA (but not with
>> me
>> now physically), and have the impression that the
>> search ordering is predetermined (at index time);
>> what
>> I want is search-time ordering, e.g.,
>>
>> "I'm at (x,y) now and low on gas; find me the
>> closest
>> airports that can land 747, the closest first,
>> please".
>>
>> I'll re-read the book/chapter tonight, but look
>> forward to any expert advises.
>>
>> Thanks,
>> -James
>>
>> --- Otis Gospodnetic <[hidden email]>
>> wrote:
>>
>>
>>> Hi James,
>>>
>>> Check out the org.apache.lucene.search.package,
>>> there are several sort
>>> classes that will let you write  a custom sorter.
>>> If you have a copy
>>> of LIA, look at chapter 6 for an example (
>>>
>>>
>>
>>
> http://www.lucenebook.com/search?query=custom+sort+section%3A6*
>
>>> )
>>>
>>> Otis
>>>
>>> --- James Huang <[hidden email]> wrote:
>>>
>>>
>>>> Suppose I have a book index with
>>>>
>>> field="publisher", field="title",
>>>
>>>> etc.
>>>> If a user has bought Manning books, then I like
>>>>
>> to
>>
>>> sort the result
>>>
>>>> with Manning books listed first.
>>>>
>>>> In essence, I'm asking for a parameterized
>>>>
>> custom
>>
>>> sorting. Is there a
>>>
>>>> way to do this?
>>>>
>>>> Thanks,
>>>> -James
>>>>
>>>>
>>>>
>>>> ---------------------------------
>>>> Yahoo! for Good
>>>>  Click here to donate to the Hurricane Katrina
>>>>
>>> relief effort.
>>>
>>>
>>>
>>>
>>
>>
> ---------------------------------------------------------------------
>
>>> To unsubscribe, e-mail:
>>> [hidden email]
>>> For additional commands, e-mail:
>>> [hidden email]
>>>
>>>
>>>
>>
>>
>>
>>
>> __________________________________
>> Yahoo! Mail - PC Magazine Editors' Choice 2005
>> http://mail.yahoo.com
>>
>>
>
>
> __________________________________________________
> Do You Yahoo!?
> Tired of spam?  Yahoo! Mail has the best spam protection around
> http://mail.yahoo.com
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Sort by relevance+distance

James Huang
I use a custom collector:

class ResultCollector extends HitCollector
{
  SortedSet set = new TreeSet();
  IndexSearcher searcher;
  Location me;

  ResultCollector(IndexSearcher searcher, Location me)
  {
    this.me = me;
    this.searcher = searcher;
  }

  public void collect(int id, float score) {
    try {
      Document doc = helper.searcher.doc(id);
      String zc = doc.get("zipcode");
      SearchResult sr = new SearchResult(
         score, zc, getDistance(me, zc));
      // The score in SearchResult is adjusted:
      // score *= 1.0 - distance/200.0;
      set.add(sr);
    } catch(Exception e) {
      e.printStackTrace();
    }
  }

  int getResult(int startindex, SearchResult[] result)
  {
    Iterator iter = set.iterator();
    int idx = 0;
    for (int i=0; iter.hasNext() && idx <
result.length; ++i) {
      Object o = iter.next();
      if (i >= startindex)
        result[idx++] = (SearchResult)o;
    }
    return set.size();
  }
}

The SearchResult extends Comparable.
Then, use IndexSearcher.search(qry, collector);

This seems to work. What I wish for is that sorting is
done by the search engine itself, hoping for a better
performance (and cleaner code).

Previously, I have created a DistanceComparatorSource
(similar to that in LIA-ch6); sorting by distance
works but relevance is lost.

-James

--- Erik Hatcher <[hidden email]> wrote:

>
> On Sep 17, 2005, at 4:10 PM, James Huang wrote:
>
> > Hi,
> >
> > I can sort the search results by distance now.
> But,
> > the relevance is lost.
> >
> > I like to have the results sorted by relevance +
> > distance, i.e., relevance first; for results of
> > similar relevance, order by distance. How to do
> that?
>
> How are you currently sorting?   You can use
> multiple sort fields  
> within a Sort.
>
>      Erik
>


               
__________________________________
Yahoo! Mail - PC Magazine Editors' Choice 2005
http://mail.yahoo.com

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Sort by relevance+distance

Erik Hatcher

On Sep 17, 2005, at 7:00 PM, James Huang wrote:

> I use a custom collector:
>
[...]
>
> Then, use IndexSearcher.search(qry, collector);

So what happens if you get 10M results from a search?

> This seems to work. What I wish for is that sorting is
> done by the search engine itself, hoping for a better
> performance (and cleaner code).

And it can be done by Lucene itself...

> Previously, I have created a DistanceComparatorSource
> (similar to that in LIA-ch6); sorting by distance
> works but relevance is lost.

Get back to using your DistanceComparatorSource, and couple that with  
a SortField.FIELD_SCORE, like this:

Sort sort = new Sort(new SortField[] {new SortField("location",
         new DistanceComparatorSource(<whatever args you need>)),  
SortField.FIELD_SCORE});

     Erik

>
> -James
>
> --- Erik Hatcher <[hidden email]> wrote:
>
>
>>
>> On Sep 17, 2005, at 4:10 PM, James Huang wrote:
>>
>>
>>> Hi,
>>>
>>> I can sort the search results by distance now.
>>>
>> But,
>>
>>> the relevance is lost.
>>>
>>> I like to have the results sorted by relevance +
>>> distance, i.e., relevance first; for results of
>>> similar relevance, order by distance. How to do
>>>
>> that?
>>
>> How are you currently sorting?   You can use
>> multiple sort fields
>> within a Sort.
>>
>>      Erik
>>
>>
>
>
>
> __________________________________
> Yahoo! Mail - PC Magazine Editors' Choice 2005
> http://mail.yahoo.com
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Sort by relevance+distance

James Huang
--- Erik Hatcher <[hidden email]> wrote:

> Get back to using your DistanceComparatorSource, and
> couple that with  
> a SortField.FIELD_SCORE, like this:
>
> Sort sort = new Sort(new SortField[] {new
> SortField("location",
>          new DistanceComparatorSource(<whatever args
> you need>)),  
> SortField.FIELD_SCORE});

Thanks!

Does the order of thest two fields matter? I mean,
with your code, would distance take precedence over
relevance? Anyway, I'll try it out and play with
ordering and such.

-James

>
>      Erik
>


__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around
http://mail.yahoo.com 

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Sort by relevance+distance

Erik Hatcher

On Sep 18, 2005, at 10:24 AM, James Huang wrote:

> --- Erik Hatcher <[hidden email]> wrote:
>
>
>> Get back to using your DistanceComparatorSource, and
>> couple that with
>> a SortField.FIELD_SCORE, like this:
>>
>> Sort sort = new Sort(new SortField[] {new
>> SortField("location",
>>          new DistanceComparatorSource(<whatever args
>> you need>)),
>> SortField.FIELD_SCORE});
>>
>
> Thanks!
>
> Does the order of thest two fields matter? I mean,
> with your code, would distance take precedence over
> relevance? Anyway, I'll try it out and play with
> ordering and such.

Yes, order matters - they sort in the order specified.  Subsequent  
SortField's in the list are only used when prior ones are  
equivalent.  In other words, when the distance is equal between two  
documents, then they are sorted by score.

     Erik


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Sort by relevance+distance

James Huang


--- Erik Hatcher <[hidden email]> wrote:

>
> On Sep 18, 2005, at 10:24 AM, James Huang wrote:
>
> > --- Erik Hatcher <[hidden email]>
> wrote:
> >
> >
> >> Get back to using your DistanceComparatorSource,
> and
> >> couple that with
> >> a SortField.FIELD_SCORE, like this:
> >>
> >> Sort sort = new Sort(new SortField[] {new
> >> SortField("location",
> >>          new DistanceComparatorSource(<whatever
> args
> >> you need>)),
> >> SortField.FIELD_SCORE});
> >>
> >
> > Thanks!
> >
> > Does the order of thest two fields matter? I mean,
> > with your code, would distance take precedence
> over
> > relevance? Anyway, I'll try it out and play with
> > ordering and such.
>
> Yes, order matters - they sort in the order
> specified.  Subsequent  
> SortField's in the list are only used when prior
> ones are  
> equivalent.  In other words, when the distance is
> equal between two  
> documents, then they are sorted by score.
>
>      Erik
>

Then this is not what I want -- if I put FIELD_SCORE
first, it'll rarely work because FIELD_SCORE's seldom
are the same, practically leaving distance sorting out
of the picture.

What I want is a "compound" score, i.e., to adjust the
score based on the distance, like this:

  score *= 1.0 - distance/200.0;

This formula seems to work well for my situation. Is
there a way to modify the score during search?

Thanks,

-James

---------------------------------------------------------------------
> To unsubscribe, e-mail:
> [hidden email]
> For additional commands, e-mail:
> [hidden email]
>
>


__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around
http://mail.yahoo.com 

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Sort by relevance+distance

Erik Hatcher

On Sep 18, 2005, at 11:10 AM, James Huang wrote:

>
>
> --- Erik Hatcher <[hidden email]> wrote:
>
>
>>
>> On Sep 18, 2005, at 10:24 AM, James Huang wrote:
>>
>>
>>> --- Erik Hatcher <[hidden email]>
>>>
>> wrote:
>>
>>>
>>>
>>>
>>>> Get back to using your DistanceComparatorSource,
>>>>
>> and
>>
>>>> couple that with
>>>> a SortField.FIELD_SCORE, like this:
>>>>
>>>> Sort sort = new Sort(new SortField[] {new
>>>> SortField("location",
>>>>          new DistanceComparatorSource(<whatever
>>>>
>> args
>>
>>>> you need>)),
>>>> SortField.FIELD_SCORE});
>>>>
>>>>
>>>
>>> Thanks!
>>>
>>> Does the order of thest two fields matter? I mean,
>>> with your code, would distance take precedence
>>>
>> over
>>
>>> relevance? Anyway, I'll try it out and play with
>>> ordering and such.
>>>
>>
>> Yes, order matters - they sort in the order
>> specified.  Subsequent
>> SortField's in the list are only used when prior
>> ones are
>> equivalent.  In other words, when the distance is
>> equal between two
>> documents, then they are sorted by score.
>>
>>      Erik
>>
>>
>
> Then this is not what I want -- if I put FIELD_SCORE
> first, it'll rarely work because FIELD_SCORE's seldom
> are the same, practically leaving distance sorting out
> of the picture.
>
> What I want is a "compound" score, i.e., to adjust the
> score based on the distance, like this:
>
>   score *= 1.0 - distance/200.0;
>
> This formula seems to work well for my situation. Is
> there a way to modify the score during search?

Sounds like you want a new type of Query subclass that weight each  
document by a given distance.  Though I'm curious why just sorting by  
distance isn't sufficient for your situation.  Could you describe a  
bit more about what you're doing?

     Erik


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Sort by relevance+distance

Erik Hatcher
In reply to this post by James Huang
[trimming the post a bit]

On Sep 18, 2005, at 11:51 AM, James Huang wrote:
> The problem is quite generic, I believe. What I like
> to do is similar to LIA-ch6, i.e. to find a "good
> Chinese Hunan-style restaurant near me." I prefer
> Hunan-style; however, if a good Human-style one is 12
> miles, where there is a Shanghai-style only 2 miles, I
> may want to take that instead. So it's not a simple
> multi-sorting problem, it's an empirical ordering and
> the parameters may have to be experimented. Thus far,
> I'm happy with that formula I gave earlier.

The example in LIA was purely a distance sort, not blended as you  
desire.

> Separately, earlier in this thread, you also mentioned
> "what if 10M search results?" -- that's also my
> concern, for both space and time.
>
> 1. Space-wise, the 10M Document's will be dragged into
> memory (in a Hits, say), right?

No, that is not correct, and this is an important point about Lucene  
and it's ability to scale extremely well.  Hits caches up to 200  
documents (I believe) and uses a mechanism to score single documents  
at a time and only keep the top scoring ones.

There is no problem for Lucene to search and have Hits with a massive  
size.

There are memory considerations with sorting, though - these are  
described in detail in the javadocs and a little in LIA.

> 1. How to use a compound scoring at search-time (where
> you suggested a Query-subclass, but what/how?)

I'm going to defer to others to assist with this, or validate that  
this is the right approach in this situation.

> 2. Space concern about large search result set.

With a Query subclass, this shouldn't be a concern.  With sorting  
using Lucene's Sort there are some memory concerns, but less so than  
with your own TreeSet.

> P.S. Feel free to reply to the list, if you think this
> has general appeal and others may benefit.

Done!

     Erik


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Sort by relevance+distance

James Huang
See comments below.

--- Erik Hatcher <[hidden email]> wrote:

> [trimming the post a bit]
>
> On Sep 18, 2005, at 11:51 AM, James Huang wrote:
> > The problem is quite generic, I believe. What I
> like
> > to do is similar to LIA-ch6, i.e. to find a "good
> > Chinese Hunan-style restaurant near me." I prefer
> > Hunan-style; however, if a good Human-style one is
> 12
> > miles, where there is a Shanghai-style only 2
> miles, I
> > may want to take that instead. So it's not a
> simple
> > multi-sorting problem, it's an empirical ordering
> and
> > the parameters may have to be experimented. Thus
> far,
> > I'm happy with that formula I gave earlier.
>
> The example in LIA was purely a distance sort, not
> blended as you  
> desire.
>
> > Separately, earlier in this thread, you also
> mentioned
> > "what if 10M search results?" -- that's also my
> > concern, for both space and time.
> >
> > 1. Space-wise, the 10M Document's will be dragged
> into
> > memory (in a Hits, say), right?
>
> No, that is not correct, and this is an important
> point about Lucene  
> and it's ability to scale extremely well.  Hits
> caches up to 200  
> documents (I believe) and uses a mechanism to score
> single documents  
> at a time and only keep the top scoring ones.
>
> There is no problem for Lucene to search and have
> Hits with a massive size.
>
> There are memory considerations with sorting, though
> - these are  
> described in detail in the javadocs and a little in
> LIA.
>
> > 1. How to use a compound scoring at search-time
> (where
> > you suggested a Query-subclass, but what/how?)
>
> I'm going to defer to others to assist with this, or
> validate that  
> this is the right approach in this situation.
>
> > 2. Space concern about large search result set.
>
> With a Query subclass, this shouldn't be a concern.
> With sorting  
> using Lucene's Sort there are some memory concerns,
> but less so than with your own TreeSet.
>

OK, so external sorting does not scale and has to be
ruled out!

Now I have to find a way to customize the scoring
during search (using Hits, not customized
HitsCollector). Help is desparately needed here!

Thanks in advance,
-James

>
>      Erik
>


__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around
http://mail.yahoo.com 

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Sort by relevance+distance

jrodenburg
trimming the post further:

On 9/18/05, James Huang <[hidden email]> wrote:
>
> >The problem is quite generic, I believe. What I like to do is similar to
> LIA-ch6, i.e. to find a "good Chinese Hunan-style restaurant near me." I
> prefer Hunan-style; however, if a good Human-style one is 12 miles, where
> there is a Shanghai-style only 2 miles, I may want to take that instead. So
> it's not a simple multi-sorting problem, it's an empirical ordering and the
> parameters may have to be experimented. Thus far, I'm happy with that
> formula I gave earlier.


Now I have to find a way to customize the scoring during search (using Hits,
> not customized HitsCollector). Help is desparately needed here!
>

/*************************************************************************************/

The typical approach (from what I know) to implementing an affect on scoring
is to do field boosting. The difficulty in this scenario is the distance
factor, which sounds as if it's determined at run-time, plus the trickiness
of field boosting based on the values of the field. I've looked at this as
well, and it's not a simple problem to solve.

How are you determining if something is "near me"? Is it a calculation at
run-time, i.e. latitude, longitude, and geometric math? What options do you
have to determine distance?
Reply | Threaded
Open this post in threaded view
|

Re: Sort by relevance+distance

James Huang


--- Jeff Rodenburg <[hidden email]> wrote:

> trimming the post further:
>
> On 9/18/05, James Huang <[hidden email]> wrote:
> >
> > >The problem is quite generic, I believe. What I
> like to do is similar to
> > LIA-ch6, i.e. to find a "good Chinese Hunan-style
> restaurant near me." I
> > prefer Hunan-style; however, if a good Human-style
> one is 12 miles, where
> > there is a Shanghai-style only 2 miles, I may want
> to take that instead. So
> > it's not a simple multi-sorting problem, it's an
> empirical ordering and the
> > parameters may have to be experimented. Thus far,
> I'm happy with that
> > formula I gave earlier.
>
>
> Now I have to find a way to customize the scoring
> during search (using Hits,
> > not customized HitsCollector). Help is desparately
> needed here!
> >
>
>
/*************************************************************************************/

>
> The typical approach (from what I know) to
> implementing an affect on scoring
> is to do field boosting. The difficulty in this
> scenario is the distance
> factor, which sounds as if it's determined at
> run-time, plus the trickiness
> of field boosting based on the values of the field.
> I've looked at this as
> well, and it's not a simple problem to solve.
>
> How are you determining if something is "near me"?
> Is it a calculation at
> run-time, i.e. latitude, longitude, and geometric
> math? What options do you
> have to determine distance?
>

Yes, the distance is calculated at runtime, based on
longitude/latitude. Field score boosting doesn't seem
to apply. In fact, there are other dynamic factors in
addition to distance to determine the order of search
result.

So the question is, is there a way to overriding score
calculation at runtime? In the lucene/search package,
I see interfaces like Scorer, Weight and methods like
Query.createWeight(). This looks promising.

-James



               
__________________________________
Yahoo! Mail - PC Magazine Editors' Choice 2005
http://mail.yahoo.com

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Sort by relevance+distance

Erik Hatcher

On Sep 18, 2005, at 3:39 PM, James Huang wrote:
> So the question is, is there a way to overriding score
> calculation at runtime? In the lucene/search package,
> I see interfaces like Scorer, Weight and methods like
> Query.createWeight(). This looks promising.

There are several ways to adjust scoring, but I really think your  
best bet is to create a custom Query subclass (and therefore the  
Weight stuff underneath) to accommodate your needs.  I'm going to  
become an audience member for the rest of this discussion as I  
personally don't have experience creating that sort of thing, but  
look forward to learning more about how it can be done.

     Erik


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Sort by relevance+distance

jrodenburg
I like Erik's suggestion here as a starting point. I would guess you might
find some direction in the Scorer class, but I haven't gone through this in
detail.

Conceptually a sliding weight based on proximity sounds correct...

-- jeff


On Sep 18, 2005, at 3:39 PM, James Huang wrote:
> > So the question is, is there a way to overriding score
> > calculation at runtime? In the lucene/search package,
> > I see interfaces like Scorer, Weight and methods like
> > Query.createWeight(). This looks promising.
>
> There are several ways to adjust scoring, but I really think your
> best bet is to create a custom Query subclass (and therefore the
> Weight stuff underneath) to accommodate your needs.
>
Reply | Threaded
Open this post in threaded view
|

Re: Sort by relevance+distance

Paul Elschot
In reply to this post by Erik Hatcher
On Sep 18, 2005, at 3:39 PM, James Huang wrote:
> So the question is, is there a way to overriding score
> calculation at runtime? In the lucene/search package,
> I see interfaces like Scorer, Weight and methods like
> Query.createWeight(). This looks promising.

You indeed need to override the following things:
- Query (changes for createWeight() (), the query location, the normal Lucene
query)
- Weight (this will probably not do very much in your case, except for passing
things to your scorer)
- Scorer

Iirc, you only need an extra score factor depending on the distance,
where the query contains a location.
That means you need something like a ConjunctionScorer
to combine the scores of the query parts with and without the distance.
The part without the distance is a standard lucene query in your case (iirc).
For the part with the distance, you'll need the actual document locations
(preferably in RAM) and compute the distance based score factor based
on those locations and the query location.

One way to have these document locations in RAM is by using a field cache,
much like the sorting code does.

In case the query also has constraints on the location the query search might
take advantage of that, but that would need more development, for example
starting from RangeQueries and/or filters on the location coordinates.

Regards,
Paul Elschot


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Sort by relevance+distance

mark harwood
I think the HitCollector approach was fine but needed
a couple of changes:
1) use a PriorityQueue subclass in place of the
SortedSet to keep only the top n scoring docs
2) multiply lucene score by a distance measurement
based on the current doc's location (doc location
being read from a cached array of type
Location[reader.maxDoc] )


Cheers
Mark


       
       
               
___________________________________________________________
Yahoo! Messenger - NEW crystal clear PC to PC calling worldwide with voicemail http://uk.messenger.yahoo.com

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Sort by relevance+distance

James Huang
I think this is probably the closest thing I like to/am able to do now. If I ever get to do this, I'll share the idea/code and seek review and suggestions.
 
Thank you very much, Mark, and all others that have helped!
 
-James

mark harwood <[hidden email]> wrote:
I think the HitCollector approach was fine but needed
a couple of changes:
1) use a PriorityQueue subclass in place of the
SortedSet to keep only the top n scoring docs
2) multiply lucene score by a distance measurement
based on the current doc's location (doc location
being read from a cached array of type
Location[reader.maxDoc] )


Cheers
Mark
__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around
http://mail.yahoo.com 
Reply | Threaded
Open this post in threaded view
|

Re: Sort by relevance+distance

jrodenburg
In reply to this post by mark harwood
This is interesting, one I had not considered.
Mark - are there any code samples that implement this approach? Or maybe
something similar in approach?

thanks,
jeff

On 9/19/05, mark harwood <[hidden email]> wrote:

>
> I think the HitCollector approach was fine but needed
> a couple of changes:
> 1) use a PriorityQueue subclass in place of the
> SortedSet to keep only the top n scoring docs
> 2) multiply lucene score by a distance measurement
> based on the current doc's location (doc location
> being read from a cached array of type
> Location[reader.maxDoc] )
>
> Cheers
> Mark
>
> ___________________________________________________________
> Yahoo! Messenger - NEW crystal clear PC to PC calling worldwide with
> voicemail http://uk.messenger.yahoo.com
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>
Reply | Threaded
Open this post in threaded view
|

Re: Sort by relevance+distance

mark harwood
Here's an example I put together to illustrate the point.


package distance;

import java.io.IOException;
import java.util.ArrayList;

import org.apache.lucene.analysis.Analyzer;
import org.apache.lucene.analysis.WhitespaceAnalyzer;
import org.apache.lucene.document.Document;
import org.apache.lucene.document.Field;
import org.apache.lucene.index.IndexReader;
import org.apache.lucene.index.IndexWriter;
import org.apache.lucene.queryParser.ParseException;
import org.apache.lucene.queryParser.QueryParser;
import org.apache.lucene.search.HitCollector;
import org.apache.lucene.search.IndexSearcher;
import org.apache.lucene.search.Query;
import org.apache.lucene.search.ScoreDoc;
import org.apache.lucene.store.RAMDirectory;
import org.apache.lucene.util.PriorityQueue;

public class TestDistance
{

    private static QueryParser parser;
    private static IndexReader reader;
    private static Location[] locsCache;
    private static IndexSearcher searcher;
    /**
     * @param args
     */
    public static void main(String[] args) throws Exception
    {
        Analyzer analyzer=new WhitespaceAnalyzer();
        RAMDirectory dir=new RAMDirectory();
        IndexWriter writer=new IndexWriter(dir,analyzer,true);
        addDoc(writer,"the faraway mouse", 500,500);
        addDoc(writer,"the semilocal cat", 50,50);
        addDoc(writer,"the local dog", 20,20);
        writer.close();
        searcher=new IndexSearcher(dir);
        parser=new QueryParser("description", analyzer);

        //create location cache
        reader = searcher.getIndexReader();
        ArrayList allLocs=new ArrayList();
        int docCount=reader.numDocs();
        for (int i = 0; i < docCount; i++)
        {
            Document doc=reader.document(i);
            allLocs.add(new Location(
                            Float.parseFloat(doc.get("lat")),
                            Float.parseFloat(doc.get("lon"))
                            )
                        );
        }
        locsCache=new Location[reader.numDocs()];
        locsCache= (Location[]) allLocs.toArray(new
Location[allLocs.size()]);
       
        //example search 1
        runSearch("the cat");
       
        runSearch("the dog");
       
        runSearch("the mouse");
       
       
    }
   
    private static void runSearch(String queryString) throws
ParseException, IOException
    {
        System.out.println("query:"+queryString);
        Query query=parser.parse(queryString);
        Location queryLocation=new Location(1f,1f);
        RelevanceAndDistanceCollector collector=new
RelevanceAndDistanceCollector(10,
                    queryLocation,locsCache);
        searcher.search(query,collector);
        ScoreDoc[] results = collector.getMatches();
        for (int i = 0; i < results.length; i++)
        {
            Document doc=reader.document(results[i].doc);
            System.out.print("["+results[i].doc+"]");
            System.out.print("("+results[i].score+")");
            System.out.println("\t"+doc.get("description"));
        }
        System.out.println("");
    }


    public static void addDoc(IndexWriter writer,String description,
float lat, float lon) throws IOException
    {
        Document doc=new Document();
        doc.add(Field.UnIndexed("lat", ""+lat));
        doc.add(Field.UnIndexed("lon", ""+lon));
        doc.add(Field.Text("description",description));
        writer.addDocument(doc);      
    }
    static class Location
    {
        float lat;
        float lon;
        public Location(float lat, float lon)
        {
            this.lat=lat;
            this.lon=lon;
        }
        public float distance(Location loc)
        {
            float latDiff = Math.abs(loc.lat-lat);
            float lonDiff = Math.abs(loc.lon-lon);
            float dist=(float)
Math.sqrt((latDiff*latDiff)+(lonDiff*lonDiff));
            return dist;
        }
       
    }
    static class RelevanceAndDistanceCollector extends HitCollector
    {
        HitQueue hq;
        Location queryLocation;
        float maxDistance=5000;
        private Location[] docLocs;
       
        public RelevanceAndDistanceCollector(int numDocs, Location
queryLocation, Location[] docLocs)
        {
            this.queryLocation=queryLocation;
            this.docLocs=docLocs;
            hq=new HitQueue(numDocs);
        }
        public void collect(int doc, float score)
        {
            score=score*(maxDistance-queryLocation.distance(docLocs[doc]));
            hq.insert(new ScoreDoc(doc,score));          
        }  
        public ScoreDoc[] getMatches()
        {
            ScoreDoc sd[]=new ScoreDoc[hq.size()];
            while(hq.size()>0)
            {
                sd[hq.size()-1]=(ScoreDoc) hq.pop();
            }
            return sd;
        }
    }
    static  class HitQueue extends PriorityQueue {
          public HitQueue(int size) {
            initialize(size);
          }
          public final boolean lessThan(Object a, Object b) {
            ScoreDoc hitA = (ScoreDoc)a;
            ScoreDoc hitB = (ScoreDoc)b;
            if (hitA.score == hitB.score)
              return hitA.doc > hitB.doc;
            else
              return hitA.score < hitB.score;
          }
        }  

}



               
___________________________________________________________
How much free photo storage do you get? Store your holiday
snaps for FREE with Yahoo! Photos http://uk.photos.yahoo.com

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

12