geographical searches

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

geographical searches

Guillermo Payet
Hi,

I started implementing geographical searches yesterday, using BBN's QuadTree
implementaion as the spacial index.  I first implemented a new "GeoFilter"
class to filter queries to "all items within a rectangle".  That was pretty
easy and it's now working beautifuly, and very fast too.  See below force
source code.

BTW:  I'm creating the QuadTree in memory right now during Lucene index
creation, but not storing it in the disk yet.  I'll add something like
a couple of "GeoIndexReader" and "GeoIndexWriter" classes later.

I'm now having a hell of a time figuring out how to implement a "GeoQuery"
class though.  Just figuring out how the whole Query mechanism works
by reading the source code is proving to be quite a challenge.

Question:  Is there any article or document that explains this?  Also:
Any tips as to what the right approach would be here?

    --G


----------------------------------------------------------------------------
package com.oceangroup.projects.localharvest.search;

import java.util.BitSet;
import java.util.Vector;
import java.io.IOException;

import org.apache.lucene.search.*;
import org.apache.lucene.index.IndexReader;

import com.oceangroup.servlets.gis.LatLonRect;

import com.bbn.openmap.util.quadtree.QuadTree;

/**
 * A Filter that restricts search results to a geographical area
 *
 */
public class GeoFilter extends Filter {

    QuadTree    qTree;
    LatLonRect  rect;

    public GeoFilter(QuadTree quadTree, LatLonRect rect) {
        this.qTree = quadTree;
        this.rect = rect;
    }

    public BitSet bits(IndexReader reader) throws IOException {
        BitSet bits = new BitSet(reader.maxDoc());
        Vector<Integer> results = qTree.get((float)rect.urLat,(float)rect.llLon,(float)rect.llLat,(float)rect.urLon);

        if (results == null || results.size()==0) {
            return bits;
        }

        for (Integer item: results) {
            bits.set(item.intValue());
        }

        return bits;
    }
}


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: geographical searches

Guillermo Payet
A thought....

I don't really need any geographical scoring.  I just need to be able
to show all items within a region.  Since a lot of the complexity of
"Query" has to do with scoring, would it be better to just use the
GeoFilter, and to search for *?   Are there any performance issues
with this?

    --G



On Sun, May 01, 2005 at 01:14:25PM -0700, Guillermo Payet wrote:

> Hi,
>
> I started implementing geographical searches yesterday, using BBN's QuadTree
> implementaion as the spacial index.  I first implemented a new "GeoFilter"
> class to filter queries to "all items within a rectangle".  That was pretty
> easy and it's now working beautifuly, and very fast too.  See below force
> source code.
>
> BTW:  I'm creating the QuadTree in memory right now during Lucene index
> creation, but not storing it in the disk yet.  I'll add something like
> a couple of "GeoIndexReader" and "GeoIndexWriter" classes later.
>
> I'm now having a hell of a time figuring out how to implement a "GeoQuery"
> class though.  Just figuring out how the whole Query mechanism works
> by reading the source code is proving to be quite a challenge.
>
> Question:  Is there any article or document that explains this?  Also:
> Any tips as to what the right approach would be here?
>
>     --G
>
>
> ----------------------------------------------------------------------------
> package com.oceangroup.projects.localharvest.search;
>
> import java.util.BitSet;
> import java.util.Vector;
> import java.io.IOException;
>
> import org.apache.lucene.search.*;
> import org.apache.lucene.index.IndexReader;
>
> import com.oceangroup.servlets.gis.LatLonRect;
>
> import com.bbn.openmap.util.quadtree.QuadTree;
>
> /**
>  * A Filter that restricts search results to a geographical area
>  *
>  */
> public class GeoFilter extends Filter {
>
>     QuadTree    qTree;
>     LatLonRect  rect;
>
>     public GeoFilter(QuadTree quadTree, LatLonRect rect) {
>         this.qTree = quadTree;
>         this.rect = rect;
>     }
>
>     public BitSet bits(IndexReader reader) throws IOException {
>         BitSet bits = new BitSet(reader.maxDoc());
>         Vector<Integer> results = qTree.get((float)rect.urLat,(float)rect.llLon,(float)rect.llLat,(float)rect.urLon);
>
>         if (results == null || results.size()==0) {
>             return bits;
>         }
>
>         for (Integer item: results) {
>             bits.set(item.intValue());
>         }
>
>         return bits;
>     }
> }
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>

--
Guillermo Payet
L O C A L  H A R V E S T
http://www.localharvest.org

Every Morning I awake torn between a desire to save the world and
an inclination to savor it.  This makes it hard to plan the day.

                                                       -E.B.White


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: geographical searches

Guillermo Payet
In reply to this post by Guillermo Payet
I guess the solution is to forget about using IndexSearcher and
Query at all and to write a simple "GeoSearcher" that opens the
regular index, plus the QuadTree, and uses the same approach I used
in GeoFilter to generate a group of hits.  Like this:

  GeoQuery geoSearcher = new GeoSearcher("path_to_index_file", quadTree);
  hits = geoQuery.search(rect,filter);

Would that be right?

    --G


On Sun, May 01, 2005 at 01:14:25PM -0700, Guillermo Payet wrote:

> Hi,
>
> I started implementing geographical searches yesterday, using BBN's QuadTree
> implementaion as the spacial index.  I first implemented a new "GeoFilter"
> class to filter queries to "all items within a rectangle".  That was pretty
> easy and it's now working beautifuly, and very fast too.  See below force
> source code.
>
> BTW:  I'm creating the QuadTree in memory right now during Lucene index
> creation, but not storing it in the disk yet.  I'll add something like
> a couple of "GeoIndexReader" and "GeoIndexWriter" classes later.
>
> I'm now having a hell of a time figuring out how to implement a "GeoQuery"
> class though.  Just figuring out how the whole Query mechanism works
> by reading the source code is proving to be quite a challenge.
>
> Question:  Is there any article or document that explains this?  Also:
> Any tips as to what the right approach would be here?
>
>     --G
>
>
> ----------------------------------------------------------------------------
> package com.oceangroup.projects.localharvest.search;
>
> import java.util.BitSet;
> import java.util.Vector;
> import java.io.IOException;
>
> import org.apache.lucene.search.*;
> import org.apache.lucene.index.IndexReader;
>
> import com.oceangroup.servlets.gis.LatLonRect;
>
> import com.bbn.openmap.util.quadtree.QuadTree;
>
> /**
>  * A Filter that restricts search results to a geographical area
>  *
>  */
> public class GeoFilter extends Filter {
>
>     QuadTree    qTree;
>     LatLonRect  rect;
>
>     public GeoFilter(QuadTree quadTree, LatLonRect rect) {
>         this.qTree = quadTree;
>         this.rect = rect;
>     }
>
>     public BitSet bits(IndexReader reader) throws IOException {
>         BitSet bits = new BitSet(reader.maxDoc());
>         Vector<Integer> results = qTree.get((float)rect.urLat,(float)rect.llLon,(float)rect.llLat,(float)rect.urLon);
>
>         if (results == null || results.size()==0) {
>             return bits;
>         }
>
>         for (Integer item: results) {
>             bits.set(item.intValue());
>         }
>
>         return bits;
>     }
> }
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>

--
Guillermo Payet
L O C A L  H A R V E S T
http://www.localharvest.org

Every Morning I awake torn between a desire to save the world and
an inclination to savor it.  This makes it hard to plan the day.

                                                       -E.B.White


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: geographical searches

TJones-2
In reply to this post by Guillermo Payet

>I don't really need any geographical scoring.  I just need to be able
>to show all items within a region.  Since a lot of the complexity of
>"Query" has to do with scoring, would it be better to just use the
>GeoFilter, and to search for *?   Are there any performance issues
>with this?

You might try looking at the Sort objects, such as SortComparator.

If you're only searching for any term using *, I'm not sure why you're
using Lucene?

Tim


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: geographical searches

Guillermo Payet
> If you're only searching for any term using *, I'm not sure why you're
> using Lucene?

Most searches are not as simple, and for those the GeoFilter does
the trick.  I'm just trying to optimize for the few searches for
all items within an area.


On Mon, May 02, 2005 at 08:24:07AM -0500, [hidden email] wrote:

>
> >I don't really need any geographical scoring.  I just need to be able
> >to show all items within a region.  Since a lot of the complexity of
> >"Query" has to do with scoring, would it be better to just use the
> >GeoFilter, and to search for *?   Are there any performance issues
> >with this?
>
> You might try looking at the Sort objects, such as SortComparator.
>
> If you're only searching for any term using *, I'm not sure why you're
> using Lucene?
>
> Tim
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>

--
Guillermo Payet
L O C A L  H A R V E S T
http://www.localharvest.org

Every Morning I awake torn between a desire to save the world and
an inclination to savor it.  This makes it hard to plan the day.

                                                       -E.B.White


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: geographical searches

Stefan Keller
2005/5/2, Guillermo Payet <[hidden email]>:
> > If you're only searching for any term using *, I'm not sure why you're
> > using Lucene?
>
> Most searches are not as simple, and for those the GeoFilter does
> the trick.  I'm just trying to optimize for the few searches for
> all items within an area.

What I understand is the following:

Your goal is to restrict documents search results to spatial extends
similar to the syntax which restricts documents based on their
(creation/modification) date within a given time period.

For the indexing field descriptions are added to the indexer in order
to tell him, which fields to index and which store. You now want to
add a new type, let's call it coordinates (WGS84) as a new field type
to index. For Lucene API users nothing changes from here on for the
index.

Now, in a search, a bounding box is given as two WGS84 coordinates (=
lat/lon numbers north/west and lat/lon numbers south east) being a
syntax extension.

Based on the location value (if any) of each hit a result set comes
out which is a subset of all hits if there was no spatial filter
(don't now how you handle hits which have no lat/lon location value?).

Is that right?
Now what are only asking for? The most performant strategy?
For java source code look at http://freegis.org

BTW: You wrote:
> GeoQuery geoSearcher = new GeoSearcher("path_to_index_file", quadTree);
> hits = geoQuery.search(rect,filter);

I think you rather mean
> GeoQuery geoSearcher = new GeoSearcher("path_to_index_file", quadTree);
> hits = geoSearcher.search(rect,filter);
?

-- Stefan
http://www.geometa.info

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: geographical searches

Guillermo Payet
> Your goal is to restrict documents search results to spatial extends
> similar to the syntax which restricts documents based on their
> (creation/modification) date within a given time period.

Yes...  That's the immediate goal, which I have already achieved
by using quadTrees and the GeoFilter class I pasted here.  This is
already live (since yesterday).  To see it in action, go to:

   http://www.localharvest.org.

It's a little bit of a hack though, and needs some work before I can
contribute it in any meaningful form to the project.

> For the indexing field descriptions are added to the indexer in order
> to tell him, which fields to index and which store. You now want to
> add a new type, let's call it coordinates (WGS84) as a new field type
> to index. For Lucene API users nothing changes from here on for the
> index.

I haven't studied the structure of the Lucene index yet. (which I'll
do) so I don't really know how to approach this. It makes sense
to add a new "Coordinates" Field type to documents, but I suspect
that the spatial indexing (using a quadTree in my implementation)
needs to be done outside of Lucene's regular indexing.

> Now, in a search, a bounding box is given as two WGS84 coordinates (=
> lat/lon numbers north/west and lat/lon numbers south east) being a
> syntax extension.

That's right.  Additional search formats would ba a central point plus
a radius, or just a central point, returning all documents ranked by
proximity to the point.  Maybe this last one should be done as a Sort
instead?

> Based on the location value (if any) of each hit a result set comes
> out which is a subset of all hits if there was no spatial filter
> (don't now how you handle hits which have no lat/lon location value?).

In my project, all items have locations.  For a more generic implementation,
yes, that has to be thought through.

As I said, the problem has pretty much been solved now.  I'm just using
the GeoFilter I wrote (and posted here).  I would like to reimplement it
in some "clean way" so that it can be rolled into the project.

    --G


--
Guillermo Payet
L O C A L  H A R V E S T
http://www.localharvest.org

Every Morning I awake torn between a desire to save the world and
an inclination to savor it.  This makes it hard to plan the day.

                                                       -E.B.White


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Loading...