Postcode/zipcode search

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

Postcode/zipcode search

Chris Mannion-2
Hi all

I've got a bit of a niggling problem with how one of my searches is working
as opposed to how my users would like it too work.  We're indexing on UK
postcodes, which are in the format of a 3 or 4 character area code followed
by a 3 or 4 character street specific code, e.g. "NW10 7NY" or "M11 1LQ".
We originally had the values being indexed as tokenized and used a very
simple search string in the format "postcode:xxx xxx", with no grouping or
boosting or fuzzy searching, just an straight search on whatever the user
answered.  This had the benefit of finding exact matches to searches and
allowing us to search just on the area part of the code to return all
records with that area code, eg a search on "NW2" returning anything
starting NW2, like "NW2 6TB", "NW2 1ER" etc etc.

However, the downside to that was that searches could also return records
only tenuously related to what was searched for, eg. a search for "NW10 7NY"
would also return a record with a postcode "SE9 6NY" because of the slight
match of the "NY".  Obviously this was technically correct but users
complained because their searches were returning records from completely
different areas.  Our first step to put this right was to take off the
tokenization of the field, which we also weren't happy with so have
continued to fiddle.

The current status is as follows - we index the values by stripping out
spaces and tokeniing them and use a keywordAnalyzer.  In searching we also
strip spaces from the search term entered and search with a
keywordAnalyzer.  Searches for full postcodes, e.g. "NW10 7NY" find all
exact matches but also any full values that are partial matches (e.g. some
records just have "NW10" as their postcode field and the "NW10 7NY" search
pulls them back too), but searches for partial postcodes e.g. "NW10" still
only finds exact matches, e.g. it only pulls back those record that have
just "NW10" as their postcode, rather than anything *starting* with NW10 as
we'd like it to do.

Can anyone help me get this working in the way we need it too please?

--
Chris Mannion
iCasework and LocalAlert implementation team
0208 144 4416
Reply | Threaded
Open this post in threaded view
|

Re: Postcode/zipcode search

Grant Ingersoll-2
You might have a look at using a phrase query when you have more than  
one term in the query in addition to your term query, but giving the  
phrase query more weight (i.e. give an exact match more weight) and  
keep your original tokenization process.

Something like:
"NW10 7NY"^5 OR NW10 OR 7NY

or even downweighting the individual terms.  Thus, exact matches on  
the full phrase will weigh much higher, and you can still do  
individual term matching for the single term case (NW10)

-Grant

On May 6, 2008, at 12:28 PM, Chris Mannion wrote:

> Hi all
>
> I've got a bit of a niggling problem with how one of my searches is  
> working
> as opposed to how my users would like it too work.  We're indexing  
> on UK
> postcodes, which are in the format of a 3 or 4 character area code  
> followed
> by a 3 or 4 character street specific code, e.g. "NW10 7NY" or "M11  
> 1LQ".
> We originally had the values being indexed as tokenized and used a  
> very
> simple search string in the format "postcode:xxx xxx", with no  
> grouping or
> boosting or fuzzy searching, just an straight search on whatever the  
> user
> answered.  This had the benefit of finding exact matches to searches  
> and
> allowing us to search just on the area part of the code to return all
> records with that area code, eg a search on "NW2" returning anything
> starting NW2, like "NW2 6TB", "NW2 1ER" etc etc.
>
> However, the downside to that was that searches could also return  
> records
> only tenuously related to what was searched for, eg. a search for  
> "NW10 7NY"
> would also return a record with a postcode "SE9 6NY" because of the  
> slight
> match of the "NY".  Obviously this was technically correct but users
> complained because their searches were returning records from  
> completely
> different areas.  Our first step to put this right was to take off the
> tokenization of the field, which we also weren't happy with so have
> continued to fiddle.
>
> The current status is as follows - we index the values by stripping  
> out
> spaces and tokeniing them and use a keywordAnalyzer.  In searching  
> we also
> strip spaces from the search term entered and search with a
> keywordAnalyzer.  Searches for full postcodes, e.g. "NW10 7NY" find  
> all
> exact matches but also any full values that are partial matches  
> (e.g. some
> records just have "NW10" as their postcode field and the "NW10 7NY"  
> search
> pulls them back too), but searches for partial postcodes e.g. "NW10"  
> still
> only finds exact matches, e.g. it only pulls back those record that  
> have
> just "NW10" as their postcode, rather than anything *starting* with  
> NW10 as
> we'd like it to do.
>
> Can anyone help me get this working in the way we need it too please?
>
> --
> Chris Mannion
> iCasework and LocalAlert implementation team
> 0208 144 4416

--------------------------
Grant Ingersoll

Lucene Helpful Hints:
http://wiki.apache.org/lucene-java/BasicsOfPerformance
http://wiki.apache.org/lucene-java/LuceneFAQ







---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Postcode/zipcode search

Erick Erickson
In reply to this post by Chris Mannion-2
Have you looked at PrefixQuery? If that doesn't work for you, could you give
a few
more examples of expected inputs and outputs?

Best
Erick

On Tue, May 6, 2008 at 12:28 PM, Chris Mannion <[hidden email]>
wrote:

> Hi all
>
> I've got a bit of a niggling problem with how one of my searches is
> working
> as opposed to how my users would like it too work.  We're indexing on UK
> postcodes, which are in the format of a 3 or 4 character area code
> followed
> by a 3 or 4 character street specific code, e.g. "NW10 7NY" or "M11 1LQ".
> We originally had the values being indexed as tokenized and used a very
> simple search string in the format "postcode:xxx xxx", with no grouping or
> boosting or fuzzy searching, just an straight search on whatever the user
> answered.  This had the benefit of finding exact matches to searches and
> allowing us to search just on the area part of the code to return all
> records with that area code, eg a search on "NW2" returning anything
> starting NW2, like "NW2 6TB", "NW2 1ER" etc etc.
>
> However, the downside to that was that searches could also return records
> only tenuously related to what was searched for, eg. a search for "NW10
> 7NY"
> would also return a record with a postcode "SE9 6NY" because of the slight
> match of the "NY".  Obviously this was technically correct but users
> complained because their searches were returning records from completely
> different areas.  Our first step to put this right was to take off the
> tokenization of the field, which we also weren't happy with so have
> continued to fiddle.
>
> The current status is as follows - we index the values by stripping out
> spaces and tokeniing them and use a keywordAnalyzer.  In searching we also
> strip spaces from the search term entered and search with a
> keywordAnalyzer.  Searches for full postcodes, e.g. "NW10 7NY" find all
> exact matches but also any full values that are partial matches (e.g. some
> records just have "NW10" as their postcode field and the "NW10 7NY" search
> pulls them back too), but searches for partial postcodes e.g. "NW10" still
> only finds exact matches, e.g. it only pulls back those record that have
> just "NW10" as their postcode, rather than anything *starting* with NW10
> as
> we'd like it to do.
>
> Can anyone help me get this working in the way we need it too please?
>
> --
> Chris Mannion
> iCasework and LocalAlert implementation team
> 0208 144 4416
>
Reply | Threaded
Open this post in threaded view
|

RE: Postcode/zipcode search

Will Johnson-2
In reply to this post by Chris Mannion-2
You could split up the field into 2 separate fields:

Postcode:NW10 7NY -> post1:NW10 post2:7NY

Then rewrite user's queries using the same logic:  ie if the enter 1 term
'NW10' it gets rewritten to post1:NW10, if they enter 2 terms post1:NW10 AND
post2:7NY.

It also lets you do fuzzy search ie post1:NW10 post2:7?Y and so on.

- will

-----Original Message-----
From: Chris Mannion [mailto:[hidden email]]
Sent: Tuesday, May 06, 2008 12:28 PM
To: [hidden email]
Subject: Postcode/zipcode search

Hi all

I've got a bit of a niggling problem with how one of my searches is working
as opposed to how my users would like it too work.  We're indexing on UK
postcodes, which are in the format of a 3 or 4 character area code followed
by a 3 or 4 character street specific code, e.g. "NW10 7NY" or "M11 1LQ".
We originally had the values being indexed as tokenized and used a very
simple search string in the format "postcode:xxx xxx", with no grouping or
boosting or fuzzy searching, just an straight search on whatever the user
answered.  This had the benefit of finding exact matches to searches and
allowing us to search just on the area part of the code to return all
records with that area code, eg a search on "NW2" returning anything
starting NW2, like "NW2 6TB", "NW2 1ER" etc etc.

However, the downside to that was that searches could also return records
only tenuously related to what was searched for, eg. a search for "NW10 7NY"
would also return a record with a postcode "SE9 6NY" because of the slight
match of the "NY".  Obviously this was technically correct but users
complained because their searches were returning records from completely
different areas.  Our first step to put this right was to take off the
tokenization of the field, which we also weren't happy with so have
continued to fiddle.

The current status is as follows - we index the values by stripping out
spaces and tokeniing them and use a keywordAnalyzer.  In searching we also
strip spaces from the search term entered and search with a
keywordAnalyzer.  Searches for full postcodes, e.g. "NW10 7NY" find all
exact matches but also any full values that are partial matches (e.g. some
records just have "NW10" as their postcode field and the "NW10 7NY" search
pulls them back too), but searches for partial postcodes e.g. "NW10" still
only finds exact matches, e.g. it only pulls back those record that have
just "NW10" as their postcode, rather than anything *starting* with NW10 as
we'd like it to do.

Can anyone help me get this working in the way we need it too please?

--
Chris Mannion
iCasework and LocalAlert implementation team
0208 144 4416


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Postcode/zipcode search

AJ Weber
In reply to this post by Chris Mannion-2
Maybe I'm oversimplifying it, and maybe this isn't what you desire, but...

What about breaking the postcode into two (or three) different fields?  Seems easy to parse on the ingestion-side, as you just break the string at the "middle" space.  Then store "postal_area", "postal_street", and optionally the original, full "postalcode".  (Probably do not need to tokenize the first two, maybe the last one.)

Then, and here's where you may throw this idea out entirely, it depends on how your searching application/page is setup.  You'd need to apply the values entered by the user appropriately.  If they enter 2-3chars with no spaces, search on the "postal_area" field.  If they enter > 4 chars (including a space), you could, again, split the string at the space and search on the two individual fields.

If you kept the original, full "postalcode" field, you could always put a link on the search results (or maybe only if zero results are returned) saying, "Didn't find what you're looking for?  Click here to broaden your search!"  -- And in that case send the whole query-string against the postalcode field.

Dunno.  Just an idea.  Good Luck!

-AJ

  ----- Original Message -----
  From: Chris Mannion
  To: [hidden email]
  Sent: Tuesday, May 06, 2008 12:28 PM
  Subject: Postcode/zipcode search


  Hi all

  I've got a bit of a niggling problem with how one of my searches is working
  as opposed to how my users would like it too work.  We're indexing on UK
  postcodes, which are in the format of a 3 or 4 character area code followed
  by a 3 or 4 character street specific code, e.g. "NW10 7NY" or "M11 1LQ".
  We originally had the values being indexed as tokenized and used a very
  simple search string in the format "postcode:xxx xxx", with no grouping or
  boosting or fuzzy searching, just an straight search on whatever the user
  answered.  This had the benefit of finding exact matches to searches and
  allowing us to search just on the area part of the code to return all
  records with that area code, eg a search on "NW2" returning anything
  starting NW2, like "NW2 6TB", "NW2 1ER" etc etc.

  However, the downside to that was that searches could also return records
  only tenuously related to what was searched for, eg. a search for "NW10 7NY"
  would also return a record with a postcode "SE9 6NY" because of the slight
  match of the "NY".  Obviously this was technically correct but users
  complained because their searches were returning records from completely
  different areas.  Our first step to put this right was to take off the
  tokenization of the field, which we also weren't happy with so have
  continued to fiddle.

  The current status is as follows - we index the values by stripping out
  spaces and tokeniing them and use a keywordAnalyzer.  In searching we also
  strip spaces from the search term entered and search with a
  keywordAnalyzer.  Searches for full postcodes, e.g. "NW10 7NY" find all
  exact matches but also any full values that are partial matches (e.g. some
  records just have "NW10" as their postcode field and the "NW10 7NY" search
  pulls them back too), but searches for partial postcodes e.g. "NW10" still
  only finds exact matches, e.g. it only pulls back those record that have
  just "NW10" as their postcode, rather than anything *starting* with NW10 as
  we'd like it to do.

  Can anyone help me get this working in the way we need it too please?

  --
  Chris Mannion
  iCasework and LocalAlert implementation team
  0208 144 4416
Reply | Threaded
Open this post in threaded view
|

Re: Postcode/zipcode search

mark harwood
In reply to this post by Chris Mannion-2
Can you not convert all postcodes to coordinates and do actual distance-based matching?

You will have to pay Royal Mail or 3rd party suppliers to get hold of the PAF data required for this geocoding (despite having funded this already as a UK tax payer- grrrr)

Cheers
Mark

----- Original Message ----
From: Chris Mannion <[hidden email]>
To: [hidden email]
Sent: Tuesday, 6 May, 2008 5:28:25 PM
Subject: Postcode/zipcode search

Hi all

I've got a bit of a niggling problem with how one of my searches is working
as opposed to how my users would like it too work.  We're indexing on UK
postcodes, which are in the format of a 3 or 4 character area code followed
by a 3 or 4 character street specific code, e.g. "NW10 7NY" or "M11 1LQ".
We originally had the values being indexed as tokenized and used a very
simple search string in the format "postcode:xxx xxx", with no grouping or
boosting or fuzzy searching, just an straight search on whatever the user
answered.  This had the benefit of finding exact matches to searches and
allowing us to search just on the area part of the code to return all
records with that area code, eg a search on "NW2" returning anything
starting NW2, like "NW2 6TB", "NW2 1ER" etc etc.

However, the downside to that was that searches could also return records
only tenuously related to what was searched for, eg. a search for "NW10 7NY"
would also return a record with a postcode "SE9 6NY" because of the slight
match of the "NY".  Obviously this was technically correct but users
complained because their searches were returning records from completely
different areas.  Our first step to put this right was to take off the
tokenization of the field, which we also weren't happy with so have
continued to fiddle.

The current status is as follows - we index the values by stripping out
spaces and tokeniing them and use a keywordAnalyzer.  In searching we also
strip spaces from the search term entered and search with a
keywordAnalyzer.  Searches for full postcodes, e.g. "NW10 7NY" find all
exact matches but also any full values that are partial matches (e.g. some
records just have "NW10" as their postcode field and the "NW10 7NY" search
pulls them back too), but searches for partial postcodes e.g. "NW10" still
only finds exact matches, e.g. it only pulls back those record that have
just "NW10" as their postcode, rather than anything *starting* with NW10 as
we'd like it to do.

Can anyone help me get this working in the way we need it too please?

--
Chris Mannion
iCasework and LocalAlert implementation team
0208 144 4416





      __________________________________________________________
Sent from Yahoo! Mail.
A Smarter Email http://uk.docs.yahoo.com/nowyoucan.html

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Postcode/zipcode search

Ian Holsman (Lists)
have you had a look at WOEID's ?
https://developer.yahoo.com/geo/


http://where.yahooapis.com/v1/places.q('NW10%207NY')
gives you details about the postcode, as well as a lat/long bounding box
and the 'real' name of it (Willesden) in this case.

http://where.yahooapis.com/v1/place/26556102/neighbors

gives you the neighbors to it
http://where.yahooapis.com/v1/place/26556102/siblings
gives you it's children.
and
http://where.yahooapis.com/v1/place/26556102/parent?select=long
gives you 1 level up. (NW2 4) apparently.


So I'm guessing you could use 2 calls. 1 to get the WOEID of what the
user has entered. the 2nd to get the siblings. using that you can
construct a query to get all the entries in NW10 7NY.


(note: I don't work for yahoo, but work with people who used to)

mark harwood wrote:

> Can you not convert all postcodes to coordinates and do actual distance-based matching?
>
> You will have to pay Royal Mail or 3rd party suppliers to get hold of the PAF data required for this geocoding (despite having funded this already as a UK tax payer- grrrr)
>
> Cheers
> Mark
>
> ----- Original Message ----
> From: Chris Mannion <[hidden email]>
> To: [hidden email]
> Sent: Tuesday, 6 May, 2008 5:28:25 PM
> Subject: Postcode/zipcode search
>
> Hi all
>
> I've got a bit of a niggling problem with how one of my searches is working
> as opposed to how my users would like it too work.  We're indexing on UK
> postcodes, which are in the format of a 3 or 4 character area code followed
> by a 3 or 4 character street specific code, e.g. "NW10 7NY" or "M11 1LQ".
> We originally had the values being indexed as tokenized and used a very
> simple search string in the format "postcode:xxx xxx", with no grouping or
> boosting or fuzzy searching, just an straight search on whatever the user
> answered.  This had the benefit of finding exact matches to searches and
> allowing us to search just on the area part of the code to return all
> records with that area code, eg a search on "NW2" returning anything
> starting NW2, like "NW2 6TB", "NW2 1ER" etc etc.
>
> However, the downside to that was that searches could also return records
> only tenuously related to what was searched for, eg. a search for "NW10 7NY"
> would also return a record with a postcode "SE9 6NY" because of the slight
> match of the "NY".  Obviously this was technically correct but users
> complained because their searches were returning records from completely
> different areas.  Our first step to put this right was to take off the
> tokenization of the field, which we also weren't happy with so have
> continued to fiddle.
>
> The current status is as follows - we index the values by stripping out
> spaces and tokeniing them and use a keywordAnalyzer.  In searching we also
> strip spaces from the search term entered and search with a
> keywordAnalyzer.  Searches for full postcodes, e.g. "NW10 7NY" find all
> exact matches but also any full values that are partial matches (e.g. some
> records just have "NW10" as their postcode field and the "NW10 7NY" search
> pulls them back too), but searches for partial postcodes e.g. "NW10" still
> only finds exact matches, e.g. it only pulls back those record that have
> just "NW10" as their postcode, rather than anything *starting* with NW10 as
> we'd like it to do.
>
> Can anyone help me get this working in the way we need it too please?
>


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]