Boundary match as part of query language?

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Boundary match as part of query language?

Jan Høydahl / Cominvent
Hi,

Sometimes you need to anchor your search to start/end of field.

Example:
1. title=New York Yankees
2. title=New York
3. title=York

If I search title:"New York", or title:"York" I would get a match, but I'd like to anchor my search to beginning and/or end of the field, e.g. with regex syntax, title:"^New York$"

Now, I know how to work-around this, by appending some unique character sequence at each end of the field and then include this in my search in the front end. However, I wonder if any of you have been planning a patch to add a native boundary match feature to Solr that would automagically add tokens (also for multi-value fields!), and expand the query language to allow querying for starts-with(), ends-with() and equals()

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com
Training in Europe - www.solrtraining.com

Reply | Threaded
Open this post in threaded view
|

Re: Boundary match as part of query language?

Lance Norskog-2
One way is to add magic 'beginning' and 'end' terms, then do phrase
searches with those terms.

On Wed, Mar 10, 2010 at 7:51 AM, Jan Høydahl / Cominvent
<[hidden email]> wrote:

> Hi,
>
> Sometimes you need to anchor your search to start/end of field.
>
> Example:
> 1. title=New York Yankees
> 2. title=New York
> 3. title=York
>
> If I search title:"New York", or title:"York" I would get a match, but I'd like to anchor my search to beginning and/or end of the field, e.g. with regex syntax, title:"^New York$"
>
> Now, I know how to work-around this, by appending some unique character sequence at each end of the field and then include this in my search in the front end. However, I wonder if any of you have been planning a patch to add a native boundary match feature to Solr that would automagically add tokens (also for multi-value fields!), and expand the query language to allow querying for starts-with(), ends-with() and equals()
>
> --
> Jan Høydahl, search solution architect
> Cominvent AS - www.cominvent.com
> Training in Europe - www.solrtraining.com
>
>



--
Lance Norskog
[hidden email]
Reply | Threaded
Open this post in threaded view
|

Re: Boundary match as part of query language?

Jan Høydahl / Cominvent
Sure, this is how we do it now. But wouldn't it be nice with native support for it? I could start coding it myself but wanted to know if there is a patch out there already or something...

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com
Training in Europe - www.solrtraining.com

On 14. mars 2010, at 00.17, Lance Norskog wrote:

> One way is to add magic 'beginning' and 'end' terms, then do phrase
> searches with those terms.
>
> On Wed, Mar 10, 2010 at 7:51 AM, Jan Høydahl / Cominvent
> <[hidden email]> wrote:
>> Hi,
>>
>> Sometimes you need to anchor your search to start/end of field.
>>
>> Example:
>> 1. title=New York Yankees
>> 2. title=New York
>> 3. title=York
>>
>> If I search title:"New York", or title:"York" I would get a match, but I'd like to anchor my search to beginning and/or end of the field, e.g. with regex syntax, title:"^New York$"
>>
>> Now, I know how to work-around this, by appending some unique character sequence at each end of the field and then include this in my search in the front end. However, I wonder if any of you have been planning a patch to add a native boundary match feature to Solr that would automagically add tokens (also for multi-value fields!), and expand the query language to allow querying for starts-with(), ends-with() and equals()
>>
>> --
>> Jan Høydahl, search solution architect
>> Cominvent AS - www.cominvent.com
>> Training in Europe - www.solrtraining.com
>>
>>
>
>
>
> --
> Lance Norskog
> [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Boundary match as part of query language?

hossman
In reply to this post by Jan Høydahl / Cominvent

: Now, I know how to work-around this, by appending some unique character
: sequence at each end of the field and then include this in my search in
: the front end. However, I wonder if any of you have been planning a
: patch to add a native boundary match feature to Solr that would
: automagically add tokens (also for multi-value fields!), and expand the
: query language to allow querying for starts-with(), ends-with() and
: equals()

well, if you *always* want boundary rules to be applied, that can be done
as simply as adding your boundary tokens automaticly in both the index and
query time analyzers ... then a search for q="New York" can
automaticly be translated into a PhraseQuery for "_BEGIN New York _END"

If you want special QueryParser markup to specify when you wnat specific
boundary conditions that can also be done with a custom QParser, and
automaicly applying the boundry tokens in your indexing analyzer (but not
the query analyzer -- the QParser would take care of that part)  In
general though it's hard to see how something like q=begin(New York) is
easier syntax then q="_BEGIN New York"

THe point is it's realtively easy to implement something like this when
meeting specific needs, but i don't know of any working on a truely
generalized Qparser that deals with this -- largely because most people
who care about this sort of thing either have really complicated use cases
(ie: not just begin/end boudnary markers, but also want sentence,
paragraph, page, chapter, section, etc...) or want extremely specific
query syntax (ie: they're trying to recreate the syntax of an existing
system they are replacing) so a general solution doesn't work well.

The cosest i've ever seen is Mark Miller's QSolr parser, which actually
went a completley differnet direction using a home grown syntax to
generate Span queries ... if that slacker ever gets off his butt and
starts running his webserver again, you could download it and try it out,
and probably find that it would be trivial to turn it into a QParser.


-Hoss

Reply | Threaded
Open this post in threaded view
|

Re: Boundary match as part of query language?

David Smiley
By the way, you'll probably want to shingle or use CommonGrams (with _BEGIN & _END being "common") for acceptable performance.

I'm wondering, if Lucene's new payload features might provide an alternative mechanism to mark the first and last term.

~ David Smiley

hossman wrote
: Now, I know how to work-around this, by appending some unique character
: sequence at each end of the field and then include this in my search in
: the front end. However, I wonder if any of you have been planning a
: patch to add a native boundary match feature to Solr that would
: automagically add tokens (also for multi-value fields!), and expand the
: query language to allow querying for starts-with(), ends-with() and
: equals()

well, if you *always* want boundary rules to be applied, that can be done
as simply as adding your boundary tokens automaticly in both the index and
query time analyzers ... then a search for q="New York" can
automaticly be translated into a PhraseQuery for "_BEGIN New York _END"

If you want special QueryParser markup to specify when you wnat specific
boundary conditions that can also be done with a custom QParser, and
automaicly applying the boundry tokens in your indexing analyzer (but not
the query analyzer -- the QParser would take care of that part)  In
general though it's hard to see how something like q=begin(New York) is
easier syntax then q="_BEGIN New York"

THe point is it's realtively easy to implement something like this when
meeting specific needs, but i don't know of any working on a truely
generalized Qparser that deals with this -- largely because most people
who care about this sort of thing either have really complicated use cases
(ie: not just begin/end boudnary markers, but also want sentence,
paragraph, page, chapter, section, etc...) or want extremely specific
query syntax (ie: they're trying to recreate the syntax of an existing
system they are replacing) so a general solution doesn't work well.

The cosest i've ever seen is Mark Miller's QSolr parser, which actually
went a completley differnet direction using a home grown syntax to
generate Span queries ... if that slacker ever gets off his butt and
starts running his webserver again, you could download it and try it out,
and probably find that it would be trivial to turn it into a QParser.


-Hoss