[jira] Created: (SOLR-1980) Implement boundary match support

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

[jira] Created: (SOLR-1980) Implement boundary match support

Nick Burch (Jira)
Implement boundary match support
--------------------------------

                 Key: SOLR-1980
                 URL: https://issues.apache.org/jira/browse/SOLR-1980
             Project: Solr
          Issue Type: New Feature
          Components: Schema and Analysis
            Reporter: Jan Høydahl


Sometimes you need to specify that a query should match only at the start or end of a field, or be an exact match.

Example content:
1) a quick fox is brown
2) quick fox is brown

Example queries:
"^quick fox" -> should only match 2)
"brown$" -> should match 1) and 2)
"^quick fox is brown$" -> should only match 2)

Proposed way of implmementation is through a new BoundaryMatchTokenFilter which behaves like this:
On the index side it inserts special unique tokens at beginning and end of field. These could be some weird unicode sequence.
On the query side, it looks for the first character matching "^" or the last character matching "$" and replaces them with the special tokens.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (SOLR-1980) Implement boundary match support

Nick Burch (Jira)

    [ https://issues.apache.org/jira/browse/SOLR-1980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12884147#action_12884147 ]

Lance Norskog commented on SOLR-1980:
-------------------------------------

Another use case is with phrases, especially sloppy phrases.
"^hello kitty" would find "hello kitty" at the beginning of the text.
"^hello"~5 would find "hello" among the first 5 words, but the closer to the beginning, the better. This is especially interesting for consumer searches- people tend to type the first word of a movie title first.

> Implement boundary match support
> --------------------------------
>
>                 Key: SOLR-1980
>                 URL: https://issues.apache.org/jira/browse/SOLR-1980
>             Project: Solr
>          Issue Type: New Feature
>          Components: Schema and Analysis
>            Reporter: Jan Høydahl
>
> Sometimes you need to specify that a query should match only at the start or end of a field, or be an exact match.
> Example content:
> 1) a quick fox is brown
> 2) quick fox is brown
> Example queries:
> "^quick fox" -> should only match 2)
> "brown$" -> should match 1) and 2)
> "^quick fox is brown$" -> should only match 2)
> Proposed way of implmementation is through a new BoundaryMatchTokenFilter which behaves like this:
> On the index side it inserts special unique tokens at beginning and end of field. These could be some weird unicode sequence.
> On the query side, it looks for the first character matching "^" or the last character matching "$" and replaces them with the special tokens.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [jira] Commented: (SOLR-1980) Implement boundary match support

Jan Høydahl / Cominvent
I think the TokenFilter approach is the easiest. Another option would be to go deeper and introduce it as a native query language syntax in some way and add boundarymatch="true" as a parameter in the schema. Any opinions?

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com
Training in Europe - www.solrtraining.com

On 1. juli 2010, at 05.38, Lance Norskog (JIRA) wrote:

>
>    [ https://issues.apache.org/jira/browse/SOLR-1980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12884147#action_12884147 ]
>
> Lance Norskog commented on SOLR-1980:
> -------------------------------------
>
> Another use case is with phrases, especially sloppy phrases.
> "^hello kitty" would find "hello kitty" at the beginning of the text.
> "^hello"~5 would find "hello" among the first 5 words, but the closer to the beginning, the better. This is especially interesting for consumer searches- people tend to type the first word of a movie title first.
>
>> Implement boundary match support
>> --------------------------------
>>
>>                Key: SOLR-1980
>>                URL: https://issues.apache.org/jira/browse/SOLR-1980
>>            Project: Solr
>>         Issue Type: New Feature
>>         Components: Schema and Analysis
>>           Reporter: Jan Høydahl
>>
>> Sometimes you need to specify that a query should match only at the start or end of a field, or be an exact match.
>> Example content:
>> 1) a quick fox is brown
>> 2) quick fox is brown
>> Example queries:
>> "^quick fox" -> should only match 2)
>> "brown$" -> should match 1) and 2)
>> "^quick fox is brown$" -> should only match 2)
>> Proposed way of implmementation is through a new BoundaryMatchTokenFilter which behaves like this:
>> On the index side it inserts special unique tokens at beginning and end of field. These could be some weird unicode sequence.
>> On the query side, it looks for the first character matching "^" or the last character matching "$" and replaces them with the special tokens.
>
> --
> This message is automatically generated by JIRA.
> -
> You can reply to this email to add a comment to the issue online.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (SOLR-1980) Implement boundary match support

Nick Burch (Jira)
In reply to this post by Nick Burch (Jira)

    [ https://issues.apache.org/jira/browse/SOLR-1980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12895937#action_12895937 ]

Otis Gospodnetic commented on SOLR-1980:
----------------------------------------

What about Span queries - no use here? http://search-lucene.com/jd/lucene/org/apache/lucene/search/spans/SpanQuery.html


> Implement boundary match support
> --------------------------------
>
>                 Key: SOLR-1980
>                 URL: https://issues.apache.org/jira/browse/SOLR-1980
>             Project: Solr
>          Issue Type: New Feature
>          Components: Schema and Analysis
>            Reporter: Jan Høydahl
>
> Sometimes you need to specify that a query should match only at the start or end of a field, or be an exact match.
> Example content:
> 1) a quick fox is brown
> 2) quick fox is brown
> Example queries:
> "^quick fox" -> should only match 2)
> "brown$" -> should match 1) and 2)
> "^quick fox is brown$" -> should only match 2)
> Proposed way of implmementation is through a new BoundaryMatchTokenFilter which behaves like this:
> On the index side it inserts special unique tokens at beginning and end of field. These could be some weird unicode sequence.
> On the query side, it looks for the first character matching "^" or the last character matching "$" and replaces them with the special tokens.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (SOLR-1980) Implement boundary match support

Nick Burch (Jira)
In reply to this post by Nick Burch (Jira)

    [ https://issues.apache.org/jira/browse/SOLR-1980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12896016#action_12896016 ]

Jan Høydahl commented on SOLR-1980:
-----------------------------------

Phrase slop would work as before if the ^ and $ are encoded as simple special tokens in the index.

For multi-valued fields, each sub value need to be tagged.

I think the "^a b c$" syntax is pretty easy to understand. But does it crash with any other feature or special char? Perhaps some existing regex stuff that I don't know about?

> Implement boundary match support
> --------------------------------
>
>                 Key: SOLR-1980
>                 URL: https://issues.apache.org/jira/browse/SOLR-1980
>             Project: Solr
>          Issue Type: New Feature
>          Components: Schema and Analysis
>            Reporter: Jan Høydahl
>
> Sometimes you need to specify that a query should match only at the start or end of a field, or be an exact match.
> Example content:
> 1) a quick fox is brown
> 2) quick fox is brown
> Example queries:
> "^quick fox" -> should only match 2)
> "brown$" -> should match 1) and 2)
> "^quick fox is brown$" -> should only match 2)
> Proposed way of implmementation is through a new BoundaryMatchTokenFilter which behaves like this:
> On the index side it inserts special unique tokens at beginning and end of field. These could be some weird unicode sequence.
> On the query side, it looks for the first character matching "^" or the last character matching "$" and replaces them with the special tokens.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]