[jira] Created: (NUTCH-479) Support for OR queries

classic Classic list List threaded Threaded
13 messages Options
Reply | Threaded
Open this post in threaded view
|

[jira] Created: (NUTCH-479) Support for OR queries

Tim Allison (Jira)
Support for OR queries
----------------------

                 Key: NUTCH-479
                 URL: https://issues.apache.org/jira/browse/NUTCH-479
             Project: Nutch
          Issue Type: Improvement
          Components: searcher
    Affects Versions: 1.0.0
            Reporter: Andrzej Bialecki
         Assigned To: Andrzej Bialecki
             Fix For: 1.0.0


There have been many requests from users to extend Nutch query syntax to add support for OR queries, in addition to the implicit AND and NOT queries supported now.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Updated: (NUTCH-479) Support for OR queries

Tim Allison (Jira)

     [ https://issues.apache.org/jira/browse/NUTCH-479?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Andrzej Bialecki  updated NUTCH-479:
------------------------------------

    Attachment: or.patch

Patch based on the discussion on the mailing list, and a description provided by Nguien Ngoc Giang. There's a bug in this patch - when OR is used inside a phrase a parse exception is thrown. I'm not a JavaCC wizard, so I didn't know how to fix it.

> Support for OR queries
> ----------------------
>
>                 Key: NUTCH-479
>                 URL: https://issues.apache.org/jira/browse/NUTCH-479
>             Project: Nutch
>          Issue Type: Improvement
>          Components: searcher
>    Affects Versions: 1.0.0
>            Reporter: Andrzej Bialecki
>         Assigned To: Andrzej Bialecki
>             Fix For: 1.0.0
>
>         Attachments: or.patch
>
>
> There have been many requests from users to extend Nutch query syntax to add support for OR queries, in addition to the implicit AND and NOT queries supported now.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Updated: (NUTCH-479) Support for OR queries

Tim Allison (Jira)
In reply to this post by Tim Allison (Jira)

     [ https://issues.apache.org/jira/browse/NUTCH-479?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Nicolás Lichtmaier updated NUTCH-479:
-------------------------------------


This patch doesn't seem to add support for nested clauses like this:

"greenhouse effect" OR ( climate AND change )

I don't know if this full boolean logic support is important. But I've been asked to implement it here... =(

> Support for OR queries
> ----------------------
>
>                 Key: NUTCH-479
>                 URL: https://issues.apache.org/jira/browse/NUTCH-479
>             Project: Nutch
>          Issue Type: Improvement
>          Components: searcher
>    Affects Versions: 1.0.0
>            Reporter: Andrzej Bialecki
>         Assigned To: Andrzej Bialecki
>             Fix For: 1.0.0
>
>         Attachments: or.patch
>
>
> There have been many requests from users to extend Nutch query syntax to add support for OR queries, in addition to the implicit AND and NOT queries supported now.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (NUTCH-479) Support for OR queries

Tim Allison (Jira)
In reply to this post by Tim Allison (Jira)

    [ https://issues.apache.org/jira/browse/NUTCH-479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12494582 ]

Andrzej Bialecki  commented on NUTCH-479:
-----------------------------------------

Correct - the only syntax element added in this patch is an OR clause. Nested queries like that are probably not high on the priority list, because they may be expensive to run, and they would also complicate the implementation of QueryFilter plugins. Anyway, improvements are welcome ;)

> Support for OR queries
> ----------------------
>
>                 Key: NUTCH-479
>                 URL: https://issues.apache.org/jira/browse/NUTCH-479
>             Project: Nutch
>          Issue Type: Improvement
>          Components: searcher
>    Affects Versions: 1.0.0
>            Reporter: Andrzej Bialecki
>         Assigned To: Andrzej Bialecki
>             Fix For: 1.0.0
>
>         Attachments: or.patch
>
>
> There have been many requests from users to extend Nutch query syntax to add support for OR queries, in addition to the implicit AND and NOT queries supported now.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (NUTCH-479) Support for OR queries

Tim Allison (Jira)
In reply to this post by Tim Allison (Jira)

    [ https://issues.apache.org/jira/browse/NUTCH-479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12507221 ]

Rob Young commented on NUTCH-479:
---------------------------------

How would this work in the following case?

"search phrase" category:cat1 OR category:cat2

would it end up as

("search phrase" AND category:cat1) OR category:cat2

or as

"search phrase" AND (category:cat1 OR category:cat2)

> Support for OR queries
> ----------------------
>
>                 Key: NUTCH-479
>                 URL: https://issues.apache.org/jira/browse/NUTCH-479
>             Project: Nutch
>          Issue Type: Improvement
>          Components: searcher
>    Affects Versions: 1.0.0
>            Reporter: Andrzej Bialecki
>            Assignee: Andrzej Bialecki
>             Fix For: 1.0.0
>
>         Attachments: or.patch
>
>
> There have been many requests from users to extend Nutch query syntax to add support for OR queries, in addition to the implicit AND and NOT queries supported now.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (NUTCH-479) Support for OR queries

Tim Allison (Jira)
In reply to this post by Tim Allison (Jira)

    [ https://issues.apache.org/jira/browse/NUTCH-479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12507473 ]

Doug Cutting commented on NUTCH-479:
------------------------------------

Neither.  It would end up as the Lucene query:

+"search phrase" +category:cat1 category:cat2

where category:cat2 is a non-required clause that just impacts ranking, not the set of documents returned.

As for nested queries, parsing is only half the problem.  The query filter plugins would need to be extended to handle such things, as they presently expect flat queries.

The query "foo bar" currently expands to a Lucene query that looks something like:

+(anchor:foo title:foo content:foo)
+(anchor:bar title:bar content:bar)
anchor:"foo bar"~10
title:"foo bar"~1000
content:"foo bar"~1000

(The latter three boost scores when terms are nearer.  Anchor proximity is limited, to keep from matching anchors from other documents.)

So, how should (foo AND (bar OR baz) expand?  Probably something like:

+(anchor:foo title:foo content:foo)
+((anchor:bar title:bar content:bar)
    (anchor:baz title:baz content:baz))
... proximity boosting clauses?...

And (foo OR (bar AND baz)) might expand to:

(anchor:foo title:foo content:foo)
(+(anchor:bar title:bar content:bar)
 +(anchor:baz title:baz content:baz))
... proximity boosting clauses?...

This expansion is done by the query-basic plugin.


> Support for OR queries
> ----------------------
>
>                 Key: NUTCH-479
>                 URL: https://issues.apache.org/jira/browse/NUTCH-479
>             Project: Nutch
>          Issue Type: Improvement
>          Components: searcher
>    Affects Versions: 1.0.0
>            Reporter: Andrzej Bialecki
>            Assignee: Andrzej Bialecki
>             Fix For: 1.0.0
>
>         Attachments: or.patch
>
>
> There have been many requests from users to extend Nutch query syntax to add support for OR queries, in addition to the implicit AND and NOT queries supported now.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (NUTCH-479) Support for OR queries

Tim Allison (Jira)
In reply to this post by Tim Allison (Jira)

    [ https://issues.apache.org/jira/browse/NUTCH-479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12508479 ]

Rob Young commented on NUTCH-479:
---------------------------------

Hi I've found a bug in this patch. If I search for  title:red OR"title:blue" I would expect it to be expanded to
+title:"red" title:"blue" but in fact it expands to +title:"red" "title:blue" so there is no way to do term specific queries.

> Support for OR queries
> ----------------------
>
>                 Key: NUTCH-479
>                 URL: https://issues.apache.org/jira/browse/NUTCH-479
>             Project: Nutch
>          Issue Type: Improvement
>          Components: searcher
>    Affects Versions: 1.0.0
>            Reporter: Andrzej Bialecki
>            Assignee: Andrzej Bialecki
>             Fix For: 1.0.0
>
>         Attachments: or.patch
>
>
> There have been many requests from users to extend Nutch query syntax to add support for OR queries, in addition to the implicit AND and NOT queries supported now.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Updated: (NUTCH-479) Support for OR queries

Tim Allison (Jira)
In reply to this post by Tim Allison (Jira)

     [ https://issues.apache.org/jira/browse/NUTCH-479?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Rob Young updated NUTCH-479:
----------------------------

    Attachment: or.patch

I've changed the patch slightly to work around the bug I mentioned earlier.
Now the queries look like this
name:"name value" OR name:"other value"
and are expanded to
+name:"name value" name:"other value"


> Support for OR queries
> ----------------------
>
>                 Key: NUTCH-479
>                 URL: https://issues.apache.org/jira/browse/NUTCH-479
>             Project: Nutch
>          Issue Type: Improvement
>          Components: searcher
>    Affects Versions: 1.0.0
>            Reporter: Andrzej Bialecki
>            Assignee: Andrzej Bialecki
>             Fix For: 1.0.0
>
>         Attachments: or.patch, or.patch
>
>
> There have been many requests from users to extend Nutch query syntax to add support for OR queries, in addition to the implicit AND and NOT queries supported now.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (NUTCH-479) Support for OR queries

Tim Allison (Jira)
In reply to this post by Tim Allison (Jira)

    [ https://issues.apache.org/jira/browse/NUTCH-479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12541581 ]

Sebastian Steinmetz commented on NUTCH-479:
-------------------------------------------

I've integrated this patch against 0.9 and it seems to have some problems. I'm not sure, if I should have tested it against trunk.

The problem is the parsing itself. If I enter "foo OR bar" (without qotes) it gets reformed to a Lucene query which reads like: "+foo bar" but it should be "foo bar" i think...
At least is the result, if I change it in debug mode by hand to "foo bar", the pages that I would expect to appear.

I don't have any idea, how to fix this yet. But if I have one, I'll let you know...

> Support for OR queries
> ----------------------
>
>                 Key: NUTCH-479
>                 URL: https://issues.apache.org/jira/browse/NUTCH-479
>             Project: Nutch
>          Issue Type: Improvement
>          Components: searcher
>    Affects Versions: 1.0.0
>            Reporter: Andrzej Bialecki
>            Assignee: Andrzej Bialecki
>             Fix For: 1.0.0
>
>         Attachments: or.patch, or.patch
>
>
> There have been many requests from users to extend Nutch query syntax to add support for OR queries, in addition to the implicit AND and NOT queries supported now.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (NUTCH-479) Support for OR queries

Tim Allison (Jira)
In reply to this post by Tim Allison (Jira)

    [ https://issues.apache.org/jira/browse/NUTCH-479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12671140#action_12671140 ]

Andrzej Bialecki  commented on NUTCH-479:
-----------------------------------------

The current patch is not sufficient to solve the issue - postponing to 1.1.

> Support for OR queries
> ----------------------
>
>                 Key: NUTCH-479
>                 URL: https://issues.apache.org/jira/browse/NUTCH-479
>             Project: Nutch
>          Issue Type: Improvement
>          Components: searcher
>    Affects Versions: 1.0.0
>            Reporter: Andrzej Bialecki
>            Assignee: Andrzej Bialecki
>             Fix For: 1.1
>
>         Attachments: or.patch, or.patch
>
>
> There have been many requests from users to extend Nutch query syntax to add support for OR queries, in addition to the implicit AND and NOT queries supported now.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Updated: (NUTCH-479) Support for OR queries

Tim Allison (Jira)
In reply to this post by Tim Allison (Jira)

     [ https://issues.apache.org/jira/browse/NUTCH-479?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Andrzej Bialecki  updated NUTCH-479:
------------------------------------

    Fix Version/s:     (was: 1.0.0)
                   1.1

> Support for OR queries
> ----------------------
>
>                 Key: NUTCH-479
>                 URL: https://issues.apache.org/jira/browse/NUTCH-479
>             Project: Nutch
>          Issue Type: Improvement
>          Components: searcher
>    Affects Versions: 1.0.0
>            Reporter: Andrzej Bialecki
>            Assignee: Andrzej Bialecki
>             Fix For: 1.1
>
>         Attachments: or.patch, or.patch
>
>
> There have been many requests from users to extend Nutch query syntax to add support for OR queries, in addition to the implicit AND and NOT queries supported now.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Updated: (NUTCH-479) Support for OR queries

Tim Allison (Jira)
In reply to this post by Tim Allison (Jira)

     [ https://issues.apache.org/jira/browse/NUTCH-479?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Robert Buccigrossi updated NUTCH-479:
-------------------------------------

    Attachment: nutch_0.9_OR.patch

The problem with the 2007 patch is that it translates:

  brain apples OR oranges
to
 +brain +apples oranges

The included patch translates the phrase to

 +brain apples oranges

(specifically it makes the previous clause non-mandatory, which I believe is the preferred behavior).

I hope this helps.

Robert Buccigrossi

(I also blogged on it at http://blog.tcg.com/tcg/2009/03/adding-the-boolean-operator-or-to-nutch.html )

> Support for OR queries
> ----------------------
>
>                 Key: NUTCH-479
>                 URL: https://issues.apache.org/jira/browse/NUTCH-479
>             Project: Nutch
>          Issue Type: Improvement
>          Components: searcher
>    Affects Versions: 1.0.0
>            Reporter: Andrzej Bialecki
>            Assignee: Andrzej Bialecki
>             Fix For: 1.1
>
>         Attachments: nutch_0.9_OR.patch, or.patch, or.patch
>
>
> There have been many requests from users to extend Nutch query syntax to add support for OR queries, in addition to the implicit AND and NOT queries supported now.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Updated: (NUTCH-479) Support for OR queries

Tim Allison (Jira)
In reply to this post by Tim Allison (Jira)

     [ https://issues.apache.org/jira/browse/NUTCH-479?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Chris A. Mattmann updated NUTCH-479:
------------------------------------

       Patch Info: [Patch Available]
    Fix Version/s:     (was: 1.1)

- pushing this out per http://bit.ly/c7tBv9

> Support for OR queries
> ----------------------
>
>                 Key: NUTCH-479
>                 URL: https://issues.apache.org/jira/browse/NUTCH-479
>             Project: Nutch
>          Issue Type: Improvement
>          Components: searcher
>    Affects Versions: 1.0.0
>            Reporter: Andrzej Bialecki
>            Assignee: Andrzej Bialecki
>         Attachments: nutch_0.9_OR.patch, or.patch, or.patch
>
>
> There have been many requests from users to extend Nutch query syntax to add support for OR queries, in addition to the implicit AND and NOT queries supported now.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.