Solr search word NOT followed by another word

classic Classic list List threaded Threaded
19 messages Options
Reply | Threaded
Open this post in threaded view
|

Solr search word NOT followed by another word

ivan
What i'm trying to do is to only get results for "Leonardo" when is not
followed by "da vinci".
So any result containing "Leonardo" (not followed by "da vinci") is fine
even if i have "Leonardo da vinci" in the result. I want to filter out only
the results where i don't have "Leonardo" without "da vinci".

Examples:
"Leonardo abc abc abc"   OK
"Leonardo da vinci abab"  KO
"Leonardo is the name of Leonardo da Vinci"  OK


I can't seem to find any way to do that using solr queries. I can't use
regex (i have a tokenized text field) and any combination of boolean logic
doesn't seem to work.

Any help?
Thanks




--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
Reply | Threaded
Open this post in threaded view
|

RE: Solr search word NOT followed by another word

Allison, Timothy B.
That requires a SpanNotQuery.  AFAIK, there is no way to do this with the current parsers included in Solr.

My SpanQueryParser does cover this, and I'm hoping to port it to 7.x today or tomorrow.

Syntax would be "Leonardo [da vinci]"!~0,1

https://issues.apache.org/jira/browse/LUCENE-5205

https://github.com/tballison/lucene-addons/tree/master/lucene-5205

https://mvnrepository.com/artifact/org.tallison.lucene/lucene-5205

With Solr wrapper: https://issues.apache.org/jira/browse/SOLR-5410


-----Original Message-----
From: ivan [mailto:[hidden email]]
Sent: Monday, February 12, 2018 6:00 AM
To: [hidden email]
Subject: Solr search word NOT followed by another word

What i'm trying to do is to only get results for "Leonardo" when is not followed by "da vinci".
So any result containing "Leonardo" (not followed by "da vinci") is fine even if i have "Leonardo da vinci" in the result. I want to filter out only the results where i don't have "Leonardo" without "da vinci".

Examples:
"Leonardo abc abc abc"   OK
"Leonardo da vinci abab"  KO
"Leonardo is the name of Leonardo da Vinci"  OK


I can't seem to find any way to do that using solr queries. I can't use regex (i have a tokenized text field) and any combination of boolean logic doesn't seem to work.

Any help?
Thanks




--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
Reply | Threaded
Open this post in threaded view
|

Re: Solr search word NOT followed by another word

simon-2
Tim:

How up to date is the Solr-5410  patch/zip in JIRA ?.  Looking to use the
Span Query parser in 6.5.1, migrating to 7.x sometime soon.

Would love to see these committed !

-Simon

On Mon, Feb 12, 2018 at 10:41 AM, Allison, Timothy B. <[hidden email]>
wrote:

> That requires a SpanNotQuery.  AFAIK, there is no way to do this with the
> current parsers included in Solr.
>
> My SpanQueryParser does cover this, and I'm hoping to port it to 7.x today
> or tomorrow.
>
> Syntax would be "Leonardo [da vinci]"!~0,1
>
> https://issues.apache.org/jira/browse/LUCENE-5205
>
> https://github.com/tballison/lucene-addons/tree/master/lucene-5205
>
> https://mvnrepository.com/artifact/org.tallison.lucene/lucene-5205
>
> With Solr wrapper: https://issues.apache.org/jira/browse/SOLR-5410
>
>
> -----Original Message-----
> From: ivan [mailto:[hidden email]]
> Sent: Monday, February 12, 2018 6:00 AM
> To: [hidden email]
> Subject: Solr search word NOT followed by another word
>
> What i'm trying to do is to only get results for "Leonardo" when is not
> followed by "da vinci".
> So any result containing "Leonardo" (not followed by "da vinci") is fine
> even if i have "Leonardo da vinci" in the result. I want to filter out only
> the results where i don't have "Leonardo" without "da vinci".
>
> Examples:
> "Leonardo abc abc abc"   OK
> "Leonardo da vinci abab"  KO
> "Leonardo is the name of Leonardo da Vinci"  OK
>
>
> I can't seem to find any way to do that using solr queries. I can't use
> regex (i have a tokenized text field) and any combination of boolean logic
> doesn't seem to work.
>
> Any help?
> Thanks
>
>
>
>
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>
Reply | Threaded
Open this post in threaded view
|

RE: Solr search word NOT followed by another word

ivan
In reply to this post by Allison, Timothy B.
That looks great!
Not sure how to install that into my version of Solr though (using 6.4.1)



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
Reply | Threaded
Open this post in threaded view
|

Re: Solr search word NOT followed by another word

Emir Arnautović
In reply to this post by ivan
Hi Ivan,
You might be able to use complexphrase query parser to get what you need, you can test something like this:

{!complexphrase df=my_field}”Leonardo -(da Vinci)”

This should return any Leonardo that is not followed by da Vinci.

HTH,
Emir
--
Monitoring - Log Management - Alerting - Anomaly Detection
Solr & Elasticsearch Consulting Support Training - http://sematext.com/



> On 12 Feb 2018, at 12:00, ivan <[hidden email]> wrote:
>
> What i'm trying to do is to only get results for "Leonardo" when is not
> followed by "da vinci".
> So any result containing "Leonardo" (not followed by "da vinci") is fine
> even if i have "Leonardo da vinci" in the result. I want to filter out only
> the results where i don't have "Leonardo" without "da vinci".
>
> Examples:
> "Leonardo abc abc abc"   OK
> "Leonardo da vinci abab"  KO
> "Leonardo is the name of Leonardo da Vinci"  OK
>
>
> I can't seem to find any way to do that using solr queries. I can't use
> regex (i have a tokenized text field) and any combination of boolean logic
> doesn't seem to work.
>
> Any help?
> Thanks
>
>
>
>
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Reply | Threaded
Open this post in threaded view
|

Re: Solr search word NOT followed by another word

ivan
Hi Emir,

unfortunately that does not work, since i'm not getting a match for my third
example ("Leonardo is the name of Leonardo da Vinci") because i have both
"Leonardo" and "Leonardo da Vinci" in the same field. I'm fine with having
"Leonardo da Vinci" as long as i have another "Leonardo" (NOT followed by da
Vinci).



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
Reply | Threaded
Open this post in threaded view
|

Re: Solr search word NOT followed by another word

Emir Arnautović
Hi Ivan,
Which version of Solr do you use? I’ve just tried it on 6.5.1 and it returned expected.

Emir
--
Monitoring - Log Management - Alerting - Anomaly Detection
Solr & Elasticsearch Consulting Support Training - http://sematext.com/



> On 13 Feb 2018, at 16:08, ivan <[hidden email]> wrote:
>
> Hi Emir,
>
> unfortunately that does not work, since i'm not getting a match for my third
> example ("Leonardo is the name of Leonardo da Vinci") because i have both
> "Leonardo" and "Leonardo da Vinci" in the same field. I'm fine with having
> "Leonardo da Vinci" as long as i have another "Leonardo" (NOT followed by da
> Vinci).
>
>
>
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Reply | Threaded
Open this post in threaded view
|

Re: Solr search word NOT followed by another word

ivan
I'm working on 6.4.1 (but i tried on 7.2.1 too) and i'm not getting results
for the case i've shown before.



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
Reply | Threaded
Open this post in threaded view
|

RE: Solr search word NOT followed by another word

ivan
In reply to this post by Allison, Timothy B.
Hi Timothy,

i'm trying to use your Parser, but i'm having some trouble with the versions
of solr\lucene.
I'm trying to use version 6.4.1 but i'm facing a lot of incompatibilities
with version 5. Is there any updated version of the plugin?




--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
Reply | Threaded
Open this post in threaded view
|

RE: Solr search word NOT followed by another word

Allison, Timothy B.
In reply to this post by simon-2
In process, should finish by end of this week.  I had to put SlowFuzzyQuery back in, and I discovered SOLR-11976 while trying to upgrade.  I'll have to do a workaround until that is fixed.

-----Original Message-----
From: simon [mailto:[hidden email]]
Sent: Monday, February 12, 2018 1:21 PM
To: solr-user <[hidden email]>
Subject: Re: Solr search word NOT followed by another word

Tim:

How up to date is the Solr-5410  patch/zip in JIRA ?.  Looking to use the Span Query parser in 6.5.1, migrating to 7.x sometime soon.

Would love to see these committed !

-Simon

On Mon, Feb 12, 2018 at 10:41 AM, Allison, Timothy B. <[hidden email]>
wrote:

> That requires a SpanNotQuery.  AFAIK, there is no way to do this with
> the current parsers included in Solr.
>
> My SpanQueryParser does cover this, and I'm hoping to port it to 7.x
> today or tomorrow.
>
> Syntax would be "Leonardo [da vinci]"!~0,1
>
> https://issues.apache.org/jira/browse/LUCENE-5205
>
> https://github.com/tballison/lucene-addons/tree/master/lucene-5205
>
> https://mvnrepository.com/artifact/org.tallison.lucene/lucene-5205
>
> With Solr wrapper: https://issues.apache.org/jira/browse/SOLR-5410
>
>
> -----Original Message-----
> From: ivan [mailto:[hidden email]]
> Sent: Monday, February 12, 2018 6:00 AM
> To: [hidden email]
> Subject: Solr search word NOT followed by another word
>
> What i'm trying to do is to only get results for "Leonardo" when is
> not followed by "da vinci".
> So any result containing "Leonardo" (not followed by "da vinci") is
> fine even if i have "Leonardo da vinci" in the result. I want to
> filter out only the results where i don't have "Leonardo" without "da vinci".
>
> Examples:
> "Leonardo abc abc abc"   OK
> "Leonardo da vinci abab"  KO
> "Leonardo is the name of Leonardo da Vinci"  OK
>
>
> I can't seem to find any way to do that using solr queries. I can't
> use regex (i have a tokenized text field) and any combination of
> boolean logic doesn't seem to work.
>
> Any help?
> Thanks
>
>
>
>
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>
Reply | Threaded
Open this post in threaded view
|

RE: Solr search word NOT followed by another word

Allison, Timothy B.
In reply to this post by Emir Arnautović
I've been away from the ComplexQueryParser for a while, and I was wrong when I said in my earlier email that no currently included Solr parse generates a SpanNotQuery.  

You're right, Emir, that the ComplexQueryParser does generate a SpanNotQuery, and, y, I just tried this with 7.2.1, and it retrieves "Leonardo is the name of Leonardo da Vinci".

However, if fails to retrieve :
a) "Leonardo da is the name of Leonardo da Vinci"
and
b) "Leonardo Vinci is the name of Leonardo da Vinci"

because the SpanNot exclude is a SpanOr ("da" or "vinci") after the rewrite:

spanNot(name:leonardo, spanNear([name:leonardo, spanOr([name:da, name:vinci])], 0, true), 0, 0)







-----Original Message-----
From: Emir Arnautović [mailto:[hidden email]]
Sent: Tuesday, February 13, 2018 11:23 AM
To: [hidden email]
Subject: Re: Solr search word NOT followed by another word

Hi Ivan,
Which version of Solr do you use? I’ve just tried it on 6.5.1 and it returned expected.

Emir
--
Monitoring - Log Management - Alerting - Anomaly Detection Solr & Elasticsearch Consulting Support Training - http://sematext.com/



> On 13 Feb 2018, at 16:08, ivan <[hidden email]> wrote:
>
> Hi Emir,
>
> unfortunately that does not work, since i'm not getting a match for my
> third example ("Leonardo is the name of Leonardo da Vinci") because i
> have both "Leonardo" and "Leonardo da Vinci" in the same field. I'm
> fine with having "Leonardo da Vinci" as long as i have another
> "Leonardo" (NOT followed by da Vinci).
>
>
>
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Reply | Threaded
Open this post in threaded view
|

RE: Solr search word NOT followed by another word

Allison, Timothy B.
In reply to this post by ivan
I just updated the SpanQueryParser (LUCENE-5205) and its Solr plugin (SOLR-5410) for master and 7.2.1.

What version of Solr are you using and which version of the plugin?

These should be available on maven central shortly: version 7.2-0.1
<dependency>
    <groupId>org.tallison.solr</groupId>
    <artifactId>solr-5410</artifactId>
    <version>7.2-0.1</version>
</dependency>

Or you can fork: https://github.com/tballison/lucene-addons/tree/7.2-0.1


-----Original Message-----
From: ivan [mailto:[hidden email]]
Sent: Wednesday, February 14, 2018 6:42 AM
To: [hidden email]
Subject: RE: Solr search word NOT followed by another word

Hi Timothy,

i'm trying to use your Parser, but i'm having some trouble with the versions of solr\lucene.
I'm trying to use version 6.4.1 but i'm facing a lot of incompatibilities with version 5. Is there any updated version of the plugin?




--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
Reply | Threaded
Open this post in threaded view
|

Re: Solr search word NOT followed by another word

Emir Arnautović
In reply to this post by Allison, Timothy B.
Hi,
I did not provide the right query. If you query as {!complexphrase df=name}”Leonardo -da -Vinci” all works as expected. This matches all three doc.

HTH,
Emir
--
Monitoring - Log Management - Alerting - Anomaly Detection
Solr & Elasticsearch Consulting Support Training - http://sematext.com/



> On 15 Feb 2018, at 19:51, Allison, Timothy B. <[hidden email]> wrote:
>
> I've been away from the ComplexQueryParser for a while, and I was wrong when I said in my earlier email that no currently included Solr parse generates a SpanNotQuery.  
>
> You're right, Emir, that the ComplexQueryParser does generate a SpanNotQuery, and, y, I just tried this with 7.2.1, and it retrieves "Leonardo is the name of Leonardo da Vinci".
>
> However, if fails to retrieve :
> a) "Leonardo da is the name of Leonardo da Vinci"
> and
> b) "Leonardo Vinci is the name of Leonardo da Vinci"
>
> because the SpanNot exclude is a SpanOr ("da" or "vinci") after the rewrite:
>
> spanNot(name:leonardo, spanNear([name:leonardo, spanOr([name:da, name:vinci])], 0, true), 0, 0)
>
>
>
>
>
>
>
> -----Original Message-----
> From: Emir Arnautović [mailto:[hidden email]]
> Sent: Tuesday, February 13, 2018 11:23 AM
> To: [hidden email]
> Subject: Re: Solr search word NOT followed by another word
>
> Hi Ivan,
> Which version of Solr do you use? I’ve just tried it on 6.5.1 and it returned expected.
>
> Emir
> --
> Monitoring - Log Management - Alerting - Anomaly Detection Solr & Elasticsearch Consulting Support Training - http://sematext.com/
>
>
>
>> On 13 Feb 2018, at 16:08, ivan <[hidden email]> wrote:
>>
>> Hi Emir,
>>
>> unfortunately that does not work, since i'm not getting a match for my
>> third example ("Leonardo is the name of Leonardo da Vinci") because i
>> have both "Leonardo" and "Leonardo da Vinci" in the same field. I'm
>> fine with having "Leonardo da Vinci" as long as i have another
>> "Leonardo" (NOT followed by da Vinci).
>>
>>
>>
>> --
>> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>

Reply | Threaded
Open this post in threaded view
|

RE: Solr search word NOT followed by another word

Allison, Timothy B.
Nice.  Thank you!

-----Original Message-----
From: Emir Arnautović [mailto:[hidden email]]
Sent: Thursday, February 15, 2018 2:19 PM
To: [hidden email]
Subject: Re: Solr search word NOT followed by another word

Hi,
I did not provide the right query. If you query as {!complexphrase df=name}”Leonardo -da -Vinci” all works as expected. This matches all three doc.

HTH,
Emir
--
Monitoring - Log Management - Alerting - Anomaly Detection Solr & Elasticsearch Consulting Support Training - http://sematext.com/
Reply | Threaded
Open this post in threaded view
|

RE: Solr search word NOT followed by another word

ivan
In reply to this post by Allison, Timothy B.
I'm using solr 6.4.1, i will try your updated version and let you know,
thanks!



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
Reply | Threaded
Open this post in threaded view
|

Re: Solr search word NOT followed by another word

ivan
In reply to this post by Emir Arnautović
That still does not work for me.
I'm not getting results for "Leonardo da vinci bla bla Leonardo" or
"Leonardo 1 da vinci bla bla Leonardo".

Tried on both solr 6.4.1 and solr 7.2.1



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
Reply | Threaded
Open this post in threaded view
|

Re: Solr search word NOT followed by another word

Emir Arnautović
Hi Ivan,
Can you share config for that field. It seems to me that it might be caused by your analysis chain. Does queries “Leonardo 1” or “bla Leonardo” result in matches?

Emir
--
Monitoring - Log Management - Alerting - Anomaly Detection
Solr & Elasticsearch Consulting Support Training - http://sematext.com/



> On 16 Feb 2018, at 10:12, ivan <[hidden email]> wrote:
>
> That still does not work for me.
> I'm not getting results for "Leonardo da vinci bla bla Leonardo" or
> "Leonardo 1 da vinci bla bla Leonardo".
>
> Tried on both solr 6.4.1 and solr 7.2.1
>
>
>
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Reply | Threaded
Open this post in threaded view
|

Re: Solr search word NOT followed by another word

ivan
Hi Emir,

i'm testing these on the examples that comes with solr (techproducts), i
just added some docs to that.
Both those queries give the expected results.
I'm testing on a TextField (indexed, tokenized, stored).



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
Reply | Threaded
Open this post in threaded view
|

Re: Solr search word NOT followed by another word

Emir Arnautović
Hi Ivan
Can you share response from debug query.

Here is what I got:

{
  "responseHeader":{
    "status":0,
    "QTime":8,
    "params":{
      "q":"{!complexphrase df=content}\"leonardo -da -Vinci\"",
      "indent":"on",
      "fl":"content",
      "wt":"json",
      "debugQuery":"on"}},
  "response":{"numFound":3,"start":0,"docs":[
      {
        "content":"Leonardo is the name of Leonardo da Vinci"},
      {
        "content":"Leonardo da is the name of Leonardo da Vinci"},
      {
        "content":"Leonardo Vinci is the name of Leonardo da Vinci"}]
  },
  "debug":{
    "rawquerystring":"{!complexphrase df=content}\"leonardo -da -Vinci\"",
    "querystring":"{!complexphrase df=content}\"leonardo -da -Vinci\"",
    "parsedquery":"ComplexPhraseQuery(\"leonardo -da -Vinci\")",
    "parsedquery_toString":"\"leonardo -da -Vinci\"",
    "explain":{
      “test”:"\n22.708572 = weight(spanNot(content:leonardo, spanNear([content:leonardo, content:da, content:vinci], 0, true), 0, 0) in 0) ...

Thanks,
Emir
--
Monitoring - Log Management - Alerting - Anomaly Detection
Solr & Elasticsearch Consulting Support Training - http://sematext.com/



> On 16 Feb 2018, at 10:51, ivan <[hidden email]> wrote:
>
> Hi Emir,
>
> i'm testing these on the examples that comes with solr (techproducts), i
> just added some docs to that.
> Both those queries give the expected results.
> I'm testing on a TextField (indexed, tokenized, stored).
>
>
>
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html