Issue with highlighter

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

Issue with highlighter

Ali Husain
Hi,


I think I've found a bug with the highlighter. I search for the word "something" and I get an empty highlighting response for all the documents that are returned shown below. The fields that I am searching over are text_en, the highlighter works for a lot of queries. I have no stopwords.txt list that could be messing this up either.


 "highlighting":{
    "310":{},
    "103":{},
    "406":{},
    "1189":{},
    "54":{},
    "292":{},
    "309":{}}}


Just changing the search term to "something like" I get back this:


"highlighting":{
    "310":{},
    "309":{
      "content":["1949 Convention, <em>like</em> those"]},
    "103":{},
    "406":{},
    "1189":{},
    "54":{},
    "292":{},
    "286":{
      "content":["persons in these classes are treated <em>like</em> combatants, but in other respects"]},
    "336":{
      "content":["   be treated <em>like</em> engagement"]}}}


So I know that I have it setup correctly, but I can't figure this out. I've searched through JIRA/Google and haven't been able to find a similar issue.


Any ideas?


Thanks,

Ali
Reply | Threaded
Open this post in threaded view
|

Re: Issue with highlighter

Erick Erickson
If the default operator is OR, then you're just matching on the "like"
word and it's being properly highlighted. If you're saying that doc
286 (or whatever) has both "something" and "like" in the content and
you expect to find them both, try increasing the number of snippets
returned.

Otherwise we need to see the _complete_ query and response, preferably
with &debug=true. Plus your schema, plus a sample document and exactly
what you think should be happening that isn't.

Best,
Erick

On Wed, Jun 14, 2017 at 4:11 PM, Ali Husain <[hidden email]> wrote:

> Hi,
>
>
> I think I've found a bug with the highlighter. I search for the word "something" and I get an empty highlighting response for all the documents that are returned shown below. The fields that I am searching over are text_en, the highlighter works for a lot of queries. I have no stopwords.txt list that could be messing this up either.
>
>
>  "highlighting":{
>     "310":{},
>     "103":{},
>     "406":{},
>     "1189":{},
>     "54":{},
>     "292":{},
>     "309":{}}}
>
>
> Just changing the search term to "something like" I get back this:
>
>
> "highlighting":{
>     "310":{},
>     "309":{
>       "content":["1949 Convention, <em>like</em> those"]},
>     "103":{},
>     "406":{},
>     "1189":{},
>     "54":{},
>     "292":{},
>     "286":{
>       "content":["persons in these classes are treated <em>like</em> combatants, but in other respects"]},
>     "336":{
>       "content":["   be treated <em>like</em> engagement"]}}}
>
>
> So I know that I have it setup correctly, but I can't figure this out. I've searched through JIRA/Google and haven't been able to find a similar issue.
>
>
> Any ideas?
>
>
> Thanks,
>
> Ali
Reply | Threaded
Open this post in threaded view
|

RE: Issue with highlighter

Phil Scadden
In reply to this post by Ali Husain
Just had similar issue - works for some, not others. First thing to look at is hl.maxAnalyzedChars is the query. The default is quite small.
Since many of my documents are large PDF files, I opted to use storeOffsetsWithPositions="true" termVectors="true" on the field I was searching on.
This certainly did increase my index size but not too bad and certainly fast.
https://cwiki.apache.org/confluence/display/solr/Highlighting

Beware of NOT plus OR in a search. That will certainly produce no highlights. (eg test -results when default op is OR)


-----Original Message-----
From: Ali Husain [mailto:[hidden email]]
Sent: Thursday, 15 June 2017 11:11 a.m.
To: [hidden email]
Subject: Issue with highlighter

Hi,


I think I've found a bug with the highlighter. I search for the word "something" and I get an empty highlighting response for all the documents that are returned shown below. The fields that I am searching over are text_en, the highlighter works for a lot of queries. I have no stopwords.txt list that could be messing this up either.


 "highlighting":{
    "310":{},
    "103":{},
    "406":{},
    "1189":{},
    "54":{},
    "292":{},
    "309":{}}}


Just changing the search term to "something like" I get back this:


"highlighting":{
    "310":{},
    "309":{
      "content":["1949 Convention, <em>like</em> those"]},
    "103":{},
    "406":{},
    "1189":{},
    "54":{},
    "292":{},
    "286":{
      "content":["persons in these classes are treated <em>like</em> combatants, but in other respects"]},
    "336":{
      "content":["   be treated <em>like</em> engagement"]}}}


So I know that I have it setup correctly, but I can't figure this out. I've searched through JIRA/Google and haven't been able to find a similar issue.


Any ideas?


Thanks,

Ali
Notice: This email and any attachments are confidential and may not be used, published or redistributed without the prior written consent of the Institute of Geological and Nuclear Sciences Limited (GNS Science). If received in error please destroy and immediately notify GNS Science. Do not copy or disclose the contents.
Reply | Threaded
Open this post in threaded view
|

Re: Issue with highlighter

david.w.smiley@gmail.com
> Beware of NOT plus OR in a search. That will certainly produce no
highlights. (eg test -results when default op is OR)

Seems like a bug to me; the default operator shouldn't matter in that case
I think since there is only one clause that has no BooleanQuery.Occur
operator and thus the OR/AND shouldn't matter.  The end effect is "test" is
effectively required and should definitely be highlighted.

Note to Ali: Phil's comment implies use of hl.method=unified which is not
the default.

On Wed, Jun 14, 2017 at 10:22 PM Phil Scadden <[hidden email]> wrote:

> Just had similar issue - works for some, not others. First thing to look
> at is hl.maxAnalyzedChars is the query. The default is quite small.
> Since many of my documents are large PDF files, I opted to use
> storeOffsetsWithPositions="true" termVectors="true" on the field I was
> searching on.
> This certainly did increase my index size but not too bad and certainly
> fast.
> https://cwiki.apache.org/confluence/display/solr/Highlighting
>
> Beware of NOT plus OR in a search. That will certainly produce no
> highlights. (eg test -results when default op is OR)
>
>
> -----Original Message-----
> From: Ali Husain [mailto:[hidden email]]
> Sent: Thursday, 15 June 2017 11:11 a.m.
> To: [hidden email]
> Subject: Issue with highlighter
>
> Hi,
>
>
> I think I've found a bug with the highlighter. I search for the word
> "something" and I get an empty highlighting response for all the documents
> that are returned shown below. The fields that I am searching over are
> text_en, the highlighter works for a lot of queries. I have no
> stopwords.txt list that could be messing this up either.
>
>
>  "highlighting":{
>     "310":{},
>     "103":{},
>     "406":{},
>     "1189":{},
>     "54":{},
>     "292":{},
>     "309":{}}}
>
>
> Just changing the search term to "something like" I get back this:
>
>
> "highlighting":{
>     "310":{},
>     "309":{
>       "content":["1949 Convention, <em>like</em> those"]},
>     "103":{},
>     "406":{},
>     "1189":{},
>     "54":{},
>     "292":{},
>     "286":{
>       "content":["persons in these classes are treated <em>like</em>
> combatants, but in other respects"]},
>     "336":{
>       "content":["   be treated <em>like</em> engagement"]}}}
>
>
> So I know that I have it setup correctly, but I can't figure this out.
> I've searched through JIRA/Google and haven't been able to find a similar
> issue.
>
>
> Any ideas?
>
>
> Thanks,
>
> Ali
> Notice: This email and any attachments are confidential and may not be
> used, published or redistributed without the prior written consent of the
> Institute of Geological and Nuclear Sciences Limited (GNS Science). If
> received in error please destroy and immediately notify GNS Science. Do not
> copy or disclose the contents.
>
--
Lucene/Solr Search Committer, Consultant, Developer, Author, Speaker
LinkedIn: http://linkedin.com/in/davidwsmiley | Book:
http://www.solrenterprisesearchserver.com
Reply | Threaded
Open this post in threaded view
|

Re: Issue with highlighter

Ali Husain
Thanks for the replies. Let me try and explain this a little better.


I haven't modified anything in solrconfig. All I did was get a fresh instance of solr 6.4.1 and create a core testHighlight. I then created a content field of type text_en via the Solr Admin UI. id was already there, and that is of type string.


I then use the UI, once again to check the hl checkbox, hl.fl is set to * because I want any and every match.


I push the following content into this new solr instance:

id:91101

content:'I am adding something to the core field and we will try and find it. We want to make sure the highlighter works!

This is short so fragsize and max characters shouldn\'t be an issue.'

As you can see, very few characters, fragsize, maxAnalyzedChars, all that should not be an issue.


I then send this query:

http://localhost:8983/solr/testHighlight/select?hl.fl=*&hl=on&indent=on&q=something&wt=json


My results:


"response":{"numFound":1,"start":0,"docs":[

{"id":"91101",

        "content":"I am adding something to the core field and we will try and find it. We want to make sure the highlighter works! This is short so fragsize and max characters shouldn't be an issue.",
        "_version_":1570302668841156608}]


},


"highlighting":{
    "91101":{}}


I change q to be core instead of something.


http://localhost:8983/solr/testHighlight/select?hl.fl=*&hl=on&indent=on&q=core&wt=json


{
        "id":"91101",
        "content":"I am adding something to the core field and we will try and find it. We want to make sure the highlighter works! This is short so fragsize and max characters shouldn't be an issue.",
        "_version_":1570302668841156608},



"highlighting":{
    "91101":{
      "content":["I am adding something to the <em>core</em> field and we will try and find it. We want to make sure"]}}

I've tried a bunch of queries. 'adding', 'something' both don't return any highlights. 'core' 'am' 'field' all work.

Am I doing a better job of explaining this? Quite puzzling why this would be happening. My guess is there is some file/config somewhere that is ignoring some words? It isn't stopwords.txt in my case though. If that isn't the case then it definitely seems like a bug to me.

Thanks, Ali


________________________________
From: David Smiley <[hidden email]>
Sent: Thursday, June 15, 2017 12:33:39 AM
To: [hidden email]
Subject: Re: Issue with highlighter

> Beware of NOT plus OR in a search. That will certainly produce no
highlights. (eg test -results when default op is OR)

Seems like a bug to me; the default operator shouldn't matter in that case
I think since there is only one clause that has no BooleanQuery.Occur
operator and thus the OR/AND shouldn't matter.  The end effect is "test" is
effectively required and should definitely be highlighted.

Note to Ali: Phil's comment implies use of hl.method=unified which is not
the default.

On Wed, Jun 14, 2017 at 10:22 PM Phil Scadden <[hidden email]> wrote:

> Just had similar issue - works for some, not others. First thing to look
> at is hl.maxAnalyzedChars is the query. The default is quite small.
> Since many of my documents are large PDF files, I opted to use
> storeOffsetsWithPositions="true" termVectors="true" on the field I was
> searching on.
> This certainly did increase my index size but not too bad and certainly
> fast.
> https://cwiki.apache.org/confluence/display/solr/Highlighting
>
> Beware of NOT plus OR in a search. That will certainly produce no
> highlights. (eg test -results when default op is OR)
>
>
> -----Original Message-----
> From: Ali Husain [mailto:[hidden email]]
> Sent: Thursday, 15 June 2017 11:11 a.m.
> To: [hidden email]
> Subject: Issue with highlighter
>
> Hi,
>
>
> I think I've found a bug with the highlighter. I search for the word
> "something" and I get an empty highlighting response for all the documents
> that are returned shown below. The fields that I am searching over are
> text_en, the highlighter works for a lot of queries. I have no
> stopwords.txt list that could be messing this up either.
>
>
>  "highlighting":{
>     "310":{},
>     "103":{},
>     "406":{},
>     "1189":{},
>     "54":{},
>     "292":{},
>     "309":{}}}
>
>
> Just changing the search term to "something like" I get back this:
>
>
> "highlighting":{
>     "310":{},
>     "309":{
>       "content":["1949 Convention, <em>like</em> those"]},
>     "103":{},
>     "406":{},
>     "1189":{},
>     "54":{},
>     "292":{},
>     "286":{
>       "content":["persons in these classes are treated <em>like</em>
> combatants, but in other respects"]},
>     "336":{
>       "content":["   be treated <em>like</em> engagement"]}}}
>
>
> So I know that I have it setup correctly, but I can't figure this out.
> I've searched through JIRA/Google and haven't been able to find a similar
> issue.
>
>
> Any ideas?
>
>
> Thanks,
>
> Ali
> Notice: This email and any attachments are confidential and may not be
> used, published or redistributed without the prior written consent of the
> Institute of Geological and Nuclear Sciences Limited (GNS Science). If
> received in error please destroy and immediately notify GNS Science. Do not
> copy or disclose the contents.
>
--
Lucene/Solr Search Committer, Consultant, Developer, Author, Speaker
LinkedIn: http://linkedin.com/in/davidwsmiley | Book:
http://www.solrenterprisesearchserver.com
Reply | Threaded
Open this post in threaded view
|

Re: Issue with highlighter

Damien Kamerman
Ali, does adding a 'hl.q' param help?  q=something&hl.q=something&...

On 16 June 2017 at 06:21, Ali Husain <[hidden email]> wrote:

> Thanks for the replies. Let me try and explain this a little better.
>
>
> I haven't modified anything in solrconfig. All I did was get a fresh
> instance of solr 6.4.1 and create a core testHighlight. I then created a
> content field of type text_en via the Solr Admin UI. id was already there,
> and that is of type string.
>
>
> I then use the UI, once again to check the hl checkbox, hl.fl is set to *
> because I want any and every match.
>
>
> I push the following content into this new solr instance:
>
> id:91101
>
> content:'I am adding something to the core field and we will try and find
> it. We want to make sure the highlighter works!
>
> This is short so fragsize and max characters shouldn\'t be an issue.'
>
> As you can see, very few characters, fragsize, maxAnalyzedChars, all that
> should not be an issue.
>
>
> I then send this query:
>
> http://localhost:8983/solr/testHighlight/select?hl.fl=*&
> hl=on&indent=on&q=something&wt=json
>
>
> My results:
>
>
> "response":{"numFound":1,"start":0,"docs":[
>
> {"id":"91101",
>
>         "content":"I am adding something to the core field and we will try
> and find it. We want to make sure the highlighter works! This is short so
> fragsize and max characters shouldn't be an issue.",
>         "_version_":1570302668841156608}]
>
>
> },
>
>
> "highlighting":{
>     "91101":{}}
>
>
> I change q to be core instead of something.
>
>
> http://localhost:8983/solr/testHighlight/select?hl.fl=*&
> hl=on&indent=on&q=core&wt=json
>
>
> {
>         "id":"91101",
>         "content":"I am adding something to the core field and we will try
> and find it. We want to make sure the highlighter works! This is short so
> fragsize and max characters shouldn't be an issue.",
>         "_version_":1570302668841156608},
>
>
>
> "highlighting":{
>     "91101":{
>       "content":["I am adding something to the <em>core</em> field and we
> will try and find it. We want to make sure"]}}
>
> I've tried a bunch of queries. 'adding', 'something' both don't return any
> highlights. 'core' 'am' 'field' all work.
>
> Am I doing a better job of explaining this? Quite puzzling why this would
> be happening. My guess is there is some file/config somewhere that is
> ignoring some words? It isn't stopwords.txt in my case though. If that
> isn't the case then it definitely seems like a bug to me.
>
> Thanks, Ali
>
>
> ________________________________
> From: David Smiley <[hidden email]>
> Sent: Thursday, June 15, 2017 12:33:39 AM
> To: [hidden email]
> Subject: Re: Issue with highlighter
>
> > Beware of NOT plus OR in a search. That will certainly produce no
> highlights. (eg test -results when default op is OR)
>
> Seems like a bug to me; the default operator shouldn't matter in that case
> I think since there is only one clause that has no BooleanQuery.Occur
> operator and thus the OR/AND shouldn't matter.  The end effect is "test" is
> effectively required and should definitely be highlighted.
>
> Note to Ali: Phil's comment implies use of hl.method=unified which is not
> the default.
>
> On Wed, Jun 14, 2017 at 10:22 PM Phil Scadden <[hidden email]>
> wrote:
>
> > Just had similar issue - works for some, not others. First thing to look
> > at is hl.maxAnalyzedChars is the query. The default is quite small.
> > Since many of my documents are large PDF files, I opted to use
> > storeOffsetsWithPositions="true" termVectors="true" on the field I was
> > searching on.
> > This certainly did increase my index size but not too bad and certainly
> > fast.
> > https://cwiki.apache.org/confluence/display/solr/Highlighting
> >
> > Beware of NOT plus OR in a search. That will certainly produce no
> > highlights. (eg test -results when default op is OR)
> >
> >
> > -----Original Message-----
> > From: Ali Husain [mailto:[hidden email]]
> > Sent: Thursday, 15 June 2017 11:11 a.m.
> > To: [hidden email]
> > Subject: Issue with highlighter
> >
> > Hi,
> >
> >
> > I think I've found a bug with the highlighter. I search for the word
> > "something" and I get an empty highlighting response for all the
> documents
> > that are returned shown below. The fields that I am searching over are
> > text_en, the highlighter works for a lot of queries. I have no
> > stopwords.txt list that could be messing this up either.
> >
> >
> >  "highlighting":{
> >     "310":{},
> >     "103":{},
> >     "406":{},
> >     "1189":{},
> >     "54":{},
> >     "292":{},
> >     "309":{}}}
> >
> >
> > Just changing the search term to "something like" I get back this:
> >
> >
> > "highlighting":{
> >     "310":{},
> >     "309":{
> >       "content":["1949 Convention, <em>like</em> those"]},
> >     "103":{},
> >     "406":{},
> >     "1189":{},
> >     "54":{},
> >     "292":{},
> >     "286":{
> >       "content":["persons in these classes are treated <em>like</em>
> > combatants, but in other respects"]},
> >     "336":{
> >       "content":["   be treated <em>like</em> engagement"]}}}
> >
> >
> > So I know that I have it setup correctly, but I can't figure this out.
> > I've searched through JIRA/Google and haven't been able to find a similar
> > issue.
> >
> >
> > Any ideas?
> >
> >
> > Thanks,
> >
> > Ali
> > Notice: This email and any attachments are confidential and may not be
> > used, published or redistributed without the prior written consent of the
> > Institute of Geological and Nuclear Sciences Limited (GNS Science). If
> > received in error please destroy and immediately notify GNS Science. Do
> not
> > copy or disclose the contents.
> >
> --
> Lucene/Solr Search Committer, Consultant, Developer, Author, Speaker
> LinkedIn: http://linkedin.com/in/davidwsmiley | Book:
> http://www.solrenterprisesearchserver.com
>
Reply | Threaded
Open this post in threaded view
|

Re: Issue with highlighter

Ali Husain
Damien, I tried that too before I sent the email. Nothing :/


http://localhost:8983/solr/testHighlight/select?hl.q=something&hl.fl=*&hl=on&indent=on&q=something&wt=json


This is a bug, right?

________________________________
From: Damien Kamerman <[hidden email]>
Sent: Friday, June 16, 2017 12:11:57 AM
To: [hidden email]
Subject: Re: Issue with highlighter

Ali, does adding a 'hl.q' param help?  q=something&hl.q=something&...

On 16 June 2017 at 06:21, Ali Husain <[hidden email]> wrote:

> Thanks for the replies. Let me try and explain this a little better.
>
>
> I haven't modified anything in solrconfig. All I did was get a fresh
> instance of solr 6.4.1 and create a core testHighlight. I then created a
> content field of type text_en via the Solr Admin UI. id was already there,
> and that is of type string.
>
>
> I then use the UI, once again to check the hl checkbox, hl.fl is set to *
> because I want any and every match.
>
>
> I push the following content into this new solr instance:
>
> id:91101
>
> content:'I am adding something to the core field and we will try and find
> it. We want to make sure the highlighter works!
>
> This is short so fragsize and max characters shouldn\'t be an issue.'
>
> As you can see, very few characters, fragsize, maxAnalyzedChars, all that
> should not be an issue.
>
>
> I then send this query:
>
> http://localhost:8983/solr/testHighlight/select?hl.fl=*&
> hl=on&indent=on&q=something&wt=json
>
>
> My results:
>
>
> "response":{"numFound":1,"start":0,"docs":[
>
> {"id":"91101",
>
>         "content":"I am adding something to the core field and we will try
> and find it. We want to make sure the highlighter works! This is short so
> fragsize and max characters shouldn't be an issue.",
>         "_version_":1570302668841156608}]
>
>
> },
>
>
> "highlighting":{
>     "91101":{}}
>
>
> I change q to be core instead of something.
>
>
> http://localhost:8983/solr/testHighlight/select?hl.fl=*&
> hl=on&indent=on&q=core&wt=json
>
>
> {
>         "id":"91101",
>         "content":"I am adding something to the core field and we will try
> and find it. We want to make sure the highlighter works! This is short so
> fragsize and max characters shouldn't be an issue.",
>         "_version_":1570302668841156608},
>
>
>
> "highlighting":{
>     "91101":{
>       "content":["I am adding something to the <em>core</em> field and we
> will try and find it. We want to make sure"]}}
>
> I've tried a bunch of queries. 'adding', 'something' both don't return any
> highlights. 'core' 'am' 'field' all work.
>
> Am I doing a better job of explaining this? Quite puzzling why this would
> be happening. My guess is there is some file/config somewhere that is
> ignoring some words? It isn't stopwords.txt in my case though. If that
> isn't the case then it definitely seems like a bug to me.
>
> Thanks, Ali
>
>
> ________________________________
> From: David Smiley <[hidden email]>
> Sent: Thursday, June 15, 2017 12:33:39 AM
> To: [hidden email]
> Subject: Re: Issue with highlighter
>
> > Beware of NOT plus OR in a search. That will certainly produce no
> highlights. (eg test -results when default op is OR)
>
> Seems like a bug to me; the default operator shouldn't matter in that case
> I think since there is only one clause that has no BooleanQuery.Occur
> operator and thus the OR/AND shouldn't matter.  The end effect is "test" is
> effectively required and should definitely be highlighted.
>
> Note to Ali: Phil's comment implies use of hl.method=unified which is not
> the default.
>
> On Wed, Jun 14, 2017 at 10:22 PM Phil Scadden <[hidden email]>
> wrote:
>
> > Just had similar issue - works for some, not others. First thing to look
> > at is hl.maxAnalyzedChars is the query. The default is quite small.
> > Since many of my documents are large PDF files, I opted to use
> > storeOffsetsWithPositions="true" termVectors="true" on the field I was
> > searching on.
> > This certainly did increase my index size but not too bad and certainly
> > fast.
> > https://cwiki.apache.org/confluence/display/solr/Highlighting
> >
> > Beware of NOT plus OR in a search. That will certainly produce no
> > highlights. (eg test -results when default op is OR)
> >
> >
> > -----Original Message-----
> > From: Ali Husain [mailto:[hidden email]]
> > Sent: Thursday, 15 June 2017 11:11 a.m.
> > To: [hidden email]
> > Subject: Issue with highlighter
> >
> > Hi,
> >
> >
> > I think I've found a bug with the highlighter. I search for the word
> > "something" and I get an empty highlighting response for all the
> documents
> > that are returned shown below. The fields that I am searching over are
> > text_en, the highlighter works for a lot of queries. I have no
> > stopwords.txt list that could be messing this up either.
> >
> >
> >  "highlighting":{
> >     "310":{},
> >     "103":{},
> >     "406":{},
> >     "1189":{},
> >     "54":{},
> >     "292":{},
> >     "309":{}}}
> >
> >
> > Just changing the search term to "something like" I get back this:
> >
> >
> > "highlighting":{
> >     "310":{},
> >     "309":{
> >       "content":["1949 Convention, <em>like</em> those"]},
> >     "103":{},
> >     "406":{},
> >     "1189":{},
> >     "54":{},
> >     "292":{},
> >     "286":{
> >       "content":["persons in these classes are treated <em>like</em>
> > combatants, but in other respects"]},
> >     "336":{
> >       "content":["   be treated <em>like</em> engagement"]}}}
> >
> >
> > So I know that I have it setup correctly, but I can't figure this out.
> > I've searched through JIRA/Google and haven't been able to find a similar
> > issue.
> >
> >
> > Any ideas?
> >
> >
> > Thanks,
> >
> > Ali
> > Notice: This email and any attachments are confidential and may not be
> > used, published or redistributed without the prior written consent of the
> > Institute of Geological and Nuclear Sciences Limited (GNS Science). If
> > received in error please destroy and immediately notify GNS Science. Do
> not
> > copy or disclose the contents.
> >
> --
> Lucene/Solr Search Committer, Consultant, Developer, Author, Speaker
> LinkedIn: http://linkedin.com/in/davidwsmiley | Book:
> http://www.solrenterprisesearchserver.com
>
Reply | Threaded
Open this post in threaded view
|

Re: Issue with highlighter

Erick Erickson
Works perfectly for me. Let's see:

> your solrconfig file, particularly the "select" handler.
> the field definition you use for the content field. Be sure to include the associated fieldType.
> the results of debug=on attached to the query.
> What version of Solr?

Best,
Erick




On Sat, Jun 17, 2017 at 7:14 PM, Ali Husain <[hidden email]> wrote:

> Damien, I tried that too before I sent the email. Nothing :/
>
>
> http://localhost:8983/solr/testHighlight/select?hl.q=something&hl.fl=*&hl=on&indent=on&q=something&wt=json
>
>
> This is a bug, right?
>
> ________________________________
> From: Damien Kamerman <[hidden email]>
> Sent: Friday, June 16, 2017 12:11:57 AM
> To: [hidden email]
> Subject: Re: Issue with highlighter
>
> Ali, does adding a 'hl.q' param help?  q=something&hl.q=something&...
>
> On 16 June 2017 at 06:21, Ali Husain <[hidden email]> wrote:
>
>> Thanks for the replies. Let me try and explain this a little better.
>>
>>
>> I haven't modified anything in solrconfig. All I did was get a fresh
>> instance of solr 6.4.1 and create a core testHighlight. I then created a
>> content field of type text_en via the Solr Admin UI. id was already there,
>> and that is of type string.
>>
>>
>> I then use the UI, once again to check the hl checkbox, hl.fl is set to *
>> because I want any and every match.
>>
>>
>> I push the following content into this new solr instance:
>>
>> id:91101
>>
>> content:'I am adding something to the core field and we will try and find
>> it. We want to make sure the highlighter works!
>>
>> This is short so fragsize and max characters shouldn\'t be an issue.'
>>
>> As you can see, very few characters, fragsize, maxAnalyzedChars, all that
>> should not be an issue.
>>
>>
>> I then send this query:
>>
>> http://localhost:8983/solr/testHighlight/select?hl.fl=*&
>> hl=on&indent=on&q=something&wt=json
>>
>>
>> My results:
>>
>>
>> "response":{"numFound":1,"start":0,"docs":[
>>
>> {"id":"91101",
>>
>>         "content":"I am adding something to the core field and we will try
>> and find it. We want to make sure the highlighter works! This is short so
>> fragsize and max characters shouldn't be an issue.",
>>         "_version_":1570302668841156608}]
>>
>>
>> },
>>
>>
>> "highlighting":{
>>     "91101":{}}
>>
>>
>> I change q to be core instead of something.
>>
>>
>> http://localhost:8983/solr/testHighlight/select?hl.fl=*&
>> hl=on&indent=on&q=core&wt=json
>>
>>
>> {
>>         "id":"91101",
>>         "content":"I am adding something to the core field and we will try
>> and find it. We want to make sure the highlighter works! This is short so
>> fragsize and max characters shouldn't be an issue.",
>>         "_version_":1570302668841156608},
>>
>>
>>
>> "highlighting":{
>>     "91101":{
>>       "content":["I am adding something to the <em>core</em> field and we
>> will try and find it. We want to make sure"]}}
>>
>> I've tried a bunch of queries. 'adding', 'something' both don't return any
>> highlights. 'core' 'am' 'field' all work.
>>
>> Am I doing a better job of explaining this? Quite puzzling why this would
>> be happening. My guess is there is some file/config somewhere that is
>> ignoring some words? It isn't stopwords.txt in my case though. If that
>> isn't the case then it definitely seems like a bug to me.
>>
>> Thanks, Ali
>>
>>
>> ________________________________
>> From: David Smiley <[hidden email]>
>> Sent: Thursday, June 15, 2017 12:33:39 AM
>> To: [hidden email]
>> Subject: Re: Issue with highlighter
>>
>> > Beware of NOT plus OR in a search. That will certainly produce no
>> highlights. (eg test -results when default op is OR)
>>
>> Seems like a bug to me; the default operator shouldn't matter in that case
>> I think since there is only one clause that has no BooleanQuery.Occur
>> operator and thus the OR/AND shouldn't matter.  The end effect is "test" is
>> effectively required and should definitely be highlighted.
>>
>> Note to Ali: Phil's comment implies use of hl.method=unified which is not
>> the default.
>>
>> On Wed, Jun 14, 2017 at 10:22 PM Phil Scadden <[hidden email]>
>> wrote:
>>
>> > Just had similar issue - works for some, not others. First thing to look
>> > at is hl.maxAnalyzedChars is the query. The default is quite small.
>> > Since many of my documents are large PDF files, I opted to use
>> > storeOffsetsWithPositions="true" termVectors="true" on the field I was
>> > searching on.
>> > This certainly did increase my index size but not too bad and certainly
>> > fast.
>> > https://cwiki.apache.org/confluence/display/solr/Highlighting
>> >
>> > Beware of NOT plus OR in a search. That will certainly produce no
>> > highlights. (eg test -results when default op is OR)
>> >
>> >
>> > -----Original Message-----
>> > From: Ali Husain [mailto:[hidden email]]
>> > Sent: Thursday, 15 June 2017 11:11 a.m.
>> > To: [hidden email]
>> > Subject: Issue with highlighter
>> >
>> > Hi,
>> >
>> >
>> > I think I've found a bug with the highlighter. I search for the word
>> > "something" and I get an empty highlighting response for all the
>> documents
>> > that are returned shown below. The fields that I am searching over are
>> > text_en, the highlighter works for a lot of queries. I have no
>> > stopwords.txt list that could be messing this up either.
>> >
>> >
>> >  "highlighting":{
>> >     "310":{},
>> >     "103":{},
>> >     "406":{},
>> >     "1189":{},
>> >     "54":{},
>> >     "292":{},
>> >     "309":{}}}
>> >
>> >
>> > Just changing the search term to "something like" I get back this:
>> >
>> >
>> > "highlighting":{
>> >     "310":{},
>> >     "309":{
>> >       "content":["1949 Convention, <em>like</em> those"]},
>> >     "103":{},
>> >     "406":{},
>> >     "1189":{},
>> >     "54":{},
>> >     "292":{},
>> >     "286":{
>> >       "content":["persons in these classes are treated <em>like</em>
>> > combatants, but in other respects"]},
>> >     "336":{
>> >       "content":["   be treated <em>like</em> engagement"]}}}
>> >
>> >
>> > So I know that I have it setup correctly, but I can't figure this out.
>> > I've searched through JIRA/Google and haven't been able to find a similar
>> > issue.
>> >
>> >
>> > Any ideas?
>> >
>> >
>> > Thanks,
>> >
>> > Ali
>> > Notice: This email and any attachments are confidential and may not be
>> > used, published or redistributed without the prior written consent of the
>> > Institute of Geological and Nuclear Sciences Limited (GNS Science). If
>> > received in error please destroy and immediately notify GNS Science. Do
>> not
>> > copy or disclose the contents.
>> >
>> --
>> Lucene/Solr Search Committer, Consultant, Developer, Author, Speaker
>> LinkedIn: http://linkedin.com/in/davidwsmiley | Book:
>> http://www.solrenterprisesearchserver.com
>>