"Indexed without position data" - strange exception in eDisMax (Solr 4.0beta)

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

"Indexed without position data" - strange exception in eDisMax (Solr 4.0beta)

Alexandre Rafalovitch
I am getting a very strange exception when I use edismax handler and
search query contains keyword with a dash (but only some keywords with
a dash).

The exception is:
1-Oct-2012 11:45:38 AM org.apache.solr.core.SolrCore execute
INFO: [kb] webapp=/solr path=/select
params={facet=true&facet.field=Schema&facet.field=EventCode&facet.field=ThemeCodes&facet.field=AichiCodes&q=(Schema:(sideEvent)+AND+A-D)&rows=0&version=2.2}
status=500 QTime=14
1-Oct-2012 11:45:38 AM org.apache.solr.common.SolrException log
SEVERE: null:java.lang.IllegalStateException: field "NamesEN" was
indexed without position data; cannot run PhraseQuery (term=a)
        at org.apache.lucene.search.PhraseQuery$PhraseWeight.scorer(PhraseQuery.java:274)
        at org.apache.lucene.search.DisjunctionMaxQuery$DisjunctionMaxWeight.scorer(DisjunctionMaxQuery.java:160)
        at org.apache.lucene.search.BooleanQuery$BooleanWeight.scorer(BooleanQuery.java:318)
        at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:571)
        at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:275)
        at org.apache.solr.search.SolrIndexSearcher.getDocListAndSetNC(SolrIndexSearcher.java:1514)
        at org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:1261)
        at org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:390)
        at org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:411)
        at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:206)
        at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
        at org.apache.solr.core.SolrCore.execute(SolrCore.java:1656)
        at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:454)
        at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:275)

The field definition it complaints about is:
        <field name="NamesEN" type="text_en_splitting" multiValued="true"
indexed="true" stored="true" />
And the eDisMax definition is (in solrconfig.xml):
<str name="qf">
.... Id NamesEN^5 Organizations ....
</str>

The strange thing it does not seem to happen for all 'X-Y" sequences.
Here is one that works just before:
1-Oct-2012 11:45:22 AM org.apache.solr.core.SolrCore execute
INFO: [kb] webapp=/solr path=/select
params={facet=true&facet.field=Schema&facet.field=EventCode&facet.field=ThemeCodes&facet.field=AichiCodes&q=(Schema:(sideEvent)+AND+ABC-D)&rows=0&version=2.2}
hits=13 status=0 QTime=105


I don't mind if the results are slightly off, I am still tuning the
full text search. But I am not sure what to do with the exception
above. Do I need to 'index position data' somehow? Do I need to escape
dash? Did I hit a rare bug in "handle anything thrown at it" eDisMax?

Any pointers would be appreciated.

Regards,
   Alex.

Personal blog: http://blog.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all
at once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
book)
Reply | Threaded
Open this post in threaded view
|

Re: "Indexed without position data" - strange exception in eDisMax (Solr 4.0beta)

Jack Krupansky-2
You probably have omitTermFreqAndPositions=true or omitPositions=true in
your schema for that field. You MUST have position info to use phrase query.

-- Jack Krupansky

-----Original Message-----
From: Alexandre Rafalovitch
Sent: Monday, October 01, 2012 11:53 AM
To: [hidden email]
Subject: "Indexed without position data" - strange exception in eDisMax
(Solr 4.0beta)

I am getting a very strange exception when I use edismax handler and
search query contains keyword with a dash (but only some keywords with
a dash).

The exception is:
1-Oct-2012 11:45:38 AM org.apache.solr.core.SolrCore execute
INFO: [kb] webapp=/solr path=/select
params={facet=true&facet.field=Schema&facet.field=EventCode&facet.field=ThemeCodes&facet.field=AichiCodes&q=(Schema:(sideEvent)+AND+A-D)&rows=0&version=2.2}
status=500 QTime=14
1-Oct-2012 11:45:38 AM org.apache.solr.common.SolrException log
SEVERE: null:java.lang.IllegalStateException: field "NamesEN" was
indexed without position data; cannot run PhraseQuery (term=a)
        at
org.apache.lucene.search.PhraseQuery$PhraseWeight.scorer(PhraseQuery.java:274)
        at
org.apache.lucene.search.DisjunctionMaxQuery$DisjunctionMaxWeight.scorer(DisjunctionMaxQuery.java:160)
        at
org.apache.lucene.search.BooleanQuery$BooleanWeight.scorer(BooleanQuery.java:318)
        at
org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:571)
        at
org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:275)
        at
org.apache.solr.search.SolrIndexSearcher.getDocListAndSetNC(SolrIndexSearcher.java:1514)
        at
org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:1261)
        at
org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:390)
        at
org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:411)
        at
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:206)
        at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
        at org.apache.solr.core.SolrCore.execute(SolrCore.java:1656)
        at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:454)
        at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:275)

The field definition it complaints about is:
<field name="NamesEN" type="text_en_splitting" multiValued="true"
indexed="true" stored="true" />
And the eDisMax definition is (in solrconfig.xml):
<str name="qf">
.... Id NamesEN^5 Organizations ....
</str>

The strange thing it does not seem to happen for all 'X-Y" sequences.
Here is one that works just before:
1-Oct-2012 11:45:22 AM org.apache.solr.core.SolrCore execute
INFO: [kb] webapp=/solr path=/select
params={facet=true&facet.field=Schema&facet.field=EventCode&facet.field=ThemeCodes&facet.field=AichiCodes&q=(Schema:(sideEvent)+AND+ABC-D)&rows=0&version=2.2}
hits=13 status=0 QTime=105


I don't mind if the results are slightly off, I am still tuning the
full text search. But I am not sure what to do with the exception
above. Do I need to 'index position data' somehow? Do I need to escape
dash? Did I hit a rare bug in "handle anything thrown at it" eDisMax?

Any pointers would be appreciated.

Regards,
   Alex.

Personal blog: http://blog.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all
at once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
book)

Reply | Threaded
Open this post in threaded view
|

RE: "Indexed without position data" - strange exception in eDisMax (Solr 4.0beta)

Markus Jelsma-2
In reply to this post by Alexandre Rafalovitch

-----Original message-----

> From:Alexandre Rafalovitch <[hidden email]>
> Sent: Mon 01-Oct-2012 17:58
> To: [hidden email]
> Subject: &quot;Indexed without position data&quot; - strange exception in eDisMax (Solr 4.0beta)
>
> I am getting a very strange exception when I use edismax handler and
> search query contains keyword with a dash (but only some keywords with
> a dash).
>
> The exception is:
> 1-Oct-2012 11:45:38 AM org.apache.solr.core.SolrCore execute
> INFO: [kb] webapp=/solr path=/select
> params={facet=true&facet.field=Schema&facet.field=EventCode&facet.field=ThemeCodes&facet.field=AichiCodes&q=(Schema:(sideEvent)+AND+A-D)&rows=0&version=2.2}
> status=500 QTime=14
> 1-Oct-2012 11:45:38 AM org.apache.solr.common.SolrException log
> SEVERE: null:java.lang.IllegalStateException: field "NamesEN" was
> indexed without position data; cannot run PhraseQuery (term=a)
>         at org.apache.lucene.search.PhraseQuery$PhraseWeight.scorer(PhraseQuery.java:274)
>         at org.apache.lucene.search.DisjunctionMaxQuery$DisjunctionMaxWeight.scorer(DisjunctionMaxQuery.java:160)
>         at org.apache.lucene.search.BooleanQuery$BooleanWeight.scorer(BooleanQuery.java:318)
>         at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:571)
>         at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:275)
>         at org.apache.solr.search.SolrIndexSearcher.getDocListAndSetNC(SolrIndexSearcher.java:1514)
>         at org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:1261)
>         at org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:390)
>         at org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:411)
>         at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:206)
>         at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
>         at org.apache.solr.core.SolrCore.execute(SolrCore.java:1656)
>         at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:454)
>         at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:275)
>
> The field definition it complaints about is:
> <field name="NamesEN" type="text_en_splitting" multiValued="true"
> indexed="true" stored="true" />
> And the eDisMax definition is (in solrconfig.xml):
> <str name="qf">
> .... Id NamesEN^5 Organizations ....
> </str>
>
> The strange thing it does not seem to happen for all 'X-Y" sequences.
> Here is one that works just before:
> 1-Oct-2012 11:45:22 AM org.apache.solr.core.SolrCore execute
> INFO: [kb] webapp=/solr path=/select
> params={facet=true&facet.field=Schema&facet.field=EventCode&facet.field=ThemeCodes&facet.field=AichiCodes&q=(Schema:(sideEvent)+AND+ABC-D)&rows=0&version=2.2}
> hits=13 status=0 QTime=105
>
>
> I don't mind if the results are slightly off, I am still tuning the
> full text search. But I am not sure what to do with the exception
> above. Do I need to 'index position data' somehow? Do I need to escape
> dash? Did I hit a rare bug in "handle anything thrown at it" eDisMax?
>
> Any pointers would be appreciated.

Hi - this happens if the field is indexed without position data and the user requests an explicit phrase query and/or you have autoGeneratePhraseQueries enabled for the field.

You can fix it by removing the field from the fl parameter or index the field with position data. Turning autoGeneratePhraseQueries off will fix this problem but the exception will return for explicit phrase queries. Check the field or fieldType for omitTermFreqAndPositions or omitPositions.


>
> Regards,
>    Alex.
>
> Personal blog: http://blog.outerthoughts.com/
> LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
> - Time is the quality of nature that keeps events from happening all
> at once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
> book)
>
Reply | Threaded
Open this post in threaded view
|

Re: "Indexed without position data" - strange exception in eDisMax (Solr 4.0beta)

Alexandre Rafalovitch
In reply to this post by Jack Krupansky-2
I use text_en_splitting from example distribution, which does have
autoGeneratePhraseQueries:
        <fieldType name="text_en_splitting" class="solr.TextField"
positionIncrementGap="100" autoGeneratePhraseQueries="true">

But I do not see the other omit options anywhere in solr.config and
definitely not for the field or field type definition:
<field name="NamesEN" type="text_en_splitting" multiValued="true"
indexed="true" stored="true" />

I am not sure what the defaults are either. I am using version 1.5 of
the schema and it says in the example schema.xml:
       1.2: omitTermFreqAndPositions attribute introduced, true by
default except for text fields.
Does it mean it is by default true for "text_en_splitting" or false
because it is a text field. Wiki does not say anything else.

I can disable autoGeneratePhraseQueries as the next step I guess
(still not sure exactly what it does anyway), but I still remain
somewhat confused on the other two fields.

Regards,
   Alex.

Personal blog: http://blog.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all
at once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
book)


On Mon, Oct 1, 2012 at 12:00 PM, Jack Krupansky <[hidden email]> wrote:

> You probably have omitTermFreqAndPositions=true or omitPositions=true in
> your schema for that field. You MUST have position info to use phrase query.
>
> -- Jack Krupansky
>
> -----Original Message----- From: Alexandre Rafalovitch
> Sent: Monday, October 01, 2012 11:53 AM
> To: [hidden email]
> Subject: "Indexed without position data" - strange exception in eDisMax
> (Solr 4.0beta)
>
>
> I am getting a very strange exception when I use edismax handler and
> search query contains keyword with a dash (but only some keywords with
> a dash).
>
> The exception is:
> 1-Oct-2012 11:45:38 AM org.apache.solr.core.SolrCore execute
> INFO: [kb] webapp=/solr path=/select
> params={facet=true&facet.field=Schema&facet.field=EventCode&facet.field=ThemeCodes&facet.field=AichiCodes&q=(Schema:(sideEvent)+AND+A-D)&rows=0&version=2.2}
> status=500 QTime=14
> 1-Oct-2012 11:45:38 AM org.apache.solr.common.SolrException log
> SEVERE: null:java.lang.IllegalStateException: field "NamesEN" was
> indexed without position data; cannot run PhraseQuery (term=a)
>        at
> org.apache.lucene.search.PhraseQuery$PhraseWeight.scorer(PhraseQuery.java:274)
>        at
> org.apache.lucene.search.DisjunctionMaxQuery$DisjunctionMaxWeight.scorer(DisjunctionMaxQuery.java:160)
>        at
> org.apache.lucene.search.BooleanQuery$BooleanWeight.scorer(BooleanQuery.java:318)
>        at
> org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:571)
>        at
> org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:275)
>        at
> org.apache.solr.search.SolrIndexSearcher.getDocListAndSetNC(SolrIndexSearcher.java:1514)
>        at
> org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:1261)
>        at
> org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:390)
>        at
> org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:411)
>        at
> org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:206)
>        at
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
>        at org.apache.solr.core.SolrCore.execute(SolrCore.java:1656)
>        at
> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:454)
>        at
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:275)
>
> The field definition it complaints about is:
> <field name="NamesEN" type="text_en_splitting" multiValued="true"
> indexed="true" stored="true" />
> And the eDisMax definition is (in solrconfig.xml):
> <str name="qf">
> .... Id NamesEN^5 Organizations ....
> </str>
>
> The strange thing it does not seem to happen for all 'X-Y" sequences.
> Here is one that works just before:
> 1-Oct-2012 11:45:22 AM org.apache.solr.core.SolrCore execute
> INFO: [kb] webapp=/solr path=/select
> params={facet=true&facet.field=Schema&facet.field=EventCode&facet.field=ThemeCodes&facet.field=AichiCodes&q=(Schema:(sideEvent)+AND+ABC-D)&rows=0&version=2.2}
> hits=13 status=0 QTime=105
>
>
> I don't mind if the results are slightly off, I am still tuning the
> full text search. But I am not sure what to do with the exception
> above. Do I need to 'index position data' somehow? Do I need to escape
> dash? Did I hit a rare bug in "handle anything thrown at it" eDisMax?
>
> Any pointers would be appreciated.
>
> Regards,
>   Alex.
>
> Personal blog: http://blog.outerthoughts.com/
> LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
> - Time is the quality of nature that keeps events from happening all
> at once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
> book)
Reply | Threaded
Open this post in threaded view
|

Re: "Indexed without position data" - strange exception in eDisMax (Solr 4.0beta)

Jack Krupansky-2
"autoGeneratePhraseQueries" simply means that terms with embedded
punctuation that gets filtered to be whitespace will treat the resulting
sub-terms as if they were a quoted phrase, so A-D gets treated as "A D", a
phrase.

Did you maybe index the initial data with different field types and then
change to the current field types, say "string" and then move to "text_..."?
If so, simply delete the index and re-index.

-- Jack Krupansky

-----Original Message-----
From: Alexandre Rafalovitch
Sent: Monday, October 01, 2012 12:32 PM
To: [hidden email]
Subject: Re: "Indexed without position data" - strange exception in eDisMax
(Solr 4.0beta)

I use text_en_splitting from example distribution, which does have
autoGeneratePhraseQueries:
        <fieldType name="text_en_splitting" class="solr.TextField"
positionIncrementGap="100" autoGeneratePhraseQueries="true">

But I do not see the other omit options anywhere in solr.config and
definitely not for the field or field type definition:
<field name="NamesEN" type="text_en_splitting" multiValued="true"
indexed="true" stored="true" />

I am not sure what the defaults are either. I am using version 1.5 of
the schema and it says in the example schema.xml:
       1.2: omitTermFreqAndPositions attribute introduced, true by
default except for text fields.
Does it mean it is by default true for "text_en_splitting" or false
because it is a text field. Wiki does not say anything else.

I can disable autoGeneratePhraseQueries as the next step I guess
(still not sure exactly what it does anyway), but I still remain
somewhat confused on the other two fields.

Regards,
   Alex.

Personal blog: http://blog.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all
at once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
book)


On Mon, Oct 1, 2012 at 12:00 PM, Jack Krupansky <[hidden email]>
wrote:

> You probably have omitTermFreqAndPositions=true or omitPositions=true in
> your schema for that field. You MUST have position info to use phrase
> query.
>
> -- Jack Krupansky
>
> -----Original Message----- From: Alexandre Rafalovitch
> Sent: Monday, October 01, 2012 11:53 AM
> To: [hidden email]
> Subject: "Indexed without position data" - strange exception in eDisMax
> (Solr 4.0beta)
>
>
> I am getting a very strange exception when I use edismax handler and
> search query contains keyword with a dash (but only some keywords with
> a dash).
>
> The exception is:
> 1-Oct-2012 11:45:38 AM org.apache.solr.core.SolrCore execute
> INFO: [kb] webapp=/solr path=/select
> params={facet=true&facet.field=Schema&facet.field=EventCode&facet.field=ThemeCodes&facet.field=AichiCodes&q=(Schema:(sideEvent)+AND+A-D)&rows=0&version=2.2}
> status=500 QTime=14
> 1-Oct-2012 11:45:38 AM org.apache.solr.common.SolrException log
> SEVERE: null:java.lang.IllegalStateException: field "NamesEN" was
> indexed without position data; cannot run PhraseQuery (term=a)
>        at
> org.apache.lucene.search.PhraseQuery$PhraseWeight.scorer(PhraseQuery.java:274)
>        at
> org.apache.lucene.search.DisjunctionMaxQuery$DisjunctionMaxWeight.scorer(DisjunctionMaxQuery.java:160)
>        at
> org.apache.lucene.search.BooleanQuery$BooleanWeight.scorer(BooleanQuery.java:318)
>        at
> org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:571)
>        at
> org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:275)
>        at
> org.apache.solr.search.SolrIndexSearcher.getDocListAndSetNC(SolrIndexSearcher.java:1514)
>        at
> org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:1261)
>        at
> org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:390)
>        at
> org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:411)
>        at
> org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:206)
>        at
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
>        at org.apache.solr.core.SolrCore.execute(SolrCore.java:1656)
>        at
> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:454)
>        at
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:275)
>
> The field definition it complaints about is:
> <field name="NamesEN" type="text_en_splitting" multiValued="true"
> indexed="true" stored="true" />
> And the eDisMax definition is (in solrconfig.xml):
> <str name="qf">
> .... Id NamesEN^5 Organizations ....
> </str>
>
> The strange thing it does not seem to happen for all 'X-Y" sequences.
> Here is one that works just before:
> 1-Oct-2012 11:45:22 AM org.apache.solr.core.SolrCore execute
> INFO: [kb] webapp=/solr path=/select
> params={facet=true&facet.field=Schema&facet.field=EventCode&facet.field=ThemeCodes&facet.field=AichiCodes&q=(Schema:(sideEvent)+AND+ABC-D)&rows=0&version=2.2}
> hits=13 status=0 QTime=105
>
>
> I don't mind if the results are slightly off, I am still tuning the
> full text search. But I am not sure what to do with the exception
> above. Do I need to 'index position data' somehow? Do I need to escape
> dash? Did I hit a rare bug in "handle anything thrown at it" eDisMax?
>
> Any pointers would be appreciated.
>
> Regards,
>   Alex.
>
> Personal blog: http://blog.outerthoughts.com/
> LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
> - Time is the quality of nature that keeps events from happening all
> at once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
> book)