Range Queries performing differently on SortableIntField vs TrieField of type integer

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

Range Queries performing differently on SortableIntField vs TrieField of type integer

Aaron Daubman
Greetings,

I'm finally updating an old instance and in testing, discovered that using
the recommended TrieField instead of SortableIntField for range queries
returns unexpected and seemingly incorrect results.

A query with:

q=*:*&fq=+i_yearStartSort:{* TO 1995}&fq=+i_yearStopSort:{* TO *}

Should, and does under 1.4.1 with SortableIntField, only return docs that
have some i_yearStopSort value and have an i_yearStartSort value less than
1995.

Unfortunately, under 3.6.1 with class="solr.TrieField" type="integer", this
query is returning docs that have neither an i_yearStopSort nor a
i_yearStartSort value.


Here are the two schemas:

Solr 1.4.1 Relevant Schema Parts - Working as desired:
---------------------------------------------------------------------------------
<fieldType name="sint" class="solr.SortableIntField" sortMissingLast="true"
omitNorms="true"/>
...
<field name="i_yearStartSort" type="sint" indexed="true" stored="false"
required="false" multiValued="true"/>
<field name="i_yearStopSort" type="sint" indexed="true"  stored="false"
required="false" multiValued="true"/>


Solr 3.6.1 Relevant Schema Parts - Not working as expected:
-----------------------------------------------------------------------------------------
<fieldType name="tint" class="solr.TrieField" type="integer"
precisionStep="4" sortMissingLast="true" positionIncrementGap="0"
omitNorms="true"/>
...
<field name="i_yearStartSort" type="tint" indexed="true" stored="false"
required="false" multiValued="false"/>
<field name="i_yearStopSort" type="tint" indexed="true" stored="false"
required="false" multiValued="false"/>


1) What is the best way to return to the desired/expected behavior?
2) Can you explain to me why this happens?
3) I have a sneaking suspicion (but could be totally wrong) that this
relates to sortMissingLast="true" - if it does, can you explain the seeming
discrepancies in:
SOLR-2881 and SOLR-2134? If I am reading these correctly, SOLR-2134 says
this was fixed for Trie in 4.0, but not in 3.x... SOLR-2881 has a fix
version of 3.5 listed, but some of the comments also seem to indicate this
was not actually fixed in 3.5+

Thanks,
     Aaron
Reply | Threaded
Open this post in threaded view
|

Re: Range Queries performing differently on SortableIntField vs TrieField of type integer

Malcolm Upayavira Holmes
One small question - did you re-index in-between? The index structure
will be different for each.

Upayavira

On Tue, Dec 4, 2012, at 02:30 PM, Aaron Daubman wrote:

> Greetings,
>
> I'm finally updating an old instance and in testing, discovered that
> using
> the recommended TrieField instead of SortableIntField for range queries
> returns unexpected and seemingly incorrect results.
>
> A query with:
>
> q=*:*&fq=+i_yearStartSort:{* TO 1995}&fq=+i_yearStopSort:{* TO *}
>
> Should, and does under 1.4.1 with SortableIntField, only return docs that
> have some i_yearStopSort value and have an i_yearStartSort value less
> than
> 1995.
>
> Unfortunately, under 3.6.1 with class="solr.TrieField" type="integer",
> this
> query is returning docs that have neither an i_yearStopSort nor a
> i_yearStartSort value.
>
>
> Here are the two schemas:
>
> Solr 1.4.1 Relevant Schema Parts - Working as desired:
> ---------------------------------------------------------------------------------
> <fieldType name="sint" class="solr.SortableIntField"
> sortMissingLast="true"
> omitNorms="true"/>
> ...
> <field name="i_yearStartSort" type="sint" indexed="true" stored="false"
> required="false" multiValued="true"/>
> <field name="i_yearStopSort" type="sint" indexed="true"  stored="false"
> required="false" multiValued="true"/>
>
>
> Solr 3.6.1 Relevant Schema Parts - Not working as expected:
> -----------------------------------------------------------------------------------------
> <fieldType name="tint" class="solr.TrieField" type="integer"
> precisionStep="4" sortMissingLast="true" positionIncrementGap="0"
> omitNorms="true"/>
> ...
> <field name="i_yearStartSort" type="tint" indexed="true" stored="false"
> required="false" multiValued="false"/>
> <field name="i_yearStopSort" type="tint" indexed="true" stored="false"
> required="false" multiValued="false"/>
>
>
> 1) What is the best way to return to the desired/expected behavior?
> 2) Can you explain to me why this happens?
> 3) I have a sneaking suspicion (but could be totally wrong) that this
> relates to sortMissingLast="true" - if it does, can you explain the
> seeming
> discrepancies in:
> SOLR-2881 and SOLR-2134? If I am reading these correctly, SOLR-2134 says
> this was fixed for Trie in 4.0, but not in 3.x... SOLR-2881 has a fix
> version of 3.5 listed, but some of the comments also seem to indicate
> this
> was not actually fixed in 3.5+
>
> Thanks,
>      Aaron
Reply | Threaded
Open this post in threaded view
|

Re: Range Queries performing differently on SortableIntField vs TrieField of type integer

Aaron Daubman
Hi Upayavira,

One small question - did you re-index in-between? The index structure
> will be different for each.
>

Yes, the Solr 1.4.1 (working) instance was built using the original schema
and that solr version.
The Solr 3.6.1 (not working) instance was re-built using the new schema and
Solr 3.6.1...

Thanks,
      Aaron
Reply | Threaded
Open this post in threaded view
|

Re: Range Queries performing differently on SortableIntField vs TrieField of type integer

Aaron Daubman
In reply to this post by Malcolm Upayavira Holmes
I forgot a possibly important piece... Given the different Solr versions,
the schema version (and it's related different defaults) is also a change:

Solr 1.4.1 Has:
<schema name="ourSchema" version="1.1">

Solr 3.6.1 Has:
<schema name="ourSchema" version="1.5">


> Solr 1.4.1 Relevant Schema Parts - Working as desired:

> >
> ---------------------------------------------------------------------------------
> > <fieldType name="sint" class="solr.SortableIntField"
> > sortMissingLast="true"
> > omitNorms="true"/>
> > ...
> > <field name="i_yearStartSort" type="sint" indexed="true" stored="false"
> > required="false" multiValued="true"/>
> > <field name="i_yearStopSort" type="sint" indexed="true"  stored="false"
> > required="false" multiValued="true"/>
> >
> >
> > Solr 3.6.1 Relevant Schema Parts - Not working as expected:
> >
> -----------------------------------------------------------------------------------------
> > <fieldType name="tint" class="solr.TrieField" type="integer"
> > precisionStep="4" sortMissingLast="true" positionIncrementGap="0"
> > omitNorms="true"/>
> > ...
> > <field name="i_yearStartSort" type="tint" indexed="true" stored="false"
> > required="false" multiValued="false"/>
> > <field name="i_yearStopSort" type="tint" indexed="true" stored="false"
> > required="false" multiValued="false"/>
>
Reply | Threaded
Open this post in threaded view
|

Re: Range Queries performing differently on SortableIntField vs TrieField of type integer

Chris Hostetter-3
In reply to this post by Aaron Daubman

: q=*:*&fq=+i_yearStartSort:{* TO 1995}&fq=+i_yearStopSort:{* TO *}
        ...
: Unfortunately, under 3.6.1 with class="solr.TrieField" type="integer", this
: query is returning docs that have neither an i_yearStopSort nor a
: i_yearStartSort value.

Hmmmm... I can't seem to reproduce this.

Here's what i tried...

1) start up the Solr 3.6.1 example

2) index the 3.6.1 example docs...
java -jar post.jar *.xml

3) index a single doc using some "*_ti" dynamic fields (which us
"tint")...
java -Ddata=args -jar post.jar '<add><doc><field name="id">HOSS</field><field name="start_ti">45</field><field name="end_ti">100</field></doc></add>'

If i do some open ended range queries on the *_ti fields, i get the
results i expect (either only my HOSS doc if it's in the ranges, or no
docs if HOSS is out of range)...

Matches HOSS...
http://localhost:8983/solr/select?q=*:*&fq=start_ti:{*%20TO%2050}&fl=start_ti,id,end_ti
http://localhost:8983/solr/select?q=*:*&fq=start_ti:{*%20TO%2050}&fq=end_ti:{*%20TO%20*}fl=start_ti,id,end_ti

Matches nothing...
http://localhost:8983/solr/select?q=*:*&fq=start_ti:{*%20TO%205}&fl=start_ti,id,end_ti
http://localhost:8983/solr/select?q=*:*&fq=start_ti:{*%20TO%205}&fq=end_ti:{*%20TO%20*}fl=start_ti,id,end_ti

I repeated the test after deleting all data, and adding
sortMissingLast="true" to the example "tint" fieldType, and got the same
results.

: Solr 3.6.1 Relevant Schema Parts - Not working as expected:
: -----------------------------------------------------------------------------------------
: <fieldType name="tint" class="solr.TrieField" type="integer"
: precisionStep="4" sortMissingLast="true" positionIncrementGap="0"
: omitNorms="true"/>

FYI: you have some wackiness there: 'type="integer"' inside the
'<fieldType name="tint" .../>' ... that shouldn't have caused any problems
though, but it doesn't make any sense.

: <field name="i_yearStartSort" type="tint" indexed="true" stored="false"
: required="false" multiValued="false"/>
: <field name="i_yearStopSort" type="tint" indexed="true" stored="false"
: required="false" multiValued="false"/>

can you try changing those to stored="true" and re-indexing as a sanity
check? perhaps your indexing code is putting a default value in that
you aren't realizing?

w/o more specifics (ie: sample docs to index) on how to reproduce, i can't
seem to find any problem.


-Hoss
Reply | Threaded
Open this post in threaded view
|

Re: Range Queries performing differently on SortableIntField vs TrieField of type integer

Jack Krupansky-2
In reply to this post by Aaron Daubman
Could you show us some input data, both WITH a i_yearStopSort value and
WITHOUT the the value?

I tried a quick test using the stock Solr 3.6.1 example schema and a dynamic
integer field and the filter query did in fact filter out all documents that
did not have a value in that field:

http://localhost:8983/solr/select?q=*:*&fq=%2bx_i:{*+TO+*}

Maybe you could come up with a simple sample solrxml document that can be
added to the stock 3.6.1 example schema that shows the problem.

-- Jack Krupansky

-----Original Message-----
From: Aaron Daubman
Sent: Tuesday, December 04, 2012 9:30 AM
To: [hidden email]
Subject: Range Queries performing differently on SortableIntField vs
TrieField of type integer

Greetings,

I'm finally updating an old instance and in testing, discovered that using
the recommended TrieField instead of SortableIntField for range queries
returns unexpected and seemingly incorrect results.

A query with:

q=*:*&fq=+i_yearStartSort:{* TO 1995}&fq=+i_yearStopSort:{* TO *}

Should, and does under 1.4.1 with SortableIntField, only return docs that
have some i_yearStopSort value and have an i_yearStartSort value less than
1995.

Unfortunately, under 3.6.1 with class="solr.TrieField" type="integer", this
query is returning docs that have neither an i_yearStopSort nor a
i_yearStartSort value.


Here are the two schemas:

Solr 1.4.1 Relevant Schema Parts - Working as desired:
---------------------------------------------------------------------------------
<fieldType name="sint" class="solr.SortableIntField" sortMissingLast="true"
omitNorms="true"/>
...
<field name="i_yearStartSort" type="sint" indexed="true" stored="false"
required="false" multiValued="true"/>
<field name="i_yearStopSort" type="sint" indexed="true"  stored="false"
required="false" multiValued="true"/>


Solr 3.6.1 Relevant Schema Parts - Not working as expected:
-----------------------------------------------------------------------------------------
<fieldType name="tint" class="solr.TrieField" type="integer"
precisionStep="4" sortMissingLast="true" positionIncrementGap="0"
omitNorms="true"/>
...
<field name="i_yearStartSort" type="tint" indexed="true" stored="false"
required="false" multiValued="false"/>
<field name="i_yearStopSort" type="tint" indexed="true" stored="false"
required="false" multiValued="false"/>


1) What is the best way to return to the desired/expected behavior?
2) Can you explain to me why this happens?
3) I have a sneaking suspicion (but could be totally wrong) that this
relates to sortMissingLast="true" - if it does, can you explain the seeming
discrepancies in:
SOLR-2881 and SOLR-2134? If I am reading these correctly, SOLR-2134 says
this was fixed for Trie in 4.0, but not in 3.x... SOLR-2881 has a fix
version of 3.5 listed, but some of the comments also seem to indicate this
was not actually fixed in 3.5+

Thanks,
     Aaron