Garbled facets even in a zero hit search

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Garbled facets even in a zero hit search

Dennis Schafroth
Hi, 

Running on a Debian 5.0.5 64bit box. Using
solr-1.4.1 with Java version "1.6.0_20"

I am seeing weird facets results along with the "right" looking ones. Garbled data, stuff that looks like a buffer overflow / index off by ...

And I even get them when I do a zero hit search. I wouldn't expect any facets: 

<?xml version="1.0" encoding="UTF-8"?>
<response>
  <lst name="responseHeader">
    <int name="status">0</int>
    <int name="QTime">56</int>
    <lst name="params">
      <str name="facet">true</str>
      <str name="shards">satay:8985/solr</str>
      <str name="start">0</str>
      <str name="q">title:xzyzx</str>
      <str name="f.date.facet.limit">10</str>
      <str name="f.subject_exact.facet.limit">10</str>
      <arr name="facet.field">
        <str>author_exact</str>
        <str>date</str>
        <str>subject_exact</str>
      </arr>
      <str name="f.author_exact.facet.limit">10</str>
      <str name="rows">20</str>
    </lst>
  </lst>
  <result name="response" numFound="0" start="0"/>
  <lst name="facet_counts">
    <lst name="facet_queries"/>
    <lst name="facet_fields">
      <lst name="author_exact">
        <int name=" ">0</int>
        <int name=" !;;!">0</int>
        <int name=" (Domingo, Juan); Imprenta Tormentaria (Córdoba)">0</int>
        <int name=" (Supervisor)">0</int>
        <int name=" *">0</int>
        <int name=" * ">0</int>
        <int name=" * (μτφρ.)">0</int>
        <int name=" * * * ">0</int>
        <int name=" * * * (μτφρ.)">0</int>
        <int name=" * * * *">0</int>
      </lst>
      <lst name="date">
        <int name="0000">0</int>
        <int name="0001">0</int>
        <int name="0002">0</int>
        <int name="0003">0</int>
        <int name="0004">0</int>
        <int name="0005">0</int>
        <int name="0006">0</int>
        <int name="0007">0</int>
        <int name="0008">0</int>
        <int name="0009">0</int>
      </lst>
      <lst name="subject_exact">
        <int name=" ">0</int>
        <int name=" ! ! R P R">0</int>
        <int name=" !!rrqqyyhqhqwwllrqrqdd!!vvddvv">0</int>
        <int name=" !&quot;&quot;$%&quot;( )*+,($&quot;(">0</int>
        <int name=" !()+, -./01 23456">0</int>
        <int name=" !-decidable and decidable deductive procedures for a restricted FTL with Unless">0</int>
        <int name=" !&lt;f87.03...">0</int>
        <int name=" &quot;)338-8570">0</int>
        <int name=" &quot;-Optimization Schemes and L-Bit Precision: Alternative Perspectives in Combinatorial Optimization">0</int>
        <int name=" &quot;A picture is worth 1K words&quot;">0</int>
      </lst>
    </lst>
    <lst name="facet_dates"/>
  </lst>
</response>


I tried to look for a bug report, but haven't been able to find one that matches. I will try to setup a debug session to get closer, but would love to get feedback if this is a know issue.
cheers, 
:-Dennis Schafroth 

response_formated.xml (3K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Garbled facets even in a zero hit search

Erick Erickson
That looks...er...unfortunate. The very first thing I'd do is check your
index
and see if there are such weird values in your facet fields. My guess is
that SOLR is working fine, but you somehow have garbage values
in your index, but that's only a guess. I'd try that before trying to
debug, GIGO.

Which wouldn't answer the question of how the garbage got in there
in the first place, posting your field type definition for your
faceted fields would help with that question.

Best
Erick

On Thu, Sep 9, 2010 at 8:19 AM, Dennis Schafroth <[hidden email]>wrote:

> Hi,
>
> Running on a Debian 5.0.5 64bit box. Using
> solr-1.4.1 with Java version "1.6.0_20"
>
> I am seeing weird facets results along with the "right" looking ones.
> Garbled data, stuff that looks like a buffer overflow / index off by ...
>
> And I even get them when I do a zero hit search. I wouldn't expect any
> facets:
>
> <?xml version="1.0" encoding="UTF-8"?>
> <response>
>   <lst name="responseHeader">
>     <int name="status">0</int>
>     <int name="QTime">56</int>
>     <lst name="params">
>       <str name="facet">true</str>
>       <str name="shards">satay:8985/solr</str>
>       <str name="start">0</str>
>       <str name="q">title:xzyzx</str>
>       <str name="f.date.facet.limit">10</str>
>       <str name="f.subject_exact.facet.limit">10</str>
>       <arr name="facet.field">
>         <str>author_exact</str>
>         <str>date</str>
>         <str>subject_exact</str>
>       </arr>
>       <str name="f.author_exact.facet.limit">10</str>
>       <str name="rows">20</str>
>     </lst>
>   </lst>
>   <result name="response" numFound="0" start="0"/>
>   <lst name="facet_counts">
>     <lst name="facet_queries"/>
>     <lst name="facet_fields">
>       <lst name="author_exact">
>         <int name=" ">0</int>
>         <int name=" !;;!">0</int>
>         <int name=" (Domingo, Juan); Imprenta Tormentaria (Córdoba)">0</
> int>
>         <int name=" (Supervisor)">0</int>
>         <int name=" *">0</int>
>         <int name=" * ">0</int>
>         <int name=" * (μτφρ.)">0</int>
>         <int name=" * * * ">0</int>
>         <int name=" * * * (μτφρ.)">0</int>
>         <int name=" * * * *">0</int>
>       </lst>
>       <lst name="date">
>         <int name="0000">0</int>
>         <int name="0001">0</int>
>         <int name="0002">0</int>
>         <int name="0003">0</int>
>         <int name="0004">0</int>
>         <int name="0005">0</int>
>         <int name="0006">0</int>
>         <int name="0007">0</int>
>         <int name="0008">0</int>
>         <int name="0009">0</int>
>       </lst>
>       <lst name="subject_exact">
>         <int name=" ">0</int>
>         <int name=" ! ! R P R">0</int>
>         <int name=" !!rrqqyyhqhqwwllrqrqdd!!vvddvv">0</int>
>         <int name=" !&quot;&quot;$%&quot;( )*+,($&quot;(">0</int>
>         <int name=" !()+, -./01 23456">0</int>
>         <int name=" !-decidable and decidable deductive procedures for a
> restricted FTL with Unless">0</int>
>         <int name=" !&lt;f87.03...">0</int>
>         <int name=" &quot;)338-8570">0</int>
>         <int name=" &quot;-Optimization Schemes and L-Bit Precision:
> Alternative Perspectives in Combinatorial Optimization">0</int>
>         <int name=" &quot;A picture is worth 1K words&quot;">0</int>
>       </lst>
>     </lst>
>     <lst name="facet_dates"/>
>   </lst>
> </response>
>
>
> I tried to look for a bug report, but haven't been able to find one that matches. I will try to setup a debug session to get closer, but would love to get feedback if this is a know issue.
>
> cheers,
>
> :-Dennis Schafroth
>
>
>
Reply | Threaded
Open this post in threaded view
|

Re: Garbled facets even in a zero hit search

Dennis Schafroth

I am definitely not excluding the idea that index is garbled, but.. it doesn't explain that I get facets on zero hit.

The schema is as following:



where I have copied field for the facets (author_exact, subject_exact, title_exact), as I don't want tokenization on these.

the request has the following parameters:
facet=true&start=0&q=title:xyzy&f.date.facet.limit=10&f.subject_exact.facet.limit=10&facet.field=author_exact&facet.field=date&facet.field=subject_exact&f.author_exact.facet.limit=10&rows=20

I haven't been able to reproduce it in a test index yet, but I do have two different index that show similar problem (facets on zero hits).  

cheers,
:-Dennis Schafroth

On 09/09/2010, at 15.10, Erick Erickson wrote:

> That looks...er...unfortunate. The very first thing I'd do is check your
> index
> and see if there are such weird values in your facet fields. My guess is
> that SOLR is working fine, but you somehow have garbage values
> in your index, but that's only a guess. I'd try that before trying to
> debug, GIGO.
>
> Which wouldn't answer the question of how the garbage got in there
> in the first place, posting your field type definition for your
> faceted fields would help with that question.
>
> Best
> Erick
>
> On Thu, Sep 9, 2010 at 8:19 AM, Dennis Schafroth <[hidden email]>wrote:
>
>> Hi,
>>
>> Running on a Debian 5.0.5 64bit box. Using
>> solr-1.4.1 with Java version "1.6.0_20"
>>
>> I am seeing weird facets results along with the "right" looking ones.
>> Garbled data, stuff that looks like a buffer overflow / index off by ...
>>
>> And I even get them when I do a zero hit search. I wouldn't expect any
>> facets:
>>
>> <?xml version="1.0" encoding="UTF-8"?>
>> <response>
>>  <lst name="responseHeader">
>>    <int name="status">0</int>
>>    <int name="QTime">56</int>
>>    <lst name="params">
>>      <str name="facet">true</str>
>>      <str name="shards">satay:8985/solr</str>
>>      <str name="start">0</str>
>>      <str name="q">title:xzyzx</str>
>>      <str name="f.date.facet.limit">10</str>
>>      <str name="f.subject_exact.facet.limit">10</str>
>>      <arr name="facet.field">
>>        <str>author_exact</str>
>>        <str>date</str>
>>        <str>subject_exact</str>
>>      </arr>
>>      <str name="f.author_exact.facet.limit">10</str>
>>      <str name="rows">20</str>
>>    </lst>
>>  </lst>
>>  <result name="response" numFound="0" start="0"/>
>>  <lst name="facet_counts">
>>    <lst name="facet_queries"/>
>>    <lst name="facet_fields">
>>      <lst name="author_exact">
>>        <int name=" ">0</int>
>>        <int name=" !;;!">0</int>
>>        <int name=" (Domingo, Juan); Imprenta Tormentaria (Córdoba)">0</
>> int>
>>        <int name=" (Supervisor)">0</int>
>>        <int name=" *">0</int>
>>        <int name=" * ">0</int>
>>        <int name=" * (μτφρ.)">0</int>
>>        <int name=" * * * ">0</int>
>>        <int name=" * * * (μτφρ.)">0</int>
>>        <int name=" * * * *">0</int>
>>      </lst>
>>      <lst name="date">
>>        <int name="0000">0</int>
>>        <int name="0001">0</int>
>>        <int name="0002">0</int>
>>        <int name="0003">0</int>
>>        <int name="0004">0</int>
>>        <int name="0005">0</int>
>>        <int name="0006">0</int>
>>        <int name="0007">0</int>
>>        <int name="0008">0</int>
>>        <int name="0009">0</int>
>>      </lst>
>>      <lst name="subject_exact">
>>        <int name=" ">0</int>
>>        <int name=" ! ! R P R">0</int>
>>        <int name=" !!rrqqyyhqhqwwllrqrqdd!!vvddvv">0</int>
>>        <int name=" !&quot;&quot;$%&quot;( )*+,($&quot;(">0</int>
>>        <int name=" !()+, -./01 23456">0</int>
>>        <int name=" !-decidable and decidable deductive procedures for a
>> restricted FTL with Unless">0</int>
>>        <int name=" !&lt;f87.03...">0</int>
>>        <int name=" &quot;)338-8570">0</int>
>>        <int name=" &quot;-Optimization Schemes and L-Bit Precision:
>> Alternative Perspectives in Combinatorial Optimization">0</int>
>>        <int name=" &quot;A picture is worth 1K words&quot;">0</int>
>>      </lst>
>>    </lst>
>>    <lst name="facet_dates"/>
>>  </lst>
>> </response>
>>
>>
>> I tried to look for a bug report, but haven't been able to find one that matches. I will try to setup a debug session to get closer, but would love to get feedback if this is a know issue.
>>
>> cheers,
>>
>> :-Dennis Schafroth
>>
>>
>>


satay.xml (33K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Garbled facets even in a zero hit search

Markus Jelsma
That's normal behavior if you haven't configured facet.mincount. Check the
wiki.

On Thursday 09 September 2010 16:05:01 Dennis Schafroth wrote:
> I am definitely not excluding the idea that index is garbled, but.. it
>  doesn't explain that I get facets on zero hit.
>
> The schema is as following:
>

Markus Jelsma - Technisch Architect - Buyways BV
http://www.linkedin.com/in/markus17
050-8536620 / 06-50258350

Reply | Threaded
Open this post in threaded view
|

Re: Garbled facets even in a zero hit search

Dennis Schafroth

Thanks, that did it.

On 09/09/2010, at 16.14, Markus Jelsma wrote:

> That's normal behavior if you haven't configured facet.mincount. Check the
> wiki.
>
> On Thursday 09 September 2010 16:05:01 Dennis Schafroth wrote:
>> I am definitely not excluding the idea that index is garbled, but.. it
>> doesn't explain that I get facets on zero hit.
>>
>> The schema is as following:
>>
>
> Markus Jelsma - Technisch Architect - Buyways BV
> http://www.linkedin.com/in/markus17
> 050-8536620 / 06-50258350
>
>