FacetField-Result on String-Field contains value with count 0?

classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|

FacetField-Result on String-Field contains value with count 0?

Sebastian Riemer
Hi,

Please help me understand: http://localhost:8983/solr/wemi/select?facet.field=m_mediaType_s&facet=on&indent=on&q=*:*&wt=json returns:

"facet_counts":{
    "facet_queries":{},
    "facet_fields":{
      "m_mediaType_s":[
        "2",25561,
        "3",19027,
        "10",1966,
        "11",1705,
        "12",1067,
        "4",1056,
        "5",291,
        "8",68,
        "13",2,
        "6",2,
        "7",1,
        "9",1,
        "1",0]},
    "facet_ranges":{},
    "facet_intervals":{},
    "facet_heatmaps":{}}}

http://localhost:8983/solr/wemi/select?fq=m_mediaType_s:%222%22&indent=on&q=*:*&rows=0&start=0&wt=json


?  "response":{"numFound":25561,"start":0,"docs":[]

http://localhost:8983/solr/wemi/select?fq=m_mediaType_s:%220%22&indent=on&q=*:*&rows=0&start=0&wt=json


?  "response":{"numFound":0,"start":0,"docs":[]

So why does the search for facet.field even contain the value "1", if it does not exist?

And why does it e.g. not contain "SomeReallyCrazyOtherValueWhichLikeValue"1"DoesNotExistButLetsIncludeItInTheFacetFieldsResultListAnywaysWithCountZero" : 0

Best regards,
Sebastian

Additional info, field m_mediaType_s is a string;
<dynamicField name="*_s"     type="string"                   indexed="true"  stored="true" />
<fieldType name="string" class="solr.StrField" sortMissingLast="true" />

Reply | Threaded
Open this post in threaded view
|

AW: FacetField-Result on String-Field contains value with count 0?

Sebastian Riemer
Pardon me,
the second search should have been this: http://localhost:8983/solr/wemi/select?fq=m_mediaType_s:%221%22&indent=on&q=*:*&rows=0&start=0&wt=json 
(or in other words, give me all documents having value "1" for field "m_mediaType_s")

Since this search gives zero results, why is it included in the facet.fields result-count list?

----

Hi,

Please help me understand: http://localhost:8983/solr/wemi/select?facet.field=m_mediaType_s&facet=on&indent=on&q=*:*&wt=json returns:

"facet_counts":{
    "facet_queries":{},
    "facet_fields":{
      "m_mediaType_s":[
        "2",25561,
        "3",19027,
        "10",1966,
        "11",1705,
        "12",1067,
        "4",1056,
        "5",291,
        "8",68,
        "13",2,
        "6",2,
        "7",1,
        "9",1,
        "1",0]},
    "facet_ranges":{},
    "facet_intervals":{},
    "facet_heatmaps":{}}}

http://localhost:8983/solr/wemi/select?fq=m_mediaType_s:%222%22&indent=on&q=*:*&rows=0&start=0&wt=json


?  "response":{"numFound":25561,"start":0,"docs":[]

http://localhost:8983/solr/wemi/select?fq=m_mediaType_s:%220%22&indent=on&q=*:*&rows=0&start=0&wt=json


?  "response":{"numFound":0,"start":0,"docs":[]

So why does the search for facet.field even contain the value "1", if it does not exist?

And why does it e.g. not contain "SomeReallyCrazyOtherValueWhichLikeValue"1"DoesNotExistButLetsIncludeItInTheFacetFieldsResultListAnywaysWithCountZero" : 0

Best regards,
Sebastian

Additional info, field m_mediaType_s is a string;
<dynamicField name="*_s"     type="string"                   indexed="true"  stored="true" />
<fieldType name="string" class="solr.StrField" sortMissingLast="true" />

Reply | Threaded
Open this post in threaded view
|

Re: AW: FacetField-Result on String-Field contains value with count 0?

Billnbell
Set mincount to 1

Bill Bell
Sent from mobile


> On Jan 13, 2017, at 7:19 AM, Sebastian Riemer <[hidden email]> wrote:
>
> Pardon me,
> the second search should have been this: http://localhost:8983/solr/wemi/select?fq=m_mediaType_s:%221%22&indent=on&q=*:*&rows=0&start=0&wt=json 
> (or in other words, give me all documents having value "1" for field "m_mediaType_s")
>
> Since this search gives zero results, why is it included in the facet.fields result-count list?
>
> ----
>
> Hi,
>
> Please help me understand: http://localhost:8983/solr/wemi/select?facet.field=m_mediaType_s&facet=on&indent=on&q=*:*&wt=json returns:
>
> "facet_counts":{
>    "facet_queries":{},
>    "facet_fields":{
>      "m_mediaType_s":[
>        "2",25561,
>        "3",19027,
>        "10",1966,
>        "11",1705,
>        "12",1067,
>        "4",1056,
>        "5",291,
>        "8",68,
>        "13",2,
>        "6",2,
>        "7",1,
>        "9",1,
>        "1",0]},
>    "facet_ranges":{},
>    "facet_intervals":{},
>    "facet_heatmaps":{}}}
>
> http://localhost:8983/solr/wemi/select?fq=m_mediaType_s:%222%22&indent=on&q=*:*&rows=0&start=0&wt=json
>
>
> ?  "response":{"numFound":25561,"start":0,"docs":[]
>
> http://localhost:8983/solr/wemi/select?fq=m_mediaType_s:%220%22&indent=on&q=*:*&rows=0&start=0&wt=json
>
>
> ?  "response":{"numFound":0,"start":0,"docs":[]
>
> So why does the search for facet.field even contain the value "1", if it does not exist?
>
> And why does it e.g. not contain "SomeReallyCrazyOtherValueWhichLikeValue"1"DoesNotExistButLetsIncludeItInTheFacetFieldsResultListAnywaysWithCountZero" : 0
>
> Best regards,
> Sebastian
>
> Additional info, field m_mediaType_s is a string;
> <dynamicField name="*_s"     type="string"                   indexed="true"  stored="true" />
> <fieldType name="string" class="solr.StrField" sortMissingLast="true" />
>
Reply | Threaded
Open this post in threaded view
|

AW: AW: FacetField-Result on String-Field contains value with count 0?

Sebastian Riemer
Hi Bill,

Thanks, that's actually where I come from. But I don't want to exclude values leading to a count of zero.

Background to this: A user searched for mediaType "book" which gave him 10 results. Now some other task/routine whatever changes all those 10 books to be say 10 ebooks, because the type has been incorrect. The user makes a refresh, still looking for "book" gets 0 results (which is expected) and because we rule out facet.fields having count 0, I don't get back the selected mediaType "book" and thus I cannot select this value in the select-dropdown-filter for the mediaType. This leads to confusion for the user, since he has no results, but doesn't see that it's because of he still has that mediaType-filter set to a value "books" which now actually leads to 0 results.

-----Ursprüngliche Nachricht-----
Von: [hidden email] [mailto:[hidden email]]
Gesendet: Freitag, 13. Januar 2017 15:23
An: [hidden email]
Betreff: Re: AW: FacetField-Result on String-Field contains value with count 0?

Set mincount to 1

Bill Bell
Sent from mobile


> On Jan 13, 2017, at 7:19 AM, Sebastian Riemer <[hidden email]> wrote:
>
> Pardon me,
> the second search should have been this:
> http://localhost:8983/solr/wemi/select?fq=m_mediaType_s:%221%22&indent
> =on&q=*:*&rows=0&start=0&wt=json (or in other words, give me all
> documents having value "1" for field "m_mediaType_s")
>
> Since this search gives zero results, why is it included in the facet.fields result-count list?
>
> ----
>
> Hi,
>
> Please help me understand: http://localhost:8983/solr/wemi/select?facet.field=m_mediaType_s&facet=on&indent=on&q=*:*&wt=json returns:
>
> "facet_counts":{
>    "facet_queries":{},
>    "facet_fields":{
>      "m_mediaType_s":[
>        "2",25561,
>        "3",19027,
>        "10",1966,
>        "11",1705,
>        "12",1067,
>        "4",1056,
>        "5",291,
>        "8",68,
>        "13",2,
>        "6",2,
>        "7",1,
>        "9",1,
>        "1",0]},
>    "facet_ranges":{},
>    "facet_intervals":{},
>    "facet_heatmaps":{}}}
>
> http://localhost:8983/solr/wemi/select?fq=m_mediaType_s:%222%22&indent
> =on&q=*:*&rows=0&start=0&wt=json
>
>
> ?  "response":{"numFound":25561,"start":0,"docs":[]
>
> http://localhost:8983/solr/wemi/select?fq=m_mediaType_s:%220%22&indent
> =on&q=*:*&rows=0&start=0&wt=json
>
>
> ?  "response":{"numFound":0,"start":0,"docs":[]
>
> So why does the search for facet.field even contain the value "1", if it does not exist?
>
> And why does it e.g. not contain
> "SomeReallyCrazyOtherValueWhichLikeValue"1"DoesNotExistButLetsIncludeI
> tInTheFacetFieldsResultListAnywaysWithCountZero" : 0
>
> Best regards,
> Sebastian
>
> Additional info, field m_mediaType_s is a string;
> <dynamicField name="*_s"     type="string"                   indexed="true"  stored="true" />
> <fieldType name="string" class="solr.StrField" sortMissingLast="true"
> />
>
Reply | Threaded
Open this post in threaded view
|

Re: FacetField-Result on String-Field contains value with count 0?

Michael Kuhlmann-5
Then I don't understand your problem. Solr already does exactly what you
want.

Maybe the problem is different: I assume that there never was a value of
"1" in the index, leading to your confusion.

Solr returns all fields as facet result where there was some value at
some time as long as the the documents are somewhere in the index, even
when they're marked as indexed. So there must have been a document with
m_mediaType_s=1. Even if all these documents are deleted already, its
values still appear in the facet result.

This holds true until segments get merged so that all deleted documents
are pruned. So if you send a forceMerge request, chances are good that
"1" won't come up any more.

-Michael

Am 13.01.2017 um 15:36 schrieb Sebastian Riemer:

> Hi Bill,
>
> Thanks, that's actually where I come from. But I don't want to exclude values leading to a count of zero.
>
> Background to this: A user searched for mediaType "book" which gave him 10 results. Now some other task/routine whatever changes all those 10 books to be say 10 ebooks, because the type has been incorrect. The user makes a refresh, still looking for "book" gets 0 results (which is expected) and because we rule out facet.fields having count 0, I don't get back the selected mediaType "book" and thus I cannot select this value in the select-dropdown-filter for the mediaType. This leads to confusion for the user, since he has no results, but doesn't see that it's because of he still has that mediaType-filter set to a value "books" which now actually leads to 0 results.
>
> -----Ursprüngliche Nachricht-----
> Von: [hidden email] [mailto:[hidden email]]
> Gesendet: Freitag, 13. Januar 2017 15:23
> An: [hidden email]
> Betreff: Re: AW: FacetField-Result on String-Field contains value with count 0?
>
> Set mincount to 1
>
> Bill Bell
> Sent from mobile
>
>
>> On Jan 13, 2017, at 7:19 AM, Sebastian Riemer <[hidden email]> wrote:
>>
>> Pardon me,
>> the second search should have been this:
>> http://localhost:8983/solr/wemi/select?fq=m_mediaType_s:%221%22&indent
>> =on&q=*:*&rows=0&start=0&wt=json (or in other words, give me all
>> documents having value "1" for field "m_mediaType_s")
>>
>> Since this search gives zero results, why is it included in the facet.fields result-count list?
>>
>> ----
>>
>> Hi,
>>
>> Please help me understand: http://localhost:8983/solr/wemi/select?facet.field=m_mediaType_s&facet=on&indent=on&q=*:*&wt=json returns:
>>
>> "facet_counts":{
>>    "facet_queries":{},
>>    "facet_fields":{
>>      "m_mediaType_s":[
>>        "2",25561,
>>        "3",19027,
>>        "10",1966,
>>        "11",1705,
>>        "12",1067,
>>        "4",1056,
>>        "5",291,
>>        "8",68,
>>        "13",2,
>>        "6",2,
>>        "7",1,
>>        "9",1,
>>        "1",0]},
>>    "facet_ranges":{},
>>    "facet_intervals":{},
>>    "facet_heatmaps":{}}}
>>
>> http://localhost:8983/solr/wemi/select?fq=m_mediaType_s:%222%22&indent
>> =on&q=*:*&rows=0&start=0&wt=json
>>
>>
>> ?  "response":{"numFound":25561,"start":0,"docs":[]
>>
>> http://localhost:8983/solr/wemi/select?fq=m_mediaType_s:%220%22&indent
>> =on&q=*:*&rows=0&start=0&wt=json
>>
>>
>> ?  "response":{"numFound":0,"start":0,"docs":[]
>>
>> So why does the search for facet.field even contain the value "1", if it does not exist?
>>
>> And why does it e.g. not contain
>> "SomeReallyCrazyOtherValueWhichLikeValue"1"DoesNotExistButLetsIncludeI
>> tInTheFacetFieldsResultListAnywaysWithCountZero" : 0
>>
>> Best regards,
>> Sebastian
>>
>> Additional info, field m_mediaType_s is a string;
>> <dynamicField name="*_s"     type="string"                   indexed="true"  stored="true" />
>> <fieldType name="string" class="solr.StrField" sortMissingLast="true"
>> />
>>

Reply | Threaded
Open this post in threaded view
|

Re: AW: FacetField-Result on String-Field contains value with count 0?

Toke Eskildsen-2
In reply to this post by Sebastian Riemer
On Fri, 2017-01-13 at 14:19 +0000, Sebastian Riemer wrote:
> the second search should have been this: http://localhost:8983/solr/w
> emi/select?fq=m_mediaType_s:%221%22&indent=on&q=*:*&rows=0&start=0&wt
> =json 
> (or in other words, give me all documents having value "1" for field
> "m_mediaType_s")
>
> Since this search gives zero results, why is it included in the
> facet.fields result-count list?

Qualified guess (I don't know the JSON faceting code in details):
The list of possible facet values is extracted from the DocValues
structure in the segment files, without respect to documents marked as
deleted. At some point you had one or more documents with
m_mediaType_s:1, which were later deleted.

If your index is not too large, you can verify this by optimizing down
to 1 segment, which will remove all traces of deleted documents (unless
the index is already 1 segment).

If you cannot live with the false terms, committing with
expungeDeletes=true should do the trick, although it is likely to make
your indexing process a lot heavier.

The reason for this inaccuracy is that it is quite heavy to verify
whether a docvalue is referenced by a document: Each time one or more
documents in a segment are deleted, all references from all documents
in that segment would have to be checked to create a correct mapping.
As this only affects mincount=0 combined with your use case where
_all_ documents with a certain docvalue are deleted, my guess it that
it is seen as too much of an edge case to handle.
-- 
Toke Eskildsen, Royal Danish Library

Reply | Threaded
Open this post in threaded view
|

AW: FacetField-Result on String-Field contains value with count 0?

Sebastian Riemer
In reply to this post by Michael Kuhlmann-5
Nice, thank you very much for your explanation!

    >> Solr returns all fields as facet result where there was some value at some time as long as the the documents are somewhere in the index, even when they're marked as indexed. So there must have been a document with m_mediaType_s=1. Even if all these documents are deleted already, its values still appear in the facet result.

I did not know about that! That makes perfect sense. I am quite sure there has been a time where that field contained the value "1". Even more, as now where I rebuild my index, the value "1" is not present as facet.field result anymore.

I'll think about how to deal with my situation then, maybe it would be better to keep solr filtering out 0-count facet-fields and insert the filterquery leading to 0 results into the select-dropdown "manually".

-----Ursprüngliche Nachricht-----
Von: Michael Kuhlmann [mailto:[hidden email]]
Gesendet: Freitag, 13. Januar 2017 15:43
An: [hidden email]
Betreff: Re: FacetField-Result on String-Field contains value with count 0?

Then I don't understand your problem. Solr already does exactly what you want.

Maybe the problem is different: I assume that there never was a value of "1" in the index, leading to your confusion.

Solr returns all fields as facet result where there was some value at some time as long as the the documents are somewhere in the index, even when they're marked as indexed. So there must have been a document with m_mediaType_s=1. Even if all these documents are deleted already, its values still appear in the facet result.

This holds true until segments get merged so that all deleted documents are pruned. So if you send a forceMerge request, chances are good that "1" won't come up any more.

-Michael

Am 13.01.2017 um 15:36 schrieb Sebastian Riemer:

> Hi Bill,
>
> Thanks, that's actually where I come from. But I don't want to exclude values leading to a count of zero.
>
> Background to this: A user searched for mediaType "book" which gave him 10 results. Now some other task/routine whatever changes all those 10 books to be say 10 ebooks, because the type has been incorrect. The user makes a refresh, still looking for "book" gets 0 results (which is expected) and because we rule out facet.fields having count 0, I don't get back the selected mediaType "book" and thus I cannot select this value in the select-dropdown-filter for the mediaType. This leads to confusion for the user, since he has no results, but doesn't see that it's because of he still has that mediaType-filter set to a value "books" which now actually leads to 0 results.
>
> -----Ursprüngliche Nachricht-----
> Von: [hidden email] [mailto:[hidden email]]
> Gesendet: Freitag, 13. Januar 2017 15:23
> An: [hidden email]
> Betreff: Re: AW: FacetField-Result on String-Field contains value with count 0?
>
> Set mincount to 1
>
> Bill Bell
> Sent from mobile
>
>
>> On Jan 13, 2017, at 7:19 AM, Sebastian Riemer <[hidden email]> wrote:
>>
>> Pardon me,
>> the second search should have been this:
>> http://localhost:8983/solr/wemi/select?fq=m_mediaType_s:%221%22&inden
>> t =on&q=*:*&rows=0&start=0&wt=json (or in other words, give me all
>> documents having value "1" for field "m_mediaType_s")
>>
>> Since this search gives zero results, why is it included in the facet.fields result-count list?
>>
>> ----
>>
>> Hi,
>>
>> Please help me understand: http://localhost:8983/solr/wemi/select?facet.field=m_mediaType_s&facet=on&indent=on&q=*:*&wt=json returns:
>>
>> "facet_counts":{
>>    "facet_queries":{},
>>    "facet_fields":{
>>      "m_mediaType_s":[
>>        "2",25561,
>>        "3",19027,
>>        "10",1966,
>>        "11",1705,
>>        "12",1067,
>>        "4",1056,
>>        "5",291,
>>        "8",68,
>>        "13",2,
>>        "6",2,
>>        "7",1,
>>        "9",1,
>>        "1",0]},
>>    "facet_ranges":{},
>>    "facet_intervals":{},
>>    "facet_heatmaps":{}}}
>>
>> http://localhost:8983/solr/wemi/select?fq=m_mediaType_s:%222%22&inden
>> t
>> =on&q=*:*&rows=0&start=0&wt=json
>>
>>
>> ?  "response":{"numFound":25561,"start":0,"docs":[]
>>
>> http://localhost:8983/solr/wemi/select?fq=m_mediaType_s:%220%22&inden
>> t
>> =on&q=*:*&rows=0&start=0&wt=json
>>
>>
>> ?  "response":{"numFound":0,"start":0,"docs":[]
>>
>> So why does the search for facet.field even contain the value "1", if it does not exist?
>>
>> And why does it e.g. not contain
>> "SomeReallyCrazyOtherValueWhichLikeValue"1"DoesNotExistButLetsInclude
>> I tInTheFacetFieldsResultListAnywaysWithCountZero" : 0
>>
>> Best regards,
>> Sebastian
>>
>> Additional info, field m_mediaType_s is a string;
>> <dynamicField name="*_s"     type="string"                   indexed="true"  stored="true" />
>> <fieldType name="string" class="solr.StrField" sortMissingLast="true"
>> />
>>

Reply | Threaded
Open this post in threaded view
|

AW: AW: FacetField-Result on String-Field contains value with count 0?

Sebastian Riemer
In reply to this post by Toke Eskildsen-2
Thanks @Toke,  for pointing out these options. I'll have a read about expungeDeletes.

Sounds even more so, that having solr filter out 0-counts is a good idea and I should handle my use-case outside of solr.

Thanks again,
Sebastian

On Fri, 2017-01-13 at 14:19 +0000, Sebastian Riemer wrote:
> the second search should have been this: http://localhost:8983/solr/w 
> emi/select?fq=m_mediaType_s:%221%22&indent=on&q=*:*&rows=0&start=0&wt
> =json
> (or in other words, give me all documents having value "1" for field
> "m_mediaType_s")
>
> Since this search gives zero results, why is it included in the
> facet.fields result-count list?

Qualified guess (I don't know the JSON faceting code in details):
The list of possible facet values is extracted from the DocValues structure in the segment files, without respect to documents marked as deleted. At some point you had one or more documents with m_mediaType_s:1, which were later deleted.

If your index is not too large, you can verify this by optimizing down to 1 segment, which will remove all traces of deleted documents (unless the index is already 1 segment).

If you cannot live with the false terms, committing with expungeDeletes=true should do the trick, although it is likely to make your indexing process a lot heavier.

The reason for this inaccuracy is that it is quite heavy to verify whether a docvalue is referenced by a document: Each time one or more documents in a segment are deleted, all references from all documents in that segment would have to be checked to create a correct mapping.
As this only affects mincount=0 combined with your use case where _all_ documents with a certain docvalue are deleted, my guess it that it is seen as too much of an edge case to handle.
--
Toke Eskildsen, Royal Danish Library

Reply | Threaded
Open this post in threaded view
|

Re: AW: AW: FacetField-Result on String-Field contains value with count 0?

Shawn Heisey-2
In reply to this post by Sebastian Riemer
On 1/13/2017 7:36 AM, Sebastian Riemer wrote:
> Thanks, that's actually where I come from. But I don't want to exclude values leading to a count of zero.
>
> Background to this: A user searched for mediaType "book" which gave him 10 results. Now some other task/routine whatever changes all those 10 books to be say 10 ebooks, because the type has been incorrect. The user makes a refresh, still looking for "book" gets 0 results (which is expected) and because we rule out facet.fields having count 0, I don't get back the selected mediaType "book" and thus I cannot select this value in the select-dropdown-filter for the mediaType. This leads to confusion for the user, since he has no results, but doesn't see that it's because of he still has that mediaType-filter set to a value "books" which now actually leads to 0 results.

Some users are always going to be confused in one way or another when
something behaves in a way that's contrary to their expectations.  If
you plan your interface correctly, you can eliminate the biggest sources
of confusion ... but there's an applicable saying here:  You can never
make things idiot-proof.  There's always a better idiot.

The facet.mincount parameter is the way to deal with this problem, as
Bill Bell already mentioned.  One of the reasons that facet.mincount
exists is to remove terms that have no documents, but still exist in the
index.

If the q parameter was an actual query instead of "all docs" and the
request didn't have facet.mincount, then the facet for that field would
still have thirteen entries, many of which might be zero.

Thanks,
Shawn