Get distinct count in json facet

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Get distinct count in json facet

jay harkhani
Hello,

We are using Solr v-6.1.0. We have 2 shards and 2 replica. In collection there are lakhs of documents. When make query it returns around 20000 documents. We need distinct count based on docNumber field in json facet query.  We tried to use both unique and hll function but it not return accurate result. In unique function for more than 100 documents while in hll for more than 7000 documents it gives wrong result.

Some documents with field values as below:

docNumber       poi     status
document 1      1       draft   abc
document 2      2       review
xyz
document 3
1       draft
xyz
document 4
3       review
abc
document 5
1       draft
abc

Following query used to get count from solr:

Using hll function:
json.facet={project_id:{type:terms,field:project_id,limit:100,facet:{distcount:"hll(docNumber)",status:{type:terms,field:status,limit:-1,facet:{distcount:"hll(docNumber)",poi:{type:terms,field:poi,limit:-1,facet:{distcount:"hll(docNumber)"}}}}}}}

Using unique function:
json.facet={project_id:{type:terms,field:project_id,limit:100,facet:{distcount:"unique(docNumber)",status:{type:terms,field:status,limit:-1,facet:{distcount:"unique(docNumber)",poi:{type:terms,field:poi,limit:-1,facet:{distcount:"unique(docNumber)"}}}}}}}

Please suggest approach to get distinct count in json.facet.

Regards,
Jay Harkhani.
Reply | Threaded
Open this post in threaded view
|

Get distinct count in json facet

jay harkhani
Hello,

We are using Solr v-6.1.0. We have 2 shards and 2 replica. In collection there are lakhs of documents. When make query it returns around 20000 documents. We need distinct count based on docNumber field in json facet query.  We tried to use both unique and hll function but it not return accurate result. In unique function for more than 100 documents while in hll for more than 7000 documents it gives wrong result.

Some documents with field values as below:
document 1: docNumber: 1, poi: draft, status: abc
document 2: docNumber: 2, poi: review, status: xyz
document 3: docNumber: 1, poi: draft, status: xyz
document 4: docNumber: 3, poi: review, status: abc
document 5: docNumber: 1, poi: draft, status: abc

Following query used to get count from solr:

Using hll function:
json.facet={project_id:{type:terms,field:project_id,limit:100,facet:{distcount:"hll(docNumber)",status:{type:terms,field:status,limit:-1,facet:{distcount:"hll(docNumber)",poi:{type:terms,field:poi,limit:-1,facet:{distcount:"hll(docNumber)"}}}}}}}

Using unique function:
json.facet={project_id:{type:terms,field:project_id,limit:100,facet:{distcount:"unique(docNumber)",status:{type:terms,field:status,limit:-1,facet:{distcount:"unique(docNumber)",poi:{type:terms,field:poi,limit:-1,facet:{distcount:"unique(docNumber)"}}}}}}}

Please suggest approach to get distinct count in json.facet.

Regards,
Jay Harkhani.