Solr grouping with offset

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

Solr grouping with offset

Vadim Ivanov-2
Hello guys!
I need an advise. My task is to delete some documents in collection.
Del algorithm is following:
Group docs by field1  with sort by field2 and delete every 3 and following occurrences in every group.
Unfortunately I didn't find easy way to do so.
Closest approach was to use group.offset = 2, but  result set is polluted with empty groups with no documents (they have less then 3 docs in group).
May be I'm missing smth and there is way not to receive empty groups in results?
Next approach was to use facet first with facet.mincount=3, then find docs ids by every facet result  and then delete docs by id.
That way seems to me  too complicated for the task.
What's the best use case for the task?
Reply | Threaded
Open this post in threaded view
|

Re: Solr grouping with offset

Paras Lehana
It would be better if you give us an example.

On Fri, 14 Feb 2020 at 17:20, Vadim Ivanov
<[hidden email]> wrote:

> Hello guys!
> I need an advise. My task is to delete some documents in collection.
> Del algorithm is following:
> Group docs by field1  with sort by field2 and delete every 3 and following
> occurrences in every group.
> Unfortunately I didn't find easy way to do so.
> Closest approach was to use group.offset = 2, but  result set is polluted
> with empty groups with no documents (they have less then 3 docs in group).
> May be I'm missing smth and there is way not to receive empty groups in
> results?
> Next approach was to use facet first with facet.mincount=3, then find docs
> ids by every facet result  and then delete docs by id.
> That way seems to me  too complicated for the task.
> What's the best use case for the task?
>


--
--
Regards,

*Paras Lehana* [65871]
Development Engineer, *Auto-Suggest*,
IndiaMART InterMESH Ltd,

11th Floor, Tower 2, Assotech Business Cresterra,
Plot No. 22, Sector 135, Noida, Uttar Pradesh, India 201305

Mob.: +91-9560911996
Work: 0120-4056700 | Extn:
*11096*

--
*
*

 <https://www.facebook.com/IndiaMART/videos/578196442936091/>
Reply | Threaded
Open this post in threaded view
|

RE: Solr grouping with offset

Vadim Ivanov
Example of gtouping with empty groups in results:
Filed1 = rr_group, field2 = rr_updatedate
Problem is that I have tens of million groups in result and only several thousand with  "numFound" >2
   
"params":{
      "q":"*:* ",
      "group.sort":"rr_updatedate desc ",
      "group.limit":"-1",
      "fl":"rr_group,rr_adl,rr_createdate,rr_calctaskkey ",
      "group.offset":"2",
      "wt":"json",
      "group.field":"rr_group",
      "group":"true"}},
  "grouped":{
    "rr_group":{
      "matches":41475082,
      "groups":[{
          "groupValue":"164370:20200707:23:251",
          "doclist":{"numFound":1,"start":2,"docs":[]
          }},
        {
          "groupValue":"163942:20200708:22:251",
          "doclist":{"numFound":1,"start":2,"docs":[]
          }},
        {
          "groupValue":"163943:20200708:22:251",
          "doclist":{"numFound":1,"start":2,"docs":[]
          }},
        {
          "groupValue":"164355:20200708:22:251",
          "doclist":{"numFound":1,"start":2,"docs":[]

> -----Original Message-----
> From: Paras Lehana [mailto:[hidden email]]
> Sent: Friday, February 14, 2020 3:37 PM
> To: [hidden email]
> Subject: Re: Solr grouping with offset
>
> It would be better if you give us an example.
>
> On Fri, 14 Feb 2020 at 17:20, Vadim Ivanov
> <[hidden email]> wrote:
>
> > Hello guys!
> > I need an advise. My task is to delete some documents in collection.
> > Del algorithm is following:
> > Group docs by field1  with sort by field2 and delete every 3 and
> > following occurrences in every group.
> > Unfortunately I didn't find easy way to do so.
> > Closest approach was to use group.offset = 2, but  result set is
> > polluted with empty groups with no documents (they have less then 3 docs
> in group).
> > May be I'm missing smth and there is way not to receive empty groups
> > in results?
> > Next approach was to use facet first with facet.mincount=3, then find
> > docs ids by every facet result  and then delete docs by id.
> > That way seems to me  too complicated for the task.
> > What's the best use case for the task?
> >
>
>
> --
> --
> Regards,
>
> *Paras Lehana* [65871]
> Development Engineer, *Auto-Suggest*,
> IndiaMART InterMESH Ltd,
>
> 11th Floor, Tower 2, Assotech Business Cresterra, Plot No. 22, Sector 135,
> Noida, Uttar Pradesh, India 201305
>
> Mob.: +91-9560911996
> Work: 0120-4056700 | Extn:
> *11096*
>
> --
> *
> *
>
>  <https://www.facebook.com/IndiaMART/videos/578196442936091/>

Reply | Threaded
Open this post in threaded view
|

Re: Solr grouping with offset

Saurabh Sharma
Hi,

If you want to sort on your field and want to put a count restriction too
then you have to use mincount. That seems to be best approach for your
problem.

Thanks
Saurabh

On Fri, Feb 14, 2020, 6:24 PM Vadim Ivanov <
[hidden email]> wrote:

> Example of gtouping with empty groups in results:
> Filed1 = rr_group, field2 = rr_updatedate
> Problem is that I have tens of million groups in result and only several
> thousand with  "numFound" >2
>
> "params":{
>       "q":"*:* ",
>       "group.sort":"rr_updatedate desc ",
>       "group.limit":"-1",
>       "fl":"rr_group,rr_adl,rr_createdate,rr_calctaskkey ",
>       "group.offset":"2",
>       "wt":"json",
>       "group.field":"rr_group",
>       "group":"true"}},
>   "grouped":{
>     "rr_group":{
>       "matches":41475082,
>       "groups":[{
>           "groupValue":"164370:20200707:23:251",
>           "doclist":{"numFound":1,"start":2,"docs":[]
>           }},
>         {
>           "groupValue":"163942:20200708:22:251",
>           "doclist":{"numFound":1,"start":2,"docs":[]
>           }},
>         {
>           "groupValue":"163943:20200708:22:251",
>           "doclist":{"numFound":1,"start":2,"docs":[]
>           }},
>         {
>           "groupValue":"164355:20200708:22:251",
>           "doclist":{"numFound":1,"start":2,"docs":[]
>
> > -----Original Message-----
> > From: Paras Lehana [mailto:[hidden email]]
> > Sent: Friday, February 14, 2020 3:37 PM
> > To: [hidden email]
> > Subject: Re: Solr grouping with offset
> >
> > It would be better if you give us an example.
> >
> > On Fri, 14 Feb 2020 at 17:20, Vadim Ivanov
> > <[hidden email]> wrote:
> >
> > > Hello guys!
> > > I need an advise. My task is to delete some documents in collection.
> > > Del algorithm is following:
> > > Group docs by field1  with sort by field2 and delete every 3 and
> > > following occurrences in every group.
> > > Unfortunately I didn't find easy way to do so.
> > > Closest approach was to use group.offset = 2, but  result set is
> > > polluted with empty groups with no documents (they have less then 3
> docs
> > in group).
> > > May be I'm missing smth and there is way not to receive empty groups
> > > in results?
> > > Next approach was to use facet first with facet.mincount=3, then find
> > > docs ids by every facet result  and then delete docs by id.
> > > That way seems to me  too complicated for the task.
> > > What's the best use case for the task?
> > >
> >
> >
> > --
> > --
> > Regards,
> >
> > *Paras Lehana* [65871]
> > Development Engineer, *Auto-Suggest*,
> > IndiaMART InterMESH Ltd,
> >
> > 11th Floor, Tower 2, Assotech Business Cresterra, Plot No. 22, Sector
> 135,
> > Noida, Uttar Pradesh, India 201305
> >
> > Mob.: +91-9560911996
> > Work: 0120-4056700 | Extn:
> > *11096*
> >
> > --
> > *
> > *
> >
> >  <https://www.facebook.com/IndiaMART/videos/578196442936091/>
>
>
Reply | Threaded
Open this post in threaded view
|

RE: Solr grouping with offset

Vadim Ivanov
group.mincount ? Never heard of it. It exists?
May be you have in mind facet.mincount and second approach mentioned earlier:

> > > > Next approach was to use facet first with facet.mincount=3, then
> > > > find docs ids by every facet result  and then delete docs by id.
> > > > That way seems to me  too complicated for the task.

> -----Original Message-----
> From: Saurabh Sharma [mailto:[hidden email]]
> Sent: Friday, February 14, 2020 4:36 PM
> To: [hidden email]
> Subject: Re: Solr grouping with offset
>
> Hi,
>
> If you want to sort on your field and want to put a count restriction too then
> you have to use mincount. That seems to be best approach for your
> problem.
>
> Thanks
> Saurabh
>
> On Fri, Feb 14, 2020, 6:24 PM Vadim Ivanov < [hidden email]-
> intourist.ru> wrote:
>
> > Example of gtouping with empty groups in results:
> > Filed1 = rr_group, field2 = rr_updatedate Problem is that I have tens
> > of million groups in result and only several thousand with  "numFound"
> > >2
> >
> > "params":{
> >       "q":"*:* ",
> >       "group.sort":"rr_updatedate desc ",
> >       "group.limit":"-1",
> >       "fl":"rr_group,rr_adl,rr_createdate,rr_calctaskkey ",
> >       "group.offset":"2",
> >       "wt":"json",
> >       "group.field":"rr_group",
> >       "group":"true"}},
> >   "grouped":{
> >     "rr_group":{
> >       "matches":41475082,
> >       "groups":[{
> >           "groupValue":"164370:20200707:23:251",
> >           "doclist":{"numFound":1,"start":2,"docs":[]
> >           }},
> >         {
> >           "groupValue":"163942:20200708:22:251",
> >           "doclist":{"numFound":1,"start":2,"docs":[]
> >           }},
> >         {
> >           "groupValue":"163943:20200708:22:251",
> >           "doclist":{"numFound":1,"start":2,"docs":[]
> >           }},
> >         {
> >           "groupValue":"164355:20200708:22:251",
> >           "doclist":{"numFound":1,"start":2,"docs":[]
> >
> > > -----Original Message-----
> > > From: Paras Lehana [mailto:[hidden email]]
> > > Sent: Friday, February 14, 2020 3:37 PM
> > > To: [hidden email]
> > > Subject: Re: Solr grouping with offset
> > >
> > > It would be better if you give us an example.
> > >
> > > On Fri, 14 Feb 2020 at 17:20, Vadim Ivanov
> > > <[hidden email]> wrote:
> > >
> > > > Hello guys!
> > > > I need an advise. My task is to delete some documents in collection.
> > > > Del algorithm is following:
> > > > Group docs by field1  with sort by field2 and delete every 3 and
> > > > following occurrences in every group.
> > > > Unfortunately I didn't find easy way to do so.
> > > > Closest approach was to use group.offset = 2, but  result set is
> > > > polluted with empty groups with no documents (they have less then
> > > > 3
> > docs
> > > in group).
> > > > May be I'm missing smth and there is way not to receive empty
> > > > groups in results?
> > > > Next approach was to use facet first with facet.mincount=3, then
> > > > find docs ids by every facet result  and then delete docs by id.
> > > > That way seems to me  too complicated for the task.
> > > > What's the best use case for the task?
> > > >
> > >
> > >
> > > --
> > > --
> > > Regards,
> > >
> > > *Paras Lehana* [65871]
> > > Development Engineer, *Auto-Suggest*, IndiaMART InterMESH Ltd,
> > >
> > > 11th Floor, Tower 2, Assotech Business Cresterra, Plot No. 22,
> > > Sector
> > 135,
> > > Noida, Uttar Pradesh, India 201305
> > >
> > > Mob.: +91-9560911996
> > > Work: 0120-4056700 | Extn:
> > > *11096*
> > >
> > > --
> > > *
> > > *
> > >
> > >  <https://www.facebook.com/IndiaMART/videos/578196442936091/>
> >
> >

Reply | Threaded
Open this post in threaded view
|

Re: Solr grouping with offset

Saurabh Sharma
Hi,

Yes. I meant facet.mincount only.


Thanks
Saurabh

On Fri, Feb 14, 2020, 8:51 PM Vadim Ivanov <
[hidden email]> wrote:

> group.mincount ? Never heard of it. It exists?
> May be you have in mind facet.mincount and second approach mentioned
> earlier:
>
> > > > > Next approach was to use facet first with facet.mincount=3, then
> > > > > find docs ids by every facet result  and then delete docs by id.
> > > > > That way seems to me  too complicated for the task.
>
> > -----Original Message-----
> > From: Saurabh Sharma [mailto:[hidden email]]
> > Sent: Friday, February 14, 2020 4:36 PM
> > To: [hidden email]
> > Subject: Re: Solr grouping with offset
> >
> > Hi,
> >
> > If you want to sort on your field and want to put a count restriction
> too then
> > you have to use mincount. That seems to be best approach for your
> > problem.
> >
> > Thanks
> > Saurabh
> >
> > On Fri, Feb 14, 2020, 6:24 PM Vadim Ivanov < [hidden email]-
> > intourist.ru> wrote:
> >
> > > Example of gtouping with empty groups in results:
> > > Filed1 = rr_group, field2 = rr_updatedate Problem is that I have tens
> > > of million groups in result and only several thousand with  "numFound"
> > > >2
> > >
> > > "params":{
> > >       "q":"*:* ",
> > >       "group.sort":"rr_updatedate desc ",
> > >       "group.limit":"-1",
> > >       "fl":"rr_group,rr_adl,rr_createdate,rr_calctaskkey ",
> > >       "group.offset":"2",
> > >       "wt":"json",
> > >       "group.field":"rr_group",
> > >       "group":"true"}},
> > >   "grouped":{
> > >     "rr_group":{
> > >       "matches":41475082,
> > >       "groups":[{
> > >           "groupValue":"164370:20200707:23:251",
> > >           "doclist":{"numFound":1,"start":2,"docs":[]
> > >           }},
> > >         {
> > >           "groupValue":"163942:20200708:22:251",
> > >           "doclist":{"numFound":1,"start":2,"docs":[]
> > >           }},
> > >         {
> > >           "groupValue":"163943:20200708:22:251",
> > >           "doclist":{"numFound":1,"start":2,"docs":[]
> > >           }},
> > >         {
> > >           "groupValue":"164355:20200708:22:251",
> > >           "doclist":{"numFound":1,"start":2,"docs":[]
> > >
> > > > -----Original Message-----
> > > > From: Paras Lehana [mailto:[hidden email]]
> > > > Sent: Friday, February 14, 2020 3:37 PM
> > > > To: [hidden email]
> > > > Subject: Re: Solr grouping with offset
> > > >
> > > > It would be better if you give us an example.
> > > >
> > > > On Fri, 14 Feb 2020 at 17:20, Vadim Ivanov
> > > > <[hidden email]> wrote:
> > > >
> > > > > Hello guys!
> > > > > I need an advise. My task is to delete some documents in
> collection.
> > > > > Del algorithm is following:
> > > > > Group docs by field1  with sort by field2 and delete every 3 and
> > > > > following occurrences in every group.
> > > > > Unfortunately I didn't find easy way to do so.
> > > > > Closest approach was to use group.offset = 2, but  result set is
> > > > > polluted with empty groups with no documents (they have less then
> > > > > 3
> > > docs
> > > > in group).
> > > > > May be I'm missing smth and there is way not to receive empty
> > > > > groups in results?
> > > > > Next approach was to use facet first with facet.mincount=3, then
> > > > > find docs ids by every facet result  and then delete docs by id.
> > > > > That way seems to me  too complicated for the task.
> > > > > What's the best use case for the task?
> > > > >
> > > >
> > > >
> > > > --
> > > > --
> > > > Regards,
> > > >
> > > > *Paras Lehana* [65871]
> > > > Development Engineer, *Auto-Suggest*, IndiaMART InterMESH Ltd,
> > > >
> > > > 11th Floor, Tower 2, Assotech Business Cresterra, Plot No. 22,
> > > > Sector
> > > 135,
> > > > Noida, Uttar Pradesh, India 201305
> > > >
> > > > Mob.: +91-9560911996
> > > > Work: 0120-4056700 | Extn:
> > > > *11096*
> > > >
> > > > --
> > > > *
> > > > *
> > > >
> > > >  <https://www.facebook.com/IndiaMART/videos/578196442936091/>
> > >
> > >
>
>