Large Data set relationships handling

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Large Data set relationships handling

Lucky Sharma
Hi all,
Needed help in  one use case :
It is like when you have  2 sets of data suppose A and B, which are
linked to each other. For example, each entity of set X can have 1 to
many relationships to the set B, and as a result, I need the
sorted/faceted values of the values from Set B.
For example entity x(i) from Set A, can have a relation which all the
values in the Set B. and another entity x(j) from Set A can have
[y(i)... y(j)] values from set B.


* both the data sets are too larger.

One Idea was too just have data of Set B, and we just put fq for all
the values of which Set X can have and then we can do sort and
faceting on them.
but since the data size is +1000 it will never be a good approach.

Another Idea is we can create a parent-child data relationship as 2
different collections and then perform join over them,

Please review and suggest if there could be any other way possible of
solving this problem.



--
Warm Regards,

Lucky Sharma
Contact No: +91 9821559918
Reply | Threaded
Open this post in threaded view
|

Re: Large Data set relationships handling

Mikhail Khludnev-2
On Thu, Jun 20, 2019 at 5:47 PM Lucky Sharma <[hidden email]> wrote:

> Hi all,
> Needed help in  one use case :
> It is like when you have  2 sets of data suppose A and B, which are
> linked to each other. For example, each entity of set X can have 1 to
> many relationships to the set B, and as a result, I need the
> sorted/faceted values of the values from Set B.
> For example entity x(i) from Set A, can have a relation which all the
> values in the Set B. and another entity x(j) from Set A can have
> [y(i)... y(j)] values from set B.
>
>
> * both the data sets are too larger.
>
> One Idea was too just have data of Set B, and we just put fq for all
> the values of which Set X can have and then we can do sort and
> faceting on them.
> but since the data size is +1000 it will never be a good approach.
>
1. this is what "lucene join" does underneath. It's enabled by score=none
see
https://lucene.apache.org/solr/guide/7_2/other-parsers.html#OtherParsers-JoinQueryParser
2. this requires proper sharding, linked data should reside the same shard,
otherwise - no way.
3. note, when you say fq with all values, hopefully it might be achieved
with {!terms} qp, which way more powerful than bare {!lucene}'s bq.
4. the set notation above confuses me a little, it might seem many-to-many
indeed.


>
> Another Idea is we can create a parent-child data relationship as 2
> different collections and then perform join over them,
>

Query-time join can't handle two sharded collection, although there some
plugins and patches claiming so.
 Index time join aka Block join or {!parent} requires docs to be
collocated.


>
> Please review and suggest if there could be any other way possible of
> solving this problem.
>
>
>
> --
> Warm Regards,
>
> Lucky Sharma
> Contact No: +91 9821559918
>


--
Sincerely yours
Mikhail Khludnev
Reply | Threaded
Open this post in threaded view
|

Re: Large Data set relationships handling

Lucky Sharma
Hi Mikhail,
Sorry for the confusion, it is indeed many to many relationship. Is there
any possibility to make both the index and just query on the elements of
set A and we can get the results of set B,

As you mentioned join seems trivial.
Is there any other design which can be looked upon.

Regarding the terms query : is there any max number of ids which we can
pass as a filter. Also sincethere could be large set will that not impact
the query tree. ??

Regards,
Lucky Sharma

On Fri, 21 Jun, 2019, 1:36 AM Mikhail Khludnev, <[hidden email]> wrote:

> On Thu, Jun 20, 2019 at 5:47 PM Lucky Sharma <[hidden email]> wrote:
>
> > Hi all,
> > Needed help in  one use case :
> > It is like when you have  2 sets of data suppose A and B, which are
> > linked to each other. For example, each entity of set X can have 1 to
> > many relationships to the set B, and as a result, I need the
> > sorted/faceted values of the values from Set B.
> > For example entity x(i) from Set A, can have a relation which all the
> > values in the Set B. and another entity x(j) from Set A can have
> > [y(i)... y(j)] values from set B.
> >
> >
> > * both the data sets are too larger.
> >
> > One Idea was too just have data of Set B, and we just put fq for all
> > the values of which Set X can have and then we can do sort and
> > faceting on them.
> > but since the data size is +1000 it will never be a good approach.
> >
> 1. this is what "lucene join" does underneath. It's enabled by score=none
> see
>
> https://lucene.apache.org/solr/guide/7_2/other-parsers.html#OtherParsers-JoinQueryParser
> 2. this requires proper sharding, linked data should reside the same shard,
> otherwise - no way.
> 3. note, when you say fq with all values, hopefully it might be achieved
> with {!terms} qp, which way more powerful than bare {!lucene}'s bq.
> 4. the set notation above confuses me a little, it might seem many-to-many
> indeed.
>
>
> >
> > Another Idea is we can create a parent-child data relationship as 2
> > different collections and then perform join over them,
> >
>
> Query-time join can't handle two sharded collection, although there some
> plugins and patches claiming so.
>  Index time join aka Block join or {!parent} requires docs to be
> collocated.
>
>
> >
> > Please review and suggest if there could be any other way possible of
> > solving this problem.
> >
> >
> >
> > --
> > Warm Regards,
> >
> > Lucky Sharma
> > Contact No: +91 9821559918
> >
>
>
> --
> Sincerely yours
> Mikhail Khludnev
>