Index Size of a tenant

classic Classic list List threaded Threaded
10 messages Options
Reply | Threaded
Open this post in threaded view
|

Index Size of a tenant

Natarajan, Rajeswari
Hi,

We plan to store multiple tenants in a single collection (multiple shards)  with a composite Id router with docId prefix as tenant id.
In this set up, how can a tenant’s index size be found. Solr metrics api gives the core’s index size .But in same core multiple tenants might be present.
Would like to know if there is any out of box solr api available for this case.


Thanks,
Rajeswari
Reply | Threaded
Open this post in threaded view
|

Re: Index Size of a tenant

Jan Høydahl / Cominvent
Why not the obious design choice of one collection per tenant? Are you afraid of Solr not handling a large number of collections?

Jan

> 5. apr. 2021 kl. 06:59 skrev Natarajan, Rajeswari <[hidden email]>:
>
> Hi,
>
> We plan to store multiple tenants in a single collection (multiple shards)  with a composite Id router with docId prefix as tenant id.
> In this set up, how can a tenant’s index size be found. Solr metrics api gives the core’s index size .But in same core multiple tenants might be present.
> Would like to know if there is any out of box solr api available for this case.
>
>
> Thanks,
> Rajeswari

Reply | Threaded
Open this post in threaded view
|

Re: Index Size of a tenant

Natarajan, Rajeswari
Yes, that's correct .

Thanks,
Rajeswari

On 4/5/21, 6:21 AM, "Jan Høydahl" <[hidden email]> wrote:

    Why not the obious design choice of one collection per tenant? Are you afraid of Solr not handling a large number of collections?

    Jan

    > 5. apr. 2021 kl. 06:59 skrev Natarajan, Rajeswari <[hidden email]>:
    >
    > Hi,
    >
    > We plan to store multiple tenants in a single collection (multiple shards)  with a composite Id router with docId prefix as tenant id.
    > In this set up, how can a tenant’s index size be found. Solr metrics api gives the core’s index size .But in same core multiple tenants might be present.
    > Would like to know if there is any out of box solr api available for this case.
    >
    >
    > Thanks,
    > Rajeswari


Reply | Threaded
Open this post in threaded view
|

Re: Index Size of a tenant

Walter Underwood
Assuming each tenant has an ID, you can get the size by searching for tenant_id:1234 and requesting zero rows. We do that for metrics for different document types in the same collection.

wunder
Walter Underwood
[hidden email]
http://observer.wunderwood.org/  (my blog)

> On Apr 5, 2021, at 10:02 AM, Natarajan, Rajeswari <[hidden email]> wrote:
>
> Yes, that's correct .
>
> Thanks,
> Rajeswari
>
> On 4/5/21, 6:21 AM, "Jan Høydahl" <[hidden email]> wrote:
>
>    Why not the obious design choice of one collection per tenant? Are you afraid of Solr not handling a large number of collections?
>
>    Jan
>
>> 5. apr. 2021 kl. 06:59 skrev Natarajan, Rajeswari <[hidden email]>:
>>
>> Hi,
>>
>> We plan to store multiple tenants in a single collection (multiple shards)  with a composite Id router with docId prefix as tenant id.
>> In this set up, how can a tenant’s index size be found. Solr metrics api gives the core’s index size .But in same core multiple tenants might be present.
>> Would like to know if there is any out of box solr api available for this case.
>>
>>
>> Thanks,
>> Rajeswari
>
>

Reply | Threaded
Open this post in threaded view
|

Re: Index Size of a tenant

Natarajan, Rajeswari
I guess you mean number of documents ,not the size of index in disk. We are looking for size of index in disk.

Thanks,
Rajeswari

On 4/5/21, 10:32 AM, "Walter Underwood" <[hidden email]> wrote:

    Assuming each tenant has an ID, you can get the size by searching for tenant_id:1234 and requesting zero rows. We do that for metrics for different document types in the same collection.

    wunder
    Walter Underwood
    [hidden email]
    http://observer.wunderwood.org/  (my blog)

    > On Apr 5, 2021, at 10:02 AM, Natarajan, Rajeswari <[hidden email]> wrote:
    >
    > Yes, that's correct .
    >
    > Thanks,
    > Rajeswari
    >
    > On 4/5/21, 6:21 AM, "Jan Høydahl" <[hidden email]> wrote:
    >
    >    Why not the obious design choice of one collection per tenant? Are you afraid of Solr not handling a large number of collections?
    >
    >    Jan
    >
    >> 5. apr. 2021 kl. 06:59 skrev Natarajan, Rajeswari <[hidden email]>:
    >>
    >> Hi,
    >>
    >> We plan to store multiple tenants in a single collection (multiple shards)  with a composite Id router with docId prefix as tenant id.
    >> In this set up, how can a tenant’s index size be found. Solr metrics api gives the core’s index size .But in same core multiple tenants might be present.
    >> Would like to know if there is any out of box solr api available for this case.
    >>
    >>
    >> Thanks,
    >> Rajeswari
    >
    >


Reply | Threaded
Open this post in threaded view
|

Re: Index Size of a tenant

Walter Underwood
Some index structures are statistics of the entire index, so they don’t belong to one part of it.

So the number you are asking for doesn’t exist. Lucene indexes don’t work like that. If you
made an index with the documents from one tenant, it would not be the same size as the
fraction of a shared index.

Your best approach is to get the entire disk usage and assign the portion of the that by the
portion of docs belonging to a tenant.

But to back up one step, what are you doing with that information? Disk space is not a useful
or stable metric for indexes. It varies with the number of deleted documents, changes during
and after merges, and you need extra unused disk space for Solr to function. That unused space
must be dedicated to Solr, so should be counted even though it doesn’t have index files on it
right now. Solr Cloud needs transaction logs even though those aren’t officially part of the index.

All of that means that there is no API for one tenant’s part of the disk space and there won’t be
an API for it. The question doesn’t make sense for a Solr system.

wunder
Walter Underwood
[hidden email]
http://observer.wunderwood.org/  (my blog)

> On Apr 5, 2021, at 1:17 PM, Natarajan, Rajeswari <[hidden email]> wrote:
>
> I guess you mean number of documents ,not the size of index in disk. We are looking for size of index in disk.
>
> Thanks,
> Rajeswari
>
> On 4/5/21, 10:32 AM, "Walter Underwood" <[hidden email]> wrote:
>
>    Assuming each tenant has an ID, you can get the size by searching for tenant_id:1234 and requesting zero rows. We do that for metrics for different document types in the same collection.
>
>    wunder
>    Walter Underwood
>    [hidden email]
>    http://observer.wunderwood.org/  (my blog)
>
>> On Apr 5, 2021, at 10:02 AM, Natarajan, Rajeswari <[hidden email]> wrote:
>>
>> Yes, that's correct .
>>
>> Thanks,
>> Rajeswari
>>
>> On 4/5/21, 6:21 AM, "Jan Høydahl" <[hidden email]> wrote:
>>
>>   Why not the obious design choice of one collection per tenant? Are you afraid of Solr not handling a large number of collections?
>>
>>   Jan
>>
>>> 5. apr. 2021 kl. 06:59 skrev Natarajan, Rajeswari <[hidden email]>:
>>>
>>> Hi,
>>>
>>> We plan to store multiple tenants in a single collection (multiple shards)  with a composite Id router with docId prefix as tenant id.
>>> In this set up, how can a tenant’s index size be found. Solr metrics api gives the core’s index size .But in same core multiple tenants might be present.
>>> Would like to know if there is any out of box solr api available for this case.
>>>
>>>
>>> Thanks,
>>> Rajeswari

Reply | Threaded
Open this post in threaded view
|

Re: Index Size of a tenant

Natarajan, Rajeswari
Thanks for your reply . We are looking for some strategy to add tenants in a collection. Initially we thought we will go with the
number of documents. But we saw some tenants have less docs ,but their index size is more than the tenants having
more documents, meaning the number of docs and index size is not proportional .  So we are looking to see if any way that exists to
get the size of a tenant's index.

Thanks,
Rajeswari

On 4/5/21, 1:35 PM, "Walter Underwood" <[hidden email]> wrote:

    Some index structures are statistics of the entire index, so they don’t belong to one part of it.

    So the number you are asking for doesn’t exist. Lucene indexes don’t work like that. If you
    made an index with the documents from one tenant, it would not be the same size as the
    fraction of a shared index.

    Your best approach is to get the entire disk usage and assign the portion of the that by the
    portion of docs belonging to a tenant.

    But to back up one step, what are you doing with that information? Disk space is not a useful
    or stable metric for indexes. It varies with the number of deleted documents, changes during
    and after merges, and you need extra unused disk space for Solr to function. That unused space
    must be dedicated to Solr, so should be counted even though it doesn’t have index files on it
    right now. Solr Cloud needs transaction logs even though those aren’t officially part of the index.

    All of that means that there is no API for one tenant’s part of the disk space and there won’t be
    an API for it. The question doesn’t make sense for a Solr system.

    wunder
    Walter Underwood
    [hidden email]
    http://observer.wunderwood.org/  (my blog)

    > On Apr 5, 2021, at 1:17 PM, Natarajan, Rajeswari <[hidden email]> wrote:
    >
    > I guess you mean number of documents ,not the size of index in disk. We are looking for size of index in disk.
    >
    > Thanks,
    > Rajeswari
    >
    > On 4/5/21, 10:32 AM, "Walter Underwood" <[hidden email]> wrote:
    >
    >    Assuming each tenant has an ID, you can get the size by searching for tenant_id:1234 and requesting zero rows. We do that for metrics for different document types in the same collection.
    >
    >    wunder
    >    Walter Underwood
    >    [hidden email]
    >    http://observer.wunderwood.org/  (my blog)
    >
    >> On Apr 5, 2021, at 10:02 AM, Natarajan, Rajeswari <[hidden email]> wrote:
    >>
    >> Yes, that's correct .
    >>
    >> Thanks,
    >> Rajeswari
    >>
    >> On 4/5/21, 6:21 AM, "Jan Høydahl" <[hidden email]> wrote:
    >>
    >>   Why not the obious design choice of one collection per tenant? Are you afraid of Solr not handling a large number of collections?
    >>
    >>   Jan
    >>
    >>> 5. apr. 2021 kl. 06:59 skrev Natarajan, Rajeswari <[hidden email]>:
    >>>
    >>> Hi,
    >>>
    >>> We plan to store multiple tenants in a single collection (multiple shards)  with a composite Id router with docId prefix as tenant id.
    >>> In this set up, how can a tenant’s index size be found. Solr metrics api gives the core’s index size .But in same core multiple tenants might be present.
    >>> Would like to know if there is any out of box solr api available for this case.
    >>>
    >>>
    >>> Thanks,
    >>> Rajeswari


Reply | Threaded
Open this post in threaded view
|

Re: Index Size of a tenant

Natarajan, Rajeswari
If there is any way to get the size of the index of tenant in a collection where multiple tenants co-exist with composite id router scheme ,let me know
We need to somehow track the tenant's index size to see if it grows too big and document count is not proportional to index size in our case.

Thanks,
Rajeswari
 

On 4/5/21, 1:52 PM, "Natarajan, Rajeswari" <[hidden email]> wrote:

    Thanks for your reply . We are looking for some strategy to add tenants in a collection. Initially we thought we will go with the
    number of documents. But we saw some tenants have less docs ,but their index size is more than the tenants having
    more documents, meaning the number of docs and index size is not proportional .  So we are looking to see if any way that exists to
    get the size of a tenant's index.

    Thanks,
    Rajeswari

    On 4/5/21, 1:35 PM, "Walter Underwood" <[hidden email]> wrote:

        Some index structures are statistics of the entire index, so they don’t belong to one part of it.

        So the number you are asking for doesn’t exist. Lucene indexes don’t work like that. If you
        made an index with the documents from one tenant, it would not be the same size as the
        fraction of a shared index.

        Your best approach is to get the entire disk usage and assign the portion of the that by the
        portion of docs belonging to a tenant.

        But to back up one step, what are you doing with that information? Disk space is not a useful
        or stable metric for indexes. It varies with the number of deleted documents, changes during
        and after merges, and you need extra unused disk space for Solr to function. That unused space
        must be dedicated to Solr, so should be counted even though it doesn’t have index files on it
        right now. Solr Cloud needs transaction logs even though those aren’t officially part of the index.

        All of that means that there is no API for one tenant’s part of the disk space and there won’t be
        an API for it. The question doesn’t make sense for a Solr system.

        wunder
        Walter Underwood
        [hidden email]
        http://observer.wunderwood.org/  (my blog)

        > On Apr 5, 2021, at 1:17 PM, Natarajan, Rajeswari <[hidden email]> wrote:
        >
        > I guess you mean number of documents ,not the size of index in disk. We are looking for size of index in disk.
        >
        > Thanks,
        > Rajeswari
        >
        > On 4/5/21, 10:32 AM, "Walter Underwood" <[hidden email]> wrote:
        >
        >    Assuming each tenant has an ID, you can get the size by searching for tenant_id:1234 and requesting zero rows. We do that for metrics for different document types in the same collection.
        >
        >    wunder
        >    Walter Underwood
        >    [hidden email]
        >    http://observer.wunderwood.org/  (my blog)
        >
        >> On Apr 5, 2021, at 10:02 AM, Natarajan, Rajeswari <[hidden email]> wrote:
        >>
        >> Yes, that's correct .
        >>
        >> Thanks,
        >> Rajeswari
        >>
        >> On 4/5/21, 6:21 AM, "Jan Høydahl" <[hidden email]> wrote:
        >>
        >>   Why not the obious design choice of one collection per tenant? Are you afraid of Solr not handling a large number of collections?
        >>
        >>   Jan
        >>
        >>> 5. apr. 2021 kl. 06:59 skrev Natarajan, Rajeswari <[hidden email]>:
        >>>
        >>> Hi,
        >>>
        >>> We plan to store multiple tenants in a single collection (multiple shards)  with a composite Id router with docId prefix as tenant id.
        >>> In this set up, how can a tenant’s index size be found. Solr metrics api gives the core’s index size .But in same core multiple tenants might be present.
        >>> Would like to know if there is any out of box solr api available for this case.
        >>>
        >>>
        >>> Thanks,
        >>> Rajeswari



Reply | Threaded
Open this post in threaded view
|

Re: Index Size of a tenant

Shawn Heisey-2
On 4/7/2021 1:41 PM, Natarajan, Rajeswari wrote:
> If there is any way to get the size of the index of tenant in a collection where multiple tenants co-exist with composite id router scheme ,let me know
> We need to somehow track the tenant's index size to see if it grows too big and document count is not proportional to index size in our case.

There isn't any way to do that.  The way that Lucene's indexes are
designed, obtaining that information is currently impossible, and it
would likely take a VERY large amount of development effort to make it
possible.  I would guess that even if it were possible, obtaining that
information would be very expensive in terms of system resources and time.

The best you can do with current technology is estimate the size based
on document count compared to the whole index.  But if each tenant has
very different kinds of data in the index, that method would probably
give you inaccurate information.

One thing you could do to have each one be its own collection is set up
multiple cloud installs, which can share one zookeeper ensemble by using
different chroot values for each one, and only put a few hundred
collections in each cloud.  This would probably require a lot of
additional hardware, and because of Lucene's economies of scale that
Walter was talking about, multiple collections WILL be larger on disk
than multiple tenants in one collection.

Thanks,
Shawn
Reply | Threaded
Open this post in threaded view
|

Re: Index Size of a tenant

Natarajan, Rajeswari
Thanks much for your reply.
Thanks,
Rajeswari

On 4/7/21, 1:16 PM, "Shawn Heisey" <[hidden email]> wrote:

    On 4/7/2021 1:41 PM, Natarajan, Rajeswari wrote:
    > If there is any way to get the size of the index of tenant in a collection where multiple tenants co-exist with composite id router scheme ,let me know
    > We need to somehow track the tenant's index size to see if it grows too big and document count is not proportional to index size in our case.

    There isn't any way to do that.  The way that Lucene's indexes are
    designed, obtaining that information is currently impossible, and it
    would likely take a VERY large amount of development effort to make it
    possible.  I would guess that even if it were possible, obtaining that
    information would be very expensive in terms of system resources and time.

    The best you can do with current technology is estimate the size based
    on document count compared to the whole index.  But if each tenant has
    very different kinds of data in the index, that method would probably
    give you inaccurate information.

    One thing you could do to have each one be its own collection is set up
    multiple cloud installs, which can share one zookeeper ensemble by using
    different chroot values for each one, and only put a few hundred
    collections in each cloud.  This would probably require a lot of
    additional hardware, and because of Lucene's economies of scale that
    Walter was talking about, multiple collections WILL be larger on disk
    than multiple tenants in one collection.

    Thanks,
    Shawn