multiple indexes

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

multiple indexes

Kevin Osborn-2
Here is an issue that I am trying to resolve. We have a large catalog of documents, but our customers (several hundred) can only see a subset of those documents. And the subsets vary in size greatly. And some of these customers will be creating a lot of traffic. Also, there is no way to map the subsets to a query. The customer either has access to a document or they don't.

Has anybody worked on this issue before? If I use one large index and do the filtering in my application, then Solr will be serving a lot of useless documents. The counts would also be screwed up for facet queries. Is the best solution to extend Solr and do the filtering there?

The other potential solution is to have one index per customer. This would require one instance of the servlet per index, correct? It just seems like this would require a lot of hardware and complexity (configuring the memory of each servlet instance to index size and traffic).

Index partitioning looks like it could help here, but I see that is still on the task list. I don't know where that is in development, if anywhere.

Reply | Threaded
Open this post in threaded view
|

Re: multiple indexes

Mike Klaas
On 3/22/07, Kevin Osborn <[hidden email]> wrote:
> Here is an issue that I am trying to resolve. We have a large catalog of documents, but our customers (several hundred) can only see a subset of those documents. And the subsets vary in size greatly. And some of these customers will be creating a lot of traffic. Also, there is no way to map the subsets to a query. The customer either has access to a document or they don't.
>
> Has anybody worked on this issue before? If I use one large index and do the filtering in my application, then Solr will be serving a lot of useless documents. The counts would also be screwed up for facet queries. Is the best solution to extend Solr and do the filtering there?
>
> The other potential solution is to have one index per customer. This would require one instance of the servlet per index, correct? It just seems like this would require a lot of hardware and complexity (configuring the memory of each servlet instance to index size and traffic).

Why not create a multivalued field that stores the customer perms?
add has_access:cust1 has_access:cust2, etc to the document at index
time, and turn this into a filter query at query time?

-Mike
Reply | Threaded
Open this post in threaded view
|

Re: multiple indexes

Chris Hostetter-3

: Why not create a multivalued field that stores the customer perms?
: add has_access:cust1 has_access:cust2, etc to the document at index
: time, and turn this into a filter query at query time?

this can be a particularly effective solution when the permissions don't
change at all .. the ideal solution is where each doc is "owned" by one
and only one customer, but either way it's a matter of listing all of the
customers that have access to the document in a field, and filtering on
it. -- for a few hundred customers it's not a lot of work to cache those
filters, autowarming will help ensure that it's efficient.

this approach doesn't scale particulararly well to the tens of thousands
of "users" thta might search your site, but at that point you have to
start thinking about how you model the "access" in your underlying
datamodel ... odds are you have some concept of "public" documents versus
"private" documents, and hte private documents might have Access Control
lists based on "groups" and you can filter on that type of information
instead.



-Hoss

Reply | Threaded
Open this post in threaded view
|

Re: multiple indexes

Maarten.De.Vilder
In reply to this post by Mike Klaas
> Why not create a multivalued field that stores the customer perms?
> add has_access:cust1 has_access:cust2, etc to the document at index
> time, and turn this into a filter query at query time?

that is what we are doing at the moment, and i must say, it works very and
does not slow the server down at all (because of the efficient indexes
that solr builds)





"Mike Klaas" <[hidden email]>
22/03/2007 19:15
Please respond to
[hidden email]


To
[hidden email]
cc

Subject
Re: multiple indexes






On 3/22/07, Kevin Osborn <[hidden email]> wrote:
> Here is an issue that I am trying to resolve. We have a large catalog of
documents, but our customers (several hundred) can only see a subset of
those documents. And the subsets vary in size greatly. And some of these
customers will be creating a lot of traffic. Also, there is no way to map
the subsets to a query. The customer either has access to a document or
they don't.
>
> Has anybody worked on this issue before? If I use one large index and do
the filtering in my application, then Solr will be serving a lot of
useless documents. The counts would also be screwed up for facet queries.
Is the best solution to extend Solr and do the filtering there?
>
> The other potential solution is to have one index per customer. This
would require one instance of the servlet per index, correct? It just
seems like this would require a lot of hardware and complexity
(configuring the memory of each servlet instance to index size and
traffic).

Why not create a multivalued field that stores the customer perms?
add has_access:cust1 has_access:cust2, etc to the document at index
time, and turn this into a filter query at query time?

-Mike