Solr and Permissions

classic Classic list List threaded Threaded
17 messages Options
Reply | Threaded
Open this post in threaded view
|

Solr and Permissions

Liam O'Boyle
Morning,

We use solr to index a range of content to which, within our application,
access is restricted by a system of user groups and permissions.  In order
to ensure that search results don't reveal information about items which the
user doesn't have access to, we need to somehow filter the results; this
needs to be done within Solr itself, rather than after retrieval, so that
the facet and result counts are correct.

Currently we do this by creating a filter query which specifies all of the
items which may be allowed to match (e.g. id: (foo OR bar OR blarg OR ...)),
but this has definite scalability issues - we're starting to run into
issues, as this can be a set of ORs of potentially unlimited size (and
practically, we're hitting the low thousands sometimes).  While we can
adjust maxBooleanClauses upwards, I understand that this has performance
implications...

So, has anyone had to implement something similar in the past?  Any
suggestions for a more scalable approach?  Any advice on safe and sensible
limits on how far I can push maxBooleanClauses?

Thanks for your advice,

Liam
Reply | Threaded
Open this post in threaded view
|

Re: Solr and Permissions

Sujit Pal
How about assigning content types to documents in the index, and map
users to a set of content types they are allowed to access? That way you
will pass in fewer parameters in the fq.

-sujit

On Fri, 2011-03-11 at 11:53 +1100, Liam O'Boyle wrote:

> Morning,
>
> We use solr to index a range of content to which, within our application,
> access is restricted by a system of user groups and permissions.  In order
> to ensure that search results don't reveal information about items which the
> user doesn't have access to, we need to somehow filter the results; this
> needs to be done within Solr itself, rather than after retrieval, so that
> the facet and result counts are correct.
>
> Currently we do this by creating a filter query which specifies all of the
> items which may be allowed to match (e.g. id: (foo OR bar OR blarg OR ...)),
> but this has definite scalability issues - we're starting to run into
> issues, as this can be a set of ORs of potentially unlimited size (and
> practically, we're hitting the low thousands sometimes).  While we can
> adjust maxBooleanClauses upwards, I understand that this has performance
> implications...
>
> So, has anyone had to implement something similar in the past?  Any
> suggestions for a more scalable approach?  Any advice on safe and sensible
> limits on how far I can push maxBooleanClauses?
>
> Thanks for your advice,
>
> Liam

Reply | Threaded
Open this post in threaded view
|

Re: Solr and Permissions

canal
I have similar requirements.

Content type is one solution; but there are also other use cases where this not
enough.

Another requirement is, when the access permission is changed, we need to update
the field - my understanding is we can not unless re-index the whole document
again. Am I correct?
 thanks,
canal




________________________________
From: Sujit Pal <[hidden email]>
To: [hidden email]
Sent: Fri, March 11, 2011 10:39:27 AM
Subject: Re: Solr and Permissions

How about assigning content types to documents in the index, and map
users to a set of content types they are allowed to access? That way you
will pass in fewer parameters in the fq.

-sujit

On Fri, 2011-03-11 at 11:53 +1100, Liam O'Boyle wrote:

> Morning,
>
> We use solr to index a range of content to which, within our application,
> access is restricted by a system of user groups and permissions.  In order
> to ensure that search results don't reveal information about items which the
> user doesn't have access to, we need to somehow filter the results; this
> needs to be done within Solr itself, rather than after retrieval, so that
> the facet and result counts are correct.
>
> Currently we do this by creating a filter query which specifies all of the
> items which may be allowed to match (e.g. id: (foo OR bar OR blarg OR ...)),
> but this has definite scalability issues - we're starting to run into
> issues, as this can be a set of ORs of potentially unlimited size (and
> practically, we're hitting the low thousands sometimes).  While we can
> adjust maxBooleanClauses upwards, I understand that this has performance
> implications...
>
> So, has anyone had to implement something similar in the past?  Any
> suggestions for a more scalable approach?  Any advice on safe and sensible
> limits on how far I can push maxBooleanClauses?
>
> Thanks for your advice,
>
> Liam


     
Reply | Threaded
Open this post in threaded view
|

Re: Solr and Permissions

Liam O'Boyle
As Canal points out,  grouping into types is not always possible.

In our case, permissions are not on a per-type level, but either on a per
"folder" (of which there can be hundreds) or per item in some cases (of
which there can be... any number at all).

Reindexing is also to slow to really be an option; some of the items use
Tika to extract content, which means that we need to reextract the content
(variable length of time; average is about half a second, but on some
documents it will sit there until the connection times out) .  Querying it,
modifying then resubmitting without rerunning content extraction is still
faster, but involves sending even more data over the network; either way is
relatively slow.

Liam

On 11 March 2011 16:24, go canal <[hidden email]> wrote:

> I have similar requirements.
>
> Content type is one solution; but there are also other use cases where this
> not
> enough.
>
> Another requirement is, when the access permission is changed, we need to
> update
> the field - my understanding is we can not unless re-index the whole
> document
> again. Am I correct?
>  thanks,
> canal
>
>
>
>
> ________________________________
> From: Sujit Pal <[hidden email]>
> To: [hidden email]
> Sent: Fri, March 11, 2011 10:39:27 AM
> Subject: Re: Solr and Permissions
>
> How about assigning content types to documents in the index, and map
> users to a set of content types they are allowed to access? That way you
> will pass in fewer parameters in the fq.
>
> -sujit
>
> On Fri, 2011-03-11 at 11:53 +1100, Liam O'Boyle wrote:
> > Morning,
> >
> > We use solr to index a range of content to which, within our application,
> > access is restricted by a system of user groups and permissions.  In
> order
> > to ensure that search results don't reveal information about items which
> the
> > user doesn't have access to, we need to somehow filter the results; this
> > needs to be done within Solr itself, rather than after retrieval, so that
> > the facet and result counts are correct.
> >
> > Currently we do this by creating a filter query which specifies all of
> the
> > items which may be allowed to match (e.g. id: (foo OR bar OR blarg OR
> ...)),
> > but this has definite scalability issues - we're starting to run into
> > issues, as this can be a set of ORs of potentially unlimited size (and
> > practically, we're hitting the low thousands sometimes).  While we can
> > adjust maxBooleanClauses upwards, I understand that this has performance
> > implications...
> >
> > So, has anyone had to implement something similar in the past?  Any
> > suggestions for a more scalable approach?  Any advice on safe and
> sensible
> > limits on how far I can push maxBooleanClauses?
> >
> > Thanks for your advice,
> >
> > Liam
>
>
>
>



--
Liam O'Boyle

IntelligenceBank Pty Ltd
Level 1, 31 Coventry Street Southbank, Victoria 3006, Australia
P:   +613 8618 7810   F:   +613 8618 7899   M: +61 403 88 66 44

*Awarded 2010 "Best New Business" and "Business of the Year" - Business3000
Awards*

This email and any attachments are confidential and may contain legally
privileged information or copyright material. If you are not an intended
recipient, please contact us at once by return email and then delete both
messages. We do not accept liability in connection with transmission of
information using the internet.
Reply | Threaded
Open this post in threaded view
|

Re: Solr and Permissions

canal
To be fair, I think there is a slight difference between a Content Management
and a Search Engine.

Access control at per document level, per type level, supporting dynamic role
changes, etc.are more like  content management use cases; where search solution
like Solr focuses on different set of use cases;

But in real world, any content management systems need full text search; so the
question is to how to support search with permission control.

JackRabbit integrated with Lucene/Tika, this could be one solution but I do not
know its performance and scalability;

CouchDB also integrates with Lucene/Tika, another option?

I have yet to see a Search Engine that provides some sort of Content Management
features like we are discussing here (Solr, Elastic Search ?)


Then the last option is probably to build an application that works with a
document repository with all necessary content management features and Solr
which provides search capability;  and handling the permissions outside Solr?
thanks,
canal




________________________________
From: Liam O'Boyle <[hidden email]>
To: [hidden email]
Cc: go canal <[hidden email]>
Sent: Fri, March 11, 2011 2:28:19 PM
Subject: Re: Solr and Permissions

As Canal points out,  grouping into types is not always possible.

In our case, permissions are not on a per-type level, but either on a per
"folder" (of which there can be hundreds) or per item in some cases (of
which there can be... any number at all).

Reindexing is also to slow to really be an option; some of the items use
Tika to extract content, which means that we need to reextract the content
(variable length of time; average is about half a second, but on some
documents it will sit there until the connection times out) .  Querying it,
modifying then resubmitting without rerunning content extraction is still
faster, but involves sending even more data over the network; either way is
relatively slow.

Liam

On 11 March 2011 16:24, go canal <[hidden email]> wrote:

> I have similar requirements.
>
> Content type is one solution; but there are also other use cases where this
> not
> enough.
>
> Another requirement is, when the access permission is changed, we need to
> update
> the field - my understanding is we can not unless re-index the whole
> document
> again. Am I correct?
>  thanks,
> canal
>
>
>
>
> ________________________________
> From: Sujit Pal <[hidden email]>
> To: [hidden email]
> Sent: Fri, March 11, 2011 10:39:27 AM
> Subject: Re: Solr and Permissions
>
> How about assigning content types to documents in the index, and map
> users to a set of content types they are allowed to access? That way you
> will pass in fewer parameters in the fq.
>
> -sujit
>
> On Fri, 2011-03-11 at 11:53 +1100, Liam O'Boyle wrote:
> > Morning,
> >
> > We use solr to index a range of content to which, within our application,
> > access is restricted by a system of user groups and permissions.  In
> order
> > to ensure that search results don't reveal information about items which
> the
> > user doesn't have access to, we need to somehow filter the results; this
> > needs to be done within Solr itself, rather than after retrieval, so that
> > the facet and result counts are correct.
> >
> > Currently we do this by creating a filter query which specifies all of
> the
> > items which may be allowed to match (e.g. id: (foo OR bar OR blarg OR
> ...)),
> > but this has definite scalability issues - we're starting to run into
> > issues, as this can be a set of ORs of potentially unlimited size (and
> > practically, we're hitting the low thousands sometimes).  While we can
> > adjust maxBooleanClauses upwards, I understand that this has performance
> > implications...
> >
> > So, has anyone had to implement something similar in the past?  Any
> > suggestions for a more scalable approach?  Any advice on safe and
> sensible
> > limits on how far I can push maxBooleanClauses?
> >
> > Thanks for your advice,
> >
> > Liam
>
>
>
>



--
Liam O'Boyle

IntelligenceBank Pty Ltd
Level 1, 31 Coventry Street Southbank, Victoria 3006, Australia
P:   +613 8618 7810   F:   +613 8618 7899   M: +61 403 88 66 44

*Awarded 2010 "Best New Business" and "Business of the Year" - Business3000
Awards*

This email and any attachments are confidential and may contain legally
privileged information or copyright material. If you are not an intended
recipient, please contact us at once by return email and then delete both
messages. We do not accept liability in connection with transmission of
information using the internet.



     
Reply | Threaded
Open this post in threaded view
|

Re: Solr and Permissions

Jan Høydahl / Cominvent
Hi,

Talk to the ManifoldCF guys - they have successfully implemented support for document level security for many repositories including CMC/ECMs and may have some hints for you to write your own Authority connector against your system, which will fetch the ACL for the document and index it with the document itself. This eliminates long query-time filters.

Re-indexing content for which ACLs have changed is a very common way of doing this, and you should not worry too much about performance implications before there is a real issue. In real world, you don't change folder permissions very often, and that will be a cost you'll have to live with. If you worry that this lag between repository state and index state may cause people to see content they are not entitled to, it is possible to do late binding filtering of the result set as well, but I would avoid that if possible.

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com

On 11. mars 2011, at 06.48, go canal wrote:

> To be fair, I think there is a slight difference between a Content Management
> and a Search Engine.
>
> Access control at per document level, per type level, supporting dynamic role
> changes, etc.are more like  content management use cases; where search solution
> like Solr focuses on different set of use cases;
>
> But in real world, any content management systems need full text search; so the
> question is to how to support search with permission control.
>
> JackRabbit integrated with Lucene/Tika, this could be one solution but I do not
> know its performance and scalability;
>
> CouchDB also integrates with Lucene/Tika, another option?
>
> I have yet to see a Search Engine that provides some sort of Content Management
> features like we are discussing here (Solr, Elastic Search ?)
>
>
> Then the last option is probably to build an application that works with a
> document repository with all necessary content management features and Solr
> which provides search capability;  and handling the permissions outside Solr?
> thanks,
> canal
>
>
>
>
> ________________________________
> From: Liam O'Boyle <[hidden email]>
> To: [hidden email]
> Cc: go canal <[hidden email]>
> Sent: Fri, March 11, 2011 2:28:19 PM
> Subject: Re: Solr and Permissions
>
> As Canal points out,  grouping into types is not always possible.
>
> In our case, permissions are not on a per-type level, but either on a per
> "folder" (of which there can be hundreds) or per item in some cases (of
> which there can be... any number at all).
>
> Reindexing is also to slow to really be an option; some of the items use
> Tika to extract content, which means that we need to reextract the content
> (variable length of time; average is about half a second, but on some
> documents it will sit there until the connection times out) .  Querying it,
> modifying then resubmitting without rerunning content extraction is still
> faster, but involves sending even more data over the network; either way is
> relatively slow.
>
> Liam
>
> On 11 March 2011 16:24, go canal <[hidden email]> wrote:
>
>> I have similar requirements.
>>
>> Content type is one solution; but there are also other use cases where this
>> not
>> enough.
>>
>> Another requirement is, when the access permission is changed, we need to
>> update
>> the field - my understanding is we can not unless re-index the whole
>> document
>> again. Am I correct?
>> thanks,
>> canal
>>
>>
>>
>>
>> ________________________________
>> From: Sujit Pal <[hidden email]>
>> To: [hidden email]
>> Sent: Fri, March 11, 2011 10:39:27 AM
>> Subject: Re: Solr and Permissions
>>
>> How about assigning content types to documents in the index, and map
>> users to a set of content types they are allowed to access? That way you
>> will pass in fewer parameters in the fq.
>>
>> -sujit
>>
>> On Fri, 2011-03-11 at 11:53 +1100, Liam O'Boyle wrote:
>>> Morning,
>>>
>>> We use solr to index a range of content to which, within our application,
>>> access is restricted by a system of user groups and permissions.  In
>> order
>>> to ensure that search results don't reveal information about items which
>> the
>>> user doesn't have access to, we need to somehow filter the results; this
>>> needs to be done within Solr itself, rather than after retrieval, so that
>>> the facet and result counts are correct.
>>>
>>> Currently we do this by creating a filter query which specifies all of
>> the
>>> items which may be allowed to match (e.g. id: (foo OR bar OR blarg OR
>> ...)),
>>> but this has definite scalability issues - we're starting to run into
>>> issues, as this can be a set of ORs of potentially unlimited size (and
>>> practically, we're hitting the low thousands sometimes).  While we can
>>> adjust maxBooleanClauses upwards, I understand that this has performance
>>> implications...
>>>
>>> So, has anyone had to implement something similar in the past?  Any
>>> suggestions for a more scalable approach?  Any advice on safe and
>> sensible
>>> limits on how far I can push maxBooleanClauses?
>>>
>>> Thanks for your advice,
>>>
>>> Liam
>>
>>
>>
>>
>
>
>
> --
> Liam O'Boyle
>
> IntelligenceBank Pty Ltd
> Level 1, 31 Coventry Street Southbank, Victoria 3006, Australia
> P:   +613 8618 7810   F:   +613 8618 7899   M: +61 403 88 66 44
>
> *Awarded 2010 "Best New Business" and "Business of the Year" - Business3000
> Awards*
>
> This email and any attachments are confidential and may contain legally
> privileged information or copyright material. If you are not an intended
> recipient, please contact us at once by return email and then delete both
> messages. We do not accept liability in connection with transmission of
> information using the internet.
>
>
>

Reply | Threaded
Open this post in threaded view
|

RE: Solr and Permissions

Knaak
In reply to this post by Liam O'Boyle
What about using the BitwiseQueryParserPlugin?  

https://issues.apache.org/jira/browse/SOLR-1913

You could encode your documents with a series of permissions based on
Bit flags and then OR them on query.

Tim

-----Original Message-----
From: [hidden email] [mailto:[hidden email]] On
Behalf Of Liam O'Boyle
Sent: Thursday, March 10, 2011 7:53 PM
To: [hidden email]
Subject: Solr and Permissions

Morning,

We use solr to index a range of content to which, within our
application,
access is restricted by a system of user groups and permissions.  In
order
to ensure that search results don't reveal information about items which
the
user doesn't have access to, we need to somehow filter the results; this
needs to be done within Solr itself, rather than after retrieval, so
that
the facet and result counts are correct.

Currently we do this by creating a filter query which specifies all of
the
items which may be allowed to match (e.g. id: (foo OR bar OR blarg OR
...)),
but this has definite scalability issues - we're starting to run into
issues, as this can be a set of ORs of potentially unlimited size (and
practically, we're hitting the low thousands sometimes).  While we can
adjust maxBooleanClauses upwards, I understand that this has performance
implications...

So, has anyone had to implement something similar in the past?  Any
suggestions for a more scalable approach?  Any advice on safe and
sensible
limits on how far I can push maxBooleanClauses?

Thanks for your advice,

Liam
Reply | Threaded
Open this post in threaded view
|

Re: Solr and Permissions

canal
In reply to this post by Jan Høydahl / Cominvent
Thank you Jan, I will take a look at the MainfoldCF.
So it seems that the solution is basically to implement something outside of
Solr for permission control.
thanks,
canal




________________________________
From: Jan Høydahl <[hidden email]>
To: [hidden email]
Sent: Fri, March 11, 2011 4:17:22 PM
Subject: Re: Solr and Permissions

Hi,

Talk to the ManifoldCF guys - they have successfully implemented support for
document level security for many repositories including CMC/ECMs and may have
some hints for you to write your own Authority connector against your system,
which will fetch the ACL for the document and index it with the document itself.
This eliminates long query-time filters.

Re-indexing content for which ACLs have changed is a very common way of doing
this, and you should not worry too much about performance implications before
there is a real issue. In real world, you don't change folder permissions very
often, and that will be a cost you'll have to live with. If you worry that this
lag between repository state and index state may cause people to see content
they are not entitled to, it is possible to do late binding filtering of the
result set as well, but I would avoid that if possible.

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com

On 11. mars 2011, at 06.48, go canal wrote:

> To be fair, I think there is a slight difference between a Content Management
> and a Search Engine.
>
> Access control at per document level, per type level, supporting dynamic role
> changes, etc.are more like  content management use cases; where search solution
>
> like Solr focuses on different set of use cases;
>
> But in real world, any content management systems need full text search; so the
>
> question is to how to support search with permission control.
>
> JackRabbit integrated with Lucene/Tika, this could be one solution but I do not
>
> know its performance and scalability;
>
> CouchDB also integrates with Lucene/Tika, another option?
>
> I have yet to see a Search Engine that provides some sort of Content Management
>
> features like we are discussing here (Solr, Elastic Search ?)
>
>
> Then the last option is probably to build an application that works with a
> document repository with all necessary content management features and Solr
> which provides search capability;  and handling the permissions outside Solr?
> thanks,
> canal
>
>
>
>
> ________________________________
> From: Liam O'Boyle <[hidden email]>
> To: [hidden email]
> Cc: go canal <[hidden email]>
> Sent: Fri, March 11, 2011 2:28:19 PM
> Subject: Re: Solr and Permissions
>
> As Canal points out,  grouping into types is not always possible.
>
> In our case, permissions are not on a per-type level, but either on a per
> "folder" (of which there can be hundreds) or per item in some cases (of
> which there can be... any number at all).
>
> Reindexing is also to slow to really be an option; some of the items use
> Tika to extract content, which means that we need to reextract the content
> (variable length of time; average is about half a second, but on some
> documents it will sit there until the connection times out) .  Querying it,
> modifying then resubmitting without rerunning content extraction is still
> faster, but involves sending even more data over the network; either way is
> relatively slow.
>
> Liam
>
> On 11 March 2011 16:24, go canal <[hidden email]> wrote:
>
>> I have similar requirements.
>>
>> Content type is one solution; but there are also other use cases where this
>> not
>> enough.
>>
>> Another requirement is, when the access permission is changed, we need to
>> update
>> the field - my understanding is we can not unless re-index the whole
>> document
>> again. Am I correct?
>> thanks,
>> canal
>>
>>
>>
>>
>> ________________________________
>> From: Sujit Pal <[hidden email]>
>> To: [hidden email]
>> Sent: Fri, March 11, 2011 10:39:27 AM
>> Subject: Re: Solr and Permissions
>>
>> How about assigning content types to documents in the index, and map
>> users to a set of content types they are allowed to access? That way you
>> will pass in fewer parameters in the fq.
>>
>> -sujit
>>
>> On Fri, 2011-03-11 at 11:53 +1100, Liam O'Boyle wrote:
>>> Morning,
>>>
>>> We use solr to index a range of content to which, within our application,
>>> access is restricted by a system of user groups and permissions.  In
>> order
>>> to ensure that search results don't reveal information about items which
>> the
>>> user doesn't have access to, we need to somehow filter the results; this
>>> needs to be done within Solr itself, rather than after retrieval, so that
>>> the facet and result counts are correct.
>>>
>>> Currently we do this by creating a filter query which specifies all of
>> the
>>> items which may be allowed to match (e.g. id: (foo OR bar OR blarg OR
>> ...)),
>>> but this has definite scalability issues - we're starting to run into
>>> issues, as this can be a set of ORs of potentially unlimited size (and
>>> practically, we're hitting the low thousands sometimes).  While we can
>>> adjust maxBooleanClauses upwards, I understand that this has performance
>>> implications...
>>>
>>> So, has anyone had to implement something similar in the past?  Any
>>> suggestions for a more scalable approach?  Any advice on safe and
>> sensible
>>> limits on how far I can push maxBooleanClauses?
>>>
>>> Thanks for your advice,
>>>
>>> Liam
>>
>>
>>
>>
>
>
>
> --
> Liam O'Boyle
>
> IntelligenceBank Pty Ltd
> Level 1, 31 Coventry Street Southbank, Victoria 3006, Australia
> P:   +613 8618 7810   F:   +613 8618 7899   M: +61 403 88 66 44
>
> *Awarded 2010 "Best New Business" and "Business of the Year" - Business3000
> Awards*
>
> This email and any attachments are confidential and may contain legally
> privileged information or copyright material. If you are not an intended
> recipient, please contact us at once by return email and then delete both
> messages. We do not accept liability in connection with transmission of
> information using the internet.
>
>
>


Reply | Threaded
Open this post in threaded view
|

Re: Solr and Permissions

Walter Underwood
In reply to this post by canal
On Mar 10, 2011, at 10:48 PM, go canal wrote:

> But in real world, any content management systems need full text search; so the
> question is to how to support search with permission control.
>
> I have yet to see a Search Engine that provides some sort of Content Management
> features like we are discussing here (Solr, Elastic Search ?)


It isn't free, but MarkLogic can do this. It is an XML database with security support and search. Changing permissions is an update transaction, not a reload. Permissions can be part of a search, just like any other constraint.

The search is not the usual crappy search you get in a database. MarkLogic is built with search engine technology, so the search is fast and good.

We do offer a community license for personal, not-for-profit use. See details here:

http://developer.marklogic.com/licensing

wunder
--
Walter Underwood
Lead Engineer, MarkLogic

Reply | Threaded
Open this post in threaded view
|

Re: Solr and Permissions

Billnbell
Why not just add a security field in Solr and use fq to limit to the users permissions?

Bill Bell
Sent from mobile


On Mar 11, 2011, at 10:27 AM, Walter Underwood <[hidden email]> wrote:

> On Mar 10, 2011, at 10:48 PM, go canal wrote:
>
>> But in real world, any content management systems need full text search; so the
>> question is to how to support search with permission control.
>>
>> I have yet to see a Search Engine that provides some sort of Content Management
>> features like we are discussing here (Solr, Elastic Search ?)
>
>
> It isn't free, but MarkLogic can do this. It is an XML database with security support and search. Changing permissions is an update transaction, not a reload. Permissions can be part of a search, just like any other constraint.
>
> The search is not the usual crappy search you get in a database. MarkLogic is built with search engine technology, so the search is fast and good.
>
> We do offer a community license for personal, not-for-profit use. See details here:
>
> http://developer.marklogic.com/licensing
>
> wunder
> --
> Walter Underwood
> Lead Engineer, MarkLogic
>
Reply | Threaded
Open this post in threaded view
|

Re: Solr and Permissions

Walter Underwood
On Mar 11, 2011, at 9:32 AM, Bill Bell wrote:

> Why not just add a security field in Solr and use fq to limit to the users permissions?

You can. When permissions change, you need to reload every affected document. You also need to build the whole security filtering from scratch instead of having it as supported feature in a database.

So, it is slower in operation and more engineering work to create and maintain.

wunder
--
Walter Underwood



Reply | Threaded
Open this post in threaded view
|

Re: Solr and Permissions

britske
In reply to this post by Billnbell
About the 'having to reindex when permissions change'-problem:

have a look at ExternalFileField
http://lucene.apache.org/solr/api/org/apache/solr/schema/ExternalFileField.html
<http://lucene.apache.org/solr/api/org/apache/solr/schema/ExternalFileField.html>which
enables you to reload a file without having to reindex all the documents.

Thinking out loud: multivalued field 'roles' of type ExternalFileField.
- assign each person 1 or multiple roles.
- each document has multiple roles assigned to it (which are entitled to
view it)

Not sure if it (the ExternalFileField approach) scales though.

Geert-Jan


2011/3/11 Bill Bell <[hidden email]>

> Why not just add a security field in Solr and use fq to limit to the users
> permissions?
>
> Bill Bell
> Sent from mobile
>
>
> On Mar 11, 2011, at 10:27 AM, Walter Underwood <[hidden email]>
> wrote:
>
> > On Mar 10, 2011, at 10:48 PM, go canal wrote:
> >
> >> But in real world, any content management systems need full text search;
> so the
> >> question is to how to support search with permission control.
> >>
> >> I have yet to see a Search Engine that provides some sort of Content
> Management
> >> features like we are discussing here (Solr, Elastic Search ?)
> >
> >
> > It isn't free, but MarkLogic can do this. It is an XML database with
> security support and search. Changing permissions is an update transaction,
> not a reload. Permissions can be part of a search, just like any other
> constraint.
> >
> > The search is not the usual crappy search you get in a database.
> MarkLogic is built with search engine technology, so the search is fast and
> good.
> >
> > We do offer a community license for personal, not-for-profit use. See
> details here:
> >
> > http://developer.marklogic.com/licensing
> >
> > wunder
> > --
> > Walter Underwood
> > Lead Engineer, MarkLogic
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: Solr and Permissions

Sujit Pal
In reply to this post by canal
Yes there can be cases where user is allowed a subset of a content type,
or a combination of content type groups and individual documents, where
this would break down.

And yes, afaik, if you want to update the permissions in the document
(seems slightly strange, since you would potentially many more users
than documents, so you may want to think this requirement through some
more), you would need to update the document.

-sujit

On Thu, 2011-03-10 at 21:24 -0800, go canal wrote:

> I have similar requirements.
>
> Content type is one solution; but there are also other use cases where this not
> enough.
>
> Another requirement is, when the access permission is changed, we need to update
> the field - my understanding is we can not unless re-index the whole document
> again. Am I correct?
>  thanks,
> canal
>
>
>
>
> ________________________________
> From: Sujit Pal <[hidden email]>
> To: [hidden email]
> Sent: Fri, March 11, 2011 10:39:27 AM
> Subject: Re: Solr and Permissions
>
> How about assigning content types to documents in the index, and map
> users to a set of content types they are allowed to access? That way you
> will pass in fewer parameters in the fq.
>
> -sujit
>
> On Fri, 2011-03-11 at 11:53 +1100, Liam O'Boyle wrote:
> > Morning,
> >
> > We use solr to index a range of content to which, within our application,
> > access is restricted by a system of user groups and permissions.  In order
> > to ensure that search results don't reveal information about items which the
> > user doesn't have access to, we need to somehow filter the results; this
> > needs to be done within Solr itself, rather than after retrieval, so that
> > the facet and result counts are correct.
> >
> > Currently we do this by creating a filter query which specifies all of the
> > items which may be allowed to match (e.g. id: (foo OR bar OR blarg OR ...)),
> > but this has definite scalability issues - we're starting to run into
> > issues, as this can be a set of ORs of potentially unlimited size (and
> > practically, we're hitting the low thousands sometimes).  While we can
> > adjust maxBooleanClauses upwards, I understand that this has performance
> > implications...
> >
> > So, has anyone had to implement something similar in the past?  Any
> > suggestions for a more scalable approach?  Any advice on safe and sensible
> > limits on how far I can push maxBooleanClauses?
> >
> > Thanks for your advice,
> >
> > Liam
>
>
>      

Reply | Threaded
Open this post in threaded view
|

Re: Solr and Permissions

canal
In reply to this post by britske
Looking at the API doc, it seems that only floating value is currently
supported, is it true?
 thanks,
canal




________________________________
From: Geert-Jan Brits <[hidden email]>
To: [hidden email]
Sent: Sat, March 12, 2011 1:42:38 AM
Subject: Re: Solr and Permissions

About the 'having to reindex when permissions change'-problem:

have a look at ExternalFileField
http://lucene.apache.org/solr/api/org/apache/solr/schema/ExternalFileField.html
<http://lucene.apache.org/solr/api/org/apache/solr/schema/ExternalFileField.html>which

enables you to reload a file without having to reindex all the documents.

Thinking out loud: multivalued field 'roles' of type ExternalFileField.
- assign each person 1 or multiple roles.
- each document has multiple roles assigned to it (which are entitled to
view it)

Not sure if it (the ExternalFileField approach) scales though.

Geert-Jan


2011/3/11 Bill Bell <[hidden email]>

> Why not just add a security field in Solr and use fq to limit to the users
> permissions?
>
> Bill Bell
> Sent from mobile
>
>
> On Mar 11, 2011, at 10:27 AM, Walter Underwood <[hidden email]>
> wrote:
>
> > On Mar 10, 2011, at 10:48 PM, go canal wrote:
> >
> >> But in real world, any content management systems need full text search;
> so the
> >> question is to how to support search with permission control.
> >>
> >> I have yet to see a Search Engine that provides some sort of Content
> Management
> >> features like we are discussing here (Solr, Elastic Search ?)
> >
> >
> > It isn't free, but MarkLogic can do this. It is an XML database with
> security support and search. Changing permissions is an update transaction,
> not a reload. Permissions can be part of a search, just like any other
> constraint.
> >
> > The search is not the usual crappy search you get in a database.
> MarkLogic is built with search engine technology, so the search is fast and
> good.
> >
> > We do offer a community license for personal, not-for-profit use. See
> details here:
> >
> > http://developer.marklogic.com/licensing
> >
> > wunder
> > --
> > Walter Underwood
> > Lead Engineer, MarkLogic
> >
>



     
Reply | Threaded
Open this post in threaded view
|

Re: Solr and Permissions

Koji Sekiguchi
(11/03/12 10:28), go canal wrote:
> Looking at the API doc, it seems that only floating value is currently
> supported, is it true?

Right. And it is just for changing score by using float values in the file,
so it cannot be used for filtering.

Koji
--
http://www.rondhuit.com/en/
Reply | Threaded
Open this post in threaded view
|

Re: Solr and Permissions

britske
Ahh yes, sorry about that. I assumed ExternalFileField would work for
filtering as well. Note to self: never assume
Geert-Jan

2011/3/12 Koji Sekiguchi <[hidden email]>

> (11/03/12 10:28), go canal wrote:
>
>> Looking at the API doc, it seems that only floating value is currently
>> supported, is it true?
>>
>
> Right. And it is just for changing score by using float values in the file,
> so it cannot be used for filtering.
>
> Koji
> --
> http://www.rondhuit.com/en/
>
Reply | Threaded
Open this post in threaded view
|

Re: Solr and Permissions

Liam O'Boyle
In reply to this post by canal
ManifoldCF sounds like it might be the right solution, so long as it's
not secretly building a filter query in the back end, otherwise it
will hit the same limits.

In the meantime, I have made a minor improvement to my filter query;
it now scans the permitted IDs and attempts to build a filter query
using ranges (e.g. instead of 1 OR 2 OR 3 it will filter using [1 TO
3]) which will hopefully keep me going in the meantime.

Liam

On 12 March 2011 01:46, go canal <[hidden email]> wrote:

> Thank you Jan, I will take a look at the MainfoldCF.
> So it seems that the solution is basically to implement something outside of
> Solr for permission control.
> thanks,
> canal
>
>
>
>
> ________________________________
> From: Jan Høydahl <[hidden email]>
> To: [hidden email]
> Sent: Fri, March 11, 2011 4:17:22 PM
> Subject: Re: Solr and Permissions
>
> Hi,
>
> Talk to the ManifoldCF guys - they have successfully implemented support for
> document level security for many repositories including CMC/ECMs and may have
> some hints for you to write your own Authority connector against your system,
> which will fetch the ACL for the document and index it with the document itself.
> This eliminates long query-time filters.
>
> Re-indexing content for which ACLs have changed is a very common way of doing
> this, and you should not worry too much about performance implications before
> there is a real issue. In real world, you don't change folder permissions very
> often, and that will be a cost you'll have to live with. If you worry that this
> lag between repository state and index state may cause people to see content
> they are not entitled to, it is possible to do late binding filtering of the
> result set as well, but I would avoid that if possible.
>
> --
> Jan Høydahl, search solution architect
> Cominvent AS - www.cominvent.com
>
> On 11. mars 2011, at 06.48, go canal wrote:
>
>> To be fair, I think there is a slight difference between a Content Management
>> and a Search Engine.
>>
>> Access control at per document level, per type level, supporting dynamic role
>> changes, etc.are more like  content management use cases; where search solution
>>
>> like Solr focuses on different set of use cases;
>>
>> But in real world, any content management systems need full text search; so the
>>
>> question is to how to support search with permission control.
>>
>> JackRabbit integrated with Lucene/Tika, this could be one solution but I do not
>>
>> know its performance and scalability;
>>
>> CouchDB also integrates with Lucene/Tika, another option?
>>
>> I have yet to see a Search Engine that provides some sort of Content Management
>>
>> features like we are discussing here (Solr, Elastic Search ?)
>>
>>
>> Then the last option is probably to build an application that works with a
>> document repository with all necessary content management features and Solr
>> which provides search capability;  and handling the permissions outside Solr?
>> thanks,
>> canal
>>
>>
>>
>>
>> ________________________________
>> From: Liam O'Boyle <[hidden email]>
>> To: [hidden email]
>> Cc: go canal <[hidden email]>
>> Sent: Fri, March 11, 2011 2:28:19 PM
>> Subject: Re: Solr and Permissions
>>
>> As Canal points out,  grouping into types is not always possible.
>>
>> In our case, permissions are not on a per-type level, but either on a per
>> "folder" (of which there can be hundreds) or per item in some cases (of
>> which there can be... any number at all).
>>
>> Reindexing is also to slow to really be an option; some of the items use
>> Tika to extract content, which means that we need to reextract the content
>> (variable length of time; average is about half a second, but on some
>> documents it will sit there until the connection times out) .  Querying it,
>> modifying then resubmitting without rerunning content extraction is still
>> faster, but involves sending even more data over the network; either way is
>> relatively slow.
>>
>> Liam
>>
>> On 11 March 2011 16:24, go canal <[hidden email]> wrote:
>>
>>> I have similar requirements.
>>>
>>> Content type is one solution; but there are also other use cases where this
>>> not
>>> enough.
>>>
>>> Another requirement is, when the access permission is changed, we need to
>>> update
>>> the field - my understanding is we can not unless re-index the whole
>>> document
>>> again. Am I correct?
>>> thanks,
>>> canal
>>>
>>>
>>>
>>>
>>> ________________________________
>>> From: Sujit Pal <[hidden email]>
>>> To: [hidden email]
>>> Sent: Fri, March 11, 2011 10:39:27 AM
>>> Subject: Re: Solr and Permissions
>>>
>>> How about assigning content types to documents in the index, and map
>>> users to a set of content types they are allowed to access? That way you
>>> will pass in fewer parameters in the fq.
>>>
>>> -sujit
>>>
>>> On Fri, 2011-03-11 at 11:53 +1100, Liam O'Boyle wrote:
>>>> Morning,
>>>>
>>>> We use solr to index a range of content to which, within our application,
>>>> access is restricted by a system of user groups and permissions.  In
>>> order
>>>> to ensure that search results don't reveal information about items which
>>> the
>>>> user doesn't have access to, we need to somehow filter the results; this
>>>> needs to be done within Solr itself, rather than after retrieval, so that
>>>> the facet and result counts are correct.
>>>>
>>>> Currently we do this by creating a filter query which specifies all of
>>> the
>>>> items which may be allowed to match (e.g. id: (foo OR bar OR blarg OR
>>> ...)),
>>>> but this has definite scalability issues - we're starting to run into
>>>> issues, as this can be a set of ORs of potentially unlimited size (and
>>>> practically, we're hitting the low thousands sometimes).  While we can
>>>> adjust maxBooleanClauses upwards, I understand that this has performance
>>>> implications...
>>>>
>>>> So, has anyone had to implement something similar in the past?  Any
>>>> suggestions for a more scalable approach?  Any advice on safe and
>>> sensible
>>>> limits on how far I can push maxBooleanClauses?
>>>>
>>>> Thanks for your advice,
>>>>
>>>> Liam
>>>
>>>
>>>
>>>
>>
>>
>>
>> --
>> Liam O'Boyle
>>
>> IntelligenceBank Pty Ltd
>> Level 1, 31 Coventry Street Southbank, Victoria 3006, Australia
>> P:   +613 8618 7810   F:   +613 8618 7899   M: +61 403 88 66 44
>>
>> *Awarded 2010 "Best New Business" and "Business of the Year" - Business3000
>> Awards*
>>
>> This email and any attachments are confidential and may contain legally
>> privileged information or copyright material. If you are not an intended
>> recipient, please contact us at once by return email and then delete both
>> messages. We do not accept liability in connection with transmission of
>> information using the internet.
>>
>>
>>
>
>
>



--
Liam O'Boyle

IntelligenceBank Pty Ltd
Level 1, 31 Coventry Street Southbank, Victoria 3006, Australia
P:   +613 8618 7810   F:   +613 8618 7899   M: +61 403 88 66 44

Awarded 2010 "Best New Business" and "Business of the Year" -
Business3000 Awards

This email and any attachments are confidential and may contain
legally privileged information or copyright material. If you are not
an intended recipient, please contact us at once by return email and
then delete both messages. We do not accept liability in connection
with transmission of information using the internet.