Solr design decisions

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Solr design decisions

Greg Georges
Hello all,

I have just finished to book "Solr 1.4 Enterprise Search Server". I now understand most of the basics of Solr and also how we can scale the solution. Our goal is to have a centralized search service for a multitude of apps.

Our first application which we want to index, is a system in which we must index documents through Solr Cell. These documents are associated to certain clients (companies). Each client can have a multitude of users, and each user can be part of a group of users. We have permissions on each physical document in the system, and we want this to also be present in our enterprise search for the system.

I read that we can associate roles and ids to solr documents in order to show only a subset of search results for a particular user. The question I am asking is this. A best practice in Solr is to batch commit changes. The problem in my case is that if we change a documents permissions (role), and if we batch commit there can be a period where the document in the search results can be associated to the old role. What should I do in this case? Should I just commit the change right away? What if this action is done many times by many clients, will the performance still scale even if I do not batch commit my changes? Thanks

Greg
Reply | Threaded
Open this post in threaded view
|

Re: Solr design decisions

Erick Erickson
Your users will have to accept some latency between changed permissions and
those permissions being reflected in the results. The length of that latency is
determined by two things:
1> the interval between when you send the change to Solr (i.e.
re-index the doc)
and issue a commit
AND
2> the time it takes the Solr instance to propgate that change.

Now, for <2> if you have a master/slave setup, the slave's polling interval must
pass before it pulls the changes down. Then there's the "warmup" time that
passes between the time the changes are made (master and/or slave) and the
time the new searcher uses the newly-warmed searcher.

Here's the problem; When a change is committed to an index (we're skipping the
master/slave issue for now), any autowarming takes, say, time T. If you commit
too frequently (some time less than T), then the *first* autowarm
process isn't yet
done when the *second* starts. And if you keep committing pathologically quickly
then you start a death spiral.

So the batching/not batching is less of a problem than the death
spiral. Batch changes
are more efficient, but that speedup is probably less noticeable than
the propagation delays.

All that said, it's not unreasonable to expect, say, a 5 minute delay
between the changes
and when they're reflected in new searches, so I'd start with some
reasonable number,
monitor the warmup times and reduce the commit interval as appropriate....

NOTE: if you have a master/slave setup, and your master isn't used to
search, you can
control this by the polling interval on the slave and commit more
frequently on the
master since it doesn't need to warm searchers.

Finally, there is work being done for NRT (Near Real Time) searching that may
be of interest to you, search for NRT in JIRA if you're interested.

Best
Erick


On Fri, Feb 11, 2011 at 10:22 AM, Greg Georges <[hidden email]> wrote:

> Hello all,
>
> I have just finished to book "Solr 1.4 Enterprise Search Server". I now understand most of the basics of Solr and also how we can scale the solution. Our goal is to have a centralized search service for a multitude of apps.
>
> Our first application which we want to index, is a system in which we must index documents through Solr Cell. These documents are associated to certain clients (companies). Each client can have a multitude of users, and each user can be part of a group of users. We have permissions on each physical document in the system, and we want this to also be present in our enterprise search for the system.
>
> I read that we can associate roles and ids to solr documents in order to show only a subset of search results for a particular user. The question I am asking is this. A best practice in Solr is to batch commit changes. The problem in my case is that if we change a documents permissions (role), and if we batch commit there can be a period where the document in the search results can be associated to the old role. What should I do in this case? Should I just commit the change right away? What if this action is done many times by many clients, will the performance still scale even if I do not batch commit my changes? Thanks
>
> Greg
>
Reply | Threaded
Open this post in threaded view
|

Re: Solr design decisions

Billnbell
In reply to this post by Greg Georges
You could commit on a time schedule. Like every 5 mins. If there is nothing to commit it doesn't do anything anyway.

Bill Bell
Sent from mobile


On Feb 11, 2011, at 8:22 AM, Greg Georges <[hidden email]> wrote:

> Hello all,
>
> I have just finished to book "Solr 1.4 Enterprise Search Server". I now understand most of the basics of Solr and also how we can scale the solution. Our goal is to have a centralized search service for a multitude of apps.
>
> Our first application which we want to index, is a system in which we must index documents through Solr Cell. These documents are associated to certain clients (companies). Each client can have a multitude of users, and each user can be part of a group of users. We have permissions on each physical document in the system, and we want this to also be present in our enterprise search for the system.
>
> I read that we can associate roles and ids to solr documents in order to show only a subset of search results for a particular user. The question I am asking is this. A best practice in Solr is to batch commit changes. The problem in my case is that if we change a documents permissions (role), and if we batch commit there can be a period where the document in the search results can be associated to the old role. What should I do in this case? Should I just commit the change right away? What if this action is done many times by many clients, will the performance still scale even if I do not batch commit my changes? Thanks
>
> Greg
Reply | Threaded
Open this post in threaded view
|

Re: Solr design decisions

Yonik Seeley-2-2
On Fri, Feb 11, 2011 at 10:47 AM, Bill Bell <[hidden email]> wrote:
> You could commit on a time schedule. Like every 5 mins. If there is nothing to commit it doesn't do anything anyway.

It does do something!  A new searcher is opened and caches are invalidated, etc.
I'd recommend normally using commitWithin instead of explicitly
committing or using autocommit.

-Yonik
http://lucidimagination.com
Reply | Threaded
Open this post in threaded view
|

Re: Solr design decisions

Billnbell
Thanks. If you do 2 commits should it do anything? Are people using it to clear caches?



Bill Bell
Sent from mobile


On Feb 11, 2011, at 9:55 AM, Yonik Seeley <[hidden email]> wrote:

> On Fri, Feb 11, 2011 at 10:47 AM, Bill Bell <[hidden email]> wrote:
>> You could commit on a time schedule. Like every 5 mins. If there is nothing to commit it doesn't do anything anyway.
>
> It does do something!  A new searcher is opened and caches are invalidated, etc.
> I'd recommend normally using commitWithin instead of explicitly
> committing or using autocommit.
>
> -Yonik
> http://lucidimagination.com