Solr Read-Only?

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

Solr Read-Only?

Terry Steichen
Is it possible to run solr in a read-only directory?

I'm running it just fine on a ubuntu server which is accessible only
through SSH tunneling.  At the platform level, this is fine: only
authorized users can access it (via a browser on their machine accessing
a forwarded port). 

The problem is that it's an all-or-nothing situation so everyone who's
authorized access to the platform has, in effect, administrator
privileges on solr.  I understand that authentication is coming, but
that it isn't here yet.  (Or, to add complexity, I had to downgrade from
7.2.1 to 6.4.2 to overcome a new bug concerning indexing of eml files,
and 6.4.2 definitely doesn't have authentication.)

Anyway, what I was wondering is if it might be possible to run solr not
as me (the administrator), but as a user with lesser privileges so that
no one who came through the SSH tunnel could (inadvertently or
otherwise) screw up the indexes.

Terry

Reply | Threaded
Open this post in threaded view
|

Re: Solr Read-Only?

Christopher Schultz
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

Terry,

On 3/6/18 4:08 PM, Terry Steichen wrote:

> Is it possible to run solr in a read-only directory?
>
> I'm running it just fine on a ubuntu server which is accessible
> only through SSH tunneling.  At the platform level, this is fine:
> only authorized users can access it (via a browser on their machine
> accessing a forwarded port).
>
> The problem is that it's an all-or-nothing situation so everyone
> who's authorized access to the platform has, in effect,
> administrator privileges on solr.  I understand that authentication
> is coming, but that it isn't here yet.  (Or, to add complexity, I
> had to downgrade from 7.2.1 to 6.4.2 to overcome a new bug
> concerning indexing of eml files, and 6.4.2 definitely doesn't have
> authentication.)
>
> Anyway, what I was wondering is if it might be possible to run solr
> not as me (the administrator), but as a user with lesser privileges
> so that no one who came through the SSH tunnel could (inadvertently
> or otherwise) screw up the indexes.

With shell access, the only protection you could provide would be
through file-permissions. But of course Solr will need to be
read-write in order to build the index in the first place. So you'd
probably have to run read-write at first, build the index (perhaps
that's already been done in the past), then (possibly) restart in
read-only mode.

Read-only can be achieved by simply revoking write-access to the data
directories from the euid of the Solr process. Theoretically, you
could switch from being read-write to read-only merely by changing
file-permissions... no Solr restarts required.

I'm not sure if it matters to you very much, but a user can still do
some damage to the index even if the "server" is read-only (through
file-permissions): they can issue a batch of DELETE or ADD requests
that will effect the in-memory copies of the index. It might be
temporary, but it might require that you restart the Solr instance to
get back to a sane state.

Hope that helps,
- -chris
-----BEGIN PGP SIGNATURE-----
Comment: GPGTools - http://gpgtools.org
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQJRBAEBCAA7FiEEMmKgYcQvxMe7tcJcHPApP6U8pFgFAlqfBiEdHGNocmlzQGNo
cmlzdG9waGVyc2NodWx0ei5uZXQACgkQHPApP6U8pFg9WBAAw1AoqeNTmndplMwT
YRLznWAaiSi2/bCzxQEFf8KlTXh80rMc9zVPvMhgqJQYx0EGiMqyUqQEAk1xc/Vq
5XGNk0Vf2efnjA4HVS5pHvhWJz2t4ATagqX6Z98qFvvO0OqkX7lpZat8612jfDYA
f2PmZ1GGlkxZhU7eP4u7FX1drVTFJPBWeUndZoPiSZg6Sj/zz4+rbfaCIEhcl2hC
1CorI3OIos4NgJjLwCqHLCuurkN0+NEJOFE+n2wsEJA69UES8sBo4rwZMR7TECWN
mv+bFHVc4RQIvmppFPSptQIAX4T0k7PgNY38pfGPKgpHgET8RbvpKP34S434uR06
w8jhwOCUOSY7iUP718vbzK9RKcJFzYB6hb2hIUe/C8Hig2K1EfOys7NHd96uBYvS
7fKL6zHByCw9Fw+XiA1O8q5D6Clo3DAWEix5JUl7FDmbXIeUftHEmzb7axfDisec
B80ZYFSUmtOAshaRhKT1dSaw6wIi8io/VDYw+UMIyKh4MFZFDDiN2fF8JLwGkFF4
whZvIaaP8iUBdrhc6ZlOupMA2mjjq+ugAjelyeVjxc/ogaqSOQzIyah7NgW0yvYY
u7xaMsVSg6OJWluAe6lEh0U1CYpdBABgdkSjs7rHefIQ/n4du+7sq0fQUcE32dX8
jMOD3In9TqX4JXP3c6EDfMQCN1g=
=FrpI
-----END PGP SIGNATURE-----
Reply | Threaded
Open this post in threaded view
|

Re: Solr Read-Only?

Terry Steichen
Chris,

Thanks for your suggestion.  Restarting solr after an in-memory
corruption is, of course, trivial (compared to rebuilding the indexes).

Are there any solr directories that MUST be read/write (even with a
pre-built index)?  Would it suffice (for my purposes) to make only the
data/index directory R-O?

Terry


On 03/06/2018 04:20 PM, Christopher Schultz wrote:

> Terry,
>
> On 3/6/18 4:08 PM, Terry Steichen wrote:
> > Is it possible to run solr in a read-only directory?
>
> > I'm running it just fine on a ubuntu server which is accessible
> > only through SSH tunneling.  At the platform level, this is fine:
> > only authorized users can access it (via a browser on their machine
> > accessing a forwarded port).
>
> > The problem is that it's an all-or-nothing situation so everyone
> > who's authorized access to the platform has, in effect,
> > administrator privileges on solr.  I understand that authentication
> > is coming, but that it isn't here yet.  (Or, to add complexity, I
> > had to downgrade from 7.2.1 to 6.4.2 to overcome a new bug
> > concerning indexing of eml files, and 6.4.2 definitely doesn't have
> > authentication.)
>
> > Anyway, what I was wondering is if it might be possible to run solr
> > not as me (the administrator), but as a user with lesser privileges
> > so that no one who came through the SSH tunnel could (inadvertently
> > or otherwise) screw up the indexes.
>
> With shell access, the only protection you could provide would be
> through file-permissions. But of course Solr will need to be
> read-write in order to build the index in the first place. So you'd
> probably have to run read-write at first, build the index (perhaps
> that's already been done in the past), then (possibly) restart in
> read-only mode.
>
> Read-only can be achieved by simply revoking write-access to the data
> directories from the euid of the Solr process. Theoretically, you
> could switch from being read-write to read-only merely by changing
> file-permissions... no Solr restarts required.
>
> I'm not sure if it matters to you very much, but a user can still do
> some damage to the index even if the "server" is read-only (through
> file-permissions): they can issue a batch of DELETE or ADD requests
> that will effect the in-memory copies of the index. It might be
> temporary, but it might require that you restart the Solr instance to
> get back to a sane state.
>
> Hope that helps,
> -chris
>

Reply | Threaded
Open this post in threaded view
|

Re: Solr Read-Only?

Christopher Schultz
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

Terry,

On 3/6/18 4:55 PM, Terry Steichen wrote:
> Chris,
>
> Thanks for your suggestion.  Restarting solr after an in-memory
> corruption is, of course, trivial (compared to rebuilding the
> indexes).
>
> Are there any solr directories that MUST be read/write (even with
> a pre-built index)?  Would it suffice (for my purposes) to make
> only the data/index directory R-O?

I installed Solr for the first time 2 weeks ago, so I'm not a great
resource, here. But I've used Lucene in the past and the on-disk
storage is basically the same AFAICT.

When starting with a expand-the-tarball-and-just-go-for-it deployment
model, I'd probably make sure that the server/solr directory and
everything below it was non-writable by the Solr-user.

Obviously, once you have set this up in a test lab, just try to break
it and see what happens :)

- -chris

> On 03/06/2018 04:20 PM, Christopher Schultz wrote:
>> Terry,
>>
>> On 3/6/18 4:08 PM, Terry Steichen wrote:
>>> Is it possible to run solr in a read-only directory?
>>
>>> I'm running it just fine on a ubuntu server which is
>>> accessible only through SSH tunneling.  At the platform level,
>>> this is fine: only authorized users can access it (via a
>>> browser on their machine accessing a forwarded port).
>>
>>> The problem is that it's an all-or-nothing situation so
>>> everyone who's authorized access to the platform has, in
>>> effect, administrator privileges on solr.  I understand that
>>> authentication is coming, but that it isn't here yet.  (Or, to
>>> add complexity, I had to downgrade from 7.2.1 to 6.4.2 to
>>> overcome a new bug concerning indexing of eml files, and 6.4.2
>>> definitely doesn't have authentication.)
>>
>>> Anyway, what I was wondering is if it might be possible to run
>>> solr not as me (the administrator), but as a user with lesser
>>> privileges so that no one who came through the SSH tunnel could
>>> (inadvertently or otherwise) screw up the indexes.
>>
>> With shell access, the only protection you could provide would
>> be through file-permissions. But of course Solr will need to be
>> read-write in order to build the index in the first place. So
>> you'd probably have to run read-write at first, build the index
>> (perhaps that's already been done in the past), then (possibly)
>> restart in read-only mode.
>>
>> Read-only can be achieved by simply revoking write-access to the
>> data directories from the euid of the Solr process.
>> Theoretically, you could switch from being read-write to
>> read-only merely by changing file-permissions... no Solr restarts
>> required.
>>
>> I'm not sure if it matters to you very much, but a user can still
>> do some damage to the index even if the "server" is read-only
>> (through file-permissions): they can issue a batch of DELETE or
>> ADD requests that will effect the in-memory copies of the index.
>> It might be temporary, but it might require that you restart the
>> Solr instance to get back to a sane state.
>>
>> Hope that helps, -chris
>>
>
-----BEGIN PGP SIGNATURE-----
Comment: GPGTools - http://gpgtools.org
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQJRBAEBCAA7FiEEMmKgYcQvxMe7tcJcHPApP6U8pFgFAlqfFf8dHGNocmlzQGNo
cmlzdG9waGVyc2NodWx0ei5uZXQACgkQHPApP6U8pFhNbQ//SNP5gVLO/Ntt3OA5
9Cg05Gzvc7lNvLQVW1SSDFiQHbAJ91/6CB1N/AHhCTOLyRzmAoYBsOF+wgOuufrV
Z8FZBbSCVACiNi48n+agNfA/QQ79pBgTBaharAZqFaEybxhLgivAw5f9VyhABxSt
5Ceq2UffHzOFL4q8yRSpPPwOTAPnPzSH2Qvsv7039ZRJRehiV5WZiwU318Tkbtoy
M3LbTjWWlm9/IvqzYyf3KuKAytWDIvXs7aSwGi9RI0K9PtGCJwzz4Dp8G6dJCTo3
+2jLe5Q/bRATEwrNO+uriOUk6DOT2+9giUJbyBQjwW2e9jWCxiUCN/NVosjY1M6F
zb9beuQ8Oglkzz/PlcsLpavH7vNayeVhVB2+yGK1L5XiRKz5qtvY7GaFuol4Lb7s
21PR5911vuuw79Kqi7q7srmJF/AtIPbsnBK9c/6Ts6h+VzR1BH+eflec9tSvH5rK
OuSyX6KKFjjMskZglHQz5kzdrn6tb1KLt0+lXr5SZpVSUt6YEtlyZMKDFVuxrLFB
SsZ8jhjxBh2YYYOhPCkan69bZoz4yyoE49g70+raAwKILZi1z4INFJ0Lf0eS9BSg
jXCjUAa+53Ne4/PyVRvycQYEHvPobSyPAW7dMXucldeUmIimn8mC/eLUgV0YTGaM
K6WVWl+oMrE5kLhyUEXtEYcdYwM=
=IAv7
-----END PGP SIGNATURE-----
Reply | Threaded
Open this post in threaded view
|

Re: Solr Read-Only?

Emir Arnautović
In reply to this post by Terry Steichen
Hi Terry,
Maybe you can try alternative approaches like putting some proxy in front of Solr and configure it to let only certain URLs. Other option is to define custom update request processor chain that will not include RunUpdateProcessorFactory - that will prevent accidental index updates.

HTH,
Emir
--
Monitoring - Log Management - Alerting - Anomaly Detection
Solr & Elasticsearch Consulting Support Training - http://sematext.com/



> On 6 Mar 2018, at 22:55, Terry Steichen <[hidden email]> wrote:
>
> Chris,
>
> Thanks for your suggestion.  Restarting solr after an in-memory
> corruption is, of course, trivial (compared to rebuilding the indexes).
>
> Are there any solr directories that MUST be read/write (even with a
> pre-built index)?  Would it suffice (for my purposes) to make only the
> data/index directory R-O?
>
> Terry
>
>
> On 03/06/2018 04:20 PM, Christopher Schultz wrote:
>> Terry,
>>
>> On 3/6/18 4:08 PM, Terry Steichen wrote:
>>> Is it possible to run solr in a read-only directory?
>>
>>> I'm running it just fine on a ubuntu server which is accessible
>>> only through SSH tunneling.  At the platform level, this is fine:
>>> only authorized users can access it (via a browser on their machine
>>> accessing a forwarded port).
>>
>>> The problem is that it's an all-or-nothing situation so everyone
>>> who's authorized access to the platform has, in effect,
>>> administrator privileges on solr.  I understand that authentication
>>> is coming, but that it isn't here yet.  (Or, to add complexity, I
>>> had to downgrade from 7.2.1 to 6.4.2 to overcome a new bug
>>> concerning indexing of eml files, and 6.4.2 definitely doesn't have
>>> authentication.)
>>
>>> Anyway, what I was wondering is if it might be possible to run solr
>>> not as me (the administrator), but as a user with lesser privileges
>>> so that no one who came through the SSH tunnel could (inadvertently
>>> or otherwise) screw up the indexes.
>>
>> With shell access, the only protection you could provide would be
>> through file-permissions. But of course Solr will need to be
>> read-write in order to build the index in the first place. So you'd
>> probably have to run read-write at first, build the index (perhaps
>> that's already been done in the past), then (possibly) restart in
>> read-only mode.
>>
>> Read-only can be achieved by simply revoking write-access to the data
>> directories from the euid of the Solr process. Theoretically, you
>> could switch from being read-write to read-only merely by changing
>> file-permissions... no Solr restarts required.
>>
>> I'm not sure if it matters to you very much, but a user can still do
>> some damage to the index even if the "server" is read-only (through
>> file-permissions): they can issue a batch of DELETE or ADD requests
>> that will effect the in-memory copies of the index. It might be
>> temporary, but it might require that you restart the Solr instance to
>> get back to a sane state.
>>
>> Hope that helps,
>> -chris
>>
>

Reply | Threaded
Open this post in threaded view
|

RE: Solr Read-Only?

Phil Scadden
I would also second the proxy approach. Beside keeping your solr instance behind a firewall and not directly exposed, you can do a lot in a proxy. Per-user control over which index they are access, filtering of queries, etc.

-----Original Message-----
From: Emir Arnautović [mailto:[hidden email]]
Sent: Wednesday, 7 March 2018 10:19 p.m.
To: [hidden email]
Subject: Re: Solr Read-Only?

Hi Terry,
Maybe you can try alternative approaches like putting some proxy in front of Solr and configure it to let only certain URLs. Other option is to define custom update request processor chain that will not include RunUpdateProcessorFactory - that will prevent accidental index updates.

HTH,
Emir
--
Monitoring - Log Management - Alerting - Anomaly Detection Solr & Elasticsearch Consulting Support Training - http://sematext.com/



> On 6 Mar 2018, at 22:55, Terry Steichen <[hidden email]> wrote:
>
> Chris,
>
> Thanks for your suggestion.  Restarting solr after an in-memory
> corruption is, of course, trivial (compared to rebuilding the indexes).
>
> Are there any solr directories that MUST be read/write (even with a
> pre-built index)?  Would it suffice (for my purposes) to make only the
> data/index directory R-O?
>
> Terry
>
>
> On 03/06/2018 04:20 PM, Christopher Schultz wrote:
>> Terry,
>>
>> On 3/6/18 4:08 PM, Terry Steichen wrote:
>>> Is it possible to run solr in a read-only directory?
>>
>>> I'm running it just fine on a ubuntu server which is accessible only
>>> through SSH tunneling.  At the platform level, this is fine:
>>> only authorized users can access it (via a browser on their machine
>>> accessing a forwarded port).
>>
>>> The problem is that it's an all-or-nothing situation so everyone
>>> who's authorized access to the platform has, in effect,
>>> administrator privileges on solr.  I understand that authentication
>>> is coming, but that it isn't here yet.  (Or, to add complexity, I
>>> had to downgrade from 7.2.1 to 6.4.2 to overcome a new bug
>>> concerning indexing of eml files, and 6.4.2 definitely doesn't have
>>> authentication.)
>>
>>> Anyway, what I was wondering is if it might be possible to run solr
>>> not as me (the administrator), but as a user with lesser privileges
>>> so that no one who came through the SSH tunnel could (inadvertently
>>> or otherwise) screw up the indexes.
>>
>> With shell access, the only protection you could provide would be
>> through file-permissions. But of course Solr will need to be
>> read-write in order to build the index in the first place. So you'd
>> probably have to run read-write at first, build the index (perhaps
>> that's already been done in the past), then (possibly) restart in
>> read-only mode.
>>
>> Read-only can be achieved by simply revoking write-access to the data
>> directories from the euid of the Solr process. Theoretically, you
>> could switch from being read-write to read-only merely by changing
>> file-permissions... no Solr restarts required.
>>
>> I'm not sure if it matters to you very much, but a user can still do
>> some damage to the index even if the "server" is read-only (through
>> file-permissions): they can issue a batch of DELETE or ADD requests
>> that will effect the in-memory copies of the index. It might be
>> temporary, but it might require that you restart the Solr instance to
>> get back to a sane state.
>>
>> Hope that helps,
>> -chris
>>
>

Notice: This email and any attachments are confidential and may not be used, published or redistributed without the prior written consent of the Institute of Geological and Nuclear Sciences Limited (GNS Science). If received in error please destroy and immediately notify GNS Science. Do not copy or disclose the contents.
Reply | Threaded
Open this post in threaded view
|

Re: Solr Read-Only?

Shawn Heisey-2
In reply to this post by Terry Steichen
On 3/6/2018 2:08 PM, Terry Steichen wrote:
> Is it possible to run solr in a read-only directory?

Solr can be installed as a service on most operating systems other than
Windows.  A service installer script comes with the download.  It is
installed to run as an unprivileged user, "solr" by default.

The program directory (defaulting to /opt/solr-X.Y.Z, with a symlink at
/opt/solr pointing to the real directory) gets set up so it is owned by
root, so that directory *is* effectively read-only.

The "var dir" defaults to /var/solr and is fully writable by the solr
user.  The solr home defaults to /var/solr/data.

If you want the solr home to be read only, then you will need to turn
off all index locking in your solrconfig.xml files.  When locking is
enabled, which it is by default, Lucene *will* write to the index
directory at startup, and the index will fail to start if it's not able
to make that write.  On startup, it writes to a lockfile, not the index
itself.

https://lucene.apache.org/solr/guide/7_2/indexconfig-in-solrconfig.html#index-locks

Looks like the lockType "none" is not in the documentation, but I'm
pretty sure it's a value you can use.

I would strongly recommend *NOT* making the solr home read only,
*especially* if you're running in SolrCloud mode.

> The problem is that it's an all-or-nothing situation so everyone who's
> authorized access to the platform has, in effect, administrator
> privileges on solr.  I understand that authentication is coming, but
> that it isn't here yet.  (Or, to add complexity, I had to downgrade from
> 7.2.1 to 6.4.2 to overcome a new bug concerning indexing of eml files,
> and 6.4.2 definitely doesn't have authentication.)

Solr has authentication, and has had for a very long time.  Basic
authentication required SolrCloud when it became a workable feature in
5.3.  If you're running standalone mode instead of SolrCloud, then you
need version 6.5.0 to use the authentication plugin.  Is this what you
mean when you say that 6.4.2 doesn't have authentication?  One option
that you DO have with 6.4.2 (and a number of other earlier versions) is
to configure authentication with Kerberos.  But this is a lot more
involved than basic authentication.

If you are using Tika to index those emails, then you should not be
running Tika within Solr.  Eventually Tika is probably going to crash
when trying to read a document with a layout the authors have never seen
before, and when that happens, it'll take any other software (like Solr)
running in the same process down with it.

> Anyway, what I was wondering is if it might be possible to run solr not
> as me (the administrator), but as a user with lesser privileges so that
> no one who came through the SSH tunnel could (inadvertently or
> otherwise) screw up the indexes.

As of version 6.3, Solr will refuse to start if it's run as root,
without a special option to force it.  So this is already there.

https://issues.apache.org/jira/browse/SOLR-9547

I would definitely recommend installing the service so there is a
dedicated unprivileged user account for Solr.

Thanks,
Shawn