Maintenance of Solr's official Dockerfile

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Maintenance of Solr's official Dockerfile

Jan Høydahl / Cominvent
Hi,

The Lucene project is asked to take over maintenance of the official Solr Dockerfile that ends up on Docker hub (located in https://github.com/docker-solr/docker-solr). We have received a Software Grant from current maintainer Martijn Koster who has done a fantastic job together with a few committers maintaining it.

I think it makes a lot of sense for the project to more tightly support Docker and ensure a good experience running Solr on Docker.

This email thread is to discuss what that may look like and how we should transition the current code into the project.

As a first step we invite all committers and contributors who use Docker to get involved, checkout the current docker-solr git repo, try building the images, submitting PRs etc. I have started doing this myself and have submitted a few PRs.

Next step would be to agree on how we bring the current code into our project and ASF repos in the best possible way. Questions that arise are:

1. Are we allowed to maintain ASF code in a non-ASF repo? If not, how do we transition to an ASF git repo?
    * Can it be a sub folder in our main repo or does it need to be a separate repo?
2. How will the current build/test/publish process need to change?
    * Can we continue using travis for CI?
    * Do we need to talk to Docker folks to change repo location?
    * Should publishing of new Docker be a RM responsibility, or something that happens right after each release like the ref-guide?
3. Legal stuff - when we as a project file a PR to update the official solr docker images, are we then legally releasing a binary version of Solr?
    Technically it is Docker CI that build and publish the images, we just initiate it…
    Do we know any other ASF project that maintain their own official docker image?
4. Practical things - change README, NOTICE, header files, wording etc

I have opened https://issues.apache.org/jira/browse/SOLR-14168 as an umbrella issue for tasks that spin out from this email thread discussion.

Jan Høydahl
Reply | Threaded
Open this post in threaded view
|

Re: Maintenance of Solr's official Dockerfile

Marcus Eagan
Hi Jan,

Thanks or the update, and thanks Jan from Martijn for the donation! :)

I think that regardless of what the community decides to do with the docker-solr repo, a good first step would be to add a Docker folder to the Apache repository that contains a base Dockerfile and a README. In that README, users can be directed to the location of the docker-solr repo, wherever that may be, or leverage the Dockerfile in the  Apache repo as a starting point for building their own image.

Two cents,

Marcus




On Sun, Jan 5, 2020 at 3:52 PM Jan Høydahl <[hidden email]> wrote:
Hi,

The Lucene project is asked to take over maintenance of the official Solr Dockerfile that ends up on Docker hub (located in https://github.com/docker-solr/docker-solr). We have received a Software Grant from current maintainer Martijn Koster who has done a fantastic job together with a few committers maintaining it.

I think it makes a lot of sense for the project to more tightly support Docker and ensure a good experience running Solr on Docker.

This email thread is to discuss what that may look like and how we should transition the current code into the project.

As a first step we invite all committers and contributors who use Docker to get involved, checkout the current docker-solr git repo, try building the images, submitting PRs etc. I have started doing this myself and have submitted a few PRs.

Next step would be to agree on how we bring the current code into our project and ASF repos in the best possible way. Questions that arise are:

1. Are we allowed to maintain ASF code in a non-ASF repo? If not, how do we transition to an ASF git repo?
    * Can it be a sub folder in our main repo or does it need to be a separate repo?
2. How will the current build/test/publish process need to change?
    * Can we continue using travis for CI?
    * Do we need to talk to Docker folks to change repo location?
    * Should publishing of new Docker be a RM responsibility, or something that happens right after each release like the ref-guide?
3. Legal stuff - when we as a project file a PR to update the official solr docker images, are we then legally releasing a binary version of Solr?
    Technically it is Docker CI that build and publish the images, we just initiate it…
    Do we know any other ASF project that maintain their own official docker image?
4. Practical things - change README, NOTICE, header files, wording etc

I have opened https://issues.apache.org/jira/browse/SOLR-14168 as an umbrella issue for tasks that spin out from this email thread discussion.

Jan Høydahl


--
Marcus Eagan

Reply | Threaded
Open this post in threaded view
|

Re: Maintenance of Solr's official Dockerfile

Martijn Koster
In reply to this post by Jan Høydahl / Cominvent
[pardon me breaking the email threading here; only just joined]

Jan wrote:

Next step would be to agree on how we bring the current code into our project and ASF repos
in the best possible way. Questions that arise are:

1. Are we allowed to maintain ASF code in a non-ASF repo? If not, how do we transition to
an ASF git repo?
    * Can it be a sub folder in our main repo or does it need to be a separate repo?

The way it works (from the official library’s point of view), is that we maintain https://github.com/docker-library/official-images/blob/master/library/solr which contains a link to a repo (in our case https://github.com/docker-solr/docker-solr.git) and particular git commit, and a particular directory for different versions. That is consumed by their build infrastructure. The library team reviews changes we make to that file, and the corresponding changes we made to the Dockerfiles and bash scripts in the docker-solr repo, so it needs to be readily available and it needs to be easy to see what has changed.

I think one could theoretically move this into the main Solr repo and point to its GitHub address, but that would make things slower and much harder to review. So I think it’s much better to keep the separate repo. I briefly looked for some official guidance on this, but couldn’t find it spelled out explicitly. I did see https://github.com/docker-library/official-images#maintainership which talks about maintaining git history.
Note also that I already use a “docker-solr” GitHub org for the repo, rather than my own account, to make it easier to vary ownership.

If you are dead-set to put it into the main repo, I’d run that discussion past the library team first before sinking engineering time.


2. How will the current build/test/publish process need to change?
    * Can we continue using travis for CI?

In the short term, sure.

Travis has been great for us — it is free, it builds fast enough, the UI is nice, the config is simple, the integration is good, and support was helpful.
Last year Travis CI got acquired, followed by layoffs of senior engineering staff, so there are concerns about its future, but nothing has really changed to affect us.

I imagine it would be nicer to have it in the normal Apache Jenkins world, but I’m not volunteering for that migration. :-)

If we want to stay on Travis, there may be some configuration changes required (roles/permissions/credentials and such that are tied to my account).

Oh and just to make it clear: the CI does 2 things:
- it sets build status on GitHub commits (although there is currently no enforcement to allow only passing PRs to be merged or things like that, or have review/automerge workflows which would be nice to have)
- and it pushes builds to the https://hub.docker.com/repository/docker/dockersolr/docker-solr repo — but those are only used for testing, they are not the docker images that provide the official images. I've found that occasionally useful, but we could decide to not do that, or do it differently within the Apache infrastructure.


    * Do we need to talk to Docker folks to change repo location?

If we keep the same repo, then no :-) If the repo were to change, we’d update that library/solr file, and send a PR.


    * Should publishing of new Docker be a RM responsibility, or something that happens right
after each release like the ref-guide?

I don’t have a strong opinion. I typically tried to do it as soon as I became aware of a new version via the solr-user mailing list or twitter.
Sometimes same day, sometimes it would take a week because of changes I need to make or extra things I wanted to do.
But if I’m more than a few days late someone would be asking about it :-)
The official library team review is usually very fast, same day or 24h.


3. Legal stuff - when we as a project file a PR to update the official solr docker images,
are we then legally releasing a binary version of Solr?
    Technically it is Docker CI that build and publish the images, we just initiate it…

I don’t know about that (or how that matters?)


    Do we know any other ASF project that maintain their own official docker image?

I've looked at https://github.com/docker-library/official-images/tree/master/library and spotted https://github.com/carlossg/docker-maven which is maintained by an Apache committer.


4. Practical things - change README, NOTICE, header files, wording etc

There is also https://github.com/docker-library/docs/tree/master/solr — the individual markdown files there are used to generate https://hub.docker.com/_/solr 


as an umbrella issue for tasks that spin out from this email thread discussion.


Marcus wrote:

I think that regardless of what the community decides to do with the
docker-solr repo, a good first step would be to add a Docker folder to the
Apache repository that contains a base Dockerfile and a README. In that
README, users can be directed to the location of the docker-solr repo,
wherever that may be, or leverage the Dockerfile in the  Apache repo as a
starting point for building their own image.

I think that could be useful; but it then does start to become messy almost immediately: Users will expect these self-built images and the official images to work the same, and given that docker-solr has various extra scripts (eg to create collections at startup), you’d then have to copy them into the repo (and now have duplicate maintenance, need to test them). Or you could explicitly decide not to do that, but then your users will be asking how to achieve the same functionality with their images.

I would address this as a separate issue. Let’s get the existing image flow taken care of first.

— Martijn
Reply | Threaded
Open this post in threaded view
|

Re: Maintenance of Solr's official Dockerfile

Jan Høydahl / Cominvent
1. Are we allowed to maintain ASF code in a non-ASF repo? If not, how do we transition to
an ASF git repo?
    * Can it be a sub folder in our main repo or does it need to be a separate repo?

The way it works (from the official library’s point of view), is that we maintain https://github.com/docker-library/official-images/blob/master/library/solr which contains a link to a repo (in our case https://github.com/docker-solr/docker-solr.git) and particular git commit, and a particular directory for different versions. That is consumed by their build infrastructure. The library team reviews changes we make to that file, and the corresponding changes we made to the Dockerfiles and bash scripts in the docker-solr repo, so it needs to be readily available and it needs to be easy to see what has changed.

I think one could theoretically move this into the main Solr repo and point to its GitHub address, but that would make things slower and much harder to review. So I think it’s much better to keep the separate repo. I briefly looked for some official guidance on this, but couldn’t find it spelled out explicitly. I did see https://github.com/docker-library/official-images#maintainership which talks about maintaining git history.
Note also that I already use a “docker-solr” GitHub org for the repo, rather than my own account, to make it easier to vary ownership.

If you are dead-set to put it into the main repo, I’d run that discussion past the library team first before sinking engineering time.

I just discovered https://hub.docker.com/u/apache - which is Apache’s own docker org. I see some images there are hosted in separate apache git repos, example CouchDB: https://github.com/apache/couchdb-docker pushed to https://hub.docker.com/r/apache/couchdb - and https://hub.docker.com/_/couchdb (official). The source of both hub locations seems to be the same apache/couchdb-docker git repo. I see that the person who files PRs aginst the official image repo is Joan Touzet (http://people.apache.org/~wohali/) who is a CouchDB committer. Perhpas this is a model for us to follow.

We may also want to consult LEGAL-503 where the Beam project asked a similar question a few weeks ago, and the reply is:

if you would like to continue linking to the Docker release artifact from the https://beam.apache.org you will have:
1. Transition to the official ASF dockerhub org: https://hub.docker.com/u/apache
2. Start including that binary convenience artifact into your VOTE threads on Beam releases
3. Make sure that all Cat-X licenses are ONLY brought into your container via FROM statements

So bullet point #1 there answers this question. Regarding point #2 and #3 see below.

2. How will the current build/test/publish process need to change?
    * Can we continue using travis for CI?

In the short term, sure.

Travis has been great for us — it is free, it builds fast enough, the UI is nice, the config is simple, the integration is good, and support was helpful.
Last year Travis CI got acquired, followed by layoffs of senior engineering staff, so there are concerns about its future, but nothing has really changed to affect us.

I imagine it would be nicer to have it in the normal Apache Jenkins world, but I’m not volunteering for that migration. :-)

If we want to stay on Travis, there may be some configuration changes required (roles/permissions/credentials and such that are tied to my account).

Oh and just to make it clear: the CI does 2 things:
- it sets build status on GitHub commits (although there is currently no enforcement to allow only passing PRs to be merged or things like that, or have review/automerge workflows which would be nice to have)
- and it pushes builds to the https://hub.docker.com/repository/docker/dockersolr/docker-solr repo — but those are only used for testing, they are not the docker images that provide the official images. I've found that occasionally useful, but we could decide to not do that, or do it differently within the Apache infrastructure.

So I see other ASF projects using travis as well, perhaps ASF has an account/license? If we continue to use it or if we migrate to Jenkins, we either way need to run the build and test and then push builds to the Apache Docker Hub repository space (making the image pull’able with docker pull apache/solr:tag
The actual producing of official image will be yet another PR to the docker owned official-images repo.

    * Should publishing of new Docker be a RM responsibility, or something that happens right
after each release like the ref-guide?

I don’t have a strong opinion. I typically tried to do it as soon as I became aware of a new version via the solr-user mailing list or twitter.
Sometimes same day, sometimes it would take a week because of changes I need to make or extra things I wanted to do.
But if I’m more than a few days late someone would be asking about it :-)
The official library team review is usually very fast, same day or 24h.

See point #2 from LEGAL-503 above. If we want to officially document / endorse / link to the image on hub we may want to include the docker image in the VOTE. I see that the Beam project includes this in their release-guide (publishing SDK images): https://beam.apache.org/contribute/release-guide/. What they do is that push a RC tagged version to their docker-hub as part of the release and include it in the VOTE.

3. Legal stuff - when we as a project file a PR to update the official solr docker images,
are we then legally releasing a binary version of Solr?
    Technically it is Docker CI that build and publish the images, we just initiate it…

I don’t know about that (or how that matters?)

Oh, legal stuff matters a lot for Apache :) Again, I think LEGAL-503 answers this. Bullet #3 there requres the project to make sure that our Dockerfile does not bring in Cat-X licensed software into the Docker layers built by us. Since we base our image on the ‘openjdk’ base image, which contains GNU/Linux binaries and the JDK, the only things we'd need to verify is what we bring into our Docker layers through apt-get, wget etc. Below is a list of what I found:

acl - GPL - provides tool setfacl, used only in tests, can be removed?
dirmngr, gpg - GPL - used only during docker build phase, may be apt install and uninstalled in the same RUN command
lsof - BSD license
procps - GPL - provides the ‘ps’ command needed by bin/solr. This is part of openjdk:11 but not openjdk:11-slim...
wget - GPL - used during build only, can be uninstalled after use
netcat - PublicDomain
gosu - GPL - can be removed or replaced with su-exec (MIT)
tini - MIT

    Do we know any other ASF project that maintain their own official docker image?

I've looked at https://github.com/docker-library/official-images/tree/master/library and spotted https://github.com/carlossg/docker-maven which is maintained by an Apache committer.

So couchDB is another example. And there are so many other projects in Apache’s docker-hub org that I suppose there may be others.

Marcus wrote:

I think that regardless of what the community decides to do with the
docker-solr repo, a good first step would be to add a Docker folder to the
Apache repository that contains a base Dockerfile and a README. In that
README, users can be directed to the location of the docker-solr repo,
wherever that may be, or leverage the Dockerfile in the  Apache repo as a
starting point for building their own image.

I think that could be useful; but it then does start to become messy almost immediately: Users will expect these self-built images and the official images to work the same, and given that docker-solr has various extra scripts (eg to create collections at startup), you’d then have to copy them into the repo (and now have duplicate maintenance, need to test them). Or you could explicitly decide not to do that, but then your users will be asking how to achieve the same functionality with their images.

I would address this as a separate issue. Let’s get the existing image flow taken care of first.

Yes, it should be easy to build a docker image «from source», or at least as a gradle build task. That could piggy-back on the distro tgz file which should make it not too different - we just pull the release from local disk instead of from the mirrors. 

I also saw some projects that have Jenkins routinely publish SNAPSHOT releases to docker-hub, see e.g. https://hub.docker.com/r/apache/syncope/tags which is also nice if we want to have people test out things with unreleased versions or master branch, then it is always only a docker run command away :) 

Well, I hope other committers also join this discussion and bring perhaps other points of view here before we start fleshing out actual JIRA tasks to add to https://issues.apache.org/jira/browse/SOLR-14168.

If we end up releasing official Solr Docker images together with the normal release, it would be cool to add documentation to the RefGuide and perhaps tutorial, on how to run Solr with Docker.

Jan


Reply | Threaded
Open this post in threaded view
|

Re: Maintenance of Solr's official Dockerfile

david.w.smiley@gmail.com
> Yes, it should be easy to build a docker image «from source», or at least as a gradle build task. That could piggy-back on the distro tgz file which should make it not too different - we just pull the release from local disk instead of from the mirrors. 

We do this at Salesforce in our local Lucene_Solr fork to also produce a docker image.  It's not a big deal but I could share it if we want to consider going this direction.  It's kinda necessary if we want to release this all at once instead of requiring a 'tgz' be released first, which in turn somewhat requires some signatures of that binary that then become irrelevant to check when producing the Docker image.  It's also super nice for those who fork Solr to also produce a Docker image easily (like us).

~ David Smiley
Apache Lucene/Solr Search Developer


On Sat, Jan 11, 2020 at 5:45 PM Jan Høydahl <[hidden email]> wrote:
1. Are we allowed to maintain ASF code in a non-ASF repo? If not, how do we transition to
an ASF git repo?
    * Can it be a sub folder in our main repo or does it need to be a separate repo?

The way it works (from the official library’s point of view), is that we maintain https://github.com/docker-library/official-images/blob/master/library/solr which contains a link to a repo (in our case https://github.com/docker-solr/docker-solr.git) and particular git commit, and a particular directory for different versions. That is consumed by their build infrastructure. The library team reviews changes we make to that file, and the corresponding changes we made to the Dockerfiles and bash scripts in the docker-solr repo, so it needs to be readily available and it needs to be easy to see what has changed.

I think one could theoretically move this into the main Solr repo and point to its GitHub address, but that would make things slower and much harder to review. So I think it’s much better to keep the separate repo. I briefly looked for some official guidance on this, but couldn’t find it spelled out explicitly. I did see https://github.com/docker-library/official-images#maintainership which talks about maintaining git history.
Note also that I already use a “docker-solr” GitHub org for the repo, rather than my own account, to make it easier to vary ownership.

If you are dead-set to put it into the main repo, I’d run that discussion past the library team first before sinking engineering time.

I just discovered https://hub.docker.com/u/apache - which is Apache’s own docker org. I see some images there are hosted in separate apache git repos, example CouchDB: https://github.com/apache/couchdb-docker pushed to https://hub.docker.com/r/apache/couchdb - and https://hub.docker.com/_/couchdb (official). The source of both hub locations seems to be the same apache/couchdb-docker git repo. I see that the person who files PRs aginst the official image repo is Joan Touzet (http://people.apache.org/~wohali/) who is a CouchDB committer. Perhpas this is a model for us to follow.

We may also want to consult LEGAL-503 where the Beam project asked a similar question a few weeks ago, and the reply is:

if you would like to continue linking to the Docker release artifact from the https://beam.apache.org you will have:
1. Transition to the official ASF dockerhub org: https://hub.docker.com/u/apache
2. Start including that binary convenience artifact into your VOTE threads on Beam releases
3. Make sure that all Cat-X licenses are ONLY brought into your container via FROM statements

So bullet point #1 there answers this question. Regarding point #2 and #3 see below.

2. How will the current build/test/publish process need to change?
    * Can we continue using travis for CI?

In the short term, sure.

Travis has been great for us — it is free, it builds fast enough, the UI is nice, the config is simple, the integration is good, and support was helpful.
Last year Travis CI got acquired, followed by layoffs of senior engineering staff, so there are concerns about its future, but nothing has really changed to affect us.

I imagine it would be nicer to have it in the normal Apache Jenkins world, but I’m not volunteering for that migration. :-)

If we want to stay on Travis, there may be some configuration changes required (roles/permissions/credentials and such that are tied to my account).

Oh and just to make it clear: the CI does 2 things:
- it sets build status on GitHub commits (although there is currently no enforcement to allow only passing PRs to be merged or things like that, or have review/automerge workflows which would be nice to have)
- and it pushes builds to the https://hub.docker.com/repository/docker/dockersolr/docker-solr repo — but those are only used for testing, they are not the docker images that provide the official images. I've found that occasionally useful, but we could decide to not do that, or do it differently within the Apache infrastructure.

So I see other ASF projects using travis as well, perhaps ASF has an account/license? If we continue to use it or if we migrate to Jenkins, we either way need to run the build and test and then push builds to the Apache Docker Hub repository space (making the image pull’able with docker pull apache/solr:tag
The actual producing of official image will be yet another PR to the docker owned official-images repo.

    * Should publishing of new Docker be a RM responsibility, or something that happens right
after each release like the ref-guide?

I don’t have a strong opinion. I typically tried to do it as soon as I became aware of a new version via the solr-user mailing list or twitter.
Sometimes same day, sometimes it would take a week because of changes I need to make or extra things I wanted to do.
But if I’m more than a few days late someone would be asking about it :-)
The official library team review is usually very fast, same day or 24h.

See point #2 from LEGAL-503 above. If we want to officially document / endorse / link to the image on hub we may want to include the docker image in the VOTE. I see that the Beam project includes this in their release-guide (publishing SDK images): https://beam.apache.org/contribute/release-guide/. What they do is that push a RC tagged version to their docker-hub as part of the release and include it in the VOTE.

3. Legal stuff - when we as a project file a PR to update the official solr docker images,
are we then legally releasing a binary version of Solr?
    Technically it is Docker CI that build and publish the images, we just initiate it…

I don’t know about that (or how that matters?)

Oh, legal stuff matters a lot for Apache :) Again, I think LEGAL-503 answers this. Bullet #3 there requres the project to make sure that our Dockerfile does not bring in Cat-X licensed software into the Docker layers built by us. Since we base our image on the ‘openjdk’ base image, which contains GNU/Linux binaries and the JDK, the only things we'd need to verify is what we bring into our Docker layers through apt-get, wget etc. Below is a list of what I found:

acl - GPL - provides tool setfacl, used only in tests, can be removed?
dirmngr, gpg - GPL - used only during docker build phase, may be apt install and uninstalled in the same RUN command
lsof - BSD license
procps - GPL - provides the ‘ps’ command needed by bin/solr. This is part of openjdk:11 but not openjdk:11-slim...
wget - GPL - used during build only, can be uninstalled after use
netcat - PublicDomain
gosu - GPL - can be removed or replaced with su-exec (MIT)
tini - MIT

    Do we know any other ASF project that maintain their own official docker image?

I've looked at https://github.com/docker-library/official-images/tree/master/library and spotted https://github.com/carlossg/docker-maven which is maintained by an Apache committer.

So couchDB is another example. And there are so many other projects in Apache’s docker-hub org that I suppose there may be others.

Marcus wrote:

I think that regardless of what the community decides to do with the
docker-solr repo, a good first step would be to add a Docker folder to the
Apache repository that contains a base Dockerfile and a README. In that
README, users can be directed to the location of the docker-solr repo,
wherever that may be, or leverage the Dockerfile in the  Apache repo as a
starting point for building their own image.

I think that could be useful; but it then does start to become messy almost immediately: Users will expect these self-built images and the official images to work the same, and given that docker-solr has various extra scripts (eg to create collections at startup), you’d then have to copy them into the repo (and now have duplicate maintenance, need to test them). Or you could explicitly decide not to do that, but then your users will be asking how to achieve the same functionality with their images.

I would address this as a separate issue. Let’s get the existing image flow taken care of first.

Yes, it should be easy to build a docker image «from source», or at least as a gradle build task. That could piggy-back on the distro tgz file which should make it not too different - we just pull the release from local disk instead of from the mirrors. 

I also saw some projects that have Jenkins routinely publish SNAPSHOT releases to docker-hub, see e.g. https://hub.docker.com/r/apache/syncope/tags which is also nice if we want to have people test out things with unreleased versions or master branch, then it is always only a docker run command away :) 

Well, I hope other committers also join this discussion and bring perhaps other points of view here before we start fleshing out actual JIRA tasks to add to https://issues.apache.org/jira/browse/SOLR-14168.

If we end up releasing official Solr Docker images together with the normal release, it would be cool to add documentation to the RefGuide and perhaps tutorial, on how to run Solr with Docker.

Jan