Maintenance of Solr's official Dockerfile

classic Classic list List threaded Threaded
11 messages Options
Reply | Threaded
Open this post in threaded view
|

Maintenance of Solr's official Dockerfile

Jan Høydahl / Cominvent
Hi,

The Lucene project is asked to take over maintenance of the official Solr Dockerfile that ends up on Docker hub (located in https://github.com/docker-solr/docker-solr). We have received a Software Grant from current maintainer Martijn Koster who has done a fantastic job together with a few committers maintaining it.

I think it makes a lot of sense for the project to more tightly support Docker and ensure a good experience running Solr on Docker.

This email thread is to discuss what that may look like and how we should transition the current code into the project.

As a first step we invite all committers and contributors who use Docker to get involved, checkout the current docker-solr git repo, try building the images, submitting PRs etc. I have started doing this myself and have submitted a few PRs.

Next step would be to agree on how we bring the current code into our project and ASF repos in the best possible way. Questions that arise are:

1. Are we allowed to maintain ASF code in a non-ASF repo? If not, how do we transition to an ASF git repo?
    * Can it be a sub folder in our main repo or does it need to be a separate repo?
2. How will the current build/test/publish process need to change?
    * Can we continue using travis for CI?
    * Do we need to talk to Docker folks to change repo location?
    * Should publishing of new Docker be a RM responsibility, or something that happens right after each release like the ref-guide?
3. Legal stuff - when we as a project file a PR to update the official solr docker images, are we then legally releasing a binary version of Solr?
    Technically it is Docker CI that build and publish the images, we just initiate it…
    Do we know any other ASF project that maintain their own official docker image?
4. Practical things - change README, NOTICE, header files, wording etc

I have opened https://issues.apache.org/jira/browse/SOLR-14168 as an umbrella issue for tasks that spin out from this email thread discussion.

Jan Høydahl
Reply | Threaded
Open this post in threaded view
|

Re: Maintenance of Solr's official Dockerfile

Marcus Eagan
Hi Jan,

Thanks or the update, and thanks Jan from Martijn for the donation! :)

I think that regardless of what the community decides to do with the docker-solr repo, a good first step would be to add a Docker folder to the Apache repository that contains a base Dockerfile and a README. In that README, users can be directed to the location of the docker-solr repo, wherever that may be, or leverage the Dockerfile in the  Apache repo as a starting point for building their own image.

Two cents,

Marcus




On Sun, Jan 5, 2020 at 3:52 PM Jan Høydahl <[hidden email]> wrote:
Hi,

The Lucene project is asked to take over maintenance of the official Solr Dockerfile that ends up on Docker hub (located in https://github.com/docker-solr/docker-solr). We have received a Software Grant from current maintainer Martijn Koster who has done a fantastic job together with a few committers maintaining it.

I think it makes a lot of sense for the project to more tightly support Docker and ensure a good experience running Solr on Docker.

This email thread is to discuss what that may look like and how we should transition the current code into the project.

As a first step we invite all committers and contributors who use Docker to get involved, checkout the current docker-solr git repo, try building the images, submitting PRs etc. I have started doing this myself and have submitted a few PRs.

Next step would be to agree on how we bring the current code into our project and ASF repos in the best possible way. Questions that arise are:

1. Are we allowed to maintain ASF code in a non-ASF repo? If not, how do we transition to an ASF git repo?
    * Can it be a sub folder in our main repo or does it need to be a separate repo?
2. How will the current build/test/publish process need to change?
    * Can we continue using travis for CI?
    * Do we need to talk to Docker folks to change repo location?
    * Should publishing of new Docker be a RM responsibility, or something that happens right after each release like the ref-guide?
3. Legal stuff - when we as a project file a PR to update the official solr docker images, are we then legally releasing a binary version of Solr?
    Technically it is Docker CI that build and publish the images, we just initiate it…
    Do we know any other ASF project that maintain their own official docker image?
4. Practical things - change README, NOTICE, header files, wording etc

I have opened https://issues.apache.org/jira/browse/SOLR-14168 as an umbrella issue for tasks that spin out from this email thread discussion.

Jan Høydahl


--
Marcus Eagan

Reply | Threaded
Open this post in threaded view
|

Re: Maintenance of Solr's official Dockerfile

Martijn Koster
In reply to this post by Jan Høydahl / Cominvent
[pardon me breaking the email threading here; only just joined]

Jan wrote:

Next step would be to agree on how we bring the current code into our project and ASF repos
in the best possible way. Questions that arise are:

1. Are we allowed to maintain ASF code in a non-ASF repo? If not, how do we transition to
an ASF git repo?
    * Can it be a sub folder in our main repo or does it need to be a separate repo?

The way it works (from the official library’s point of view), is that we maintain https://github.com/docker-library/official-images/blob/master/library/solr which contains a link to a repo (in our case https://github.com/docker-solr/docker-solr.git) and particular git commit, and a particular directory for different versions. That is consumed by their build infrastructure. The library team reviews changes we make to that file, and the corresponding changes we made to the Dockerfiles and bash scripts in the docker-solr repo, so it needs to be readily available and it needs to be easy to see what has changed.

I think one could theoretically move this into the main Solr repo and point to its GitHub address, but that would make things slower and much harder to review. So I think it’s much better to keep the separate repo. I briefly looked for some official guidance on this, but couldn’t find it spelled out explicitly. I did see https://github.com/docker-library/official-images#maintainership which talks about maintaining git history.
Note also that I already use a “docker-solr” GitHub org for the repo, rather than my own account, to make it easier to vary ownership.

If you are dead-set to put it into the main repo, I’d run that discussion past the library team first before sinking engineering time.


2. How will the current build/test/publish process need to change?
    * Can we continue using travis for CI?

In the short term, sure.

Travis has been great for us — it is free, it builds fast enough, the UI is nice, the config is simple, the integration is good, and support was helpful.
Last year Travis CI got acquired, followed by layoffs of senior engineering staff, so there are concerns about its future, but nothing has really changed to affect us.

I imagine it would be nicer to have it in the normal Apache Jenkins world, but I’m not volunteering for that migration. :-)

If we want to stay on Travis, there may be some configuration changes required (roles/permissions/credentials and such that are tied to my account).

Oh and just to make it clear: the CI does 2 things:
- it sets build status on GitHub commits (although there is currently no enforcement to allow only passing PRs to be merged or things like that, or have review/automerge workflows which would be nice to have)
- and it pushes builds to the https://hub.docker.com/repository/docker/dockersolr/docker-solr repo — but those are only used for testing, they are not the docker images that provide the official images. I've found that occasionally useful, but we could decide to not do that, or do it differently within the Apache infrastructure.


    * Do we need to talk to Docker folks to change repo location?

If we keep the same repo, then no :-) If the repo were to change, we’d update that library/solr file, and send a PR.


    * Should publishing of new Docker be a RM responsibility, or something that happens right
after each release like the ref-guide?

I don’t have a strong opinion. I typically tried to do it as soon as I became aware of a new version via the solr-user mailing list or twitter.
Sometimes same day, sometimes it would take a week because of changes I need to make or extra things I wanted to do.
But if I’m more than a few days late someone would be asking about it :-)
The official library team review is usually very fast, same day or 24h.


3. Legal stuff - when we as a project file a PR to update the official solr docker images,
are we then legally releasing a binary version of Solr?
    Technically it is Docker CI that build and publish the images, we just initiate it…

I don’t know about that (or how that matters?)


    Do we know any other ASF project that maintain their own official docker image?

I've looked at https://github.com/docker-library/official-images/tree/master/library and spotted https://github.com/carlossg/docker-maven which is maintained by an Apache committer.


4. Practical things - change README, NOTICE, header files, wording etc

There is also https://github.com/docker-library/docs/tree/master/solr — the individual markdown files there are used to generate https://hub.docker.com/_/solr 


as an umbrella issue for tasks that spin out from this email thread discussion.


Marcus wrote:

I think that regardless of what the community decides to do with the
docker-solr repo, a good first step would be to add a Docker folder to the
Apache repository that contains a base Dockerfile and a README. In that
README, users can be directed to the location of the docker-solr repo,
wherever that may be, or leverage the Dockerfile in the  Apache repo as a
starting point for building their own image.

I think that could be useful; but it then does start to become messy almost immediately: Users will expect these self-built images and the official images to work the same, and given that docker-solr has various extra scripts (eg to create collections at startup), you’d then have to copy them into the repo (and now have duplicate maintenance, need to test them). Or you could explicitly decide not to do that, but then your users will be asking how to achieve the same functionality with their images.

I would address this as a separate issue. Let’s get the existing image flow taken care of first.

— Martijn
Reply | Threaded
Open this post in threaded view
|

Re: Maintenance of Solr's official Dockerfile

Jan Høydahl / Cominvent
1. Are we allowed to maintain ASF code in a non-ASF repo? If not, how do we transition to
an ASF git repo?
    * Can it be a sub folder in our main repo or does it need to be a separate repo?

The way it works (from the official library’s point of view), is that we maintain https://github.com/docker-library/official-images/blob/master/library/solr which contains a link to a repo (in our case https://github.com/docker-solr/docker-solr.git) and particular git commit, and a particular directory for different versions. That is consumed by their build infrastructure. The library team reviews changes we make to that file, and the corresponding changes we made to the Dockerfiles and bash scripts in the docker-solr repo, so it needs to be readily available and it needs to be easy to see what has changed.

I think one could theoretically move this into the main Solr repo and point to its GitHub address, but that would make things slower and much harder to review. So I think it’s much better to keep the separate repo. I briefly looked for some official guidance on this, but couldn’t find it spelled out explicitly. I did see https://github.com/docker-library/official-images#maintainership which talks about maintaining git history.
Note also that I already use a “docker-solr” GitHub org for the repo, rather than my own account, to make it easier to vary ownership.

If you are dead-set to put it into the main repo, I’d run that discussion past the library team first before sinking engineering time.

I just discovered https://hub.docker.com/u/apache - which is Apache’s own docker org. I see some images there are hosted in separate apache git repos, example CouchDB: https://github.com/apache/couchdb-docker pushed to https://hub.docker.com/r/apache/couchdb - and https://hub.docker.com/_/couchdb (official). The source of both hub locations seems to be the same apache/couchdb-docker git repo. I see that the person who files PRs aginst the official image repo is Joan Touzet (http://people.apache.org/~wohali/) who is a CouchDB committer. Perhpas this is a model for us to follow.

We may also want to consult LEGAL-503 where the Beam project asked a similar question a few weeks ago, and the reply is:

if you would like to continue linking to the Docker release artifact from the https://beam.apache.org you will have:
1. Transition to the official ASF dockerhub org: https://hub.docker.com/u/apache
2. Start including that binary convenience artifact into your VOTE threads on Beam releases
3. Make sure that all Cat-X licenses are ONLY brought into your container via FROM statements

So bullet point #1 there answers this question. Regarding point #2 and #3 see below.

2. How will the current build/test/publish process need to change?
    * Can we continue using travis for CI?

In the short term, sure.

Travis has been great for us — it is free, it builds fast enough, the UI is nice, the config is simple, the integration is good, and support was helpful.
Last year Travis CI got acquired, followed by layoffs of senior engineering staff, so there are concerns about its future, but nothing has really changed to affect us.

I imagine it would be nicer to have it in the normal Apache Jenkins world, but I’m not volunteering for that migration. :-)

If we want to stay on Travis, there may be some configuration changes required (roles/permissions/credentials and such that are tied to my account).

Oh and just to make it clear: the CI does 2 things:
- it sets build status on GitHub commits (although there is currently no enforcement to allow only passing PRs to be merged or things like that, or have review/automerge workflows which would be nice to have)
- and it pushes builds to the https://hub.docker.com/repository/docker/dockersolr/docker-solr repo — but those are only used for testing, they are not the docker images that provide the official images. I've found that occasionally useful, but we could decide to not do that, or do it differently within the Apache infrastructure.

So I see other ASF projects using travis as well, perhaps ASF has an account/license? If we continue to use it or if we migrate to Jenkins, we either way need to run the build and test and then push builds to the Apache Docker Hub repository space (making the image pull’able with docker pull apache/solr:tag
The actual producing of official image will be yet another PR to the docker owned official-images repo.

    * Should publishing of new Docker be a RM responsibility, or something that happens right
after each release like the ref-guide?

I don’t have a strong opinion. I typically tried to do it as soon as I became aware of a new version via the solr-user mailing list or twitter.
Sometimes same day, sometimes it would take a week because of changes I need to make or extra things I wanted to do.
But if I’m more than a few days late someone would be asking about it :-)
The official library team review is usually very fast, same day or 24h.

See point #2 from LEGAL-503 above. If we want to officially document / endorse / link to the image on hub we may want to include the docker image in the VOTE. I see that the Beam project includes this in their release-guide (publishing SDK images): https://beam.apache.org/contribute/release-guide/. What they do is that push a RC tagged version to their docker-hub as part of the release and include it in the VOTE.

3. Legal stuff - when we as a project file a PR to update the official solr docker images,
are we then legally releasing a binary version of Solr?
    Technically it is Docker CI that build and publish the images, we just initiate it…

I don’t know about that (or how that matters?)

Oh, legal stuff matters a lot for Apache :) Again, I think LEGAL-503 answers this. Bullet #3 there requres the project to make sure that our Dockerfile does not bring in Cat-X licensed software into the Docker layers built by us. Since we base our image on the ‘openjdk’ base image, which contains GNU/Linux binaries and the JDK, the only things we'd need to verify is what we bring into our Docker layers through apt-get, wget etc. Below is a list of what I found:

acl - GPL - provides tool setfacl, used only in tests, can be removed?
dirmngr, gpg - GPL - used only during docker build phase, may be apt install and uninstalled in the same RUN command
lsof - BSD license
procps - GPL - provides the ‘ps’ command needed by bin/solr. This is part of openjdk:11 but not openjdk:11-slim...
wget - GPL - used during build only, can be uninstalled after use
netcat - PublicDomain
gosu - GPL - can be removed or replaced with su-exec (MIT)
tini - MIT

    Do we know any other ASF project that maintain their own official docker image?

I've looked at https://github.com/docker-library/official-images/tree/master/library and spotted https://github.com/carlossg/docker-maven which is maintained by an Apache committer.

So couchDB is another example. And there are so many other projects in Apache’s docker-hub org that I suppose there may be others.

Marcus wrote:

I think that regardless of what the community decides to do with the
docker-solr repo, a good first step would be to add a Docker folder to the
Apache repository that contains a base Dockerfile and a README. In that
README, users can be directed to the location of the docker-solr repo,
wherever that may be, or leverage the Dockerfile in the  Apache repo as a
starting point for building their own image.

I think that could be useful; but it then does start to become messy almost immediately: Users will expect these self-built images and the official images to work the same, and given that docker-solr has various extra scripts (eg to create collections at startup), you’d then have to copy them into the repo (and now have duplicate maintenance, need to test them). Or you could explicitly decide not to do that, but then your users will be asking how to achieve the same functionality with their images.

I would address this as a separate issue. Let’s get the existing image flow taken care of first.

Yes, it should be easy to build a docker image «from source», or at least as a gradle build task. That could piggy-back on the distro tgz file which should make it not too different - we just pull the release from local disk instead of from the mirrors. 

I also saw some projects that have Jenkins routinely publish SNAPSHOT releases to docker-hub, see e.g. https://hub.docker.com/r/apache/syncope/tags which is also nice if we want to have people test out things with unreleased versions or master branch, then it is always only a docker run command away :) 

Well, I hope other committers also join this discussion and bring perhaps other points of view here before we start fleshing out actual JIRA tasks to add to https://issues.apache.org/jira/browse/SOLR-14168.

If we end up releasing official Solr Docker images together with the normal release, it would be cool to add documentation to the RefGuide and perhaps tutorial, on how to run Solr with Docker.

Jan


Reply | Threaded
Open this post in threaded view
|

Re: Maintenance of Solr's official Dockerfile

david.w.smiley@gmail.com
> Yes, it should be easy to build a docker image «from source», or at least as a gradle build task. That could piggy-back on the distro tgz file which should make it not too different - we just pull the release from local disk instead of from the mirrors. 

We do this at Salesforce in our local Lucene_Solr fork to also produce a docker image.  It's not a big deal but I could share it if we want to consider going this direction.  It's kinda necessary if we want to release this all at once instead of requiring a 'tgz' be released first, which in turn somewhat requires some signatures of that binary that then become irrelevant to check when producing the Docker image.  It's also super nice for those who fork Solr to also produce a Docker image easily (like us).

~ David Smiley
Apache Lucene/Solr Search Developer


On Sat, Jan 11, 2020 at 5:45 PM Jan Høydahl <[hidden email]> wrote:
1. Are we allowed to maintain ASF code in a non-ASF repo? If not, how do we transition to
an ASF git repo?
    * Can it be a sub folder in our main repo or does it need to be a separate repo?

The way it works (from the official library’s point of view), is that we maintain https://github.com/docker-library/official-images/blob/master/library/solr which contains a link to a repo (in our case https://github.com/docker-solr/docker-solr.git) and particular git commit, and a particular directory for different versions. That is consumed by their build infrastructure. The library team reviews changes we make to that file, and the corresponding changes we made to the Dockerfiles and bash scripts in the docker-solr repo, so it needs to be readily available and it needs to be easy to see what has changed.

I think one could theoretically move this into the main Solr repo and point to its GitHub address, but that would make things slower and much harder to review. So I think it’s much better to keep the separate repo. I briefly looked for some official guidance on this, but couldn’t find it spelled out explicitly. I did see https://github.com/docker-library/official-images#maintainership which talks about maintaining git history.
Note also that I already use a “docker-solr” GitHub org for the repo, rather than my own account, to make it easier to vary ownership.

If you are dead-set to put it into the main repo, I’d run that discussion past the library team first before sinking engineering time.

I just discovered https://hub.docker.com/u/apache - which is Apache’s own docker org. I see some images there are hosted in separate apache git repos, example CouchDB: https://github.com/apache/couchdb-docker pushed to https://hub.docker.com/r/apache/couchdb - and https://hub.docker.com/_/couchdb (official). The source of both hub locations seems to be the same apache/couchdb-docker git repo. I see that the person who files PRs aginst the official image repo is Joan Touzet (http://people.apache.org/~wohali/) who is a CouchDB committer. Perhpas this is a model for us to follow.

We may also want to consult LEGAL-503 where the Beam project asked a similar question a few weeks ago, and the reply is:

if you would like to continue linking to the Docker release artifact from the https://beam.apache.org you will have:
1. Transition to the official ASF dockerhub org: https://hub.docker.com/u/apache
2. Start including that binary convenience artifact into your VOTE threads on Beam releases
3. Make sure that all Cat-X licenses are ONLY brought into your container via FROM statements

So bullet point #1 there answers this question. Regarding point #2 and #3 see below.

2. How will the current build/test/publish process need to change?
    * Can we continue using travis for CI?

In the short term, sure.

Travis has been great for us — it is free, it builds fast enough, the UI is nice, the config is simple, the integration is good, and support was helpful.
Last year Travis CI got acquired, followed by layoffs of senior engineering staff, so there are concerns about its future, but nothing has really changed to affect us.

I imagine it would be nicer to have it in the normal Apache Jenkins world, but I’m not volunteering for that migration. :-)

If we want to stay on Travis, there may be some configuration changes required (roles/permissions/credentials and such that are tied to my account).

Oh and just to make it clear: the CI does 2 things:
- it sets build status on GitHub commits (although there is currently no enforcement to allow only passing PRs to be merged or things like that, or have review/automerge workflows which would be nice to have)
- and it pushes builds to the https://hub.docker.com/repository/docker/dockersolr/docker-solr repo — but those are only used for testing, they are not the docker images that provide the official images. I've found that occasionally useful, but we could decide to not do that, or do it differently within the Apache infrastructure.

So I see other ASF projects using travis as well, perhaps ASF has an account/license? If we continue to use it or if we migrate to Jenkins, we either way need to run the build and test and then push builds to the Apache Docker Hub repository space (making the image pull’able with docker pull apache/solr:tag
The actual producing of official image will be yet another PR to the docker owned official-images repo.

    * Should publishing of new Docker be a RM responsibility, or something that happens right
after each release like the ref-guide?

I don’t have a strong opinion. I typically tried to do it as soon as I became aware of a new version via the solr-user mailing list or twitter.
Sometimes same day, sometimes it would take a week because of changes I need to make or extra things I wanted to do.
But if I’m more than a few days late someone would be asking about it :-)
The official library team review is usually very fast, same day or 24h.

See point #2 from LEGAL-503 above. If we want to officially document / endorse / link to the image on hub we may want to include the docker image in the VOTE. I see that the Beam project includes this in their release-guide (publishing SDK images): https://beam.apache.org/contribute/release-guide/. What they do is that push a RC tagged version to their docker-hub as part of the release and include it in the VOTE.

3. Legal stuff - when we as a project file a PR to update the official solr docker images,
are we then legally releasing a binary version of Solr?
    Technically it is Docker CI that build and publish the images, we just initiate it…

I don’t know about that (or how that matters?)

Oh, legal stuff matters a lot for Apache :) Again, I think LEGAL-503 answers this. Bullet #3 there requres the project to make sure that our Dockerfile does not bring in Cat-X licensed software into the Docker layers built by us. Since we base our image on the ‘openjdk’ base image, which contains GNU/Linux binaries and the JDK, the only things we'd need to verify is what we bring into our Docker layers through apt-get, wget etc. Below is a list of what I found:

acl - GPL - provides tool setfacl, used only in tests, can be removed?
dirmngr, gpg - GPL - used only during docker build phase, may be apt install and uninstalled in the same RUN command
lsof - BSD license
procps - GPL - provides the ‘ps’ command needed by bin/solr. This is part of openjdk:11 but not openjdk:11-slim...
wget - GPL - used during build only, can be uninstalled after use
netcat - PublicDomain
gosu - GPL - can be removed or replaced with su-exec (MIT)
tini - MIT

    Do we know any other ASF project that maintain their own official docker image?

I've looked at https://github.com/docker-library/official-images/tree/master/library and spotted https://github.com/carlossg/docker-maven which is maintained by an Apache committer.

So couchDB is another example. And there are so many other projects in Apache’s docker-hub org that I suppose there may be others.

Marcus wrote:

I think that regardless of what the community decides to do with the
docker-solr repo, a good first step would be to add a Docker folder to the
Apache repository that contains a base Dockerfile and a README. In that
README, users can be directed to the location of the docker-solr repo,
wherever that may be, or leverage the Dockerfile in the  Apache repo as a
starting point for building their own image.

I think that could be useful; but it then does start to become messy almost immediately: Users will expect these self-built images and the official images to work the same, and given that docker-solr has various extra scripts (eg to create collections at startup), you’d then have to copy them into the repo (and now have duplicate maintenance, need to test them). Or you could explicitly decide not to do that, but then your users will be asking how to achieve the same functionality with their images.

I would address this as a separate issue. Let’s get the existing image flow taken care of first.

Yes, it should be easy to build a docker image «from source», or at least as a gradle build task. That could piggy-back on the distro tgz file which should make it not too different - we just pull the release from local disk instead of from the mirrors. 

I also saw some projects that have Jenkins routinely publish SNAPSHOT releases to docker-hub, see e.g. https://hub.docker.com/r/apache/syncope/tags which is also nice if we want to have people test out things with unreleased versions or master branch, then it is always only a docker run command away :) 

Well, I hope other committers also join this discussion and bring perhaps other points of view here before we start fleshing out actual JIRA tasks to add to https://issues.apache.org/jira/browse/SOLR-14168.

If we end up releasing official Solr Docker images together with the normal release, it would be cool to add documentation to the RefGuide and perhaps tutorial, on how to run Solr with Docker.

Jan


Reply | Threaded
Open this post in threaded view
|

Re: Maintenance of Solr's official Dockerfile

Jan Høydahl / Cominvent
I propose we continue to work with the existing docker-solr repo for some time still, until we fully understand how we want to proceed with moving to ASF owned git infra and hub accounts.

I feel that some work should have higher priority for now:
- Document running Solr on Docker in Ref Guide
- Start thinking about how to include Docker image publishing in the release process
- Adding a simplistic Dockerfile to our main git repo and a gradle task for building
- Update the README in docker-solr repo to reflect the new ownership

Some of these could be sub tasks of SOLR-14168.

Other thoughts?

Jan

12. jan. 2020 kl. 04:46 skrev David Smiley <[hidden email]>:

> Yes, it should be easy to build a docker image «from source», or at least as a gradle build task. That could piggy-back on the distro tgz file which should make it not too different - we just pull the release from local disk instead of from the mirrors. 

We do this at Salesforce in our local Lucene_Solr fork to also produce a docker image.  It's not a big deal but I could share it if we want to consider going this direction.  It's kinda necessary if we want to release this all at once instead of requiring a 'tgz' be released first, which in turn somewhat requires some signatures of that binary that then become irrelevant to check when producing the Docker image.  It's also super nice for those who fork Solr to also produce a Docker image easily (like us).

~ David Smiley
Apache Lucene/Solr Search Developer


On Sat, Jan 11, 2020 at 5:45 PM Jan Høydahl <[hidden email]> wrote:
1. Are we allowed to maintain ASF code in a non-ASF repo? If not, how do we transition to
an ASF git repo?
    * Can it be a sub folder in our main repo or does it need to be a separate repo?

The way it works (from the official library’s point of view), is that we maintain https://github.com/docker-library/official-images/blob/master/library/solr which contains a link to a repo (in our case https://github.com/docker-solr/docker-solr.git) and particular git commit, and a particular directory for different versions. That is consumed by their build infrastructure. The library team reviews changes we make to that file, and the corresponding changes we made to the Dockerfiles and bash scripts in the docker-solr repo, so it needs to be readily available and it needs to be easy to see what has changed.

I think one could theoretically move this into the main Solr repo and point to its GitHub address, but that would make things slower and much harder to review. So I think it’s much better to keep the separate repo. I briefly looked for some official guidance on this, but couldn’t find it spelled out explicitly. I did see https://github.com/docker-library/official-images#maintainership which talks about maintaining git history.
Note also that I already use a “docker-solr” GitHub org for the repo, rather than my own account, to make it easier to vary ownership.

If you are dead-set to put it into the main repo, I’d run that discussion past the library team first before sinking engineering time.

I just discovered https://hub.docker.com/u/apache - which is Apache’s own docker org. I see some images there are hosted in separate apache git repos, example CouchDB: https://github.com/apache/couchdb-docker pushed to https://hub.docker.com/r/apache/couchdb - and https://hub.docker.com/_/couchdb (official). The source of both hub locations seems to be the same apache/couchdb-docker git repo. I see that the person who files PRs aginst the official image repo is Joan Touzet (http://people.apache.org/~wohali/) who is a CouchDB committer. Perhpas this is a model for us to follow.

We may also want to consult LEGAL-503 where the Beam project asked a similar question a few weeks ago, and the reply is:

if you would like to continue linking to the Docker release artifact from the https://beam.apache.org you will have:
1. Transition to the official ASF dockerhub org: https://hub.docker.com/u/apache
2. Start including that binary convenience artifact into your VOTE threads on Beam releases
3. Make sure that all Cat-X licenses are ONLY brought into your container via FROM statements

So bullet point #1 there answers this question. Regarding point #2 and #3 see below.

2. How will the current build/test/publish process need to change?
    * Can we continue using travis for CI?

In the short term, sure.

Travis has been great for us — it is free, it builds fast enough, the UI is nice, the config is simple, the integration is good, and support was helpful.
Last year Travis CI got acquired, followed by layoffs of senior engineering staff, so there are concerns about its future, but nothing has really changed to affect us.

I imagine it would be nicer to have it in the normal Apache Jenkins world, but I’m not volunteering for that migration. :-)

If we want to stay on Travis, there may be some configuration changes required (roles/permissions/credentials and such that are tied to my account).

Oh and just to make it clear: the CI does 2 things:
- it sets build status on GitHub commits (although there is currently no enforcement to allow only passing PRs to be merged or things like that, or have review/automerge workflows which would be nice to have)
- and it pushes builds to the https://hub.docker.com/repository/docker/dockersolr/docker-solr repo — but those are only used for testing, they are not the docker images that provide the official images. I've found that occasionally useful, but we could decide to not do that, or do it differently within the Apache infrastructure.

So I see other ASF projects using travis as well, perhaps ASF has an account/license? If we continue to use it or if we migrate to Jenkins, we either way need to run the build and test and then push builds to the Apache Docker Hub repository space (making the image pull’able with docker pull apache/solr:tag
The actual producing of official image will be yet another PR to the docker owned official-images repo.

    * Should publishing of new Docker be a RM responsibility, or something that happens right
after each release like the ref-guide?

I don’t have a strong opinion. I typically tried to do it as soon as I became aware of a new version via the solr-user mailing list or twitter.
Sometimes same day, sometimes it would take a week because of changes I need to make or extra things I wanted to do.
But if I’m more than a few days late someone would be asking about it :-)
The official library team review is usually very fast, same day or 24h.

See point #2 from LEGAL-503 above. If we want to officially document / endorse / link to the image on hub we may want to include the docker image in the VOTE. I see that the Beam project includes this in their release-guide (publishing SDK images): https://beam.apache.org/contribute/release-guide/. What they do is that push a RC tagged version to their docker-hub as part of the release and include it in the VOTE.

3. Legal stuff - when we as a project file a PR to update the official solr docker images,
are we then legally releasing a binary version of Solr?
    Technically it is Docker CI that build and publish the images, we just initiate it…

I don’t know about that (or how that matters?)

Oh, legal stuff matters a lot for Apache :) Again, I think LEGAL-503 answers this. Bullet #3 there requres the project to make sure that our Dockerfile does not bring in Cat-X licensed software into the Docker layers built by us. Since we base our image on the ‘openjdk’ base image, which contains GNU/Linux binaries and the JDK, the only things we'd need to verify is what we bring into our Docker layers through apt-get, wget etc. Below is a list of what I found:

acl - GPL - provides tool setfacl, used only in tests, can be removed?
dirmngr, gpg - GPL - used only during docker build phase, may be apt install and uninstalled in the same RUN command
lsof - BSD license
procps - GPL - provides the ‘ps’ command needed by bin/solr. This is part of openjdk:11 but not openjdk:11-slim...
wget - GPL - used during build only, can be uninstalled after use
netcat - PublicDomain
gosu - GPL - can be removed or replaced with su-exec (MIT)
tini - MIT

    Do we know any other ASF project that maintain their own official docker image?

I've looked at https://github.com/docker-library/official-images/tree/master/library and spotted https://github.com/carlossg/docker-maven which is maintained by an Apache committer.

So couchDB is another example. And there are so many other projects in Apache’s docker-hub org that I suppose there may be others.

Marcus wrote:

I think that regardless of what the community decides to do with the
docker-solr repo, a good first step would be to add a Docker folder to the
Apache repository that contains a base Dockerfile and a README. In that
README, users can be directed to the location of the docker-solr repo,
wherever that may be, or leverage the Dockerfile in the  Apache repo as a
starting point for building their own image.

I think that could be useful; but it then does start to become messy almost immediately: Users will expect these self-built images and the official images to work the same, and given that docker-solr has various extra scripts (eg to create collections at startup), you’d then have to copy them into the repo (and now have duplicate maintenance, need to test them). Or you could explicitly decide not to do that, but then your users will be asking how to achieve the same functionality with their images.

I would address this as a separate issue. Let’s get the existing image flow taken care of first.

Yes, it should be easy to build a docker image «from source», or at least as a gradle build task. That could piggy-back on the distro tgz file which should make it not too different - we just pull the release from local disk instead of from the mirrors. 

I also saw some projects that have Jenkins routinely publish SNAPSHOT releases to docker-hub, see e.g. https://hub.docker.com/r/apache/syncope/tags which is also nice if we want to have people test out things with unreleased versions or master branch, then it is always only a docker run command away :) 

Well, I hope other committers also join this discussion and bring perhaps other points of view here before we start fleshing out actual JIRA tasks to add to https://issues.apache.org/jira/browse/SOLR-14168.

If we end up releasing official Solr Docker images together with the normal release, it would be cool to add documentation to the RefGuide and perhaps tutorial, on how to run Solr with Docker.

Jan



Reply | Threaded
Open this post in threaded view
|

Re: Maintenance of Solr's official Dockerfile

Houston Putman
I have a separate question about the release process. As I currently understand it, whenever docker-solr is released, every version in its configs is rebuilt and re-released. This means that versions of the docker-solr images are not necessarily concrete, whereas the versions of solr are very concrete.

I imagine that by taking over the docker image as an official Apache image, this re-releasing of versions will no longer be allowed. That makes me think that adding a docker publishing step in the release process is necessary. There will also need to be extensive testing of that docker image in that process because we won't be able to retroactively fix issues anymore.

If Apache is more relaxed about re-releasing the same version, then this is less of an issue.

- Houston

On Thu, Feb 13, 2020 at 9:42 AM Jan Høydahl <[hidden email]> wrote:
I propose we continue to work with the existing docker-solr repo for some time still, until we fully understand how we want to proceed with moving to ASF owned git infra and hub accounts.

I feel that some work should have higher priority for now:
- Document running Solr on Docker in Ref Guide
- Start thinking about how to include Docker image publishing in the release process
- Adding a simplistic Dockerfile to our main git repo and a gradle task for building
- Update the README in docker-solr repo to reflect the new ownership

Some of these could be sub tasks of SOLR-14168.

Other thoughts?

Jan

12. jan. 2020 kl. 04:46 skrev David Smiley <[hidden email]>:

> Yes, it should be easy to build a docker image «from source», or at least as a gradle build task. That could piggy-back on the distro tgz file which should make it not too different - we just pull the release from local disk instead of from the mirrors. 

We do this at Salesforce in our local Lucene_Solr fork to also produce a docker image.  It's not a big deal but I could share it if we want to consider going this direction.  It's kinda necessary if we want to release this all at once instead of requiring a 'tgz' be released first, which in turn somewhat requires some signatures of that binary that then become irrelevant to check when producing the Docker image.  It's also super nice for those who fork Solr to also produce a Docker image easily (like us).

~ David Smiley
Apache Lucene/Solr Search Developer


On Sat, Jan 11, 2020 at 5:45 PM Jan Høydahl <[hidden email]> wrote:
1. Are we allowed to maintain ASF code in a non-ASF repo? If not, how do we transition to
an ASF git repo?
    * Can it be a sub folder in our main repo or does it need to be a separate repo?

The way it works (from the official library’s point of view), is that we maintain https://github.com/docker-library/official-images/blob/master/library/solr which contains a link to a repo (in our case https://github.com/docker-solr/docker-solr.git) and particular git commit, and a particular directory for different versions. That is consumed by their build infrastructure. The library team reviews changes we make to that file, and the corresponding changes we made to the Dockerfiles and bash scripts in the docker-solr repo, so it needs to be readily available and it needs to be easy to see what has changed.

I think one could theoretically move this into the main Solr repo and point to its GitHub address, but that would make things slower and much harder to review. So I think it’s much better to keep the separate repo. I briefly looked for some official guidance on this, but couldn’t find it spelled out explicitly. I did see https://github.com/docker-library/official-images#maintainership which talks about maintaining git history.
Note also that I already use a “docker-solr” GitHub org for the repo, rather than my own account, to make it easier to vary ownership.

If you are dead-set to put it into the main repo, I’d run that discussion past the library team first before sinking engineering time.

I just discovered https://hub.docker.com/u/apache - which is Apache’s own docker org. I see some images there are hosted in separate apache git repos, example CouchDB: https://github.com/apache/couchdb-docker pushed to https://hub.docker.com/r/apache/couchdb - and https://hub.docker.com/_/couchdb (official). The source of both hub locations seems to be the same apache/couchdb-docker git repo. I see that the person who files PRs aginst the official image repo is Joan Touzet (http://people.apache.org/~wohali/) who is a CouchDB committer. Perhpas this is a model for us to follow.

We may also want to consult LEGAL-503 where the Beam project asked a similar question a few weeks ago, and the reply is:

if you would like to continue linking to the Docker release artifact from the https://beam.apache.org you will have:
1. Transition to the official ASF dockerhub org: https://hub.docker.com/u/apache
2. Start including that binary convenience artifact into your VOTE threads on Beam releases
3. Make sure that all Cat-X licenses are ONLY brought into your container via FROM statements

So bullet point #1 there answers this question. Regarding point #2 and #3 see below.

2. How will the current build/test/publish process need to change?
    * Can we continue using travis for CI?

In the short term, sure.

Travis has been great for us — it is free, it builds fast enough, the UI is nice, the config is simple, the integration is good, and support was helpful.
Last year Travis CI got acquired, followed by layoffs of senior engineering staff, so there are concerns about its future, but nothing has really changed to affect us.

I imagine it would be nicer to have it in the normal Apache Jenkins world, but I’m not volunteering for that migration. :-)

If we want to stay on Travis, there may be some configuration changes required (roles/permissions/credentials and such that are tied to my account).

Oh and just to make it clear: the CI does 2 things:
- it sets build status on GitHub commits (although there is currently no enforcement to allow only passing PRs to be merged or things like that, or have review/automerge workflows which would be nice to have)
- and it pushes builds to the https://hub.docker.com/repository/docker/dockersolr/docker-solr repo — but those are only used for testing, they are not the docker images that provide the official images. I've found that occasionally useful, but we could decide to not do that, or do it differently within the Apache infrastructure.

So I see other ASF projects using travis as well, perhaps ASF has an account/license? If we continue to use it or if we migrate to Jenkins, we either way need to run the build and test and then push builds to the Apache Docker Hub repository space (making the image pull’able with docker pull apache/solr:tag
The actual producing of official image will be yet another PR to the docker owned official-images repo.

    * Should publishing of new Docker be a RM responsibility, or something that happens right
after each release like the ref-guide?

I don’t have a strong opinion. I typically tried to do it as soon as I became aware of a new version via the solr-user mailing list or twitter.
Sometimes same day, sometimes it would take a week because of changes I need to make or extra things I wanted to do.
But if I’m more than a few days late someone would be asking about it :-)
The official library team review is usually very fast, same day or 24h.

See point #2 from LEGAL-503 above. If we want to officially document / endorse / link to the image on hub we may want to include the docker image in the VOTE. I see that the Beam project includes this in their release-guide (publishing SDK images): https://beam.apache.org/contribute/release-guide/. What they do is that push a RC tagged version to their docker-hub as part of the release and include it in the VOTE.

3. Legal stuff - when we as a project file a PR to update the official solr docker images,
are we then legally releasing a binary version of Solr?
    Technically it is Docker CI that build and publish the images, we just initiate it…

I don’t know about that (or how that matters?)

Oh, legal stuff matters a lot for Apache :) Again, I think LEGAL-503 answers this. Bullet #3 there requres the project to make sure that our Dockerfile does not bring in Cat-X licensed software into the Docker layers built by us. Since we base our image on the ‘openjdk’ base image, which contains GNU/Linux binaries and the JDK, the only things we'd need to verify is what we bring into our Docker layers through apt-get, wget etc. Below is a list of what I found:

acl - GPL - provides tool setfacl, used only in tests, can be removed?
dirmngr, gpg - GPL - used only during docker build phase, may be apt install and uninstalled in the same RUN command
lsof - BSD license
procps - GPL - provides the ‘ps’ command needed by bin/solr. This is part of openjdk:11 but not openjdk:11-slim...
wget - GPL - used during build only, can be uninstalled after use
netcat - PublicDomain
gosu - GPL - can be removed or replaced with su-exec (MIT)
tini - MIT

    Do we know any other ASF project that maintain their own official docker image?

I've looked at https://github.com/docker-library/official-images/tree/master/library and spotted https://github.com/carlossg/docker-maven which is maintained by an Apache committer.

So couchDB is another example. And there are so many other projects in Apache’s docker-hub org that I suppose there may be others.

Marcus wrote:

I think that regardless of what the community decides to do with the
docker-solr repo, a good first step would be to add a Docker folder to the
Apache repository that contains a base Dockerfile and a README. In that
README, users can be directed to the location of the docker-solr repo,
wherever that may be, or leverage the Dockerfile in the  Apache repo as a
starting point for building their own image.

I think that could be useful; but it then does start to become messy almost immediately: Users will expect these self-built images and the official images to work the same, and given that docker-solr has various extra scripts (eg to create collections at startup), you’d then have to copy them into the repo (and now have duplicate maintenance, need to test them). Or you could explicitly decide not to do that, but then your users will be asking how to achieve the same functionality with their images.

I would address this as a separate issue. Let’s get the existing image flow taken care of first.

Yes, it should be easy to build a docker image «from source», or at least as a gradle build task. That could piggy-back on the distro tgz file which should make it not too different - we just pull the release from local disk instead of from the mirrors. 

I also saw some projects that have Jenkins routinely publish SNAPSHOT releases to docker-hub, see e.g. https://hub.docker.com/r/apache/syncope/tags which is also nice if we want to have people test out things with unreleased versions or master branch, then it is always only a docker run command away :) 

Well, I hope other committers also join this discussion and bring perhaps other points of view here before we start fleshing out actual JIRA tasks to add to https://issues.apache.org/jira/browse/SOLR-14168.

If we end up releasing official Solr Docker images together with the normal release, it would be cool to add documentation to the RefGuide and perhaps tutorial, on how to run Solr with Docker.

Jan



Reply | Threaded
Open this post in threaded view
|

Re: Maintenance of Solr's official Dockerfile

Jan Høydahl / Cominvent
I think se are not technically re-releasing Solr 8.4 even if that official docker image gets re-built with latest versions of Ubuntu and JRE11 when re release e.g. 8.5. 
The Apache Solr/Lucene binaries are still the exact same bits, we just change the base image — equivalent to upgrading Linux and Java on physical servers.
Of course there could be bugs manifested with a certain combination of Linux + JRE + Solr that potentially would cause solr:x.y to break further down the road, that the simple shell tests run during release might not catch.

Jan

25. feb. 2020 kl. 16:13 skrev Houston Putman <[hidden email]>:

I have a separate question about the release process. As I currently understand it, whenever docker-solr is released, every version in its configs is rebuilt and re-released. This means that versions of the docker-solr images are not necessarily concrete, whereas the versions of solr are very concrete.

I imagine that by taking over the docker image as an official Apache image, this re-releasing of versions will no longer be allowed. That makes me think that adding a docker publishing step in the release process is necessary. There will also need to be extensive testing of that docker image in that process because we won't be able to retroactively fix issues anymore.

If Apache is more relaxed about re-releasing the same version, then this is less of an issue.

- Houston

On Thu, Feb 13, 2020 at 9:42 AM Jan Høydahl <[hidden email]> wrote:
I propose we continue to work with the existing docker-solr repo for some time still, until we fully understand how we want to proceed with moving to ASF owned git infra and hub accounts.

I feel that some work should have higher priority for now:
- Document running Solr on Docker in Ref Guide
- Start thinking about how to include Docker image publishing in the release process
- Adding a simplistic Dockerfile to our main git repo and a gradle task for building
- Update the README in docker-solr repo to reflect the new ownership

Some of these could be sub tasks of SOLR-14168.

Other thoughts?

Jan

12. jan. 2020 kl. 04:46 skrev David Smiley <[hidden email]>:

> Yes, it should be easy to build a docker image «from source», or at least as a gradle build task. That could piggy-back on the distro tgz file which should make it not too different - we just pull the release from local disk instead of from the mirrors. 

We do this at Salesforce in our local Lucene_Solr fork to also produce a docker image.  It's not a big deal but I could share it if we want to consider going this direction.  It's kinda necessary if we want to release this all at once instead of requiring a 'tgz' be released first, which in turn somewhat requires some signatures of that binary that then become irrelevant to check when producing the Docker image.  It's also super nice for those who fork Solr to also produce a Docker image easily (like us).

~ David Smiley
Apache Lucene/Solr Search Developer


On Sat, Jan 11, 2020 at 5:45 PM Jan Høydahl <[hidden email]> wrote:
1. Are we allowed to maintain ASF code in a non-ASF repo? If not, how do we transition to
an ASF git repo?
    * Can it be a sub folder in our main repo or does it need to be a separate repo?

The way it works (from the official library’s point of view), is that we maintain https://github.com/docker-library/official-images/blob/master/library/solr which contains a link to a repo (in our case https://github.com/docker-solr/docker-solr.git) and particular git commit, and a particular directory for different versions. That is consumed by their build infrastructure. The library team reviews changes we make to that file, and the corresponding changes we made to the Dockerfiles and bash scripts in the docker-solr repo, so it needs to be readily available and it needs to be easy to see what has changed.

I think one could theoretically move this into the main Solr repo and point to its GitHub address, but that would make things slower and much harder to review. So I think it’s much better to keep the separate repo. I briefly looked for some official guidance on this, but couldn’t find it spelled out explicitly. I did see https://github.com/docker-library/official-images#maintainership which talks about maintaining git history.
Note also that I already use a “docker-solr” GitHub org for the repo, rather than my own account, to make it easier to vary ownership.

If you are dead-set to put it into the main repo, I’d run that discussion past the library team first before sinking engineering time.

I just discovered https://hub.docker.com/u/apache - which is Apache’s own docker org. I see some images there are hosted in separate apache git repos, example CouchDB: https://github.com/apache/couchdb-docker pushed to https://hub.docker.com/r/apache/couchdb - and https://hub.docker.com/_/couchdb (official). The source of both hub locations seems to be the same apache/couchdb-docker git repo. I see that the person who files PRs aginst the official image repo is Joan Touzet (http://people.apache.org/~wohali/) who is a CouchDB committer. Perhpas this is a model for us to follow.

We may also want to consult LEGAL-503 where the Beam project asked a similar question a few weeks ago, and the reply is:

if you would like to continue linking to the Docker release artifact from the https://beam.apache.org you will have:
1. Transition to the official ASF dockerhub org: https://hub.docker.com/u/apache
2. Start including that binary convenience artifact into your VOTE threads on Beam releases
3. Make sure that all Cat-X licenses are ONLY brought into your container via FROM statements

So bullet point #1 there answers this question. Regarding point #2 and #3 see below.

2. How will the current build/test/publish process need to change?
    * Can we continue using travis for CI?

In the short term, sure.

Travis has been great for us — it is free, it builds fast enough, the UI is nice, the config is simple, the integration is good, and support was helpful.
Last year Travis CI got acquired, followed by layoffs of senior engineering staff, so there are concerns about its future, but nothing has really changed to affect us.

I imagine it would be nicer to have it in the normal Apache Jenkins world, but I’m not volunteering for that migration. :-)

If we want to stay on Travis, there may be some configuration changes required (roles/permissions/credentials and such that are tied to my account).

Oh and just to make it clear: the CI does 2 things:
- it sets build status on GitHub commits (although there is currently no enforcement to allow only passing PRs to be merged or things like that, or have review/automerge workflows which would be nice to have)
- and it pushes builds to the https://hub.docker.com/repository/docker/dockersolr/docker-solr repo — but those are only used for testing, they are not the docker images that provide the official images. I've found that occasionally useful, but we could decide to not do that, or do it differently within the Apache infrastructure.

So I see other ASF projects using travis as well, perhaps ASF has an account/license? If we continue to use it or if we migrate to Jenkins, we either way need to run the build and test and then push builds to the Apache Docker Hub repository space (making the image pull’able with docker pull apache/solr:tag
The actual producing of official image will be yet another PR to the docker owned official-images repo.

    * Should publishing of new Docker be a RM responsibility, or something that happens right
after each release like the ref-guide?

I don’t have a strong opinion. I typically tried to do it as soon as I became aware of a new version via the solr-user mailing list or twitter.
Sometimes same day, sometimes it would take a week because of changes I need to make or extra things I wanted to do.
But if I’m more than a few days late someone would be asking about it :-)
The official library team review is usually very fast, same day or 24h.

See point #2 from LEGAL-503 above. If we want to officially document / endorse / link to the image on hub we may want to include the docker image in the VOTE. I see that the Beam project includes this in their release-guide (publishing SDK images): https://beam.apache.org/contribute/release-guide/. What they do is that push a RC tagged version to their docker-hub as part of the release and include it in the VOTE.

3. Legal stuff - when we as a project file a PR to update the official solr docker images,
are we then legally releasing a binary version of Solr?
    Technically it is Docker CI that build and publish the images, we just initiate it…

I don’t know about that (or how that matters?)

Oh, legal stuff matters a lot for Apache :) Again, I think LEGAL-503 answers this. Bullet #3 there requres the project to make sure that our Dockerfile does not bring in Cat-X licensed software into the Docker layers built by us. Since we base our image on the ‘openjdk’ base image, which contains GNU/Linux binaries and the JDK, the only things we'd need to verify is what we bring into our Docker layers through apt-get, wget etc. Below is a list of what I found:

acl - GPL - provides tool setfacl, used only in tests, can be removed?
dirmngr, gpg - GPL - used only during docker build phase, may be apt install and uninstalled in the same RUN command
lsof - BSD license
procps - GPL - provides the ‘ps’ command needed by bin/solr. This is part of openjdk:11 but not openjdk:11-slim...
wget - GPL - used during build only, can be uninstalled after use
netcat - PublicDomain
gosu - GPL - can be removed or replaced with su-exec (MIT)
tini - MIT

    Do we know any other ASF project that maintain their own official docker image?

I've looked at https://github.com/docker-library/official-images/tree/master/library and spotted https://github.com/carlossg/docker-maven which is maintained by an Apache committer.

So couchDB is another example. And there are so many other projects in Apache’s docker-hub org that I suppose there may be others.

Marcus wrote:

I think that regardless of what the community decides to do with the
docker-solr repo, a good first step would be to add a Docker folder to the
Apache repository that contains a base Dockerfile and a README. In that
README, users can be directed to the location of the docker-solr repo,
wherever that may be, or leverage the Dockerfile in the  Apache repo as a
starting point for building their own image.

I think that could be useful; but it then does start to become messy almost immediately: Users will expect these self-built images and the official images to work the same, and given that docker-solr has various extra scripts (eg to create collections at startup), you’d then have to copy them into the repo (and now have duplicate maintenance, need to test them). Or you could explicitly decide not to do that, but then your users will be asking how to achieve the same functionality with their images.

I would address this as a separate issue. Let’s get the existing image flow taken care of first.

Yes, it should be easy to build a docker image «from source», or at least as a gradle build task. That could piggy-back on the distro tgz file which should make it not too different - we just pull the release from local disk instead of from the mirrors. 

I also saw some projects that have Jenkins routinely publish SNAPSHOT releases to docker-hub, see e.g. https://hub.docker.com/r/apache/syncope/tags which is also nice if we want to have people test out things with unreleased versions or master branch, then it is always only a docker run command away :) 

Well, I hope other committers also join this discussion and bring perhaps other points of view here before we start fleshing out actual JIRA tasks to add to https://issues.apache.org/jira/browse/SOLR-14168.

If we end up releasing official Solr Docker images together with the normal release, it would be cool to add documentation to the RefGuide and perhaps tutorial, on how to run Solr with Docker.

Jan




Reply | Threaded
Open this post in threaded view
|

Re: Maintenance of Solr's official Dockerfile

david.w.smiley@gmail.com
I wonder what official information ASF provides on this matter.  I did some searching and found this page: https://cwiki.apache.org/confluence/display/INCUBATOR/DistributionGuidelines. (see "Docker" heading) which was nicely short and to the point but doesn't seem to answer.  "DRAFT" 4 times is at the top of this page.  And it doesn't address all questions.  I wonder about Houston's point as well; I'm not sure we can simply update an image just because the JAR files didn't change.  Maybe; maybe not.

~ David Smiley
Apache Lucene/Solr Search Developer


On Tue, Feb 25, 2020 at 10:35 AM Jan Høydahl <[hidden email]> wrote:
I think se are not technically re-releasing Solr 8.4 even if that official docker image gets re-built with latest versions of Ubuntu and JRE11 when re release e.g. 8.5. 
The Apache Solr/Lucene binaries are still the exact same bits, we just change the base image — equivalent to upgrading Linux and Java on physical servers.
Of course there could be bugs manifested with a certain combination of Linux + JRE + Solr that potentially would cause solr:x.y to break further down the road, that the simple shell tests run during release might not catch.

Jan

25. feb. 2020 kl. 16:13 skrev Houston Putman <[hidden email]>:

I have a separate question about the release process. As I currently understand it, whenever docker-solr is released, every version in its configs is rebuilt and re-released. This means that versions of the docker-solr images are not necessarily concrete, whereas the versions of solr are very concrete.

I imagine that by taking over the docker image as an official Apache image, this re-releasing of versions will no longer be allowed. That makes me think that adding a docker publishing step in the release process is necessary. There will also need to be extensive testing of that docker image in that process because we won't be able to retroactively fix issues anymore.

If Apache is more relaxed about re-releasing the same version, then this is less of an issue.

- Houston

On Thu, Feb 13, 2020 at 9:42 AM Jan Høydahl <[hidden email]> wrote:
I propose we continue to work with the existing docker-solr repo for some time still, until we fully understand how we want to proceed with moving to ASF owned git infra and hub accounts.

I feel that some work should have higher priority for now:
- Document running Solr on Docker in Ref Guide
- Start thinking about how to include Docker image publishing in the release process
- Adding a simplistic Dockerfile to our main git repo and a gradle task for building
- Update the README in docker-solr repo to reflect the new ownership

Some of these could be sub tasks of SOLR-14168.

Other thoughts?

Jan

12. jan. 2020 kl. 04:46 skrev David Smiley <[hidden email]>:

> Yes, it should be easy to build a docker image «from source», or at least as a gradle build task. That could piggy-back on the distro tgz file which should make it not too different - we just pull the release from local disk instead of from the mirrors. 

We do this at Salesforce in our local Lucene_Solr fork to also produce a docker image.  It's not a big deal but I could share it if we want to consider going this direction.  It's kinda necessary if we want to release this all at once instead of requiring a 'tgz' be released first, which in turn somewhat requires some signatures of that binary that then become irrelevant to check when producing the Docker image.  It's also super nice for those who fork Solr to also produce a Docker image easily (like us).

~ David Smiley
Apache Lucene/Solr Search Developer


On Sat, Jan 11, 2020 at 5:45 PM Jan Høydahl <[hidden email]> wrote:
1. Are we allowed to maintain ASF code in a non-ASF repo? If not, how do we transition to
an ASF git repo?
    * Can it be a sub folder in our main repo or does it need to be a separate repo?

The way it works (from the official library’s point of view), is that we maintain https://github.com/docker-library/official-images/blob/master/library/solr which contains a link to a repo (in our case https://github.com/docker-solr/docker-solr.git) and particular git commit, and a particular directory for different versions. That is consumed by their build infrastructure. The library team reviews changes we make to that file, and the corresponding changes we made to the Dockerfiles and bash scripts in the docker-solr repo, so it needs to be readily available and it needs to be easy to see what has changed.

I think one could theoretically move this into the main Solr repo and point to its GitHub address, but that would make things slower and much harder to review. So I think it’s much better to keep the separate repo. I briefly looked for some official guidance on this, but couldn’t find it spelled out explicitly. I did see https://github.com/docker-library/official-images#maintainership which talks about maintaining git history.
Note also that I already use a “docker-solr” GitHub org for the repo, rather than my own account, to make it easier to vary ownership.

If you are dead-set to put it into the main repo, I’d run that discussion past the library team first before sinking engineering time.

I just discovered https://hub.docker.com/u/apache - which is Apache’s own docker org. I see some images there are hosted in separate apache git repos, example CouchDB: https://github.com/apache/couchdb-docker pushed to https://hub.docker.com/r/apache/couchdb - and https://hub.docker.com/_/couchdb (official). The source of both hub locations seems to be the same apache/couchdb-docker git repo. I see that the person who files PRs aginst the official image repo is Joan Touzet (http://people.apache.org/~wohali/) who is a CouchDB committer. Perhpas this is a model for us to follow.

We may also want to consult LEGAL-503 where the Beam project asked a similar question a few weeks ago, and the reply is:

if you would like to continue linking to the Docker release artifact from the https://beam.apache.org you will have:
1. Transition to the official ASF dockerhub org: https://hub.docker.com/u/apache
2. Start including that binary convenience artifact into your VOTE threads on Beam releases
3. Make sure that all Cat-X licenses are ONLY brought into your container via FROM statements

So bullet point #1 there answers this question. Regarding point #2 and #3 see below.

2. How will the current build/test/publish process need to change?
    * Can we continue using travis for CI?

In the short term, sure.

Travis has been great for us — it is free, it builds fast enough, the UI is nice, the config is simple, the integration is good, and support was helpful.
Last year Travis CI got acquired, followed by layoffs of senior engineering staff, so there are concerns about its future, but nothing has really changed to affect us.

I imagine it would be nicer to have it in the normal Apache Jenkins world, but I’m not volunteering for that migration. :-)

If we want to stay on Travis, there may be some configuration changes required (roles/permissions/credentials and such that are tied to my account).

Oh and just to make it clear: the CI does 2 things:
- it sets build status on GitHub commits (although there is currently no enforcement to allow only passing PRs to be merged or things like that, or have review/automerge workflows which would be nice to have)
- and it pushes builds to the https://hub.docker.com/repository/docker/dockersolr/docker-solr repo — but those are only used for testing, they are not the docker images that provide the official images. I've found that occasionally useful, but we could decide to not do that, or do it differently within the Apache infrastructure.

So I see other ASF projects using travis as well, perhaps ASF has an account/license? If we continue to use it or if we migrate to Jenkins, we either way need to run the build and test and then push builds to the Apache Docker Hub repository space (making the image pull’able with docker pull apache/solr:tag
The actual producing of official image will be yet another PR to the docker owned official-images repo.

    * Should publishing of new Docker be a RM responsibility, or something that happens right
after each release like the ref-guide?

I don’t have a strong opinion. I typically tried to do it as soon as I became aware of a new version via the solr-user mailing list or twitter.
Sometimes same day, sometimes it would take a week because of changes I need to make or extra things I wanted to do.
But if I’m more than a few days late someone would be asking about it :-)
The official library team review is usually very fast, same day or 24h.

See point #2 from LEGAL-503 above. If we want to officially document / endorse / link to the image on hub we may want to include the docker image in the VOTE. I see that the Beam project includes this in their release-guide (publishing SDK images): https://beam.apache.org/contribute/release-guide/. What they do is that push a RC tagged version to their docker-hub as part of the release and include it in the VOTE.

3. Legal stuff - when we as a project file a PR to update the official solr docker images,
are we then legally releasing a binary version of Solr?
    Technically it is Docker CI that build and publish the images, we just initiate it…

I don’t know about that (or how that matters?)

Oh, legal stuff matters a lot for Apache :) Again, I think LEGAL-503 answers this. Bullet #3 there requres the project to make sure that our Dockerfile does not bring in Cat-X licensed software into the Docker layers built by us. Since we base our image on the ‘openjdk’ base image, which contains GNU/Linux binaries and the JDK, the only things we'd need to verify is what we bring into our Docker layers through apt-get, wget etc. Below is a list of what I found:

acl - GPL - provides tool setfacl, used only in tests, can be removed?
dirmngr, gpg - GPL - used only during docker build phase, may be apt install and uninstalled in the same RUN command
lsof - BSD license
procps - GPL - provides the ‘ps’ command needed by bin/solr. This is part of openjdk:11 but not openjdk:11-slim...
wget - GPL - used during build only, can be uninstalled after use
netcat - PublicDomain
gosu - GPL - can be removed or replaced with su-exec (MIT)
tini - MIT

    Do we know any other ASF project that maintain their own official docker image?

I've looked at https://github.com/docker-library/official-images/tree/master/library and spotted https://github.com/carlossg/docker-maven which is maintained by an Apache committer.

So couchDB is another example. And there are so many other projects in Apache’s docker-hub org that I suppose there may be others.

Marcus wrote:

I think that regardless of what the community decides to do with the
docker-solr repo, a good first step would be to add a Docker folder to the
Apache repository that contains a base Dockerfile and a README. In that
README, users can be directed to the location of the docker-solr repo,
wherever that may be, or leverage the Dockerfile in the  Apache repo as a
starting point for building their own image.

I think that could be useful; but it then does start to become messy almost immediately: Users will expect these self-built images and the official images to work the same, and given that docker-solr has various extra scripts (eg to create collections at startup), you’d then have to copy them into the repo (and now have duplicate maintenance, need to test them). Or you could explicitly decide not to do that, but then your users will be asking how to achieve the same functionality with their images.

I would address this as a separate issue. Let’s get the existing image flow taken care of first.

Yes, it should be easy to build a docker image «from source», or at least as a gradle build task. That could piggy-back on the distro tgz file which should make it not too different - we just pull the release from local disk instead of from the mirrors. 

I also saw some projects that have Jenkins routinely publish SNAPSHOT releases to docker-hub, see e.g. https://hub.docker.com/r/apache/syncope/tags which is also nice if we want to have people test out things with unreleased versions or master branch, then it is always only a docker run command away :) 

Well, I hope other committers also join this discussion and bring perhaps other points of view here before we start fleshing out actual JIRA tasks to add to https://issues.apache.org/jira/browse/SOLR-14168.

If we end up releasing official Solr Docker images together with the normal release, it would be cool to add documentation to the RefGuide and perhaps tutorial, on how to run Solr with Docker.

Jan




Reply | Threaded
Open this post in threaded view
|

Re: Maintenance of Solr's official Dockerfile

Alan Woodward-3
Is the release of the docker image going to be part of the standard Lucene/Solr release process? I ask because I’m planning on starting an 8.5 release next week, and I know nothing about Docker images...

On 25 Feb 2020, at 16:04, David Smiley <[hidden email]> wrote:

I wonder what official information ASF provides on this matter.  I did some searching and found this page: https://cwiki.apache.org/confluence/display/INCUBATOR/DistributionGuidelines. (see "Docker" heading) which was nicely short and to the point but doesn't seem to answer.  "DRAFT" 4 times is at the top of this page.  And it doesn't address all questions.  I wonder about Houston's point as well; I'm not sure we can simply update an image just because the JAR files didn't change.  Maybe; maybe not.

~ David Smiley
Apache Lucene/Solr Search Developer


On Tue, Feb 25, 2020 at 10:35 AM Jan Høydahl <[hidden email]> wrote:
I think se are not technically re-releasing Solr 8.4 even if that official docker image gets re-built with latest versions of Ubuntu and JRE11 when re release e.g. 8.5. 
The Apache Solr/Lucene binaries are still the exact same bits, we just change the base image — equivalent to upgrading Linux and Java on physical servers.
Of course there could be bugs manifested with a certain combination of Linux + JRE + Solr that potentially would cause solr:x.y to break further down the road, that the simple shell tests run during release might not catch.

Jan

25. feb. 2020 kl. 16:13 skrev Houston Putman <[hidden email]>:

I have a separate question about the release process. As I currently understand it, whenever docker-solr is released, every version in its configs is rebuilt and re-released. This means that versions of the docker-solr images are not necessarily concrete, whereas the versions of solr are very concrete.

I imagine that by taking over the docker image as an official Apache image, this re-releasing of versions will no longer be allowed. That makes me think that adding a docker publishing step in the release process is necessary. There will also need to be extensive testing of that docker image in that process because we won't be able to retroactively fix issues anymore.

If Apache is more relaxed about re-releasing the same version, then this is less of an issue.

- Houston

On Thu, Feb 13, 2020 at 9:42 AM Jan Høydahl <[hidden email]> wrote:
I propose we continue to work with the existing docker-solr repo for some time still, until we fully understand how we want to proceed with moving to ASF owned git infra and hub accounts.

I feel that some work should have higher priority for now:
- Document running Solr on Docker in Ref Guide
- Start thinking about how to include Docker image publishing in the release process
- Adding a simplistic Dockerfile to our main git repo and a gradle task for building
- Update the README in docker-solr repo to reflect the new ownership

Some of these could be sub tasks of SOLR-14168.

Other thoughts?

Jan

12. jan. 2020 kl. 04:46 skrev David Smiley <[hidden email]>:

> Yes, it should be easy to build a docker image «from source», or at least as a gradle build task. That could piggy-back on the distro tgz file which should make it not too different - we just pull the release from local disk instead of from the mirrors. 

We do this at Salesforce in our local Lucene_Solr fork to also produce a docker image.  It's not a big deal but I could share it if we want to consider going this direction.  It's kinda necessary if we want to release this all at once instead of requiring a 'tgz' be released first, which in turn somewhat requires some signatures of that binary that then become irrelevant to check when producing the Docker image.  It's also super nice for those who fork Solr to also produce a Docker image easily (like us).

~ David Smiley
Apache Lucene/Solr Search Developer


On Sat, Jan 11, 2020 at 5:45 PM Jan Høydahl <[hidden email]> wrote:
1. Are we allowed to maintain ASF code in a non-ASF repo? If not, how do we transition to
an ASF git repo?
    * Can it be a sub folder in our main repo or does it need to be a separate repo?

The way it works (from the official library’s point of view), is that we maintain https://github.com/docker-library/official-images/blob/master/library/solr which contains a link to a repo (in our case https://github.com/docker-solr/docker-solr.git) and particular git commit, and a particular directory for different versions. That is consumed by their build infrastructure. The library team reviews changes we make to that file, and the corresponding changes we made to the Dockerfiles and bash scripts in the docker-solr repo, so it needs to be readily available and it needs to be easy to see what has changed.

I think one could theoretically move this into the main Solr repo and point to its GitHub address, but that would make things slower and much harder to review. So I think it’s much better to keep the separate repo. I briefly looked for some official guidance on this, but couldn’t find it spelled out explicitly. I did see https://github.com/docker-library/official-images#maintainership which talks about maintaining git history.
Note also that I already use a “docker-solr” GitHub org for the repo, rather than my own account, to make it easier to vary ownership.

If you are dead-set to put it into the main repo, I’d run that discussion past the library team first before sinking engineering time.

I just discovered https://hub.docker.com/u/apache - which is Apache’s own docker org. I see some images there are hosted in separate apache git repos, example CouchDB: https://github.com/apache/couchdb-docker pushed to https://hub.docker.com/r/apache/couchdb - and https://hub.docker.com/_/couchdb (official). The source of both hub locations seems to be the same apache/couchdb-docker git repo. I see that the person who files PRs aginst the official image repo is Joan Touzet (http://people.apache.org/~wohali/) who is a CouchDB committer. Perhpas this is a model for us to follow.

We may also want to consult LEGAL-503 where the Beam project asked a similar question a few weeks ago, and the reply is:

if you would like to continue linking to the Docker release artifact from the https://beam.apache.org you will have:
1. Transition to the official ASF dockerhub org: https://hub.docker.com/u/apache
2. Start including that binary convenience artifact into your VOTE threads on Beam releases
3. Make sure that all Cat-X licenses are ONLY brought into your container via FROM statements

So bullet point #1 there answers this question. Regarding point #2 and #3 see below.

2. How will the current build/test/publish process need to change?
    * Can we continue using travis for CI?

In the short term, sure.

Travis has been great for us — it is free, it builds fast enough, the UI is nice, the config is simple, the integration is good, and support was helpful.
Last year Travis CI got acquired, followed by layoffs of senior engineering staff, so there are concerns about its future, but nothing has really changed to affect us.

I imagine it would be nicer to have it in the normal Apache Jenkins world, but I’m not volunteering for that migration. :-)

If we want to stay on Travis, there may be some configuration changes required (roles/permissions/credentials and such that are tied to my account).

Oh and just to make it clear: the CI does 2 things:
- it sets build status on GitHub commits (although there is currently no enforcement to allow only passing PRs to be merged or things like that, or have review/automerge workflows which would be nice to have)
- and it pushes builds to the https://hub.docker.com/repository/docker/dockersolr/docker-solr repo — but those are only used for testing, they are not the docker images that provide the official images. I've found that occasionally useful, but we could decide to not do that, or do it differently within the Apache infrastructure.

So I see other ASF projects using travis as well, perhaps ASF has an account/license? If we continue to use it or if we migrate to Jenkins, we either way need to run the build and test and then push builds to the Apache Docker Hub repository space (making the image pull’able with docker pull apache/solr:tag
The actual producing of official image will be yet another PR to the docker owned official-images repo.

    * Should publishing of new Docker be a RM responsibility, or something that happens right
after each release like the ref-guide?

I don’t have a strong opinion. I typically tried to do it as soon as I became aware of a new version via the solr-user mailing list or twitter.
Sometimes same day, sometimes it would take a week because of changes I need to make or extra things I wanted to do.
But if I’m more than a few days late someone would be asking about it :-)
The official library team review is usually very fast, same day or 24h.

See point #2 from LEGAL-503 above. If we want to officially document / endorse / link to the image on hub we may want to include the docker image in the VOTE. I see that the Beam project includes this in their release-guide (publishing SDK images): https://beam.apache.org/contribute/release-guide/. What they do is that push a RC tagged version to their docker-hub as part of the release and include it in the VOTE.

3. Legal stuff - when we as a project file a PR to update the official solr docker images,
are we then legally releasing a binary version of Solr?
    Technically it is Docker CI that build and publish the images, we just initiate it…

I don’t know about that (or how that matters?)

Oh, legal stuff matters a lot for Apache :) Again, I think LEGAL-503 answers this. Bullet #3 there requres the project to make sure that our Dockerfile does not bring in Cat-X licensed software into the Docker layers built by us. Since we base our image on the ‘openjdk’ base image, which contains GNU/Linux binaries and the JDK, the only things we'd need to verify is what we bring into our Docker layers through apt-get, wget etc. Below is a list of what I found:

acl - GPL - provides tool setfacl, used only in tests, can be removed?
dirmngr, gpg - GPL - used only during docker build phase, may be apt install and uninstalled in the same RUN command
lsof - BSD license
procps - GPL - provides the ‘ps’ command needed by bin/solr. This is part of openjdk:11 but not openjdk:11-slim...
wget - GPL - used during build only, can be uninstalled after use
netcat - PublicDomain
gosu - GPL - can be removed or replaced with su-exec (MIT)
tini - MIT

    Do we know any other ASF project that maintain their own official docker image?

I've looked at https://github.com/docker-library/official-images/tree/master/library and spotted https://github.com/carlossg/docker-maven which is maintained by an Apache committer.

So couchDB is another example. And there are so many other projects in Apache’s docker-hub org that I suppose there may be others.

Marcus wrote:

I think that regardless of what the community decides to do with the
docker-solr repo, a good first step would be to add a Docker folder to the
Apache repository that contains a base Dockerfile and a README. In that
README, users can be directed to the location of the docker-solr repo,
wherever that may be, or leverage the Dockerfile in the  Apache repo as a
starting point for building their own image.

I think that could be useful; but it then does start to become messy almost immediately: Users will expect these self-built images and the official images to work the same, and given that docker-solr has various extra scripts (eg to create collections at startup), you’d then have to copy them into the repo (and now have duplicate maintenance, need to test them). Or you could explicitly decide not to do that, but then your users will be asking how to achieve the same functionality with their images.

I would address this as a separate issue. Let’s get the existing image flow taken care of first.

Yes, it should be easy to build a docker image «from source», or at least as a gradle build task. That could piggy-back on the distro tgz file which should make it not too different - we just pull the release from local disk instead of from the mirrors. 

I also saw some projects that have Jenkins routinely publish SNAPSHOT releases to docker-hub, see e.g. https://hub.docker.com/r/apache/syncope/tags which is also nice if we want to have people test out things with unreleased versions or master branch, then it is always only a docker run command away :) 

Well, I hope other committers also join this discussion and bring perhaps other points of view here before we start fleshing out actual JIRA tasks to add to https://issues.apache.org/jira/browse/SOLR-14168.

If we end up releasing official Solr Docker images together with the normal release, it would be cool to add documentation to the RefGuide and perhaps tutorial, on how to run Solr with Docker.

Jan





Reply | Threaded
Open this post in threaded view
|

Re: Maintenance of Solr's official Dockerfile

Houston Putman
There's still a lot of questions up in the air, and it's going to take some time to make the necessary decisions/changes. This shouldn't affect the 8.5 release.

On Wed, Feb 26, 2020 at 6:08 AM Alan Woodward <[hidden email]> wrote:
Is the release of the docker image going to be part of the standard Lucene/Solr release process? I ask because I’m planning on starting an 8.5 release next week, and I know nothing about Docker images...

On 25 Feb 2020, at 16:04, David Smiley <[hidden email]> wrote:

I wonder what official information ASF provides on this matter.  I did some searching and found this page: https://cwiki.apache.org/confluence/display/INCUBATOR/DistributionGuidelines. (see "Docker" heading) which was nicely short and to the point but doesn't seem to answer.  "DRAFT" 4 times is at the top of this page.  And it doesn't address all questions.  I wonder about Houston's point as well; I'm not sure we can simply update an image just because the JAR files didn't change.  Maybe; maybe not.

~ David Smiley
Apache Lucene/Solr Search Developer


On Tue, Feb 25, 2020 at 10:35 AM Jan Høydahl <[hidden email]> wrote:
I think se are not technically re-releasing Solr 8.4 even if that official docker image gets re-built with latest versions of Ubuntu and JRE11 when re release e.g. 8.5. 
The Apache Solr/Lucene binaries are still the exact same bits, we just change the base image — equivalent to upgrading Linux and Java on physical servers.
Of course there could be bugs manifested with a certain combination of Linux + JRE + Solr that potentially would cause solr:x.y to break further down the road, that the simple shell tests run during release might not catch.

Jan

25. feb. 2020 kl. 16:13 skrev Houston Putman <[hidden email]>:

I have a separate question about the release process. As I currently understand it, whenever docker-solr is released, every version in its configs is rebuilt and re-released. This means that versions of the docker-solr images are not necessarily concrete, whereas the versions of solr are very concrete.

I imagine that by taking over the docker image as an official Apache image, this re-releasing of versions will no longer be allowed. That makes me think that adding a docker publishing step in the release process is necessary. There will also need to be extensive testing of that docker image in that process because we won't be able to retroactively fix issues anymore.

If Apache is more relaxed about re-releasing the same version, then this is less of an issue.

- Houston

On Thu, Feb 13, 2020 at 9:42 AM Jan Høydahl <[hidden email]> wrote:
I propose we continue to work with the existing docker-solr repo for some time still, until we fully understand how we want to proceed with moving to ASF owned git infra and hub accounts.

I feel that some work should have higher priority for now:
- Document running Solr on Docker in Ref Guide
- Start thinking about how to include Docker image publishing in the release process
- Adding a simplistic Dockerfile to our main git repo and a gradle task for building
- Update the README in docker-solr repo to reflect the new ownership

Some of these could be sub tasks of SOLR-14168.

Other thoughts?

Jan

12. jan. 2020 kl. 04:46 skrev David Smiley <[hidden email]>:

> Yes, it should be easy to build a docker image «from source», or at least as a gradle build task. That could piggy-back on the distro tgz file which should make it not too different - we just pull the release from local disk instead of from the mirrors. 

We do this at Salesforce in our local Lucene_Solr fork to also produce a docker image.  It's not a big deal but I could share it if we want to consider going this direction.  It's kinda necessary if we want to release this all at once instead of requiring a 'tgz' be released first, which in turn somewhat requires some signatures of that binary that then become irrelevant to check when producing the Docker image.  It's also super nice for those who fork Solr to also produce a Docker image easily (like us).

~ David Smiley
Apache Lucene/Solr Search Developer


On Sat, Jan 11, 2020 at 5:45 PM Jan Høydahl <[hidden email]> wrote:
1. Are we allowed to maintain ASF code in a non-ASF repo? If not, how do we transition to
an ASF git repo?
    * Can it be a sub folder in our main repo or does it need to be a separate repo?

The way it works (from the official library’s point of view), is that we maintain https://github.com/docker-library/official-images/blob/master/library/solr which contains a link to a repo (in our case https://github.com/docker-solr/docker-solr.git) and particular git commit, and a particular directory for different versions. That is consumed by their build infrastructure. The library team reviews changes we make to that file, and the corresponding changes we made to the Dockerfiles and bash scripts in the docker-solr repo, so it needs to be readily available and it needs to be easy to see what has changed.

I think one could theoretically move this into the main Solr repo and point to its GitHub address, but that would make things slower and much harder to review. So I think it’s much better to keep the separate repo. I briefly looked for some official guidance on this, but couldn’t find it spelled out explicitly. I did see https://github.com/docker-library/official-images#maintainership which talks about maintaining git history.
Note also that I already use a “docker-solr” GitHub org for the repo, rather than my own account, to make it easier to vary ownership.

If you are dead-set to put it into the main repo, I’d run that discussion past the library team first before sinking engineering time.

I just discovered https://hub.docker.com/u/apache - which is Apache’s own docker org. I see some images there are hosted in separate apache git repos, example CouchDB: https://github.com/apache/couchdb-docker pushed to https://hub.docker.com/r/apache/couchdb - and https://hub.docker.com/_/couchdb (official). The source of both hub locations seems to be the same apache/couchdb-docker git repo. I see that the person who files PRs aginst the official image repo is Joan Touzet (http://people.apache.org/~wohali/) who is a CouchDB committer. Perhpas this is a model for us to follow.

We may also want to consult LEGAL-503 where the Beam project asked a similar question a few weeks ago, and the reply is:

if you would like to continue linking to the Docker release artifact from the https://beam.apache.org you will have:
1. Transition to the official ASF dockerhub org: https://hub.docker.com/u/apache
2. Start including that binary convenience artifact into your VOTE threads on Beam releases
3. Make sure that all Cat-X licenses are ONLY brought into your container via FROM statements

So bullet point #1 there answers this question. Regarding point #2 and #3 see below.

2. How will the current build/test/publish process need to change?
    * Can we continue using travis for CI?

In the short term, sure.

Travis has been great for us — it is free, it builds fast enough, the UI is nice, the config is simple, the integration is good, and support was helpful.
Last year Travis CI got acquired, followed by layoffs of senior engineering staff, so there are concerns about its future, but nothing has really changed to affect us.

I imagine it would be nicer to have it in the normal Apache Jenkins world, but I’m not volunteering for that migration. :-)

If we want to stay on Travis, there may be some configuration changes required (roles/permissions/credentials and such that are tied to my account).

Oh and just to make it clear: the CI does 2 things:
- it sets build status on GitHub commits (although there is currently no enforcement to allow only passing PRs to be merged or things like that, or have review/automerge workflows which would be nice to have)
- and it pushes builds to the https://hub.docker.com/repository/docker/dockersolr/docker-solr repo — but those are only used for testing, they are not the docker images that provide the official images. I've found that occasionally useful, but we could decide to not do that, or do it differently within the Apache infrastructure.

So I see other ASF projects using travis as well, perhaps ASF has an account/license? If we continue to use it or if we migrate to Jenkins, we either way need to run the build and test and then push builds to the Apache Docker Hub repository space (making the image pull’able with docker pull apache/solr:tag
The actual producing of official image will be yet another PR to the docker owned official-images repo.

    * Should publishing of new Docker be a RM responsibility, or something that happens right
after each release like the ref-guide?

I don’t have a strong opinion. I typically tried to do it as soon as I became aware of a new version via the solr-user mailing list or twitter.
Sometimes same day, sometimes it would take a week because of changes I need to make or extra things I wanted to do.
But if I’m more than a few days late someone would be asking about it :-)
The official library team review is usually very fast, same day or 24h.

See point #2 from LEGAL-503 above. If we want to officially document / endorse / link to the image on hub we may want to include the docker image in the VOTE. I see that the Beam project includes this in their release-guide (publishing SDK images): https://beam.apache.org/contribute/release-guide/. What they do is that push a RC tagged version to their docker-hub as part of the release and include it in the VOTE.

3. Legal stuff - when we as a project file a PR to update the official solr docker images,
are we then legally releasing a binary version of Solr?
    Technically it is Docker CI that build and publish the images, we just initiate it…

I don’t know about that (or how that matters?)

Oh, legal stuff matters a lot for Apache :) Again, I think LEGAL-503 answers this. Bullet #3 there requres the project to make sure that our Dockerfile does not bring in Cat-X licensed software into the Docker layers built by us. Since we base our image on the ‘openjdk’ base image, which contains GNU/Linux binaries and the JDK, the only things we'd need to verify is what we bring into our Docker layers through apt-get, wget etc. Below is a list of what I found:

acl - GPL - provides tool setfacl, used only in tests, can be removed?
dirmngr, gpg - GPL - used only during docker build phase, may be apt install and uninstalled in the same RUN command
lsof - BSD license
procps - GPL - provides the ‘ps’ command needed by bin/solr. This is part of openjdk:11 but not openjdk:11-slim...
wget - GPL - used during build only, can be uninstalled after use
netcat - PublicDomain
gosu - GPL - can be removed or replaced with su-exec (MIT)
tini - MIT

    Do we know any other ASF project that maintain their own official docker image?

I've looked at https://github.com/docker-library/official-images/tree/master/library and spotted https://github.com/carlossg/docker-maven which is maintained by an Apache committer.

So couchDB is another example. And there are so many other projects in Apache’s docker-hub org that I suppose there may be others.

Marcus wrote:

I think that regardless of what the community decides to do with the
docker-solr repo, a good first step would be to add a Docker folder to the
Apache repository that contains a base Dockerfile and a README. In that
README, users can be directed to the location of the docker-solr repo,
wherever that may be, or leverage the Dockerfile in the  Apache repo as a
starting point for building their own image.

I think that could be useful; but it then does start to become messy almost immediately: Users will expect these self-built images and the official images to work the same, and given that docker-solr has various extra scripts (eg to create collections at startup), you’d then have to copy them into the repo (and now have duplicate maintenance, need to test them). Or you could explicitly decide not to do that, but then your users will be asking how to achieve the same functionality with their images.

I would address this as a separate issue. Let’s get the existing image flow taken care of first.

Yes, it should be easy to build a docker image «from source», or at least as a gradle build task. That could piggy-back on the distro tgz file which should make it not too different - we just pull the release from local disk instead of from the mirrors. 

I also saw some projects that have Jenkins routinely publish SNAPSHOT releases to docker-hub, see e.g. https://hub.docker.com/r/apache/syncope/tags which is also nice if we want to have people test out things with unreleased versions or master branch, then it is always only a docker run command away :) 

Well, I hope other committers also join this discussion and bring perhaps other points of view here before we start fleshing out actual JIRA tasks to add to https://issues.apache.org/jira/browse/SOLR-14168.

If we end up releasing official Solr Docker images together with the normal release, it would be cool to add documentation to the RefGuide and perhaps tutorial, on how to run Solr with Docker.

Jan