Docker image along with 1.23?

classic Classic list List threaded Threaded
10 messages Options
Reply | Threaded
Open this post in threaded view
|

Docker image along with 1.23?

Tim Allison
All,
  Eric Pugh recently asked on another channel if we had any plans to
release an official docker image for 1.23.  IIRC Dave had that up and
running, but I couldn't get it to work as part of the release
process because I was behind a proxy or on Windows or something.  My dev
environment is now different, and I _should_ be able to get it to work.
  Do we want to try to release an official Docker image as part of the 1.23
release?

           Cheers,

                   Tim
Reply | Threaded
Open this post in threaded view
|

Re: [EXTERNAL] Docker image along with 1.23?

Mattmann, Chris A (US 1760)
Sure let's do that. I also have a set of tika-dockers in USCDataScience useful for the ML/Deep learning stuff.



Chris Mattmann, Ph.D.
Deputy Chief Technology & Innovation Officer
17x   |   Office of the Chief Information Officer, Chief Technology and Innovation Office (1760)

JPL   |   jpl.nasa.gov
4800 Oak Grove Dr, Mail Stop 171-377
Pasadena, California 91109
O 818-354-8810   |   M 626-755-6564


From: Tim Allison <[hidden email]>
Reply-To: "[hidden email]" <[hidden email]>, "Allison, Timothy B (US 1760-Affiliate)" <[hidden email]>
Date: Wednesday, November 20, 2019 at 1:20 PM
To: "<[hidden email]>" <[hidden email]>
Subject: [EXTERNAL] Docker image along with 1.23?

All,
  Eric Pugh recently asked on another channel if we had any plans to
release an official docker image for 1.23.  IIRC Dave had that up and
running, but I couldn't get it to work as part of the release
process because I was behind a proxy or on Windows or something.  My dev
environment is now different, and I _should_ be able to get it to work.
  Do we want to try to release an official Docker image as part of the 1.23
release?

           Cheers,

                   Tim

Reply | Threaded
Open this post in threaded view
|

Re: Docker image along with 1.23?

Nick Burch-2
In reply to this post by Tim Allison
On Wed, 20 Nov 2019, Tim Allison wrote:
> Eric Pugh recently asked on another channel if we had any plans to
> release an official docker image for 1.23.

Depending on what we put in the container, we do need to be a little
careful. There's "platform dependencies" under non-compatible licenses
that we can optionally use if people have installed them, which we
ourselves can't directly ship under ASF rules. (Tesseract is fine as
that's Apache Licenses, Java itself is trickier, see the Netbeans
discussions on legal-discuss@ and LEGAL jira)

Shipping an official docker container with the Tika Server on seems to me
to be a helpful step for users, but we just need to make sure we're
following ASF policies. (The Apache Software Foundation mission is to
"provide software for the public good", but source code is the main focus
for the mission, binaries are trickier!)

Nick
Reply | Threaded
Open this post in threaded view
|

Re: [EXTERNAL] Re: Docker image along with 1.23?

Chris Mattmann
Nick, TBH, I don’t get it. If we ship the “Dockerfile” we are simply shipping text file,
code. Under a license. If we create a “docker image” and then publish it to the ASF
hub then I agree with you.

 

My suggestion and my interpretation of Tim’s is to ship a standard “Dockerfile”. Do you
agree with this? It should be air covered (as former VP, Legal, at least it would have been
with me).

 

Cheers,

Chris

 

 

 

 

From: Nick Burch <[hidden email]>
Reply-To: "[hidden email]" <[hidden email]>
Date: Wednesday, November 20, 2019 at 3:57 PM
To: "Allison, Timothy B (US 1760-Affiliate)" <[hidden email]>
Cc: "<[hidden email]>" <[hidden email]>
Subject: [EXTERNAL] Re: Docker image along with 1.23?

 

On Wed, 20 Nov 2019, Tim Allison wrote:

Eric Pugh recently asked on another channel if we had any plans to

release an official docker image for 1.23.

 

Depending on what we put in the container, we do need to be a little

careful. There's "platform dependencies" under non-compatible licenses

that we can optionally use if people have installed them, which we

ourselves can't directly ship under ASF rules. (Tesseract is fine as

that's Apache Licenses, Java itself is trickier, see the Netbeans

discussions on legal-discuss@ and LEGAL jira)

 

Shipping an official docker container with the Tika Server on seems to me

to be a helpful step for users, but we just need to make sure we're

following ASF policies. (The Apache Software Foundation mission is to

"provide software for the public good", but source code is the main focus

for the mission, binaries are trickier!)

 

Nick

 

Reply | Threaded
Open this post in threaded view
|

Re: [EXTERNAL] Docker image along with 1.23?

Eric Pugh-4
I was thinking more of producing the actual image, so that others don’t have to go through the pain of compiling an image.   Having the Dockerfile made available as well does give a nice recipe for modifying the “official” image.   I recently tested Tesseract 3 with the latest Tika, and I did it by tweaking the existing Dockerfile that LogicalSpark has published.

I don’t know how other projects at ASF handle the image publishing.




> On Nov 20, 2019, at 7:02 PM, Chris Mattmann <[hidden email]> wrote:
>
> Nick, TBH, I don’t get it. If we ship the “Dockerfile” we are simply shipping text file,
> code. Under a license. If we create a “docker image” and then publish it to the ASF
> hub then I agree with you.
>
>
>
> My suggestion and my interpretation of Tim’s is to ship a standard “Dockerfile”. Do you
> agree with this? It should be air covered (as former VP, Legal, at least it would have been
> with me).
>
>
>
> Cheers,
>
> Chris
>
>
>
>
>
>
>
>
>
> From: Nick Burch <[hidden email]>
> Reply-To: "[hidden email]" <[hidden email]>
> Date: Wednesday, November 20, 2019 at 3:57 PM
> To: "Allison, Timothy B (US 1760-Affiliate)" <[hidden email]>
> Cc: "<[hidden email]>" <[hidden email]>
> Subject: [EXTERNAL] Re: Docker image along with 1.23?
>
>
>
> On Wed, 20 Nov 2019, Tim Allison wrote:
>
> Eric Pugh recently asked on another channel if we had any plans to
>
> release an official docker image for 1.23.
>
>
>
> Depending on what we put in the container, we do need to be a little
>
> careful. There's "platform dependencies" under non-compatible licenses
>
> that we can optionally use if people have installed them, which we
>
> ourselves can't directly ship under ASF rules. (Tesseract is fine as
>
> that's Apache Licenses, Java itself is trickier, see the Netbeans
>
> discussions on legal-discuss@ and LEGAL jira)
>
>
>
> Shipping an official docker container with the Tika Server on seems to me
>
> to be a helpful step for users, but we just need to make sure we're
>
> following ASF policies. (The Apache Software Foundation mission is to
>
> "provide software for the public good", but source code is the main focus
>
> for the mission, binaries are trickier!)
>
>
>
> Nick
>
>
>

_______________________
Eric Pugh | Founder & CEO | OpenSource Connections, LLC | 434.466.1467 | http://www.opensourceconnections.com <http://www.opensourceconnections.com/> | My Free/Busy <http://tinyurl.com/eric-cal>  
Co-Author: Apache Solr Enterprise Search Server, 3rd Ed <https://www.packtpub.com/big-data-and-business-intelligence/apache-solr-enterprise-search-server-third-edition-raw>
This e-mail and all contents, including attachments, is considered to be Company Confidential unless explicitly stated otherwise, regardless of whether attachments are marked as such.

Reply | Threaded
Open this post in threaded view
|

Re: [EXTERNAL] Docker image along with 1.23?

Chris Mattmann
Yeah producing the actual image is tricky and my recommendation is for Tika to
stay out of the business of that. Leave it to LogicalSpark or others to do this. It’s
tricky with licenses and I doubt ASF will ever develop an optimal solution to this
due to the nature of its core mission as Nick stated.

 

 

 

 

From: Eric Pugh <[hidden email]>
Reply-To: "[hidden email]" <[hidden email]>
Date: Wednesday, November 20, 2019 at 6:02 PM
To: "[hidden email]" <[hidden email]>
Cc: "Allison, Timothy B (US 1760-Affiliate)" <[hidden email]>
Subject: Re: [EXTERNAL] Docker image along with 1.23?

 

I was thinking more of producing the actual image, so that others don’t have to go through the pain of compiling an image.   Having the Dockerfile made available as well does give a nice recipe for modifying the “official” image.   I recently tested Tesseract 3 with the latest Tika, and I did it by tweaking the existing Dockerfile that LogicalSpark has published.

 

I don’t know how other projects at ASF handle the image publishing.

 

 

 

 

On Nov 20, 2019, at 7:02 PM, Chris Mattmann <[hidden email]> wrote:

Nick, TBH, I don’t get it. If we ship the “Dockerfile” we are simply shipping text file,

code. Under a license. If we create a “docker image” and then publish it to the ASF

hub then I agree with you.

My suggestion and my interpretation of Tim’s is to ship a standard “Dockerfile”. Do you

agree with this? It should be air covered (as former VP, Legal, at least it would have been

with me).

Cheers,

Chris

From: Nick Burch <[hidden email]>

Reply-To: "[hidden email]" <[hidden email]>

Date: Wednesday, November 20, 2019 at 3:57 PM

To: "Allison, Timothy B (US 1760-Affiliate)" <[hidden email]>

Cc: "<[hidden email]>" <[hidden email]>

Subject: [EXTERNAL] Re: Docker image along with 1.23?

On Wed, 20 Nov 2019, Tim Allison wrote:

Eric Pugh recently asked on another channel if we had any plans to

release an official docker image for 1.23.

Depending on what we put in the container, we do need to be a little

careful. There's "platform dependencies" under non-compatible licenses

that we can optionally use if people have installed them, which we

ourselves can't directly ship under ASF rules. (Tesseract is fine as

that's Apache Licenses, Java itself is trickier, see the Netbeans

discussions on legal-discuss@ and LEGAL jira)

Shipping an official docker container with the Tika Server on seems to me

to be a helpful step for users, but we just need to make sure we're

following ASF policies. (The Apache Software Foundation mission is to

"provide software for the public good", but source code is the main focus

for the mission, binaries are trickier!)

Nick

 

_______________________

Eric Pugh | Founder & CEO | OpenSource Connections, LLC | 434.466.1467 | http://www.opensourceconnections.com <http://www.opensourceconnections.com/> | My Free/Busy <http://tinyurl.com/eric-cal>  

Co-Author: Apache Solr Enterprise Search Server, 3rd Ed <https://www.packtpub.com/big-data-and-business-intelligence/apache-solr-enterprise-search-server-third-edition-raw>      

This e-mail and all contents, including attachments, is considered to be Company Confidential unless explicitly stated otherwise, regardless of whether attachments are marked as such.

 

 

Reply | Threaded
Open this post in threaded view
|

Re: [EXTERNAL] Docker image along with 1.23?

Oleg Tikhonov-2
My question is more pragmatic.
What we put inside the Dockerfile, on which image it will be based on (say
Ubuntu) ...
What will contain an entrypoint? Tika Server? Should we "install" a
tesseract? Anything more?

Thanks,
Oleg

On Thu, Nov 21, 2019 at 4:46 AM Chris Mattmann <[hidden email]> wrote:

> Yeah producing the actual image is tricky and my recommendation is for
> Tika to
> stay out of the business of that. Leave it to LogicalSpark or others to do
> this. It’s
> tricky with licenses and I doubt ASF will ever develop an optimal solution
> to this
> due to the nature of its core mission as Nick stated.
>
>
>
>
>
>
>
>
>
> From: Eric Pugh <[hidden email]>
> Reply-To: "[hidden email]" <[hidden email]>
> Date: Wednesday, November 20, 2019 at 6:02 PM
> To: "[hidden email]" <[hidden email]>
> Cc: "Allison, Timothy B (US 1760-Affiliate)" <
> [hidden email]>
> Subject: Re: [EXTERNAL] Docker image along with 1.23?
>
>
>
> I was thinking more of producing the actual image, so that others don’t
> have to go through the pain of compiling an image.   Having the Dockerfile
> made available as well does give a nice recipe for modifying the “official”
> image.   I recently tested Tesseract 3 with the latest Tika, and I did it
> by tweaking the existing Dockerfile that LogicalSpark has published.
>
>
>
> I don’t know how other projects at ASF handle the image publishing.
>
>
>
>
>
>
>
>
>
> On Nov 20, 2019, at 7:02 PM, Chris Mattmann <[hidden email]> wrote:
>
> Nick, TBH, I don’t get it. If we ship the “Dockerfile” we are simply
> shipping text file,
>
> code. Under a license. If we create a “docker image” and then publish it
> to the ASF
>
> hub then I agree with you.
>
> My suggestion and my interpretation of Tim’s is to ship a standard
> “Dockerfile”. Do you
>
> agree with this? It should be air covered (as former VP, Legal, at least
> it would have been
>
> with me).
>
> Cheers,
>
> Chris
>
> From: Nick Burch <[hidden email]>
>
> Reply-To: "[hidden email]" <[hidden email]>
>
> Date: Wednesday, November 20, 2019 at 3:57 PM
>
> To: "Allison, Timothy B (US 1760-Affiliate)" <
> [hidden email]>
>
> Cc: "<[hidden email]>" <[hidden email]>
>
> Subject: [EXTERNAL] Re: Docker image along with 1.23?
>
> On Wed, 20 Nov 2019, Tim Allison wrote:
>
> Eric Pugh recently asked on another channel if we had any plans to
>
> release an official docker image for 1.23.
>
> Depending on what we put in the container, we do need to be a little
>
> careful. There's "platform dependencies" under non-compatible licenses
>
> that we can optionally use if people have installed them, which we
>
> ourselves can't directly ship under ASF rules. (Tesseract is fine as
>
> that's Apache Licenses, Java itself is trickier, see the Netbeans
>
> discussions on legal-discuss@ and LEGAL jira)
>
> Shipping an official docker container with the Tika Server on seems to me
>
> to be a helpful step for users, but we just need to make sure we're
>
> following ASF policies. (The Apache Software Foundation mission is to
>
> "provide software for the public good", but source code is the main focus
>
> for the mission, binaries are trickier!)
>
> Nick
>
>
>
> _______________________
>
> Eric Pugh | Founder & CEO | OpenSource Connections, LLC | 434.466.1467 |
> http://www.opensourceconnections.com <
> http://www.opensourceconnections.com/> | My Free/Busy <
> http://tinyurl.com/eric-cal>
>
> Co-Author: Apache Solr Enterprise Search Server, 3rd Ed <
> https://www.packtpub.com/big-data-and-business-intelligence/apache-solr-enterprise-search-server-third-edition-raw>
>
>
> This e-mail and all contents, including attachments, is considered to be
> Company Confidential unless explicitly stated otherwise, regardless of
> whether attachments are marked as such.
>
>
>
>
>
>
Reply | Threaded
Open this post in threaded view
|

Re: [EXTERNAL] Docker image along with 1.23?

Nick Burch-2
On Thu, 21 Nov 2019, Oleg Tikhonov wrote:
> My question is more pragmatic.
> What we put inside the Dockerfile, on which image it will be based on (say
> Ubuntu) ...
> What will contain an entrypoint? Tika Server? Should we "install" a
> tesseract? Anything more?

If we want to be trendy, then Sergey Beryozkin did some cool stuck with
Quarkus and a GraalVM native image of Tika, video online at
https://aceu19.apachecon.com/session/apache-tika-goes-native-graalvm-and-quarkus

I'd possibly suggest two dockerfiles (but not published images!), both
based on a fairly thin common Java base image (so probably ubuntu rather
than alphine). One with just Tika Server + tesseract + english tesseract
data, one with all the optional Tika dependencies (sql natives libraries
etc) and tesseract and all the available tesseract languages

Some other projects are currently leading the debate on ASF binary
releases that bundle the JVM, I'd suggest we wait for that to resolve
before we think about trying to publish pre-built images ourselves.
Linking to images from external organisations we trust should be fine
though, eg similar to
http://httpd.apache.org/docs/current/platform/windows.html#down

Nick
Reply | Threaded
Open this post in threaded view
|

Re: [EXTERNAL] Docker image along with 1.23?

Eric Pugh-4
That makes sense.   Having a robust Dockerfile, even if it isn’t published, is a great way of modeling best practices in running Tika in server mode.



> On Nov 21, 2019, at 3:26 AM, Nick Burch <[hidden email]> wrote:
>
> On Thu, 21 Nov 2019, Oleg Tikhonov wrote:
>> My question is more pragmatic.
>> What we put inside the Dockerfile, on which image it will be based on (say
>> Ubuntu) ...
>> What will contain an entrypoint? Tika Server? Should we "install" a
>> tesseract? Anything more?
>
> If we want to be trendy, then Sergey Beryozkin did some cool stuck with Quarkus and a GraalVM native image of Tika, video online at
> https://aceu19.apachecon.com/session/apache-tika-goes-native-graalvm-and-quarkus
>
> I'd possibly suggest two dockerfiles (but not published images!), both based on a fairly thin common Java base image (so probably ubuntu rather than alphine). One with just Tika Server + tesseract + english tesseract data, one with all the optional Tika dependencies (sql natives libraries etc) and tesseract and all the available tesseract languages
>
> Some other projects are currently leading the debate on ASF binary releases that bundle the JVM, I'd suggest we wait for that to resolve before we think about trying to publish pre-built images ourselves. Linking to images from external organisations we trust should be fine though, eg similar to http://httpd.apache.org/docs/current/platform/windows.html#down
>
> Nick

_______________________
Eric Pugh | Founder & CEO | OpenSource Connections, LLC | 434.466.1467 | http://www.opensourceconnections.com <http://www.opensourceconnections.com/> | My Free/Busy <http://tinyurl.com/eric-cal>  
Co-Author: Apache Solr Enterprise Search Server, 3rd Ed <https://www.packtpub.com/big-data-and-business-intelligence/apache-solr-enterprise-search-server-third-edition-raw>
This e-mail and all contents, including attachments, is considered to be Company Confidential unless explicitly stated otherwise, regardless of whether attachments are marked as such.

Reply | Threaded
Open this post in threaded view
|

Re: [EXTERNAL] Docker image along with 1.23?

Tim Allison
K.  Sounds like an example Docker file will meet your needs, Eric?

Users can currently build their own images with the Docker file in
tika-server, and there's logical-spark.

As noted, there are complexities with distributing an image.

Between those two options, folks should basically be ok.  Right?

I might want to add an advanced Docker file example on our wiki  (or
perhaps in logical-spark ???) that:
1) runs tika-server in spawn-child mode
2) returns stack-traces
3) includes the "provided" xerial sqlite jar
4) includes non ASL 2.0 compatible dependencies for image processing in PDFs

Anything else?



On Thu, Nov 21, 2019 at 7:10 AM Eric Pugh <[hidden email]>
wrote:

> That makes sense.   Having a robust Dockerfile, even if it isn’t
> published, is a great way of modeling best practices in running Tika in
> server mode.
>
>
>
> > On Nov 21, 2019, at 3:26 AM, Nick Burch <[hidden email]> wrote:
> >
> > On Thu, 21 Nov 2019, Oleg Tikhonov wrote:
> >> My question is more pragmatic.
> >> What we put inside the Dockerfile, on which image it will be based on
> (say
> >> Ubuntu) ...
> >> What will contain an entrypoint? Tika Server? Should we "install" a
> >> tesseract? Anything more?
> >
> > If we want to be trendy, then Sergey Beryozkin did some cool stuck with
> Quarkus and a GraalVM native image of Tika, video online at
> >
> https://aceu19.apachecon.com/session/apache-tika-goes-native-graalvm-and-quarkus
> >
> > I'd possibly suggest two dockerfiles (but not published images!), both
> based on a fairly thin common Java base image (so probably ubuntu rather
> than alphine). One with just Tika Server + tesseract + english tesseract
> data, one with all the optional Tika dependencies (sql natives libraries
> etc) and tesseract and all the available tesseract languages
> >
> > Some other projects are currently leading the debate on ASF binary
> releases that bundle the JVM, I'd suggest we wait for that to resolve
> before we think about trying to publish pre-built images ourselves. Linking
> to images from external organisations we trust should be fine though, eg
> similar to http://httpd.apache.org/docs/current/platform/windows.html#down
> >
> > Nick
>
> _______________________
> Eric Pugh | Founder & CEO | OpenSource Connections, LLC | 434.466.1467 |
> http://www.opensourceconnections.com <
> http://www.opensourceconnections.com/> | My Free/Busy <
> http://tinyurl.com/eric-cal>
> Co-Author: Apache Solr Enterprise Search Server, 3rd Ed <
> https://www.packtpub.com/big-data-and-business-intelligence/apache-solr-enterprise-search-server-third-edition-raw>
>
> This e-mail and all contents, including attachments, is considered to be
> Company Confidential unless explicitly stated otherwise, regardless of
> whether attachments are marked as such.
>
>