[jira] Created: (TIKA-340) Provide full Tika bundle

classic Classic list List threaded Threaded
16 messages Options
Reply | Threaded
Open this post in threaded view
|

[jira] Created: (TIKA-340) Provide full Tika bundle

JIRA jira@apache.org
Provide full Tika bundle
------------------------

                 Key: TIKA-340
                 URL: https://issues.apache.org/jira/browse/TIKA-340
             Project: Tika
          Issue Type: New Feature
    Affects Versions: 0.5
            Reporter: Felix Meschberger
             Fix For: 0.6


To easily deploy Tika and especially the Tika parsers, it would be convenient to have an almost complete bundle consisting of Tika Core, Tika Parsers as well as the most important parser dependencies. Any remaining dependencies not included with the bundle should be declared as optional import to not fail bundle resolution if one or the other (or all) import(s) is missing.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Updated: (TIKA-340) Provide full Tika bundle

JIRA jira@apache.org

     [ https://issues.apache.org/jira/browse/TIKA-340?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Felix Meschberger updated TIKA-340:
-----------------------------------

    Attachment: TIKA-340.patch

Patch providing a tika-full project which creates an almost complete bundle of the dependencies.

The dependencies included are derived from the included dependencies of the Tika App project but omit some embeddings, which should be shared in the framework (most importantly the XML oriented APIs like W3C DOM and SAX).

> Provide full Tika bundle
> ------------------------
>
>                 Key: TIKA-340
>                 URL: https://issues.apache.org/jira/browse/TIKA-340
>             Project: Tika
>          Issue Type: New Feature
>    Affects Versions: 0.5
>            Reporter: Felix Meschberger
>             Fix For: 0.6
>
>         Attachments: TIKA-340.patch
>
>
> To easily deploy Tika and especially the Tika parsers, it would be convenient to have an almost complete bundle consisting of Tika Core, Tika Parsers as well as the most important parser dependencies. Any remaining dependencies not included with the bundle should be declared as optional import to not fail bundle resolution if one or the other (or all) import(s) is missing.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (TIKA-340) Provide full Tika bundle

JIRA jira@apache.org
In reply to this post by JIRA jira@apache.org

    [ https://issues.apache.org/jira/browse/TIKA-340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12784176#action_12784176 ]

Felix Meschberger commented on TIKA-340:
----------------------------------------

Note: On my machine, I had to increase the Java heap memory size to prevent the build from aborting with an OutOfMemory Exception. I set the maximum heap size to 512MB and the build ultimately used 400MB.

> Provide full Tika bundle
> ------------------------
>
>                 Key: TIKA-340
>                 URL: https://issues.apache.org/jira/browse/TIKA-340
>             Project: Tika
>          Issue Type: New Feature
>    Affects Versions: 0.5
>            Reporter: Felix Meschberger
>             Fix For: 0.6
>
>         Attachments: TIKA-340.patch
>
>
> To easily deploy Tika and especially the Tika parsers, it would be convenient to have an almost complete bundle consisting of Tika Core, Tika Parsers as well as the most important parser dependencies. Any remaining dependencies not included with the bundle should be declared as optional import to not fail bundle resolution if one or the other (or all) import(s) is missing.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Updated: (TIKA-340) Provide full Tika bundle

JIRA jira@apache.org
In reply to this post by JIRA jira@apache.org

     [ https://issues.apache.org/jira/browse/TIKA-340?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jukka Zitting updated TIKA-340:
-------------------------------

    Component/s: packaging
       Assignee: Jukka Zitting

Excellent, thanks! I committed the patch in revision 885807.

Would you mind if we rather called the package tika-bundle or tika-osgi instead of tika-full? I think that would make it easier to distinguish between this package and the tika-app jar that's also a "full" package.

Some further improvements would be to automatically wire all logging to the OSGi log service and to make it possible for Tika to automatically leverage Parser implementations from other bundles.

> Provide full Tika bundle
> ------------------------
>
>                 Key: TIKA-340
>                 URL: https://issues.apache.org/jira/browse/TIKA-340
>             Project: Tika
>          Issue Type: New Feature
>          Components: packaging
>    Affects Versions: 0.5
>            Reporter: Felix Meschberger
>            Assignee: Jukka Zitting
>             Fix For: 0.6
>
>         Attachments: TIKA-340.patch
>
>
> To easily deploy Tika and especially the Tika parsers, it would be convenient to have an almost complete bundle consisting of Tika Core, Tika Parsers as well as the most important parser dependencies. Any remaining dependencies not included with the bundle should be declared as optional import to not fail bundle resolution if one or the other (or all) import(s) is missing.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (TIKA-340) Provide full Tika bundle

JIRA jira@apache.org
In reply to this post by JIRA jira@apache.org

    [ https://issues.apache.org/jira/browse/TIKA-340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12784283#action_12784283 ]

Jukka Zitting commented on TIKA-340:
------------------------------------

Is there a particular reason why you configured some of the dependencies to be inlined and some to be just included as embedded jar files? Unless there's a good reason not to do so, I'd go for a fully inlined package to keep the jar structure simpler.

> Provide full Tika bundle
> ------------------------
>
>                 Key: TIKA-340
>                 URL: https://issues.apache.org/jira/browse/TIKA-340
>             Project: Tika
>          Issue Type: New Feature
>          Components: packaging
>    Affects Versions: 0.5
>            Reporter: Felix Meschberger
>            Assignee: Jukka Zitting
>             Fix For: 0.6
>
>         Attachments: TIKA-340.patch
>
>
> To easily deploy Tika and especially the Tika parsers, it would be convenient to have an almost complete bundle consisting of Tika Core, Tika Parsers as well as the most important parser dependencies. Any remaining dependencies not included with the bundle should be declared as optional import to not fail bundle resolution if one or the other (or all) import(s) is missing.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (TIKA-340) Provide full Tika bundle

JIRA jira@apache.org
In reply to this post by JIRA jira@apache.org

    [ https://issues.apache.org/jira/browse/TIKA-340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12784648#action_12784648 ]

Felix Meschberger commented on TIKA-340:
----------------------------------------

Thanks for committing.

> Would you mind if we rather called the package tika-bundle or tika-osgi instead of tika-full?

Not at all ...

> Some further improvements would be to automatically wire all logging to the OSGi log service

Well the bundle as it stands currently has imports for Log4J and Commons Logging. Both APIs are generally available from some logging support bundle, for example the Sling Log Service implementation or PAX logging. I am not sure, whether it is worth it to try to converge the logging approaches into OSGi LogService in the Tika Bundle itself.

> some of the dependencies to be inlined

Generally I came to like to embed JAR files. This makes it a lot easier to inspect the JAR files and AFAICT has no drawbacks on usability in an OSGi environment. I have inline one JAR file because I had to exclude an incomplete org.w3c.dom package, which would have caused resolution issues.

OTOH if you would deem the jarfile useful in general, that is non-OSGi, environments, it would probably make perfect sense to inline the embedded libraries. In this case, though, the name of the library should probably not contain the words "osgi" or "bundle". WDYT ?

> Provide full Tika bundle
> ------------------------
>
>                 Key: TIKA-340
>                 URL: https://issues.apache.org/jira/browse/TIKA-340
>             Project: Tika
>          Issue Type: New Feature
>          Components: packaging
>    Affects Versions: 0.5
>            Reporter: Felix Meschberger
>            Assignee: Jukka Zitting
>             Fix For: 0.6
>
>         Attachments: TIKA-340.patch
>
>
> To easily deploy Tika and especially the Tika parsers, it would be convenient to have an almost complete bundle consisting of Tika Core, Tika Parsers as well as the most important parser dependencies. Any remaining dependencies not included with the bundle should be declared as optional import to not fail bundle resolution if one or the other (or all) import(s) is missing.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Updated: (TIKA-340) Provide full Tika bundle

JIRA jira@apache.org
In reply to this post by JIRA jira@apache.org

     [ https://issues.apache.org/jira/browse/TIKA-340?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Felix Meschberger updated TIKA-340:
-----------------------------------

    Attachment: TIKA-340-2.patch

Here is a patch against trunk inlining all previously embedded libraries.

Interestingly now the jar file grows from 20MB to 25MB ... (well, out of my belly both sizes are horrendous given the task at hand; but that is probably another story ;-) )

> Provide full Tika bundle
> ------------------------
>
>                 Key: TIKA-340
>                 URL: https://issues.apache.org/jira/browse/TIKA-340
>             Project: Tika
>          Issue Type: New Feature
>          Components: packaging
>    Affects Versions: 0.5
>            Reporter: Felix Meschberger
>            Assignee: Jukka Zitting
>             Fix For: 0.6
>
>         Attachments: TIKA-340-2.patch, TIKA-340.patch
>
>
> To easily deploy Tika and especially the Tika parsers, it would be convenient to have an almost complete bundle consisting of Tika Core, Tika Parsers as well as the most important parser dependencies. Any remaining dependencies not included with the bundle should be declared as optional import to not fail bundle resolution if one or the other (or all) import(s) is missing.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (TIKA-340) Provide full Tika bundle

JIRA jira@apache.org
In reply to this post by JIRA jira@apache.org

    [ https://issues.apache.org/jira/browse/TIKA-340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12784761#action_12784761 ]

Jukka Zitting commented on TIKA-340:
------------------------------------

Re: logging; AFAIUI using the OSGi log service directly makes it possible for the log backend to sort out log messages based on the bundle from which they originated. That doesn't seem possible if we just depend on a support bundle that exposes the commons-logging API.

Re: size; Yep, that's another story. See http://jukkaz.wordpress.com/2009/10/16/putting-poi-on-a-diet/ for the gory details.

Re: inlining; The double compression of embedded jars explains the size difference you're seeing. That double compression seems a bit troublesome to me given the large number of non-class resources (PDF font mapping data, OOXML schemas, etc.) we have there. Ideally the classloader should be able to load such resources on demand without having to uncompress the entire archive. But I guess OSGi runtimes may already avoid that problem in similar ways as servlet containers do with embedded jars in WEB-INF/lib.

> Provide full Tika bundle
> ------------------------
>
>                 Key: TIKA-340
>                 URL: https://issues.apache.org/jira/browse/TIKA-340
>             Project: Tika
>          Issue Type: New Feature
>          Components: packaging
>    Affects Versions: 0.5
>            Reporter: Felix Meschberger
>            Assignee: Jukka Zitting
>             Fix For: 0.6
>
>         Attachments: TIKA-340-2.patch, TIKA-340.patch
>
>
> To easily deploy Tika and especially the Tika parsers, it would be convenient to have an almost complete bundle consisting of Tika Core, Tika Parsers as well as the most important parser dependencies. Any remaining dependencies not included with the bundle should be declared as optional import to not fail bundle resolution if one or the other (or all) import(s) is missing.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (TIKA-340) Provide full Tika bundle

JIRA jira@apache.org
In reply to this post by JIRA jira@apache.org

    [ https://issues.apache.org/jira/browse/TIKA-340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12784798#action_12784798 ]

Felix Meschberger commented on TIKA-340:
----------------------------------------

Re: logging:

Yes, that might be true -- still there is some API to implement and if you mplement logging based on LogService you loose all the log categories previously used because the OSGi LogService does not have such a concept.

Also, if you want to reuse the library in non-OSGi environments using the LogService will not work and create an OSGi dependency.

Re: size:

I knew there is some activity in this area. Thanks for the pointer.

Re: inlining

Why is the double compression troublesome ?

What the OSGI framework actually does, is unpacking the bundle jar. Thus embedded libaries are unpacked into jar files, that is unpacking is not recursive. Then the regaluar classes are loaded regularly while the embedded JAR files are loaded as JAR URLs.

Thus in the end, it might even be better to embed the libraries than to inline them.

But this decision depends on whether you want to use the result of the build in a non-OSGi environment or not. If you only target OSGi frameworks, then I would go for embedded libraries. Otherwise I would go for inlined libraries at the expense of 20% of the size of the resulting JAR file.

> Provide full Tika bundle
> ------------------------
>
>                 Key: TIKA-340
>                 URL: https://issues.apache.org/jira/browse/TIKA-340
>             Project: Tika
>          Issue Type: New Feature
>          Components: packaging
>    Affects Versions: 0.5
>            Reporter: Felix Meschberger
>            Assignee: Jukka Zitting
>             Fix For: 0.6
>
>         Attachments: TIKA-340-2.patch, TIKA-340.patch
>
>
> To easily deploy Tika and especially the Tika parsers, it would be convenient to have an almost complete bundle consisting of Tika Core, Tika Parsers as well as the most important parser dependencies. Any remaining dependencies not included with the bundle should be declared as optional import to not fail bundle resolution if one or the other (or all) import(s) is missing.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (TIKA-340) Provide full Tika bundle

JIRA jira@apache.org
In reply to this post by JIRA jira@apache.org

    [ https://issues.apache.org/jira/browse/TIKA-340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12784839#action_12784839 ]

Ken Krugler commented on TIKA-340:
----------------------------------

Funny, I was just looking at the size of the Hadoop job jar I generate for Bixo. It was suddenly 26MB, and pushing it up to EC2 was taking a long time.

As Jukka's blog post says, it's all about the ooxml-schemas-1.0.jar file - almost 14MB. And the 2.5MB xmlbeans-2.3.0.jar that this schema jar depends on. Excluding POI would cut about 18MB from my 26MB, which I might need to do (as an option for a smaller build).

> Provide full Tika bundle
> ------------------------
>
>                 Key: TIKA-340
>                 URL: https://issues.apache.org/jira/browse/TIKA-340
>             Project: Tika
>          Issue Type: New Feature
>          Components: packaging
>    Affects Versions: 0.5
>            Reporter: Felix Meschberger
>            Assignee: Jukka Zitting
>             Fix For: 0.6
>
>         Attachments: TIKA-340-2.patch, TIKA-340.patch
>
>
> To easily deploy Tika and especially the Tika parsers, it would be convenient to have an almost complete bundle consisting of Tika Core, Tika Parsers as well as the most important parser dependencies. Any remaining dependencies not included with the bundle should be declared as optional import to not fail bundle resolution if one or the other (or all) import(s) is missing.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (TIKA-340) Provide full Tika bundle

JIRA jira@apache.org
In reply to this post by JIRA jira@apache.org

    [ https://issues.apache.org/jira/browse/TIKA-340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12784855#action_12784855 ]

Andrzej Bialecki  commented on TIKA-340:
----------------------------------------

Vast majority of classes in these JARs are never used. Perhaps one of the steps in preparing this bundle could be to pass it through a code shrinker, such as Proguard (http://proguard.sourceforge.net) - not to obfuscate it, but simply to remove unused cruft.

> Provide full Tika bundle
> ------------------------
>
>                 Key: TIKA-340
>                 URL: https://issues.apache.org/jira/browse/TIKA-340
>             Project: Tika
>          Issue Type: New Feature
>          Components: packaging
>    Affects Versions: 0.5
>            Reporter: Felix Meschberger
>            Assignee: Jukka Zitting
>             Fix For: 0.6
>
>         Attachments: TIKA-340-2.patch, TIKA-340.patch
>
>
> To easily deploy Tika and especially the Tika parsers, it would be convenient to have an almost complete bundle consisting of Tika Core, Tika Parsers as well as the most important parser dependencies. Any remaining dependencies not included with the bundle should be declared as optional import to not fail bundle resolution if one or the other (or all) import(s) is missing.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (TIKA-340) Provide full Tika bundle

JIRA jira@apache.org
In reply to this post by JIRA jira@apache.org

    [ https://issues.apache.org/jira/browse/TIKA-340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12784860#action_12784860 ]

Felix Meschberger commented on TIKA-340:
----------------------------------------

Interesting idea, there is even a maven plugin at http://pyx4me.com/pyx4me-maven-plugins/proguard-maven-plugin/

> Provide full Tika bundle
> ------------------------
>
>                 Key: TIKA-340
>                 URL: https://issues.apache.org/jira/browse/TIKA-340
>             Project: Tika
>          Issue Type: New Feature
>          Components: packaging
>    Affects Versions: 0.5
>            Reporter: Felix Meschberger
>            Assignee: Jukka Zitting
>             Fix For: 0.6
>
>         Attachments: TIKA-340-2.patch, TIKA-340.patch
>
>
> To easily deploy Tika and especially the Tika parsers, it would be convenient to have an almost complete bundle consisting of Tika Core, Tika Parsers as well as the most important parser dependencies. Any remaining dependencies not included with the bundle should be declared as optional import to not fail bundle resolution if one or the other (or all) import(s) is missing.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Updated: (TIKA-340) Provide full Tika bundle

JIRA jira@apache.org
In reply to this post by JIRA jira@apache.org

     [ https://issues.apache.org/jira/browse/TIKA-340?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jukka Zitting updated TIKA-340:
-------------------------------

    Attachment: osgi-logging.patch

OK, you have a better view of the best practices for logging with OSGi. See the attached osgi-logging.patch for a quick and dirty experiment of what we could do if we did want to directly use the OSGi log service.

If the OSGi runtimes already unpack the bundle jar, then I have no problem with the embedded jars. Could we even avoid inlining the tika-core and tika-parsers jars, or is that something that's needed for the Export-Package rules to work? If the latter, can we exclude org.apache.tika.parser subpackages from being exported so that only tika-core gets inlined?

Let's take the size issue to tika-dev@.

> Provide full Tika bundle
> ------------------------
>
>                 Key: TIKA-340
>                 URL: https://issues.apache.org/jira/browse/TIKA-340
>             Project: Tika
>          Issue Type: New Feature
>          Components: packaging
>    Affects Versions: 0.5
>            Reporter: Felix Meschberger
>            Assignee: Jukka Zitting
>             Fix For: 0.6
>
>         Attachments: osgi-logging.patch, TIKA-340-2.patch, TIKA-340.patch
>
>
> To easily deploy Tika and especially the Tika parsers, it would be convenient to have an almost complete bundle consisting of Tika Core, Tika Parsers as well as the most important parser dependencies. Any remaining dependencies not included with the bundle should be declared as optional import to not fail bundle resolution if one or the other (or all) import(s) is missing.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Resolved: (TIKA-340) Provide full Tika bundle

JIRA jira@apache.org
In reply to this post by JIRA jira@apache.org

     [ https://issues.apache.org/jira/browse/TIKA-340?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jukka Zitting resolved TIKA-340.
--------------------------------

    Resolution: Fixed

I've renamed the bundle to tika-bundle. With that I think we have all the basics in place so I'm resolving this issue as Fixed. Thanks for the contribution!

Let's use followup issues to track the further improvements/features that have already been mentioned.

> Provide full Tika bundle
> ------------------------
>
>                 Key: TIKA-340
>                 URL: https://issues.apache.org/jira/browse/TIKA-340
>             Project: Tika
>          Issue Type: New Feature
>          Components: packaging
>    Affects Versions: 0.5
>            Reporter: Felix Meschberger
>            Assignee: Jukka Zitting
>             Fix For: 0.6
>
>         Attachments: osgi-logging.patch, TIKA-340-2.patch, TIKA-340.patch
>
>
> To easily deploy Tika and especially the Tika parsers, it would be convenient to have an almost complete bundle consisting of Tika Core, Tika Parsers as well as the most important parser dependencies. Any remaining dependencies not included with the bundle should be declared as optional import to not fail bundle resolution if one or the other (or all) import(s) is missing.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (TIKA-340) Provide full Tika bundle

JIRA jira@apache.org
In reply to this post by JIRA jira@apache.org

    [ https://issues.apache.org/jira/browse/TIKA-340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12785204#action_12785204 ]

Felix Meschberger commented on TIKA-340:
----------------------------------------

> Could we even avoid inlining the tika-core and tika-parsers jars

Partially and theoretically, yes.

Unfortunately, the <Export-Package> element causes the exported packages to be inlined. So whatever we export is being inlined.

Now, what to export: I cannot really tell.... Let's take this to tika-dev@, too (and a new issue)

> Provide full Tika bundle
> ------------------------
>
>                 Key: TIKA-340
>                 URL: https://issues.apache.org/jira/browse/TIKA-340
>             Project: Tika
>          Issue Type: New Feature
>          Components: packaging
>    Affects Versions: 0.5
>            Reporter: Felix Meschberger
>            Assignee: Jukka Zitting
>             Fix For: 0.6
>
>         Attachments: osgi-logging.patch, TIKA-340-2.patch, TIKA-340.patch
>
>
> To easily deploy Tika and especially the Tika parsers, it would be convenient to have an almost complete bundle consisting of Tika Core, Tika Parsers as well as the most important parser dependencies. Any remaining dependencies not included with the bundle should be declared as optional import to not fail bundle resolution if one or the other (or all) import(s) is missing.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Issue Comment Edited: (TIKA-340) Provide full Tika bundle

JIRA jira@apache.org
In reply to this post by JIRA jira@apache.org

    [ https://issues.apache.org/jira/browse/TIKA-340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12785204#action_12785204 ]

Felix Meschberger edited comment on TIKA-340 at 12/3/09 7:26 AM:
-----------------------------------------------------------------

> Could we even avoid inlining the tika-core and tika-parsers jars

Yes, I will create a new issue with patch.

      was (Author: fmeschbe):
    > Could we even avoid inlining the tika-core and tika-parsers jars

Partially and theoretically, yes.

Unfortunately, the <Export-Package> element causes the exported packages to be inlined. So whatever we export is being inlined.

Now, what to export: I cannot really tell.... Let's take this to tika-dev@, too (and a new issue)
 

> Provide full Tika bundle
> ------------------------
>
>                 Key: TIKA-340
>                 URL: https://issues.apache.org/jira/browse/TIKA-340
>             Project: Tika
>          Issue Type: New Feature
>          Components: packaging
>    Affects Versions: 0.5
>            Reporter: Felix Meschberger
>            Assignee: Jukka Zitting
>             Fix For: 0.6
>
>         Attachments: osgi-logging.patch, TIKA-340-2.patch, TIKA-340.patch
>
>
> To easily deploy Tika and especially the Tika parsers, it would be convenient to have an almost complete bundle consisting of Tika Core, Tika Parsers as well as the most important parser dependencies. Any remaining dependencies not included with the bundle should be declared as optional import to not fail bundle resolution if one or the other (or all) import(s) is missing.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.