classloading problems with Xerces

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

classloading problems with Xerces

Daan de Wit
Hi,

 

We tried to integrate Tika in our product instead of using our own
parsing library, all goes well except for one problem. We use an OSGi
environment, and the Xerces library used by NekoHTML is causing us real
problems with classloading. So we decided to ditch NekoHTML, and use
HTMLParser [1] instead. HTMLParser's SAX implementation has some bugs
though, so we sub-classed it in Tika's HtmlParser class. If there is any
interest, I can create a JIRA-issue and attach the patch there.

Another minor problem we encountered is that the tests can not be run
without first copying the contents of src/main/resources to
src/main/resources/org/apache/tika.

 

Daan

Reply | Threaded
Open this post in threaded view
|

RE: classloading problems with Xerces

Daan de Wit
On a side-note, we only have the classloading problems when running on
Java 5, Java 6 works just fine so it seems the implementation of the
Java XML-library has changed it's implementation-loading mechanism.
Also, forgot to include the link to HTMLParser, so here it is.

[1] htmlparser.sourceforge.net

> -----Original Message-----
> From: Daan de Wit [mailto:[hidden email]]
> Sent: maandag 23 maart 2009 10:42
> To: [hidden email]
> Subject: classloading problems with Xerces
>
> Hi,
>
>
>
> We tried to integrate Tika in our product instead of using our own
> parsing library, all goes well except for one problem. We use an OSGi
> environment, and the Xerces library used by NekoHTML is causing us
real
> problems with classloading. So we decided to ditch NekoHTML, and use
> HTMLParser [1] instead. HTMLParser's SAX implementation has some bugs
> though, so we sub-classed it in Tika's HtmlParser class. If there is
any
> interest, I can create a JIRA-issue and attach the patch there.
>
> Another minor problem we encountered is that the tests can not be run
> without first copying the contents of src/main/resources to
> src/main/resources/org/apache/tika.
>
>
>
> Daan