Registering a local dtd file for use with Digester

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Registering a local dtd file for use with Digester

Mike O'Leary
I have a collection of XML files that I would like to parse using Digester
in order to index them for Lucene. A DTD file has been supplied for the XML
files, but none of those files has a <!DOCTYPE ...> line associating them
with the DTD file. Can the Digester's register function be used to tell it
to use that DTD file for such things as entity resolution? If so, how do I
do it? I don't understand how to specify a pathname for a local file in
terms of a publicId and an entityURL. If register can't be used for this
purpose, is there another way to do it? Thanks.

Mike

Reply | Threaded
Open this post in threaded view
|

Re: Registering a local dtd file for use with Digester

steve_rowe
Hi Mike,

> I have a collection of XML files that I would like to parse using Digester
> in order to index them for Lucene. A DTD file has been supplied for the XML
> files, but none of those files has a <!DOCTYPE ...> line associating them
> with the DTD file. Can the Digester's register function be used to tell it
> to use that DTD file for such things as entity resolution? If so, how do I
> do it? I don't understand how to specify a pathname for a local file in
> terms of a publicId and an entityURL. If register can't be used for this
> purpose, is there another way to do it? Thanks.

Your issue will almost certainly be better addressed in a Digester forum
- your problem has nothing to do with Lucene.

A hint: it looks like you can create a Digester instance with an
externally created SAX parser[1], on which you can set the entity
resolver to an extended DefaultHandler2[2] (Java 1.5) which overrides
the getExternalSubset() method (specified by the EntityResolver2
interface[3]) to return an InputSource containing your desired DTD.

Something like (warning - untested; stolen in part from the Digester
FAQ[1]):

  SAXParser parser = SAXParserFactory.newInstance().newSAXParser();
  parser.getXMLReader().setEntityResolver(new DefaultHandler2() {
    getExternalSubset(String name, String baseURI) {
      return new InputSource(/* put your DTD here */);
    }
  });
  Digester digester = new Digester(parser);
  // add digester rules here
  parser.setContentHandler(digester);
  parser.parse(/* put your input document here */);

Hope it helps,
Steve

[1] Digester FAQ (instantiating Digester with an external SAX parser):
<http://wiki.apache.org/jakarta-commons/Digester/FAQ#head-8ac8fa70e2db185845fadec56785cd53eab8d3f9>

[2] DefaultHandler2 (enables external DTD resolution with no DOCTYPE in
the XML document):
<http://java.sun.com/j2se/1.5.0/docs/api/org/xml/sax/ext/DefaultHandler2.html>

[3] EntityResolver2 (implemented by DefaultHandler2):
<http://java.sun.com/j2se/1.5.0/docs/api/org/xml/sax/ext/EntityResolver2.html>


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]