FW: pdf to xml

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

FW: pdf to xml

Richard Braman


-----Original Message-----
From: Mark D. Anderson [mailto:[hidden email]]
Sent: Tuesday, February 28, 2006 4:40 PM
To: [hidden email]
Subject: Re: pdf to xml


well, i began to dislike that particular xml format, so started drafting
up a real spec:
 
http://discerning.com/hacks/docutils/pdf2xml/draft-mda-docformats-pdf-as
-xml-00.html

but then I began to conclude that I really needed to address the use of
XML namespaces, and also have two distinct namespaces, one for the
content instructions, and another for PDF tree structure.

I also explored using perl instead of pdfbox:
  http://discerning.com/hacks/docutils/pdf2xml.pl
When I next tackle this (and I don't know when the itch will come
again), I'll either use that or use multivalent, not pdfbox, which has
too much class/interface fluff to make this convenient.

-mda

On Tue, 28 Feb 2006 10:07:07 -0500, "Richard Braman"
<[hidden email]> said:

> Any updates on this?
>  
> http://discerning.com/hacks/docutils/pdf2xml/readme.html
>  
>  
>
> Richard Braman
> mailto:[hidden email]
> 561.748.4002 (voice)
>
> http://www.taxcodesoftware.org <http://www.taxcodesoftware.org/>
> Free Open Source Tax Software
>
>