XHTMLContentHandler xhtml = new XHTMLContentHandler(handler, metadata);
Based on my findings, it seems that casting InputStream into TikaInputStream is failing.
So tstream variable becomes null, which results in error.
I'm not sure what's going wrong in here as made my parser similar to the PDF's
Any help please?
Also, I'm not sure whether
File f = tstream.getFile();
Process ps = Runtime.getRuntime().exec("/hwp2xml.bin", null, f);
new XMLParser().parse(ps.getInputStream(), handler, metadata, context);
> Hi Nick, sorry to bother again but I'm not quite sure of what you have
> Nick Burch-2 wrote
> > On Tue, 31 Jul 2012, 122jxgcn wrote:
> > If your TikaInputStream lacks a file, and getFile is called, one will
> > automatically be created for you. (That's part of the point!)
> I believe created file will be empty. Then how can I process the input
> without its data?
It will not be empty. It seems there is some misunderstanding here. Of
cource a ResourceAsStream InputStream has no file backed (or the file is not
easy reachable). The main idea behin TikeInputStream is to provide the file
on request. If hasFile() returns false, TikaInputStream will do the
following when you call getFile():
- create temporary file
- copy the whole stream to the temporary file
After that you can process the contents. If the InputStream passed to
TikaInputStream has a possibility to get the file backed, it will return it
directly, but in most cases it will create a temporary one and copy the
contents into it. Because of this its always better to make your parser work
on a InputStream and only use a file, if the parser cannot (e.g. because it
needs random access).
> So basically, my file is converted to InputStream by
> InputStream stream = HWPParserTest.class.getResourceAsStream(
> After that, InputStream stream is passed to parser() of HWPParser and it
> be converted to TikaInputStream tstream without the loss of input file
> I'm currently doing
> TikaInputStream tstream = TikaInputStream.get(stream);
> right now.
> I believe tstream.hasFile() should true right away in order to my parser
No, hasFile only tells you if the wrapped InputStream has a backing file,
for resource streams this is not the case. If you cann getFile() it will
emulate a backing file by copying to a temporary one. After that the stream
> Thanks a lot.
> View this message in context: http://lucene.472066.n3.nabble.com/Custom- > parser-error-tp3998302p3998536.html
> Sent from the Apache Tika - Development mailing list archive at