Help with XmlPullParserException

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Help with XmlPullParserException

Phillip Farber
Hello all,

I'm indexing a body of OCR and encountered this exception. Apparently
it's some kind of XML parser error.  Out of thousands of documents,
which I create with significant processing to make sure they are XML
compliant, only this one appears to have a problem.  But can anyone tell
me what this specific error message means?


SEVERE: org.xmlpull.v1.XmlPullParserException: character reference (with
decimal value) may not contain a (position: START_TAG seen ...dieses aus
dem \nZusammenbestehen der Gleichungen \n\naajj2 -)- 2&#1a... @21781:16)


Thanks!

Phil

==========================

Full trace:

  SEVERE: org.xmlpull.v1.XmlPullParserException: character reference
(with decimal value) may not contain a (position: START_TAG seen
...dieses aus dem \nZusammenbestehen der Gleichungen \n\naajj2 -)-
2&#1a... @21781:16)
        at org.xmlpull.mxp1.MXParser.parseEntityRef(MXParser.java:2195)
        at org.xmlpull.mxp1.MXParser.nextImpl(MXParser.java:1275)
        at org.xmlpull.mxp1.MXParser.next(MXParser.java:1093)
        at org.xmlpull.mxp1.MXParser.nextText(MXParser.java:1058)
        at
org.apache.solr.handler.XmlUpdateRequestHandler.readDoc(XmlUpdateRequestHandler.java:332)
        at
org.apache.solr.handler.XmlUpdateRequestHandler.update(XmlUpdateRequestHandler.java:162)
        at
org.apache.solr.handler.XmlUpdateRequestHandler.handleRequestBody(XmlUpdateRequestHandler.java:84)
        at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:77)
        at org.apache.solr.core.SolrCore.execute(SolrCore.java:658)
        at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:191)
        at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:159)
        at
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1089)
        at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:365)
        at
org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
        at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181)
        at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:712)
        at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:405)
        at
org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:211)
        at
org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)
        at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:139)
        at org.mortbay.jetty.Server.handle(Server.java:285)
        at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:502)
        at
org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnection.java:835)
        at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:641)
        at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:202)
        at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:378)
        at
org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:226)
        at
org.mortbay.thread.BoundedThreadPool$PoolThread.run(BoundedThreadPool.java:442)
Reply | Threaded
Open this post in threaded view
|

Re: Help with XmlPullParserException

Phillip Farber
I just looked at this again and I think the problem is that the message
is referring to the garbage string of characters "2&#1a" where &#1a
looks like a decimal numeric character reference but the letter 'a' is a
hex digit.  I'll have to go back to my OCR cleanup routine ...  Thanks
for reading.

Phil

Phillip Farber wrote:

> Hello all,
>
> I'm indexing a body of OCR and encountered this exception. Apparently
> it's some kind of XML parser error.  Out of thousands of documents,
> which I create with significant processing to make sure they are XML
> compliant, only this one appears to have a problem.  But can anyone tell
> me what this specific error message means?
>
>
> SEVERE: org.xmlpull.v1.XmlPullParserException: character reference (with
> decimal value) may not contain a (position: START_TAG seen ...dieses aus
> dem \nZusammenbestehen der Gleichungen \n\naajj2 -)- 2&#1a... @21781:16)
>
>
> Thanks!
>
> Phil
>
> ==========================
>
> Full trace:
>
>  SEVERE: org.xmlpull.v1.XmlPullParserException: character reference
> (with decimal value) may not contain a (position: START_TAG seen
> ...dieses aus dem \nZusammenbestehen der Gleichungen \n\naajj2 -)-
> 2&#1a... @21781:16)
>     at org.xmlpull.mxp1.MXParser.parseEntityRef(MXParser.java:2195)
>     at org.xmlpull.mxp1.MXParser.nextImpl(MXParser.java:1275)
>     at org.xmlpull.mxp1.MXParser.next(MXParser.java:1093)
>     at org.xmlpull.mxp1.MXParser.nextText(MXParser.java:1058)
>     at
> org.apache.solr.handler.XmlUpdateRequestHandler.readDoc(XmlUpdateRequestHandler.java:332)
>
>     at
> org.apache.solr.handler.XmlUpdateRequestHandler.update(XmlUpdateRequestHandler.java:162)
>
>     at
> org.apache.solr.handler.XmlUpdateRequestHandler.handleRequestBody(XmlUpdateRequestHandler.java:84)
>
>     at
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:77)
>
>     at org.apache.solr.core.SolrCore.execute(SolrCore.java:658)
>     at
> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:191)
>
>     at
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:159)
>
>     at
> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1089)
>
>     at
> org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:365)
>     at
> org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
>     at
> org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181)
>     at
> org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:712)
>     at
> org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:405)
>     at
> org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:211)
>
>     at
> org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)
>
>     at
> org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:139)
>     at org.mortbay.jetty.Server.handle(Server.java:285)
>     at
> org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:502)
>     at
> org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnection.java:835)
>
>     at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:641)
>     at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:202)
>     at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:378)
>     at
> org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:226)
>
>     at
> org.mortbay.thread.BoundedThreadPool$PoolThread.run(BoundedThreadPool.java:442)
>