exception in rendering /select XML

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

exception in rendering /select XML

Erik Hatcher
I've just indexed a handful of scholarly objects, which include some  
international characters.  I may have done something wrong with the  
XML I sent to add the documents (though no errors appeared then), or  
perhaps there is some issue with Solr's XML serialization.  I haven't  
had a chance to look into it further yet, but wanted to post here in  
case anyone has seen this and solved it already or can confirm that  
it's an issue.

Thanks,
        Erik



Apr 11, 2006 11:13:25 PM org.apache.solr.core.SolrException log
SEVERE: java.lang.IndexOutOfBoundsException
         at sun.nio.cs.StreamEncoder.write(StreamEncoder.java:132)
         at java.io.OutputStreamWriter.write(OutputStreamWriter.java:
191)
         at org.mortbay.jetty.HttpConnection$OutputWriter.write
(HttpConnection.java:976)
         at java.io.PrintWriter.write(PrintWriter.java:384)
         at java.io.PrintWriter.write(PrintWriter.java:401)
         at org.apache.solr.util.XML.escapeCharData(XML.java:100)
         at org.apache.solr.request.XMLWriter.writePrim
(XMLWriter.java:609)
         at org.apache.solr.request.XMLWriter.writeStr(XMLWriter.java:
479)
         at org.apache.solr.schema.TextField.write(TextField.java:41)
         at org.apache.solr.schema.SchemaField.write(SchemaField.java:
96)
         at org.apache.solr.request.XMLWriter.writeDoc(XMLWriter.java:
282)
         at org.apache.solr.request.XMLWriter.writeDocList
(XMLWriter.java:347)
         at org.apache.solr.request.XMLWriter.writeVal(XMLWriter.java:
386)
         at org.apache.solr.request.XMLWriter.writeResponse
(XMLWriter.java:106)
         at org.apache.solr.request.XMLResponseWriter.write
(XMLResponseWriter.java:29)
         at org.apache.solr.servlet.SolrServlet.doGet
(SolrServlet.java:75)
         at javax.servlet.http.HttpServlet.service(HttpServlet.java:747)
         at javax.servlet.http.HttpServlet.service(HttpServlet.java:860)
         at org.mortbay.jetty.servlet.ServletHolder.handle
(ServletHolder.java:408)
         at org.mortbay.jetty.servlet.ServletHandler.handle
(ServletHandler.java:350)
         at org.mortbay.jetty.servlet.SessionHandler.handle
(SessionHandler.java:195)
         at org.mortbay.jetty.security.SecurityHandler.handle
(SecurityHandler.java:164)
         at org.mortbay.jetty.handler.ContextHandler.handle
(ContextHandler.java:536)
         at org.mortbay.jetty.Server.handle(Server.java:309)
         at org.mortbay.jetty.Server.handle(Server.java:285)
         at org.mortbay.jetty.HttpConnection.doHandler
(HttpConnection.java:363)
         at org.mortbay.jetty.HttpConnection.access$1600
(HttpConnection.java:45)
         at org.mortbay.jetty.HttpConnection
$RequestHandler.headerComplete(HttpConnection.java:609)
         at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:490)
         at org.mortbay.jetty.HttpParser.parseAvailable
(HttpParser.java:195)
         at org.mortbay.jetty.HttpConnection.handle
(HttpConnection.java:297)
         at org.mortbay.jetty.nio.SelectChannelConnector
$HttpEndPoint.run(SelectChannelConnector.java:680)
         at org.mortbay.thread.BoundedThreadPool$PoolThread.run
(BoundedThreadPool.java:412)


Reply | Threaded
Open this post in threaded view
|

Re: exception in rendering /select XML

Yonik Seeley
It could be a bug in the XML serialization.
Is there a way to find out what string is being written (perhaps
modify the code to catch that particular exception and display the
string)

The weird thing is that the last Solr line in the trace is
org.apache.solr.util.XML.escapeCharData(XML.java:100)

99    if (start==0) {
100      out.write(str);

So Solr is writing a complete String to the stream (no chance to add a
bad offset or length).
It looks like it could be a Jetty bug... The easiest thing to try next
might be upgrading to the latest version of Jetty or Tomcat.

-Yonik

On 4/11/06, Erik Hatcher <[hidden email]> wrote:

> I've just indexed a handful of scholarly objects, which include some
> international characters.  I may have done something wrong with the
> XML I sent to add the documents (though no errors appeared then), or
> perhaps there is some issue with Solr's XML serialization.  I haven't
> had a chance to look into it further yet, but wanted to post here in
> case anyone has seen this and solved it already or can confirm that
> it's an issue.
>
> Thanks,
>         Erik
>
>
>
> Apr 11, 2006 11:13:25 PM org.apache.solr.core.SolrException log
> SEVERE: java.lang.IndexOutOfBoundsException
>          at sun.nio.cs.StreamEncoder.write(StreamEncoder.java:132)
>          at java.io.OutputStreamWriter.write(OutputStreamWriter.java:
> 191)
>          at org.mortbay.jetty.HttpConnection$OutputWriter.write
> (HttpConnection.java:976)
>          at java.io.PrintWriter.write(PrintWriter.java:384)
>          at java.io.PrintWriter.write(PrintWriter.java:401)
>          at org.apache.solr.util.XML.escapeCharData(XML.java:100)
>          at org.apache.solr.request.XMLWriter.writePrim
> (XMLWriter.java:609)
>          at org.apache.solr.request.XMLWriter.writeStr(XMLWriter.java:
> 479)
>          at org.apache.solr.schema.TextField.write(TextField.java:41)
>          at org.apache.solr.schema.SchemaField.write(SchemaField.java:
> 96)
>          at org.apache.solr.request.XMLWriter.writeDoc(XMLWriter.java:
> 282)
>          at org.apache.solr.request.XMLWriter.writeDocList
> (XMLWriter.java:347)
>          at org.apache.solr.request.XMLWriter.writeVal(XMLWriter.java:
> 386)
>          at org.apache.solr.request.XMLWriter.writeResponse
> (XMLWriter.java:106)
>          at org.apache.solr.request.XMLResponseWriter.write
> (XMLResponseWriter.java:29)
>          at org.apache.solr.servlet.SolrServlet.doGet
> (SolrServlet.java:75)
>          at javax.servlet.http.HttpServlet.service(HttpServlet.java:747)
>          at javax.servlet.http.HttpServlet.service(HttpServlet.java:860)
>          at org.mortbay.jetty.servlet.ServletHolder.handle
> (ServletHolder.java:408)
>          at org.mortbay.jetty.servlet.ServletHandler.handle
> (ServletHandler.java:350)
>          at org.mortbay.jetty.servlet.SessionHandler.handle
> (SessionHandler.java:195)
>          at org.mortbay.jetty.security.SecurityHandler.handle
> (SecurityHandler.java:164)
>          at org.mortbay.jetty.handler.ContextHandler.handle
> (ContextHandler.java:536)
>          at org.mortbay.jetty.Server.handle(Server.java:309)
>          at org.mortbay.jetty.Server.handle(Server.java:285)
>          at org.mortbay.jetty.HttpConnection.doHandler
> (HttpConnection.java:363)
>          at org.mortbay.jetty.HttpConnection.access$1600
> (HttpConnection.java:45)
>          at org.mortbay.jetty.HttpConnection
> $RequestHandler.headerComplete(HttpConnection.java:609)
>          at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:490)
>          at org.mortbay.jetty.HttpParser.parseAvailable
> (HttpParser.java:195)
>          at org.mortbay.jetty.HttpConnection.handle
> (HttpConnection.java:297)
>          at org.mortbay.jetty.nio.SelectChannelConnector
> $HttpEndPoint.run(SelectChannelConnector.java:680)
>          at org.mortbay.thread.BoundedThreadPool$PoolThread.run
> (BoundedThreadPool.java:412)
Reply | Threaded
Open this post in threaded view
|

Re: exception in rendering /select XML

Erik Hatcher

On Apr 11, 2006, at 11:56 PM, Yonik Seeley wrote:
> It could be a bug in the XML serialization.
> Is there a way to find out what string is being written (perhaps
> modify the code to catch that particular exception and display the
> string)

I know its a bunch of text I culled from pages like this:

        <http://www.purl.org/swinburnearchive/txt/aicatlnt00>

(it'll redirect)

> The weird thing is that the last Solr line in the trace is
> org.apache.solr.util.XML.escapeCharData(XML.java:100)
>
> 99    if (start==0) {
> 100      out.write(str);
>
> So Solr is writing a complete String to the stream (no chance to add a
> bad offset or length).
> It looks like it could be a Jetty bug... The easiest thing to try next
> might be upgrading to the latest version of Jetty or Tomcat.

I actually saw the stack trace in the partial not-well-formed XML  
response on the client as well, if that bit of trivia is useful.

I'll try out those suggestions when I can.

Thanks,
        Erik

Reply | Threaded
Open this post in threaded view
|

Re: exception in rendering /select XML

Chris Hostetter-3

: > Is there a way to find out what string is being written (perhaps
: > modify the code to catch that particular exception and display the
: > string)
:
: I know its a bunch of text I culled from pages like this:
:
: <http://www.purl.org/swinburnearchive/txt/aicatlnt00>
:
: (it'll redirect)

I got a flat 404.

to pinpoint the exact text, i would start by changing the start/rows
params so that you get one doc at a time untill you find one that causes
the error .. then change your fl to just be the id and one other field,
and try each of the field names untill you find the one with the data that
caused the problem.


my hunch is that when POSTing the doc, the wrong charset (or char
encoding, i allways get them confused) was used by Jetty, so a corrupt
string was indexed, and it isn't obvious untill it was displayed.


: > The weird thing is that the last Solr line in the trace is
: > org.apache.solr.util.XML.escapeCharData(XML.java:100)
: >
: > 99    if (start==0) {
: > 100      out.write(str);

I commited a modified XML.java last night, your line number may not match
Erik's build.

: I actually saw the stack trace in the partial not-well-formed XML
: response on the client as well, if that bit of trivia is useful.

that's pretty typical of a Solr error page unfortunately.



-Hoss

Reply | Threaded
Open this post in threaded view
|

Re: exception in rendering /select XML

Erik Hatcher

On Apr 12, 2006, at 12:36 AM, Chris Hostetter wrote:

>
> : > Is there a way to find out what string is being written (perhaps
> : > modify the code to catch that particular exception and display the
> : > string)
> :
> : I know its a bunch of text I culled from pages like this:
> :
> : <http://www.purl.org/swinburnearchive/txt/aicatlnt00>
> :
> : (it'll redirect)
>
> I got a flat 404.
>
> to pinpoint the exact text, i would start by changing the start/rows
> params so that you get one doc at a time untill you find one that  
> causes
> the error .. then change your fl to just be the id and one other  
> field,
> and try each of the field names untill you find the one with the  
> data that
> caused the problem.
>
>
> my hunch is that when POSTing the doc, the wrong charset (or char
> encoding, i allways get them confused) was used by Jetty, so a corrupt
> string was indexed, and it isn't obvious untill it was displayed.

It's this, sorry for the previous bad URL:

        <http://www.letrs.indiana.edu/swinburne/txt/aicatlnt00.txt>

I suspect the charset diagnosis is the likely culprit.  My Java  
client is using HttpClient to read the data from that URL, add it to  
a field, and then send it on to Solr.  Lots of potential issues with  
the charset/encoding issue to go awry.

        Erik

Reply | Threaded
Open this post in threaded view
|

Re: exception in rendering /select XML

Yonik Seeley
In reply to this post by Chris Hostetter-3
On 4/12/06, Chris Hostetter <[hidden email]> wrote:
> : > The weird thing is that the last Solr line in the trace is
> : > org.apache.solr.util.XML.escapeCharData(XML.java:100)
> : >
> : > 99    if (start==0) {
> : > 100      out.write(str);

Thanks, I had missed that.  I just verified that line 100 is the same
in both versions of the file, so the most likely explanation is a
corrupt string (the string might end in the first char of a multi char
character) that triggers the exception in Sun's UTF-8 encoder.

So the question then is, how did this bad string come about?
Chris' guess about a bad charset somewhere is probably right.

-Yonik
Reply | Threaded
Open this post in threaded view
|

Re: exception in rendering /select XML

Erik Hatcher
I've finally revisited this issue.  I switched to Tomcat (5.5.16) and  
all is well, so it certainly appears as if this is a Jetty issue.  
*sigh*

I'll look into whether there is a newer version of Jetty and if that  
fixes this.

I did make some improvements to text encoding in my indexing process,  
which is actually quite involved: RDF files, parsed by Java, some  
pointers to URLs that get fetched via HttpClient, and then packaged  
into a JDOM XML DOM, serialized to a String, and then sent to Solr  
via HttpClient.  Even if I had the wrong encoding somewhere along the  
way, if a valid String is retrieved from a Lucene Field it should be  
serializable again, at least I'm assuming so - so it is at least  
reassuring that the bug is in Jetty and not in my complicated process.

        Erik


On Apr 12, 2006, at 10:24 AM, Yonik Seeley wrote:

> On 4/12/06, Chris Hostetter <[hidden email]> wrote:
>> : > The weird thing is that the last Solr line in the trace is
>> : > org.apache.solr.util.XML.escapeCharData(XML.java:100)
>> : >
>> : > 99    if (start==0) {
>> : > 100      out.write(str);
>
> Thanks, I had missed that.  I just verified that line 100 is the same
> in both versions of the file, so the most likely explanation is a
> corrupt string (the string might end in the first char of a multi char
> character) that triggers the exception in Sun's UTF-8 encoder.
>
> So the question then is, how did this bad string come about?
> Chris' guess about a bad charset somewhere is probably right.
>
> -Yonik