XMLWriter escaping issue

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

XMLWriter escaping issue

Erik Hatcher
I encountered an escaping issue with XMLWriter.  Locally I've added  
the following test to BasicFunctionalityTest to demonstrate:

   public void testXMLWriter() throws Exception {

     SolrQueryResponse rsp = new SolrQueryResponse();
     rsp.add("\"quoted\"", "value");

     StringWriter writer = new StringWriter(32000);
     XMLWriter.writeResponse(writer,req("foo"),rsp);

     System.out.println("writer.toString() = " + writer.toString());
     DocumentBuilder builder = DocumentBuilderFactory.newInstance
().newDocumentBuilder();
     builder.parse(new ByteArrayInputStream
                              (writer.toString().getBytes("UTF-8")));
   }


Quotes within XML attributes cause invalid XML to be generated.

I've corrected this in my local copy with this patch adding the  
escaping to attribute names and the " to XML.chardata_escapes.  
The question is, is it appropriate to escape quotes everywhere, or  
should it just be done when writing attribute values?  It should be  
fine to do it across the board for attribute values and element text,  
but I wanted to verify that with solr-dev before committing it.

Comments?

        Erik



Index: src/java/org/apache/solr/request/XMLWriter.java
===================================================================
--- src/java/org/apache/solr/request/XMLWriter.java     (revision  
395873)
+++ src/java/org/apache/solr/request/XMLWriter.java     (working copy)
@@ -178,7 +178,7 @@
      writer.write(tag);
      if (name!=null) {
        writer.write(" name=\"");
-      writer.write(name);
+      XML.escapeCharData(name, writer);
        if (closeTag) {
          writer.write("\"/>");
        } else {
Index: src/java/org/apache/solr/util/XML.java
===================================================================
--- src/java/org/apache/solr/util/XML.java      (revision 395873)
+++ src/java/org/apache/solr/util/XML.java      (working copy)
@@ -32,7 +32,7 @@
    // many chars less than 0x20 are *not* valid XML, even when escaped!
    // for example, <foo>&#0;<foo> is invalid XML.
    private static final String[] chardata_escapes=
-  
{"#0;","#1;","#2;","#3;","#4;","#5;","#6;","#7;","#8;",null,null,"#11;",
"#12;",null,"#14;","#15;","#16;","#17;","#18;","#19;","#20;","#21;","#22
;","#23;","#24;","#25;","#26;","#27;","#28;","#29;","#30;","#31;",null,n
ull,null,null,null,null,"&amp;",null,null,null,null,null,null,null,null,
null,null,null,null,null,null,null,null,null,null,null,null,null,"&lt;"}
;
+  
{"#0;","#1;","#2;","#3;","#4;","#5;","#6;","#7;","#8;",null,null,"#11;",
"#12;",null,"#14;","#15;","#16;","#17;","#18;","#19;","#20;","#21;","#22
;","#23;","#24;","#25;","#26;","#27;","#28;","#29;","#30;","#31;",null,n
ull,"&quot;",null,null,null,"&amp;",null,null,null,null,null,null,null,n
ull,null,null,null,null,null,null,null,null,null,null,null,null,null,"&l
t;"};

Reply | Threaded
Open this post in threaded view
|

Re: XMLWriter escaping issue

Yonik Seeley
On 4/21/06, Erik Hatcher <[hidden email]> wrote:
> I've corrected this in my local copy with this patch adding the
> escaping to attribute names and the &quot; to XML.chardata_escapes.
> The question is, is it appropriate to escape quotes everywhere, or
> should it just be done when writing attribute values?

I'd prefer just escaping quotes in attribute values as it makes things
like debugging output that contains query strings easier to read, and
easier to paste back into the query box for debugging from someone
elses output.

The attribute values definitely need to be XML escaped though.

-Yonik
Reply | Threaded
Open this post in threaded view
|

Re: XMLWriter escaping issue

Erik Hatcher
I've committed a change to escape attributes and character data  
differently, all tests pass.  Let me know if there are any issues  
with it and I'd be happy to address them.

        Erik


On Apr 21, 2006, at 10:04 AM, Yonik Seeley wrote:

> On 4/21/06, Erik Hatcher <[hidden email]> wrote:
>> I've corrected this in my local copy with this patch adding the
>> escaping to attribute names and the &quot; to XML.chardata_escapes.
>> The question is, is it appropriate to escape quotes everywhere, or
>> should it just be done when writing attribute values?
>
> I'd prefer just escaping quotes in attribute values as it makes things
> like debugging output that contains query strings easier to read, and
> easier to paste back into the query box for debugging from someone
> elses output.
>
> The attribute values definitely need to be XML escaped though.
>
> -Yonik