Indexing xml documents with custom field type

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Indexing xml documents with custom field type

James Gregory-4
I wish to index well formed xml documents as they are without escaping
all the tags with lt;s and gt;s. I searched this mailing list's archive
and found someone who suggested that you can make a new field type
having a file something like:

import org.apache.solr.schema.TextField;
import org.apache.solr.request.XMLWriter;
import org.apache.lucene.document.Fieldable;
import java.io.IOException;

public class XMLField extends TextField {

    public void write(XMLWriter xmlWriter, String name, Fieldable f) throws

IOException {

    xmlWriter.writePrim("xml", name, f.stringValue(), false);

    }

}

and then loading it as a plugin by placing the compiled java file into
the /lib directory. Then in my schema I have:

<fieldtype name="xmltext" class="solr.XMLField" />

However when I post data that uses this new field type I get the
following response:

<result status="1">java.lang.NullPointerException
        at org.apache.solr.core.SolrCore.update(SolrCore.java:693)
        at
org.apache.solr.servlet.SolrUpdateServlet.doPost(SolrUpdateServlet.java:53)
        at javax.servlet.http.HttpServlet.service(HttpServlet.java:709)
        at javax.servlet.http.HttpServlet.service(HttpServlet.java:802)
        at
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:252)
        at
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:173)
        at
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:213)
        at
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:178)
        at
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:126)
        at
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:105)
        at
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:107)
        at
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:148)
        at
org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:869)
        at
org.apache.coyote.http11.Http11BaseProtocol$Http11ConnectionHandler.processConnection(Http11BaseProtocol.java:664)
        at
org.apache.tomcat.util.net.PoolTcpEndpoint.processSocket(PoolTcpEndpoint.java:527)
        at
org.apache.tomcat.util.net.LeaderFollowerWorkerThread.runIt(LeaderFollowerWorkerThread.java:80)
        at
org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run(ThreadPool.java:684)
        at java.lang.Thread.run(Thread.java:595)
</result>

I have no idea what I have done wrong - can anyone help?

James

--
Isotoma, Open Source Software Consulting - http://www.isotoma.com
Tel: 01904 567349, Mobile: 07879 423002, Fax: 020 79006980
Postal Address: Tower House, Fishergate, York, YO10 4UA, UK

Registered in England.  Company No 5171172.  VAT GB843570325
Registered Office: 19a Goodge Street, London, W1T 2PH

Reply | Threaded
Open this post in threaded view
|

Re: Indexing xml documents with custom field type

Chris Hostetter-3

: I wish to index well formed xml documents as they are without escaping
: all the tags with lt;s and gt;s. I searched this mailing list's archive
: and found someone who suggested that you can make a new field type
: having a file something like:

in the thread in question...

http://www.nabble.com/Indexing-XML-files-tf2763600.html

...the suggestion to add a new XMLFieldType was so the user could get the
xml values from his field "raw" in the body of an XmlResponseWriter
response for the purposes of XSLT styling ... but that only affected the
display of results returned to query clients, if you note the early
messages in the thread, the XML data you want to use as a field value
still needs to be properly escaped when you are indexing it so that Solr
knows what is data (your xml) and what is markup (the <field> tags so Solr
expects)



-Hoss