add/update index

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

add/update index

pgwillia
Hi,

    I have created a process which uses xsl to convert my data to the form
indicated in the examples so that it can be added to the index as the solr
tutorial indicates:
<add>
   <doc>
     <field name="field">value</field>
     ...
   </doc>
</add>

    In some cases the xsl process will create a field element with no data.
(ie <field name="field"/>)  Is this considered bad input and will not be
accepted?  Or is this something that solr should deal with?  Currently for
each field element with no data I receive the message:
<result status="1">java.lang.NullPointerException
  at org.apache.solr.update.DocumentBuilder.addField(DocumentBuilder.java:78)
  at org.apache.solr.update.DocumentBuilder.addField(DocumentBuilder.java:74)
  at org.apache.solr.core.SolrCore.readDoc(SolrCore.java:917)
  at org.apache.solr.core.SolrCore.update(SolrCore.java:685)
  at org.apache.solr.servlet.SolrUpdateServlet.doPost(SolrUpdateServlet.java:52)
  ...
</result>

    Just curious if the gurus out there think I should deal with the null
values in my xsl process or if this can be dealt with in solr itself?

Thanks,
Tricia

ps.  Thanks for the timely fix for the UTF-8 issue!
Reply | Threaded
Open this post in threaded view
|

Re: add/update index

Yonik Seeley-2
On 7/27/06, Tricia Williams <[hidden email]> wrote:

> Hi,
>
>     I have created a process which uses xsl to convert my data to the form
> indicated in the examples so that it can be added to the index as the solr
> tutorial indicates:
> <add>
>    <doc>
>      <field name="field">value</field>
>      ...
>    </doc>
> </add>
>
>     In some cases the xsl process will create a field element with no data.
> (ie <field name="field"/>)  Is this considered bad input and will not be
> accepted?

If the desired semantics are "the field doesn't exist" or "null value"
then yes.  There isn't a way to represent a field without a value in
Lucene except to not add the field for that document.  If it's totally
ignored, it probably shouldn't be in the XML.

Now, one might think we could drop fields with no value, but that's
problematic because it goes against the XML standard:

http://www.w3.org/TR/REC-xml/#sec-starttags
[Definition: An element with no content is said to be empty.] The
representation of an empty element is either a start-tag immediately
followed by an end-tag, or an empty-element tag. [Definition: An
empty-element tag takes a special form:]

So <a></a> and <a/> are supposed to be equivalent.  Given that, it
does look like Solr should treat <field name="val"/> like a
zero-length string (but that's not what you wanted, right?)

-Yonik
Reply | Threaded
Open this post in threaded view
|

Re: add/update index

pgwillia
Thanks Yonik,

    That's exactly what I needed to know.  I'll adapt my xsl process to
omit null values.

Tricia

On Thu, 27 Jul 2006, Yonik Seeley wrote:

> On 7/27/06, Tricia Williams <[hidden email]> wrote:
>> Hi,
>>
>>     I have created a process which uses xsl to convert my data to the form
>> indicated in the examples so that it can be added to the index as the solr
>> tutorial indicates:
>> <add>
>>    <doc>
>>      <field name="field">value</field>
>>      ...
>>    </doc>
>> </add>
>>
>>     In some cases the xsl process will create a field element with no data.
>> (ie <field name="field"/>)  Is this considered bad input and will not be
>> accepted?
>
> If the desired semantics are "the field doesn't exist" or "null value"
> then yes.  There isn't a way to represent a field without a value in
> Lucene except to not add the field for that document.  If it's totally
> ignored, it probably shouldn't be in the XML.
>
> Now, one might think we could drop fields with no value, but that's
> problematic because it goes against the XML standard:
>
> http://www.w3.org/TR/REC-xml/#sec-starttags
> [Definition: An element with no content is said to be empty.] The
> representation of an empty element is either a start-tag immediately
> followed by an end-tag, or an empty-element tag. [Definition: An
> empty-element tag takes a special form:]
>
> So <a></a> and <a/> are supposed to be equivalent.  Given that, it
> does look like Solr should treat <field name="val"/> like a
> zero-length string (but that's not what you wanted, right?)
>
> -Yonik
>