problems with indexing documents

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

problems with indexing documents

Bill Tantzen
In a legacy application using Solr 4.1 and solrj, I have always been
able to add documents with TrieDateField types using java.util.Date
objects, for instance,

doc.addField ( "date", new java.util.Date() );

having recently upgraded to Solr 7.7, and updating my schema to
leverage DatePointField as my type, that code no longer works,  it
throws an exception with an error like:

Invalid Date String: 'Sun Jul 31 19:00:00 CDT 2016'

I understand that this String is not what solr expects, but in lieu of
formatting the correct String, is there no longer a way to pass in a
simple Date object?  Was there some kind of implicit conversion taking
place earlier that is no longer happening?

In fact, in the some of the example code that come with the solr
distribution, (SolrExampleTests.java), document timestamp fields are
added using the same AddField call I am attempting to use, so I am
very confused.

Thanks for any advice!

Regards,
Bill
Reply | Threaded
Open this post in threaded view
|

Re: problems with indexing documents

Zheng Lin Edwin Yeo
Hi Bill,

Previously, did you index the date in the same format as you are using now,
or in the Solr format of "YYYY-MM-DDTHH:MM:SSZ"?

Regards,
Edwin


On Tue, 2 Apr 2019 at 00:32, Bill Tantzen <[hidden email]> wrote:

> In a legacy application using Solr 4.1 and solrj, I have always been
> able to add documents with TrieDateField types using java.util.Date
> objects, for instance,
>
> doc.addField ( "date", new java.util.Date() );
>
> having recently upgraded to Solr 7.7, and updating my schema to
> leverage DatePointField as my type, that code no longer works,  it
> throws an exception with an error like:
>
> Invalid Date String: 'Sun Jul 31 19:00:00 CDT 2016'
>
> I understand that this String is not what solr expects, but in lieu of
> formatting the correct String, is there no longer a way to pass in a
> simple Date object?  Was there some kind of implicit conversion taking
> place earlier that is no longer happening?
>
> In fact, in the some of the example code that come with the solr
> distribution, (SolrExampleTests.java), document timestamp fields are
> added using the same AddField call I am attempting to use, so I am
> very confused.
>
> Thanks for any advice!
>
> Regards,
> Bill
>
Reply | Threaded
Open this post in threaded view
|

Re: problems with indexing documents

Mark H. Wood
I'm also working on this with Bill.

On Tue, Apr 02, 2019 at 09:44:16AM +0800, Zheng Lin Edwin Yeo wrote:
> Previously, did you index the date in the same format as you are using now,
> or in the Solr format of "YYYY-MM-DDTHH:MM:SSZ"?

As may be seen from the sample code:

> > doc.addField ( "date", new java.util.Date() );

we were not using a string format at all, but passing a java.util.Date
object.  In the past this was interpreted successfully and correctly.
After upgrading, we get an error:

> > Invalid Date String: 'Sun Jul 31 19:00:00 CDT 2016'

which suggests to me that something in or below
SolrInputDocument.addField(String, Object) is applying Date.toString()
to the Object, which yields a string format that Solr does not
understand.

I am dealing with this by trying to hunt down all the places where
Date was passed to addField, and explicitly convert it to a String in
Solr format.  But we would like to know if there is a better way, or
at least what I did wrong.

The SolrJ documentation says nothing about how the field value Object
is handled.  It does say that it should match the schema, but I can
find no table showing what Java object types "match" the stock schema
fieldtype classes such as DatePointField.  I would naively suppose that
j.u.Date is a particularly *good* match for DatePointField.  What have
I missed?

> On Tue, 2 Apr 2019 at 00:32, Bill Tantzen <[hidden email]> wrote:
>
> > In a legacy application using Solr 4.1 and solrj, I have always been
> > able to add documents with TrieDateField types using java.util.Date
> > objects, for instance,
> >
> > doc.addField ( "date", new java.util.Date() );
> >
> > having recently upgraded to Solr 7.7, and updating my schema to
> > leverage DatePointField as my type, that code no longer works,  it
> > throws an exception with an error like:
> >
> > Invalid Date String: 'Sun Jul 31 19:00:00 CDT 2016'
> >
> > I understand that this String is not what solr expects, but in lieu of
> > formatting the correct String, is there no longer a way to pass in a
> > simple Date object?  Was there some kind of implicit conversion taking
> > place earlier that is no longer happening?
> >
> > In fact, in the some of the example code that come with the solr
> > distribution, (SolrExampleTests.java), document timestamp fields are
> > added using the same AddField call I am attempting to use, so I am
> > very confused.
> >
> > Thanks for any advice!
> >
> > Regards,
> > Bill
> >
--
Mark H. Wood
Lead Technology Analyst

University Library
Indiana University - Purdue University Indianapolis
755 W. Michigan Street
Indianapolis, IN 46202
317-274-0749
www.ulib.iupui.edu

signature.asc (201 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: problems with indexing documents

Bill Tantzen
In reply to this post by Zheng Lin Edwin Yeo
Right, as Mark said, this is how the dates were indexed previously.
However, instead of passing in the actual String, we passed a
java.util.Date object which was automagically converted to the correct
string.

Now (the code on our end has not changed), solr throws an exception
because the string it sees is of the form 'Sun Jul 31 19:00:00 CDT
2016' -- (which I believe is the Date.toString() result) instead of
the DatePointField or TrieDateField format.

~~ Bill

On Mon, Apr 1, 2019 at 8:44 PM Zheng Lin Edwin Yeo <[hidden email]> wrote:

>
> Hi Bill,
>
> Previously, did you index the date in the same format as you are using now,
> or in the Solr format of "YYYY-MM-DDTHH:MM:SSZ"?
>
> Regards,
> Edwin
>
>
> On Tue, 2 Apr 2019 at 00:32, Bill Tantzen <[hidden email]> wrote:
>
> > In a legacy application using Solr 4.1 and solrj, I have always been
> > able to add documents with TrieDateField types using java.util.Date
> > objects, for instance,
> >
> > doc.addField ( "date", new java.util.Date() );
> >
> > having recently upgraded to Solr 7.7, and updating my schema to
> > leverage DatePointField as my type, that code no longer works,  it
> > throws an exception with an error like:
> >
> > Invalid Date String: 'Sun Jul 31 19:00:00 CDT 2016'
> >
> > I understand that this String is not what solr expects, but in lieu of
> > formatting the correct String, is there no longer a way to pass in a
> > simple Date object?  Was there some kind of implicit conversion taking
> > place earlier that is no longer happening?
> >
> > In fact, in the some of the example code that come with the solr
> > distribution, (SolrExampleTests.java), document timestamp fields are
> > added using the same AddField call I am attempting to use, so I am
> > very confused.
> >
> > Thanks for any advice!
> >
> > Regards,
> > Bill
> >



--
Human wheels spin round and round
While the clock keeps the pace... -- John Mellencamp
________________________________________________________________
Bill Tantzen    University of Minnesota Libraries
612-626-9949 (U of M)    612-325-1777 (cell)
Reply | Threaded
Open this post in threaded view
|

Proper type(s) for adding a DatePointField value [was: problems with indexing documents]

Mark H. Wood
One difficulty is that the documentation of
SolrInputDocument.addField(String, Object) is not at all specific.
I'm aware of SOLR-2298 and I accept that the patch is an improvement,
but still...

  @param value Value of the field, should be of same class type as
  defined by "type" attribute of the corresponding field in
  schema.xml.

The corresponding <field/>'s 'type' attribute is an arbitrary label
referencing the 'name' attribute of a <fieldType/>.  It could be
"boysenberry" or "axolotl".  So we need to look at the 'class'
attribute of the fieldType?  So, if I have in my schema:

  <fieldType name='date' class='solr.DatePointField' bla bla/>
  <field name='created' type='date' bla bla/>

then I need to pass an instance of DatePointField?

  myDoc.addField("created", new DatePointField(bla bla));

That doesn't seem right, but go ahead and surprise me.

But I *know* that it accepts a properly formatted String value for a
field using DatePointField.  So, how can I determine the set of Java
types that is accepted as a new field value for a field whose field
type's class attribute is X?  And where should I have read that?

--
Mark H. Wood
Lead Technology Analyst

University Library
Indiana University - Purdue University Indianapolis
755 W. Michigan Street
Indianapolis, IN 46202
317-274-0749
www.ulib.iupui.edu

signature.asc (201 bytes) Download Attachment