solr tika extraction video creation date problem (hours ahead)

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

solr tika extraction video creation date problem (hours ahead)

whisere
Hello , I was following the instruction
https://lucene.apache.org/solr/guide/7_1/uploading-data-with-solr-cell-using-apache-tika.html
to upload files with metadata stored and indexed in solr. I was checking
the extracted creation date ( attr_meta_creation_date ), for image, jpg
etc, the creation dates are correct but all creation dates for video are 11
hours ahead of the actual creation date. (The dates are correct when viewed
in other applications) It causes problem with searching due to this
inconsistency. Any idea is much appreciated. Thanks!
Reply | Threaded
Open this post in threaded view
|

Re: solr tika extraction video creation date problem (hours ahead)

Alexandre Rafalovitch
Sounds like timezone normalization issue. Possibly at Tika stage.

Check what your SOLR_TIMEZONE variable set to. Not sure in which file.

Regards,
     Alex

On Thu, Apr 4, 2019, 12:50 AM Where is Where, <[hidden email]> wrote:

> Hello , I was following the instruction
>
> https://lucene.apache.org/solr/guide/7_1/uploading-data-with-solr-cell-using-apache-tika.html
> to upload files with metadata stored and indexed in solr. I was checking
> the extracted creation date ( attr_meta_creation_date ), for image, jpg
> etc, the creation dates are correct but all creation dates for video are 11
> hours ahead of the actual creation date. (The dates are correct when viewed
> in other applications) It causes problem with searching due to this
> inconsistency. Any idea is much appreciated. Thanks!
>
Reply | Threaded
Open this post in threaded view
|

Re: solr tika extraction video creation date problem (hours ahead)

whisere
Thanks Alex. The problem is image creation date is correct, but the video
creation date is wrong (hours behind), if I set the time_zone I think the
image creation date will be wrong then. wonder what the difference between
image and video extraction in tika.



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
Reply | Threaded
Open this post in threaded view
|

Re: solr tika extraction video creation date problem (hours ahead)

Alexandre Rafalovitch
Well, Tika would use different libraries to extract different formats.
So maybe there is a bug. I would just get a standalone tika (of
matching version to the one in Solr) and see what the output from two
sample files are. Then, I would check with the latest Tika, just in
case.

I would also use some non-Tika way to check what the dates are, just
in case the date is wrong during encoding rather than during indexing.
A low-probability chance, but just covering all the bases.

Regards,
   Alex.

On Fri, 5 Apr 2019 at 01:39, whisere <[hidden email]> wrote:

>
> Thanks Alex. The problem is image creation date is correct, but the video
> creation date is wrong (hours behind), if I set the time_zone I think the
> image creation date will be wrong then. wonder what the difference between
> image and video extraction in tika.
>
>
>
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
Reply | Threaded
Open this post in threaded view
|

Re: solr tika extraction video creation date problem (hours ahead)

whisere
Thank you very much Alex for the great suggestion.

On Fri, Apr 5, 2019 at 7:25 PM Alexandre Rafalovitch <[hidden email]>
wrote:

> Well, Tika would use different libraries to extract different formats.
> So maybe there is a bug. I would just get a standalone tika (of
> matching version to the one in Solr) and see what the output from two
> sample files are. Then, I would check with the latest Tika, just in
> case.
>
> I would also use some non-Tika way to check what the dates are, just
> in case the date is wrong during encoding rather than during indexing.
> A low-probability chance, but just covering all the bases.
>
> Regards,
>    Alex.
>
> On Fri, 5 Apr 2019 at 01:39, whisere <[hidden email]> wrote:
> >
> > Thanks Alex. The problem is image creation date is correct, but the video
> > creation date is wrong (hours behind), if I set the time_zone I think the
> > image creation date will be wrong then. wonder what the difference
> between
> > image and video extraction in tika.
> >
> >
> >
> > --
> > Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>