problem indexing GPS metadata for video upload

classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|

problem indexing GPS metadata for video upload

whisere
uploading video to solr via tika
https://lucene.apache.org/solr/guide/7_7/uploading-data-with-solr-cell-using-apache-tika.html
The index has no video GPS metadata which is extracted and indexed for
images such as jpeg. I have checked both MP4 and MOV files, the files I
checked all have GPS Exif data embedded in the same fields as image. Any
idea? Thanks!
Reply | Threaded
Open this post in threaded view
|

Re: problem indexing GPS metadata for video upload

Alexandre Rafalovitch
What happens when you run it against a standalone Tika (recommended option
anyway)? Do you see the relevant fields?

Not every Tika field is captured, that is configured in solrconfig.xml. So
if Tika extracts them, next step is to check the mapping.

Regards,
     Alex

On Wed, May 1, 2019, 5:38 AM Where is Where, <[hidden email]> wrote:

> uploading video to solr via tika
>
> https://lucene.apache.org/solr/guide/7_7/uploading-data-with-solr-cell-using-apache-tika.html
> The index has no video GPS metadata which is extracted and indexed for
> images such as jpeg. I have checked both MP4 and MOV files, the files I
> checked all have GPS Exif data embedded in the same fields as image. Any
> idea? Thanks!
>
Reply | Threaded
Open this post in threaded view
|

Re: problem indexing GPS metadata for video upload

Tim Allison
Related?

https://issues.apache.org/jira/plugins/servlet/mobile#issue/TIKA-2861


On Wed, May 1, 2019 at 8:09 AM Alexandre Rafalovitch <[hidden email]>
wrote:

> What happens when you run it against a standalone Tika (recommended option
> anyway)? Do you see the relevant fields?
>
> Not every Tika field is captured, that is configured in solrconfig.xml. So
> if Tika extracts them, next step is to check the mapping.
>
> Regards,
>      Alex
>
> On Wed, May 1, 2019, 5:38 AM Where is Where, <[hidden email]> wrote:
>
> > uploading video to solr via tika
> >
> >
> https://lucene.apache.org/solr/guide/7_7/uploading-data-with-solr-cell-using-apache-tika.html
> > The index has no video GPS metadata which is extracted and indexed for
> > images such as jpeg. I have checked both MP4 and MOV files, the files I
> > checked all have GPS Exif data embedded in the same fields as image. Any
> > idea? Thanks!
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: problem indexing GPS metadata for video upload

whisere
In reply to this post by whisere
Thank you Alex and Tim.
I have looked at the solrconfig.xml file (I am trying the techproducts demo
config), the only related place I can find is the extract handle

<requestHandler name="/update/extract"
                  startup="lazy"
                  class="solr.extraction.ExtractingRequestHandler" >
    <lst name="defaults">
      <str name="lowernames">true</str>
      <!--<str name="uprefix">ignored_</str>-->

      <!-- capture link hrefs but ignore div attributes -->
      <str name="captureAttr">true</str>
      <str name="fmap.a">links</str>
      <str name="fmap.div">ignored_</str>
    </lst>
  </requestHandler>

I am using this command bin/post -c techproducts example/exampledocs/1.mp4
-params "literal.id=mp4_1&uprefix=attr_"

I have tried commenting out <str name="uprefix">ignored_</str> and changing
to <str name="fmap.div">div</str>
but still not working. I don't quite get why image is getting gps etc
metadata but video is acting differently while it is using the same
solrconfig and the gps metadata are in the same fields. There is no
differentiation in solrconfig setting between image and video.

Tim yes this is related to the TIKA link. Thank you!

Here is the output in solr for mp4.

{
        "attr_meta":["stream_size",
          "5721559",
          "date",
          "2019-03-29T04:36:39Z",
          "X-Parsed-By",
          "org.apache.tika.parser.DefaultParser",
          "X-Parsed-By",
          "org.apache.tika.parser.mp4.MP4Parser",
          "stream_content_type",
          "application/octet-stream",
          "meta:creation-date",
          "2019-03-29T04:36:39Z",
          "Creation-Date",
          "2019-03-29T04:36:39Z",
          "tiff:ImageLength",
          "1080",
          "resourceName",
          "/Volumes/Data/inData/App/solr/example/exampledocs/1.mp4",
          "dcterms:created",
          "2019-03-29T04:36:39Z",
          "dcterms:modified",
          "2019-03-29T04:36:39Z",
          "Last-Modified",
          "2019-03-29T04:36:39Z",
          "Last-Save-Date",
          "2019-03-29T04:36:39Z",
          "xmpDM:audioSampleRate",
          "1000",
          "meta:save-date",
          "2019-03-29T04:36:39Z",
          "modified",
          "2019-03-29T04:36:39Z",
          "tiff:ImageWidth",
          "1920",
          "xmpDM:duration",
          "2.64",
          "Content-Type",
          "video/mp4"],
        "id":"mp4_4",
        "attr_stream_size":["5721559"],
        "attr_date":["2019-03-29T04:36:39Z"],
        "attr_x_parsed_by":["org.apache.tika.parser.DefaultParser",
          "org.apache.tika.parser.mp4.MP4Parser"],
        "attr_stream_content_type":["application/octet-stream"],
        "attr_meta_creation_date":["2019-03-29T04:36:39Z"],
        "attr_creation_date":["2019-03-29T04:36:39Z"],
        "attr_tiff_imagelength":["1080"],
        "resourcename":"/Volumes/Data/inData/App/solr/example/exampledocs/1.mp4",
        "attr_dcterms_created":["2019-03-29T04:36:39Z"],
        "attr_dcterms_modified":["2019-03-29T04:36:39Z"],
        "last_modified":"2019-03-29T04:36:39Z",
        "attr_last_save_date":["2019-03-29T04:36:39Z"],
        "attr_xmpdm_audiosamplerate":["1000"],
        "attr_meta_save_date":["2019-03-29T04:36:39Z"],
        "attr_modified":["2019-03-29T04:36:39Z"],
        "attr_tiff_imagewidth":["1920"],
        "attr_xmpdm_duration":["2.64"],
        "content_type":["video/mp4"],
        "content":[" \n \n  \n  \n  \n  \n  \n  \n  \n  \n  \n  \n  \n
 \n  \n  \n  \n  \n  \n  \n  \n  \n \n   "],
        "_version_":1632383499325407232}]
  }}

JPEG is getting these:
"attr_meta":[....
"GPS Latitude",
          "37° 47' 41.99\"",
....
"attr_gps_latitude":["37° 47' 41.99\""],


On Wed, May 1, 2019 at 2:57 PM Where is Where <[hidden email]> wrote:

> uploading video to solr via tika
> https://lucene.apache.org/solr/guide/7_7/uploading-data-with-solr-cell-using-apache-tika.html
> The index has no video GPS metadata which is extracted and indexed for
> images such as jpeg. I have checked both MP4 and MOV files, the files I
> checked all have GPS Exif data embedded in the same fields as image. Any
> idea? Thanks!
>
Reply | Threaded
Open this post in threaded view
|

Re: problem indexing GPS metadata for video upload

Tim Allison
I just pushed a fix for TIKA-2861.  If you can either build locally or
wait a few hours for Jenkins to build #182, let me know if that works
with straight tika-app.jar.

On Thu, May 2, 2019 at 5:00 AM Where is Where <[hidden email]> wrote:

>
> Thank you Alex and Tim.
> I have looked at the solrconfig.xml file (I am trying the techproducts demo
> config), the only related place I can find is the extract handle
>
> <requestHandler name="/update/extract"
>                   startup="lazy"
>                   class="solr.extraction.ExtractingRequestHandler" >
>     <lst name="defaults">
>       <str name="lowernames">true</str>
>       <!--<str name="uprefix">ignored_</str>-->
>
>       <!-- capture link hrefs but ignore div attributes -->
>       <str name="captureAttr">true</str>
>       <str name="fmap.a">links</str>
>       <str name="fmap.div">ignored_</str>
>     </lst>
>   </requestHandler>
>
> I am using this command bin/post -c techproducts example/exampledocs/1.mp4
> -params "literal.id=mp4_1&uprefix=attr_"
>
> I have tried commenting out <str name="uprefix">ignored_</str> and changing
> to <str name="fmap.div">div</str>
> but still not working. I don't quite get why image is getting gps etc
> metadata but video is acting differently while it is using the same
> solrconfig and the gps metadata are in the same fields. There is no
> differentiation in solrconfig setting between image and video.
>
> Tim yes this is related to the TIKA link. Thank you!
>
> Here is the output in solr for mp4.
>
> {
>         "attr_meta":["stream_size",
>           "5721559",
>           "date",
>           "2019-03-29T04:36:39Z",
>           "X-Parsed-By",
>           "org.apache.tika.parser.DefaultParser",
>           "X-Parsed-By",
>           "org.apache.tika.parser.mp4.MP4Parser",
>           "stream_content_type",
>           "application/octet-stream",
>           "meta:creation-date",
>           "2019-03-29T04:36:39Z",
>           "Creation-Date",
>           "2019-03-29T04:36:39Z",
>           "tiff:ImageLength",
>           "1080",
>           "resourceName",
>           "/Volumes/Data/inData/App/solr/example/exampledocs/1.mp4",
>           "dcterms:created",
>           "2019-03-29T04:36:39Z",
>           "dcterms:modified",
>           "2019-03-29T04:36:39Z",
>           "Last-Modified",
>           "2019-03-29T04:36:39Z",
>           "Last-Save-Date",
>           "2019-03-29T04:36:39Z",
>           "xmpDM:audioSampleRate",
>           "1000",
>           "meta:save-date",
>           "2019-03-29T04:36:39Z",
>           "modified",
>           "2019-03-29T04:36:39Z",
>           "tiff:ImageWidth",
>           "1920",
>           "xmpDM:duration",
>           "2.64",
>           "Content-Type",
>           "video/mp4"],
>         "id":"mp4_4",
>         "attr_stream_size":["5721559"],
>         "attr_date":["2019-03-29T04:36:39Z"],
>         "attr_x_parsed_by":["org.apache.tika.parser.DefaultParser",
>           "org.apache.tika.parser.mp4.MP4Parser"],
>         "attr_stream_content_type":["application/octet-stream"],
>         "attr_meta_creation_date":["2019-03-29T04:36:39Z"],
>         "attr_creation_date":["2019-03-29T04:36:39Z"],
>         "attr_tiff_imagelength":["1080"],
>         "resourcename":"/Volumes/Data/inData/App/solr/example/exampledocs/1.mp4",
>         "attr_dcterms_created":["2019-03-29T04:36:39Z"],
>         "attr_dcterms_modified":["2019-03-29T04:36:39Z"],
>         "last_modified":"2019-03-29T04:36:39Z",
>         "attr_last_save_date":["2019-03-29T04:36:39Z"],
>         "attr_xmpdm_audiosamplerate":["1000"],
>         "attr_meta_save_date":["2019-03-29T04:36:39Z"],
>         "attr_modified":["2019-03-29T04:36:39Z"],
>         "attr_tiff_imagewidth":["1920"],
>         "attr_xmpdm_duration":["2.64"],
>         "content_type":["video/mp4"],
>         "content":[" \n \n  \n  \n  \n  \n  \n  \n  \n  \n  \n  \n  \n
>  \n  \n  \n  \n  \n  \n  \n  \n  \n \n   "],
>         "_version_":1632383499325407232}]
>   }}
>
> JPEG is getting these:
> "attr_meta":[....
> "GPS Latitude",
>           "37° 47' 41.99\"",
> ....
> "attr_gps_latitude":["37° 47' 41.99\""],
>
>
> On Wed, May 1, 2019 at 2:57 PM Where is Where <[hidden email]> wrote:
>
> > uploading video to solr via tika
> > https://lucene.apache.org/solr/guide/7_7/uploading-data-with-solr-cell-using-apache-tika.html
> > The index has no video GPS metadata which is extracted and indexed for
> > images such as jpeg. I have checked both MP4 and MOV files, the files I
> > checked all have GPS Exif data embedded in the same fields as image. Any
> > idea? Thanks!
> >
Reply | Threaded
Open this post in threaded view
|

Re: problem indexing GPS metadata for video upload

Tim Allison
Sorry build #182: https://builds.apache.org/job/tika-branch-1x/

On Thu, May 2, 2019 at 12:01 PM Tim Allison <[hidden email]> wrote:

>
> I just pushed a fix for TIKA-2861.  If you can either build locally or
> wait a few hours for Jenkins to build #182, let me know if that works
> with straight tika-app.jar.
>
> On Thu, May 2, 2019 at 5:00 AM Where is Where <[hidden email]> wrote:
> >
> > Thank you Alex and Tim.
> > I have looked at the solrconfig.xml file (I am trying the techproducts demo
> > config), the only related place I can find is the extract handle
> >
> > <requestHandler name="/update/extract"
> >                   startup="lazy"
> >                   class="solr.extraction.ExtractingRequestHandler" >
> >     <lst name="defaults">
> >       <str name="lowernames">true</str>
> >       <!--<str name="uprefix">ignored_</str>-->
> >
> >       <!-- capture link hrefs but ignore div attributes -->
> >       <str name="captureAttr">true</str>
> >       <str name="fmap.a">links</str>
> >       <str name="fmap.div">ignored_</str>
> >     </lst>
> >   </requestHandler>
> >
> > I am using this command bin/post -c techproducts example/exampledocs/1.mp4
> > -params "literal.id=mp4_1&uprefix=attr_"
> >
> > I have tried commenting out <str name="uprefix">ignored_</str> and changing
> > to <str name="fmap.div">div</str>
> > but still not working. I don't quite get why image is getting gps etc
> > metadata but video is acting differently while it is using the same
> > solrconfig and the gps metadata are in the same fields. There is no
> > differentiation in solrconfig setting between image and video.
> >
> > Tim yes this is related to the TIKA link. Thank you!
> >
> > Here is the output in solr for mp4.
> >
> > {
> >         "attr_meta":["stream_size",
> >           "5721559",
> >           "date",
> >           "2019-03-29T04:36:39Z",
> >           "X-Parsed-By",
> >           "org.apache.tika.parser.DefaultParser",
> >           "X-Parsed-By",
> >           "org.apache.tika.parser.mp4.MP4Parser",
> >           "stream_content_type",
> >           "application/octet-stream",
> >           "meta:creation-date",
> >           "2019-03-29T04:36:39Z",
> >           "Creation-Date",
> >           "2019-03-29T04:36:39Z",
> >           "tiff:ImageLength",
> >           "1080",
> >           "resourceName",
> >           "/Volumes/Data/inData/App/solr/example/exampledocs/1.mp4",
> >           "dcterms:created",
> >           "2019-03-29T04:36:39Z",
> >           "dcterms:modified",
> >           "2019-03-29T04:36:39Z",
> >           "Last-Modified",
> >           "2019-03-29T04:36:39Z",
> >           "Last-Save-Date",
> >           "2019-03-29T04:36:39Z",
> >           "xmpDM:audioSampleRate",
> >           "1000",
> >           "meta:save-date",
> >           "2019-03-29T04:36:39Z",
> >           "modified",
> >           "2019-03-29T04:36:39Z",
> >           "tiff:ImageWidth",
> >           "1920",
> >           "xmpDM:duration",
> >           "2.64",
> >           "Content-Type",
> >           "video/mp4"],
> >         "id":"mp4_4",
> >         "attr_stream_size":["5721559"],
> >         "attr_date":["2019-03-29T04:36:39Z"],
> >         "attr_x_parsed_by":["org.apache.tika.parser.DefaultParser",
> >           "org.apache.tika.parser.mp4.MP4Parser"],
> >         "attr_stream_content_type":["application/octet-stream"],
> >         "attr_meta_creation_date":["2019-03-29T04:36:39Z"],
> >         "attr_creation_date":["2019-03-29T04:36:39Z"],
> >         "attr_tiff_imagelength":["1080"],
> >         "resourcename":"/Volumes/Data/inData/App/solr/example/exampledocs/1.mp4",
> >         "attr_dcterms_created":["2019-03-29T04:36:39Z"],
> >         "attr_dcterms_modified":["2019-03-29T04:36:39Z"],
> >         "last_modified":"2019-03-29T04:36:39Z",
> >         "attr_last_save_date":["2019-03-29T04:36:39Z"],
> >         "attr_xmpdm_audiosamplerate":["1000"],
> >         "attr_meta_save_date":["2019-03-29T04:36:39Z"],
> >         "attr_modified":["2019-03-29T04:36:39Z"],
> >         "attr_tiff_imagewidth":["1920"],
> >         "attr_xmpdm_duration":["2.64"],
> >         "content_type":["video/mp4"],
> >         "content":[" \n \n  \n  \n  \n  \n  \n  \n  \n  \n  \n  \n  \n
> >  \n  \n  \n  \n  \n  \n  \n  \n  \n \n   "],
> >         "_version_":1632383499325407232}]
> >   }}
> >
> > JPEG is getting these:
> > "attr_meta":[....
> > "GPS Latitude",
> >           "37° 47' 41.99\"",
> > ....
> > "attr_gps_latitude":["37° 47' 41.99\""],
> >
> >
> > On Wed, May 1, 2019 at 2:57 PM Where is Where <[hidden email]> wrote:
> >
> > > uploading video to solr via tika
> > > https://lucene.apache.org/solr/guide/7_7/uploading-data-with-solr-cell-using-apache-tika.html
> > > The index has no video GPS metadata which is extracted and indexed for
> > > images such as jpeg. I have checked both MP4 and MOV files, the files I
> > > checked all have GPS Exif data embedded in the same fields as image. Any
> > > idea? Thanks!
> > >
Reply | Threaded
Open this post in threaded view
|

Re: problem indexing GPS metadata for video upload

whisere
In reply to this post by whisere
Thank you very much Tim, I wonder how to make the Tika change apply to
Solr? I saw Tika core, parse and xml jar files tika-core.jar
tika-parsers.jar tika-xml.jar in solr contrib/extraction/lib folder. Do we
just  replace these files? Thanks!

On Thu, May 2, 2019 at 12:16 PM Where is Where <[hidden email]> wrote:

> Thank you Alex and Tim.
> I have looked at the solrconfig.xml file (I am trying the techproducts
> demo config), the only related place I can find is the extract handle
>
> <requestHandler name="/update/extract"
>                   startup="lazy"
>                   class="solr.extraction.ExtractingRequestHandler" >
>     <lst name="defaults">
>       <str name="lowernames">true</str>
>       <!--<str name="uprefix">ignored_</str>-->
>
>       <!-- capture link hrefs but ignore div attributes -->
>       <str name="captureAttr">true</str>
>       <str name="fmap.a">links</str>
>       <str name="fmap.div">ignored_</str>
>     </lst>
>   </requestHandler>
>
> I am using this command bin/post -c techproducts
> example/exampledocs/1.mp4 -params "literal.id=mp4_1&uprefix=attr_"
>
> I have tried commenting out <str name="uprefix">ignored_</str> and
> changing to <str name="fmap.div">div</str>
> but still not working. I don't quite get why image is getting gps etc
> metadata but video is acting differently while it is using the same
> solrconfig and the gps metadata are in the same fields. There is no
> differentiation in solrconfig setting between image and video.
>
> Tim yes this is related to the TIKA link. Thank you!
>
> Here is the output in solr for mp4.
>
> {
>         "attr_meta":["stream_size",
>           "5721559",
>           "date",
>           "2019-03-29T04:36:39Z",
>           "X-Parsed-By",
>           "org.apache.tika.parser.DefaultParser",
>           "X-Parsed-By",
>           "org.apache.tika.parser.mp4.MP4Parser",
>           "stream_content_type",
>           "application/octet-stream",
>           "meta:creation-date",
>           "2019-03-29T04:36:39Z",
>           "Creation-Date",
>           "2019-03-29T04:36:39Z",
>           "tiff:ImageLength",
>           "1080",
>           "resourceName",
>           "/Volumes/Data/inData/App/solr/example/exampledocs/1.mp4",
>           "dcterms:created",
>           "2019-03-29T04:36:39Z",
>           "dcterms:modified",
>           "2019-03-29T04:36:39Z",
>           "Last-Modified",
>           "2019-03-29T04:36:39Z",
>           "Last-Save-Date",
>           "2019-03-29T04:36:39Z",
>           "xmpDM:audioSampleRate",
>           "1000",
>           "meta:save-date",
>           "2019-03-29T04:36:39Z",
>           "modified",
>           "2019-03-29T04:36:39Z",
>           "tiff:ImageWidth",
>           "1920",
>           "xmpDM:duration",
>           "2.64",
>           "Content-Type",
>           "video/mp4"],
>         "id":"mp4_4",
>         "attr_stream_size":["5721559"],
>         "attr_date":["2019-03-29T04:36:39Z"],
>         "attr_x_parsed_by":["org.apache.tika.parser.DefaultParser",
>           "org.apache.tika.parser.mp4.MP4Parser"],
>         "attr_stream_content_type":["application/octet-stream"],
>         "attr_meta_creation_date":["2019-03-29T04:36:39Z"],
>         "attr_creation_date":["2019-03-29T04:36:39Z"],
>         "attr_tiff_imagelength":["1080"],
>         "resourcename":"/Volumes/Data/inData/App/solr/example/exampledocs/1.mp4",
>         "attr_dcterms_created":["2019-03-29T04:36:39Z"],
>         "attr_dcterms_modified":["2019-03-29T04:36:39Z"],
>         "last_modified":"2019-03-29T04:36:39Z",
>         "attr_last_save_date":["2019-03-29T04:36:39Z"],
>         "attr_xmpdm_audiosamplerate":["1000"],
>         "attr_meta_save_date":["2019-03-29T04:36:39Z"],
>         "attr_modified":["2019-03-29T04:36:39Z"],
>         "attr_tiff_imagewidth":["1920"],
>         "attr_xmpdm_duration":["2.64"],
>         "content_type":["video/mp4"],
>         "content":[" \n \n  \n  \n  \n  \n  \n  \n  \n  \n  \n  \n  \n  \n  \n  \n  \n  \n  \n  \n  \n  \n \n   "],
>         "_version_":1632383499325407232}]
>   }}
>
> JPEG is getting these:
> "attr_meta":[....
> "GPS Latitude",
>           "37° 47' 41.99\"",
> ....
> "attr_gps_latitude":["37° 47' 41.99\""],
>
>
> On Wed, May 1, 2019 at 2:57 PM Where is Where <[hidden email]> wrote:
>
>> uploading video to solr via tika
>> https://lucene.apache.org/solr/guide/7_7/uploading-data-with-solr-cell-using-apache-tika.html
>> The index has no video GPS metadata which is extracted and indexed for
>> images such as jpeg. I have checked both MP4 and MOV files, the files I
>> checked all have GPS Exif data embedded in the same fields as image. Any
>> idea? Thanks!
>>
>
Reply | Threaded
Open this post in threaded view
|

Re: problem indexing GPS metadata for video upload

Tim Allison
Unfortunately, It Depends(TM)*...these are the steps I take:
https://wiki.apache.org/tika/UpgradingTikaInSolr

There can be version conflicts and other awful, unforeseen things if
you don't get it right.

We're on the cusp of the release for 1.21 (I mean it this time)...I'll
upgrade Solr as soon as Tika is out (I also mean it this time).


*TM by Erick Erickson

On Fri, May 3, 2019 at 3:44 AM Where is Where <[hidden email]> wrote:

>
> Thank you very much Tim, I wonder how to make the Tika change apply to
> Solr? I saw Tika core, parse and xml jar files tika-core.jar
> tika-parsers.jar tika-xml.jar in solr contrib/extraction/lib folder. Do we
> just  replace these files? Thanks!
>
> On Thu, May 2, 2019 at 12:16 PM Where is Where <[hidden email]> wrote:
>
> > Thank you Alex and Tim.
> > I have looked at the solrconfig.xml file (I am trying the techproducts
> > demo config), the only related place I can find is the extract handle
> >
> > <requestHandler name="/update/extract"
> >                   startup="lazy"
> >                   class="solr.extraction.ExtractingRequestHandler" >
> >     <lst name="defaults">
> >       <str name="lowernames">true</str>
> >       <!--<str name="uprefix">ignored_</str>-->
> >
> >       <!-- capture link hrefs but ignore div attributes -->
> >       <str name="captureAttr">true</str>
> >       <str name="fmap.a">links</str>
> >       <str name="fmap.div">ignored_</str>
> >     </lst>
> >   </requestHandler>
> >
> > I am using this command bin/post -c techproducts
> > example/exampledocs/1.mp4 -params "literal.id=mp4_1&uprefix=attr_"
> >
> > I have tried commenting out <str name="uprefix">ignored_</str> and
> > changing to <str name="fmap.div">div</str>
> > but still not working. I don't quite get why image is getting gps etc
> > metadata but video is acting differently while it is using the same
> > solrconfig and the gps metadata are in the same fields. There is no
> > differentiation in solrconfig setting between image and video.
> >
> > Tim yes this is related to the TIKA link. Thank you!
> >
> > Here is the output in solr for mp4.
> >
> > {
> >         "attr_meta":["stream_size",
> >           "5721559",
> >           "date",
> >           "2019-03-29T04:36:39Z",
> >           "X-Parsed-By",
> >           "org.apache.tika.parser.DefaultParser",
> >           "X-Parsed-By",
> >           "org.apache.tika.parser.mp4.MP4Parser",
> >           "stream_content_type",
> >           "application/octet-stream",
> >           "meta:creation-date",
> >           "2019-03-29T04:36:39Z",
> >           "Creation-Date",
> >           "2019-03-29T04:36:39Z",
> >           "tiff:ImageLength",
> >           "1080",
> >           "resourceName",
> >           "/Volumes/Data/inData/App/solr/example/exampledocs/1.mp4",
> >           "dcterms:created",
> >           "2019-03-29T04:36:39Z",
> >           "dcterms:modified",
> >           "2019-03-29T04:36:39Z",
> >           "Last-Modified",
> >           "2019-03-29T04:36:39Z",
> >           "Last-Save-Date",
> >           "2019-03-29T04:36:39Z",
> >           "xmpDM:audioSampleRate",
> >           "1000",
> >           "meta:save-date",
> >           "2019-03-29T04:36:39Z",
> >           "modified",
> >           "2019-03-29T04:36:39Z",
> >           "tiff:ImageWidth",
> >           "1920",
> >           "xmpDM:duration",
> >           "2.64",
> >           "Content-Type",
> >           "video/mp4"],
> >         "id":"mp4_4",
> >         "attr_stream_size":["5721559"],
> >         "attr_date":["2019-03-29T04:36:39Z"],
> >         "attr_x_parsed_by":["org.apache.tika.parser.DefaultParser",
> >           "org.apache.tika.parser.mp4.MP4Parser"],
> >         "attr_stream_content_type":["application/octet-stream"],
> >         "attr_meta_creation_date":["2019-03-29T04:36:39Z"],
> >         "attr_creation_date":["2019-03-29T04:36:39Z"],
> >         "attr_tiff_imagelength":["1080"],
> >         "resourcename":"/Volumes/Data/inData/App/solr/example/exampledocs/1.mp4",
> >         "attr_dcterms_created":["2019-03-29T04:36:39Z"],
> >         "attr_dcterms_modified":["2019-03-29T04:36:39Z"],
> >         "last_modified":"2019-03-29T04:36:39Z",
> >         "attr_last_save_date":["2019-03-29T04:36:39Z"],
> >         "attr_xmpdm_audiosamplerate":["1000"],
> >         "attr_meta_save_date":["2019-03-29T04:36:39Z"],
> >         "attr_modified":["2019-03-29T04:36:39Z"],
> >         "attr_tiff_imagewidth":["1920"],
> >         "attr_xmpdm_duration":["2.64"],
> >         "content_type":["video/mp4"],
> >         "content":[" \n \n  \n  \n  \n  \n  \n  \n  \n  \n  \n  \n  \n  \n  \n  \n  \n  \n  \n  \n  \n  \n \n   "],
> >         "_version_":1632383499325407232}]
> >   }}
> >
> > JPEG is getting these:
> > "attr_meta":[....
> > "GPS Latitude",
> >           "37° 47' 41.99\"",
> > ....
> > "attr_gps_latitude":["37° 47' 41.99\""],
> >
> >
> > On Wed, May 1, 2019 at 2:57 PM Where is Where <[hidden email]> wrote:
> >
> >> uploading video to solr via tika
> >> https://lucene.apache.org/solr/guide/7_7/uploading-data-with-solr-cell-using-apache-tika.html
> >> The index has no video GPS metadata which is extracted and indexed for
> >> images such as jpeg. I have checked both MP4 and MOV files, the files I
> >> checked all have GPS Exif data embedded in the same fields as image. Any
> >> idea? Thanks!
> >>
> >
Reply | Threaded
Open this post in threaded view
|

Re: problem indexing GPS metadata for video upload

whisere
In reply to this post by whisere
Sorry Tim! I missed your last message about this issue! Thank you very much
for the information.
Is the latest 1.21 Tika Incorporated with the change already? and how about
solr?

Thanks!

On Fri, May 3, 2019 at 11:28 AM Where is Where <[hidden email]> wrote:

> Thank you very much Tim, I wonder how to make the Tika change apply to
> Solr? I saw Tika core, parse and xml jar files tika-core.jar
> tika-parsers.jar tika-xml.jar in solr contrib/extraction/lib folder. Do we
> just  replace these files? Thanks!
>
> On Thu, May 2, 2019 at 12:16 PM Where is Where <[hidden email]> wrote:
>
>> Thank you Alex and Tim.
>> I have looked at the solrconfig.xml file (I am trying the techproducts
>> demo config), the only related place I can find is the extract handle
>>
>> <requestHandler name="/update/extract"
>>                   startup="lazy"
>>                   class="solr.extraction.ExtractingRequestHandler" >
>>     <lst name="defaults">
>>       <str name="lowernames">true</str>
>>       <!--<str name="uprefix">ignored_</str>-->
>>
>>       <!-- capture link hrefs but ignore div attributes -->
>>       <str name="captureAttr">true</str>
>>       <str name="fmap.a">links</str>
>>       <str name="fmap.div">ignored_</str>
>>     </lst>
>>   </requestHandler>
>>
>> I am using this command bin/post -c techproducts
>> example/exampledocs/1.mp4 -params "literal.id=mp4_1&uprefix=attr_"
>>
>> I have tried commenting out <str name="uprefix">ignored_</str> and
>> changing to <str name="fmap.div">div</str>
>> but still not working. I don't quite get why image is getting gps etc
>> metadata but video is acting differently while it is using the same
>> solrconfig and the gps metadata are in the same fields. There is no
>> differentiation in solrconfig setting between image and video.
>>
>> Tim yes this is related to the TIKA link. Thank you!
>>
>> Here is the output in solr for mp4.
>>
>> {
>>         "attr_meta":["stream_size",
>>           "5721559",
>>           "date",
>>           "2019-03-29T04:36:39Z",
>>           "X-Parsed-By",
>>           "org.apache.tika.parser.DefaultParser",
>>           "X-Parsed-By",
>>           "org.apache.tika.parser.mp4.MP4Parser",
>>           "stream_content_type",
>>           "application/octet-stream",
>>           "meta:creation-date",
>>           "2019-03-29T04:36:39Z",
>>           "Creation-Date",
>>           "2019-03-29T04:36:39Z",
>>           "tiff:ImageLength",
>>           "1080",
>>           "resourceName",
>>           "/Volumes/Data/inData/App/solr/example/exampledocs/1.mp4",
>>           "dcterms:created",
>>           "2019-03-29T04:36:39Z",
>>           "dcterms:modified",
>>           "2019-03-29T04:36:39Z",
>>           "Last-Modified",
>>           "2019-03-29T04:36:39Z",
>>           "Last-Save-Date",
>>           "2019-03-29T04:36:39Z",
>>           "xmpDM:audioSampleRate",
>>           "1000",
>>           "meta:save-date",
>>           "2019-03-29T04:36:39Z",
>>           "modified",
>>           "2019-03-29T04:36:39Z",
>>           "tiff:ImageWidth",
>>           "1920",
>>           "xmpDM:duration",
>>           "2.64",
>>           "Content-Type",
>>           "video/mp4"],
>>         "id":"mp4_4",
>>         "attr_stream_size":["5721559"],
>>         "attr_date":["2019-03-29T04:36:39Z"],
>>         "attr_x_parsed_by":["org.apache.tika.parser.DefaultParser",
>>           "org.apache.tika.parser.mp4.MP4Parser"],
>>         "attr_stream_content_type":["application/octet-stream"],
>>         "attr_meta_creation_date":["2019-03-29T04:36:39Z"],
>>         "attr_creation_date":["2019-03-29T04:36:39Z"],
>>         "attr_tiff_imagelength":["1080"],
>>         "resourcename":"/Volumes/Data/inData/App/solr/example/exampledocs/1.mp4",
>>         "attr_dcterms_created":["2019-03-29T04:36:39Z"],
>>         "attr_dcterms_modified":["2019-03-29T04:36:39Z"],
>>         "last_modified":"2019-03-29T04:36:39Z",
>>         "attr_last_save_date":["2019-03-29T04:36:39Z"],
>>         "attr_xmpdm_audiosamplerate":["1000"],
>>         "attr_meta_save_date":["2019-03-29T04:36:39Z"],
>>         "attr_modified":["2019-03-29T04:36:39Z"],
>>         "attr_tiff_imagewidth":["1920"],
>>         "attr_xmpdm_duration":["2.64"],
>>         "content_type":["video/mp4"],
>>         "content":[" \n \n  \n  \n  \n  \n  \n  \n  \n  \n  \n  \n  \n  \n  \n  \n  \n  \n  \n  \n  \n  \n \n   "],
>>         "_version_":1632383499325407232}]
>>   }}
>>
>> JPEG is getting these:
>> "attr_meta":[....
>> "GPS Latitude",
>>           "37° 47' 41.99\"",
>> ....
>> "attr_gps_latitude":["37° 47' 41.99\""],
>>
>>
>> On Wed, May 1, 2019 at 2:57 PM Where is Where <[hidden email]> wrote:
>>
>>> uploading video to solr via tika
>>> https://lucene.apache.org/solr/guide/7_7/uploading-data-with-solr-cell-using-apache-tika.html
>>> The index has no video GPS metadata which is extracted and indexed for
>>> images such as jpeg. I have checked both MP4 and MOV files, the files I
>>> checked all have GPS Exif data embedded in the same fields as image. Any
>>> idea? Thanks!
>>>
>>