Testing an ingest framework that uses Apache Tika

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Testing an ingest framework that uses Apache Tika

Allison, Timothy B.
All,

I finally got around to documenting Apache Tika's MockParser[1].  As of Tika 1.15 (unreleased), add tika-core-tests.jar to your class path, and you can simulate:

1. Regular catchable exceptions
2. OOMs
3. Permanent hangs

This will allow you to determine if your ingest framework is robust against these issues.

As always, we fix Tika when we can, but if history is any indicator, you'll want to make sure your ingest code can handle these issues if you are handling millions/billions of files from the wild.

Cheers,

            Tim


[1] https://wiki.apache.org/tika/MockParser
Reply | Threaded
Open this post in threaded view
|

Re: Testing an ingest framework that uses Apache Tika

Konstantin Gribov
Tim,

it's a awesome feature for downstream projects' integration tests. Thanks
for implementing it!

чт, 16 февр. 2017 г. в 16:17, Allison, Timothy B. <[hidden email]>:

> All,
>
> I finally got around to documenting Apache Tika's MockParser[1].  As of
> Tika 1.15 (unreleased), add tika-core-tests.jar to your class path, and you
> can simulate:
>
> 1. Regular catchable exceptions
> 2. OOMs
> 3. Permanent hangs
>
> This will allow you to determine if your ingest framework is robust
> against these issues.
>
> As always, we fix Tika when we can, but if history is any indicator,
> you'll want to make sure your ingest code can handle these issues if you
> are handling millions/billions of files from the wild.
>
> Cheers,
>
>             Tim
>
>
> [1] https://wiki.apache.org/tika/MockParser
>
--

Best regards,
Konstantin Gribov
Reply | Threaded
Open this post in threaded view
|

Re: Testing an ingest framework that uses Apache Tika

Luís Filipe Nassif
Excellent, Tim! Thank you for all your great work on Apache Tika!

2017-02-16 11:23 GMT-02:00 Konstantin Gribov <[hidden email]>:

> Tim,
>
> it's a awesome feature for downstream projects' integration tests. Thanks
> for implementing it!
>
> чт, 16 февр. 2017 г. в 16:17, Allison, Timothy B. <[hidden email]>:
>
> > All,
> >
> > I finally got around to documenting Apache Tika's MockParser[1].  As of
> > Tika 1.15 (unreleased), add tika-core-tests.jar to your class path, and
> you
> > can simulate:
> >
> > 1. Regular catchable exceptions
> > 2. OOMs
> > 3. Permanent hangs
> >
> > This will allow you to determine if your ingest framework is robust
> > against these issues.
> >
> > As always, we fix Tika when we can, but if history is any indicator,
> > you'll want to make sure your ingest code can handle these issues if you
> > are handling millions/billions of files from the wild.
> >
> > Cheers,
> >
> >             Tim
> >
> >
> > [1] https://wiki.apache.org/tika/MockParser
> >
> --
>
> Best regards,
> Konstantin Gribov
>
Reply | Threaded
Open this post in threaded view
|

Re: Testing an ingest framework that uses Apache Tika

Mattmann, Chris A (3010)
++1 awesome job

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Principal Data Scientist, Engineering Administrative Office (3010)
Manager, NSF & Open Source Projects Formulation and Development Offices (8212)
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 180-503E, Mailstop: 180-503
Email: [hidden email]
WWW:  http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Director, Information Retrieval and Data Science Group (IRDS)
Adjunct Associate Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
WWW: http://irds.usc.edu/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 

On 2/16/17, 5:28 AM, "Luís Filipe Nassif" <[hidden email]> wrote:

    Excellent, Tim! Thank you for all your great work on Apache Tika!
   
    2017-02-16 11:23 GMT-02:00 Konstantin Gribov <[hidden email]>:
   
    > Tim,
    >
    > it's a awesome feature for downstream projects' integration tests. Thanks
    > for implementing it!
    >
    > чт, 16 февр. 2017 г. в 16:17, Allison, Timothy B. <[hidden email]>:
    >
    > > All,
    > >
    > > I finally got around to documenting Apache Tika's MockParser[1].  As of
    > > Tika 1.15 (unreleased), add tika-core-tests.jar to your class path, and
    > you
    > > can simulate:
    > >
    > > 1. Regular catchable exceptions
    > > 2. OOMs
    > > 3. Permanent hangs
    > >
    > > This will allow you to determine if your ingest framework is robust
    > > against these issues.
    > >
    > > As always, we fix Tika when we can, but if history is any indicator,
    > > you'll want to make sure your ingest code can handle these issues if you
    > > are handling millions/billions of files from the wild.
    > >
    > > Cheers,
    > >
    > >             Tim
    > >
    > >
    > > [1] https://wiki.apache.org/tika/MockParser
    > >
    > --
    >
    > Best regards,
    > Konstantin Gribov
    >
   

Reply | Threaded
Open this post in threaded view
|

RE: Testing an ingest framework that uses Apache Tika

Allison, Timothy B.
Thank you, Chris, Luís and Konstantin!



-----Original Message-----
From: Mattmann, Chris A (3010) [mailto:[hidden email]]
Sent: Thursday, February 16, 2017 10:18 AM
To: [hidden email]; [hidden email]
Cc: [hidden email]
Subject: Re: Testing an ingest framework that uses Apache Tika

++1 awesome job

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Principal Data Scientist, Engineering Administrative Office (3010) Manager, NSF & Open Source Projects Formulation and Development Offices (8212) NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 180-503E, Mailstop: 180-503
Email: [hidden email]
WWW:  http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Director, Information Retrieval and Data Science Group (IRDS) Adjunct Associate Professor, Computer Science Department University of Southern California, Los Angeles, CA 90089 USA
WWW: http://irds.usc.edu/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 

On 2/16/17, 5:28 AM, "Luís Filipe Nassif" <[hidden email]> wrote:

    Excellent, Tim! Thank you for all your great work on Apache Tika!
   
    2017-02-16 11:23 GMT-02:00 Konstantin Gribov <[hidden email]>:
   
    > Tim,
    >
    > it's a awesome feature for downstream projects' integration tests. Thanks
    > for implementing it!
    >
    > чт, 16 февр. 2017 г. в 16:17, Allison, Timothy B. <[hidden email]>:
    >
    > > All,
    > >
    > > I finally got around to documenting Apache Tika's MockParser[1].  As of
    > > Tika 1.15 (unreleased), add tika-core-tests.jar to your class path, and
    > you
    > > can simulate:
    > >
    > > 1. Regular catchable exceptions
    > > 2. OOMs
    > > 3. Permanent hangs
    > >
    > > This will allow you to determine if your ingest framework is robust
    > > against these issues.
    > >
    > > As always, we fix Tika when we can, but if history is any indicator,
    > > you'll want to make sure your ingest code can handle these issues if you
    > > are handling millions/billions of files from the wild.
    > >
    > > Cheers,
    > >
    > >             Tim
    > >
    > >
    > > [1] https://wiki.apache.org/tika/MockParser
    > >
    > --
    >
    > Best regards,
    > Konstantin Gribov
    >