Parser test resources

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Parser test resources

Tyler Palsulich
Hi Folks,

This has irked me for a while -- do we need all of the tika-parser test
resources in a flat directory? Can we convert them to standard package
resource paths? Or, do enough parsers have overlapping test resource
dependencies where it makes sense to have them _all_ under one directory?

It would be nice to easily know which files are used for which tests.

Tyler
Reply | Threaded
Open this post in threaded view
|

RE: Parser test resources

Allison, Timothy B.
Hi Tyler,
 
This has started to irk me as well, a bit.  I don't think there's much overlap, although there is some.  I think navigating standard package resource paths might be cumbersome even with a good IDE... perhaps start with high-level subdirectories as chm is now doing?

-----Original Message-----
From: Tyler Palsulich [mailto:[hidden email]]
Sent: Tuesday, March 10, 2015 9:54 PM
To: [hidden email]
Subject: Parser test resources

Hi Folks,

This has irked me for a while -- do we need all of the tika-parser test
resources in a flat directory? Can we convert them to standard package
resource paths? Or, do enough parsers have overlapping test resource
dependencies where it makes sense to have them _all_ under one directory?

It would be nice to easily know which files are used for which tests.

Tyler
Reply | Threaded
Open this post in threaded view
|

Re: Parser test resources

Nick Burch-2
In reply to this post by Tyler Palsulich
On Tue, 10 Mar 2015, Tyler Palsulich wrote:
> Or, do enough parsers have overlapping test resource dependencies where
> it makes sense to have them _all_ under one directory?

I believe that most of the test files get used for both detection and
parsing unit tests

> It would be nice to easily know which files are used for which tests.

5 lines of perl should give you that, or fewer if you don't want to be
able to understand the perl... ;-)

Many, but not all of the test files are of the form test<filetype>.<ext>
or test<filetype>_<special type/description>.<ext>, which I find makes it
fairly easy to spot what files go with what. Not all though. Would fixing
the few files not in that format help, or hinder do you think?

Nick
Reply | Threaded
Open this post in threaded view
|

Re: Parser test resources

Tyler Palsulich
Good points. Maybe it's a good idea to keep the new files organized, like
chm, but leave the old ones where they are? The test-documents directory
has 460 entries right now.

Tyler

On Wed, Mar 11, 2015 at 8:43 AM, Nick Burch <[hidden email]> wrote:

> On Tue, 10 Mar 2015, Tyler Palsulich wrote:
>
>> Or, do enough parsers have overlapping test resource dependencies where
>> it makes sense to have them _all_ under one directory?
>>
>
> I believe that most of the test files get used for both detection and
> parsing unit tests
>
>  It would be nice to easily know which files are used for which tests.
>>
>
> 5 lines of perl should give you that, or fewer if you don't want to be
> able to understand the perl... ;-)
>
> Many, but not all of the test files are of the form test<filetype>.<ext>
> or test<filetype>_<special type/description>.<ext>, which I find makes it
> fairly easy to spot what files go with what. Not all though. Would fixing
> the few files not in that format help, or hinder do you think?
>
> Nick
>