comparing Tika's file detect with other tools?

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

comparing Tika's file detect with other tools?

Allison, Timothy B.
Would it be frowned upon to compare Tika's file detection with other tools, like "file"?  Any concerns about effectively reverse engineering (when we find that Tika is wrong) from a non-Apache project?

Any other sensitivities I should be aware of?

Best,

              Tim
Reply | Threaded
Open this post in threaded view
|

RE: comparing Tika's file detect with other tools?

kkrugler
Hi Tim,

I don't believe there's any issue with comparing results.

If you were looking at the source for "file", then it gets more gray, but I think even that would be OK as long as you weren't copying code or directly re-implementing algorithms.

-- Ken

> From: Allison, Timothy B.
> Sent: April 22, 2015 5:47:17am PDT
> To: [hidden email]
> Subject: comparing Tika's file detect with other tools?
>
> Would it be frowned upon to compare Tika's file detection with other tools, like "file"?  Any concerns about effectively reverse engineering (when we find that Tika is wrong) from a non-Apache project?
>
> Any other sensitivities I should be aware of?
>
> Best,
>
>              Tim


--------------------------
Ken Krugler
+1 530-210-6378
http://www.scaleunlimited.com
custom big data solutions & training
Hadoop, Cascading, Cassandra & Solr





Reply | Threaded
Open this post in threaded view
|

Re: comparing Tika's file detect with other tools?

Tyler Palsulich
Hi Tim,

I do not know about if there would be licensing concerns. But, we do have
TIKA-289 to track merging magic bytes from `file` into Tika.

Tyler

On Wed, Apr 22, 2015 at 10:40 AM, Ken Krugler <[hidden email]>
wrote:

> Hi Tim,
>
> I don't believe there's any issue with comparing results.
>
> If you were looking at the source for "file", then it gets more gray, but
> I think even that would be OK as long as you weren't copying code or
> directly re-implementing algorithms.
>
> -- Ken
>
> > From: Allison, Timothy B.
> > Sent: April 22, 2015 5:47:17am PDT
> > To: [hidden email]
> > Subject: comparing Tika's file detect with other tools?
> >
> > Would it be frowned upon to compare Tika's file detection with other
> tools, like "file"?  Any concerns about effectively reverse engineering
> (when we find that Tika is wrong) from a non-Apache project?
> >
> > Any other sensitivities I should be aware of?
> >
> > Best,
> >
> >              Tim
>
>
> --------------------------
> Ken Krugler
> +1 530-210-6378
> http://www.scaleunlimited.com
> custom big data solutions & training
> Hadoop, Cascading, Cassandra & Solr
>
>
>
>
>
>
Reply | Threaded
Open this post in threaded view
|

Re: comparing Tika's file detect with other tools?

Jukka Zitting
Hi,

Copyright also covers databases, so we'll need to honor the license
terms equally when copying file's code or detection patterns. Luckily
file (from http://www.darwinsys.com/file/) comes under a BSD license,
so reusing the code or data is quite simple from a licensing
perspective. In fact we've already done some of that earlier, see
https://github.com/apache/tika/commit/f807af0ee947affd34d84b334bbdc32c11576b2e
for an example.

BR,

Jukka
Reply | Threaded
Open this post in threaded view
|

RE: comparing Tika's file detect with other tools?

Allison, Timothy B.
In reply to this post by Tyler Palsulich
Ken,
  Thank you.

Tyler,
  I don't know why I had missed that issue.  Thank you!  Do we need to worry about licensing issues if we effectively copy and paste from /usr/share/misc/magic (say, on rhel)?  I didn't see a license in the file, and I guess it is in the public domain?  

  I realize that we can't just copy and paste wholesale based on Nick's points, but for those that we can "re-implement" by our methods, can we use that file?

         Best,

                    Tim

-----Original Message-----
From: Tyler Palsulich [mailto:[hidden email]]
Sent: Wednesday, April 22, 2015 11:34 AM
To: [hidden email]
Subject: Re: comparing Tika's file detect with other tools?

Hi Tim,

I do not know about if there would be licensing concerns. But, we do have
TIKA-289 to track merging magic bytes from `file` into Tika.

Tyler

On Wed, Apr 22, 2015 at 10:40 AM, Ken Krugler <[hidden email]>
wrote:

> Hi Tim,
>
> I don't believe there's any issue with comparing results.
>
> If you were looking at the source for "file", then it gets more gray, but
> I think even that would be OK as long as you weren't copying code or
> directly re-implementing algorithms.
>
> -- Ken
>
> > From: Allison, Timothy B.
> > Sent: April 22, 2015 5:47:17am PDT
> > To: [hidden email]
> > Subject: comparing Tika's file detect with other tools?
> >
> > Would it be frowned upon to compare Tika's file detection with other
> tools, like "file"?  Any concerns about effectively reverse engineering
> (when we find that Tika is wrong) from a non-Apache project?
> >
> > Any other sensitivities I should be aware of?
> >
> > Best,
> >
> >              Tim
>
>
> --------------------------
> Ken Krugler
> +1 530-210-6378
> http://www.scaleunlimited.com
> custom big data solutions & training
> Hadoop, Cascading, Cassandra & Solr
>
>
>
>
>
>
Reply | Threaded
Open this post in threaded view
|

RE: comparing Tika's file detect with other tools?

Allison, Timothy B.
In reply to this post by Jukka Zitting
Oops, our emails passed in the ether.  Thank you, Jukka!

-----Original Message-----
From: Jukka Zitting [mailto:[hidden email]]
Sent: Wednesday, April 22, 2015 12:06 PM
To: [hidden email]
Subject: Re: comparing Tika's file detect with other tools?

Hi,

Copyright also covers databases, so we'll need to honor the license
terms equally when copying file's code or detection patterns. Luckily
file (from http://www.darwinsys.com/file/) comes under a BSD license,
so reusing the code or data is quite simple from a licensing
perspective. In fact we've already done some of that earlier, see
https://github.com/apache/tika/commit/f807af0ee947affd34d84b334bbdc32c11576b2e
for an example.

BR,

Jukka