0.1 release?

classic Classic list List threaded Threaded
13 messages Options
Reply | Threaded
Open this post in threaded view
|

0.1 release?

Chris Mattmann-3
Hi Guys,

 Okay, so it seems to be the consensus that everyone is OK with me being the
release manager (thanks, by the way! :) ).

 Right now looking in JIRA ( https://issues.apache.org/jira/browse/TIKA ) we
have 1 critical and 4 major issues pending. What should the plan be for
these with respect to the 0.1 release? Can we come to agreement as to which
issues should make it into the 0.1 release? Can we also come up with a
release schedule? I'd vote for beginning the release process by the end of
next week, if that's realistic. What do others think?

Thanks!

Cheers,
 Chris


Reply | Threaded
Open this post in threaded view
|

0.1 release?

chrismattmann
Hi Guys,

 Okay, so it seems to be the consensus that everyone is OK with me being the
release manager (thanks, by the way! :) ).

 Right now looking in JIRA ( https://issues.apache.org/jira/browse/TIKA ) we
have 1 critical and 4 major issues pending. What should the plan be for
these with respect to the 0.1 release? Can we come to agreement as to which
issues should make it into the 0.1 release? Can we also come up with a
release schedule? I'd vote for beginning the release process by the end of
next week, if that's realistic. What do others think?

Thanks!

Cheers,
 Chris


Reply | Threaded
Open this post in threaded view
|

Re: 0.1 release?

Bertrand Delacretaz-2
On 10/12/07, Chris Mattmann <[hidden email]> wrote:

> ... Right now looking in JIRA ( https://issues.apache.org/jira/browse/TIKA ) we
> have 1 critical and 4 major issues pending. What should the plan be for
> these with respect to the 0.1 release?...

I just applied the TIKA-52 patch, and I don't think TIKA-56 really
deserves to be marked "critical".

I'd be fine with a 0.1 release as is, or maybe including TIKA-53 if it
can be finished soon (I'm not planning on working on it right now but
maybe others are?).

> ...Can we also come up with a
> release schedule...

At this stage I'd go the other way round: decide which issues we want
to be in 0.1, and release when those are done. In the meantime, avoid
committing disruptive changes to the codebase.

-Bertrand
Reply | Threaded
Open this post in threaded view
|

Re: 0.1 release?

Jukka Zitting
In reply to this post by Chris Mattmann-3
Hi,

On 10/12/07, Chris Mattmann <[hidden email]> wrote:
> [...]

Oops, I had this in my moderation queue and didn't notice that you had
already sent the same message from another address. So, please use the
other 0.1 thread for replies. Also, Chris, your gmail address is now
cleared for posting without moderation.

PS. Anyone willing to co-moderate the Tika mailing lists?

BR,

Jukka Zitting
Reply | Threaded
Open this post in threaded view
|

Re: 0.1 release?

Bertrand Delacretaz-2
On 10/12/07, Jukka Zitting <[hidden email]> wrote:

> ...PS. Anyone willing to co-moderate the Tika mailing lists?...

Count me in.

-Bertrand
Reply | Threaded
Open this post in threaded view
|

Re: 0.1 release?

Jukka Zitting
In reply to this post by Bertrand Delacretaz-2
Hi,

On 10/12/07, Bertrand Delacretaz <[hidden email]> wrote:
> I'd be fine with a 0.1 release as is, or maybe including TIKA-53 if it
> can be finished soon (I'm not planning on working on it right now but
> maybe others are?).

I just committed my current status with TIKA-53. I think it's good
enough for the 0.1 release.

BR,

Jukka Zitting
Reply | Threaded
Open this post in threaded view
|

Re: 0.1 release?

Keith R. Bennett
In reply to this post by Bertrand Delacretaz-2
All -

I would like to propose as another criterion for releasing 0.1 that the features we expose in the API work correctly (or are documented to not work correctly), and are verified by unit tests.  I can volunteer to do some of this, but I am not familiar with some of the features, so I may need to consult with the authors of those features.

Bertrand, thanks for committing TIKA-52.  I marked TIKA-56 critical because until it is fixed, Tika will fail to parse documents, and do so in a way that would (reasonably IMO) confuse and surprise a user.  I apologize if 'critical' was excessive; major didn't seem urgent enough.  If we provide 'detect MIME type functionality by extension' functionality, then even though I am (we are?) a fan of lower case extensions, a large number of users will complain that Tika is broken when it fails to parse files with upper case extensions.  Although from a purist's point of view Tika may not be broken, from a pragmatic point of view I think we should consider it so.  In my opinion, this should be fixed before 0.1 is released.  I believe the fix is trivial in any case.

I'd also suggest that another criterion be that the convenience methods in ParseUtils work as expected, and that performing common tasks is reasonably straightforward.  I'm not saying that there's a problem, just that as the architecture is changing, we want to make sure that we are not losing or breaking any conveniences, or failing to provide simplifications of common tasks that are trivial to simplify.

BTW, Jukka, I'm willing to help with forum moderation too.

Regards,
Keith



Bertrand Delacretaz wrote
On 10/12/07, Chris Mattmann <chris.mattmann@jpl.nasa.gov> wrote:

> ... Right now looking in JIRA ( https://issues.apache.org/jira/browse/TIKA ) we
> have 1 critical and 4 major issues pending. What should the plan be for
> these with respect to the 0.1 release?...

I just applied the TIKA-52 patch, and I don't think TIKA-56 really
deserves to be marked "critical".

I'd be fine with a 0.1 release as is, or maybe including TIKA-53 if it
can be finished soon (I'm not planning on working on it right now but
maybe others are?).

Jukka, I'd be willing to help with forum moderation too.


> ...Can we also come up with a
> release schedule...

At this stage I'd go the other way round: decide which issues we want
to be in 0.1, and release when those are done. In the meantime, avoid
committing disruptive changes to the codebase.

-Bertrand
Reply | Threaded
Open this post in threaded view
|

Re: 0.1 release?

Bertrand Delacretaz-2
On 10/12/07, Keith R. Bennett <[hidden email]> wrote:

> ...I would like to propose as another criterion for releasing 0.1 that the
> features we expose in the API work correctly (or are documented to not work
> correctly), and are verified by unit tests....

I (respectfully) disagree on having to be "perfect" in these areas for
our 0.1 release.

In my view this is clearly a "release early, release often" release.

As long as people can experiment with it on a reasonable number of use
cases, and hopefully give us feedback, doing the release (with all
required disclaimers about alpha quality) has value.

Having open issues like TIKA-56, for example, is fine IMHO. They are
just that: open issues that will hopefully be fixed before the next
release, but don't prevent people from experimenting with Tika 0.1.

-Bertrand
Reply | Threaded
Open this post in threaded view
|

Re: 0.1 release?

robert burrell donkin-2
On 10/13/07, Bertrand Delacretaz <[hidden email]> wrote:

> On 10/12/07, Keith R. Bennett <[hidden email]> wrote:
>
> > ...I would like to propose as another criterion for releasing 0.1 that the
> > features we expose in the API work correctly (or are documented to not work
> > correctly), and are verified by unit tests....
>
> I (respectfully) disagree on having to be "perfect" in these areas for
> our 0.1 release.
>
> In my view this is clearly a "release early, release often" release.

+1

until tika has a release, other projects have to depend on SNAPSHOTs
which limits usage

> As long as people can experiment with it on a reasonable number of use
> cases, and hopefully give us feedback, doing the release (with all
> required disclaimers about alpha quality) has value.
>
> Having open issues like TIKA-56, for example, is fine IMHO. They are
> just that: open issues that will hopefully be fixed before the next
> release, but don't prevent people from experimenting with Tika 0.1.

but keith's proposal is definitely worthwhile: it's very easy to add
new features without ensuring that existing ones work. perhaps 0.2
tasks could be created along the lines suggested...?

- robert
Reply | Threaded
Open this post in threaded view
|

Re: 0.1 release?

Bertrand Delacretaz-2
On 10/13/07, Robert Burrell Donkin <[hidden email]> wrote:

> ...but keith's proposal is definitely worthwhile: it's very easy to add
> new features without ensuring that existing ones work. perhaps 0.2
> tasks could be created along the lines suggested...?...

Sure, I didn't mean to dismiss Keith's proposal! Sorry if that's how I
came across.

I think adding more and more precise test cases is key in avoiding
regressions and making sure thinks continue to work as expected.  And
if things have to change, the diffs in the test cases show exactly
what happened.

-Bertrand
Reply | Threaded
Open this post in threaded view
|

Re: 0.1 release?

robert burrell donkin-2
On 10/13/07, Bertrand Delacretaz <[hidden email]> wrote:
> On 10/13/07, Robert Burrell Donkin <[hidden email]> wrote:
>
> > ...but keith's proposal is definitely worthwhile: it's very easy to add
> > new features without ensuring that existing ones work. perhaps 0.2
> > tasks could be created along the lines suggested...?...
>
> Sure, I didn't mean to dismiss Keith's proposal! Sorry if that's how I
> came across.

FWIW i didn't think you came across that way :-)

> I think adding more and more precise test cases is key in avoiding
> regressions and making sure thinks continue to work as expected.  And
> if things have to change, the diffs in the test cases show exactly
> what happened.

+1

JIRA does tasks and versions well and it's worth taking advantage of
it. IMHO 0.2 is a reasonable target for perfection or rejection of
features introduced in 0.1.

- robert
Reply | Threaded
Open this post in threaded view
|

Re: 0.1 release?

Keith R. Bennett
Bertrand -

I think I understand what you're saying -- if we wait until everything is "just right" before releasing, we'll delay release significantly, and that will be a disservice to our users.

I guess what we're really discussing is not a binary question, but where is the optimum balance between user friendliness and timeliness.

My concern is that if we release before Tika is more user friendly, that users would need to expend more effort to understand it, and that would decrease their experimentation, and ultimately, acceptance.  First impressions are powerful.

Let's take as an example Chris' great feature of turning on or off byte array MIME type detection.   If I'm not mistaken, overriding the default setting now requires creation of a new XML file (since using the default settings does not require creation of, or even knowledge of, such a file).

I suppose even that is ok for a 0.1 release, but here's another thing.  I was having problems using this feature.  I looked through the code, and could not see where the byte array MIME type detection was being used.  If there had been a minimal test exercising the feature I could have run it, stepped through it, or just looked at it and been sure that it was me and not Tika that needed correction.  After some review of the code, I came to the conclusion that it was only used where the user passes the byte header him/herself; I was thinking that one of our utilities would have read the header.

If others with no familiarity with Tika internals will find it even more difficult to figure this kind of stuff out, then working with Tika may be frustrating for them.  There have been many times when I have explored a new piece of software, and the amount of effort to understand it and get it to work exceeded my patience.  I wouldn't want this to happen with Tika.

I don't mean this in any way as a criticism to anyone.  You are very generously giving your personal time and expertise to this code, and I truly appreciate it.  My point was to elevate the importance of user friendliness in our release criteria. I would like to be helpful in this area, creating unit tests, providing documentation, etc.

- Keith
Reply | Threaded
Open this post in threaded view
|

Re: 0.1 release?

chrismattmann
Hi Guys,

 I'd like to chime in on this one:

> Let's take as an example Chris' great feature of turning on or off byte
> array MIME type detection.   If I'm not mistaken, overriding the default
> setting now requires creation of a new XML file (since using the default
> settings does not require creation of, or even knowledge of, such a file).

 I think we need to divorce ourselves from the fact that Tika (or any other
system that uses XML configuration files) requires XML to configure it.
Simply stated, there are APIs that exist in a lot of elements of Tika
configuration classes, e.g., TikaConfig, and MimeTypes, that control their
functionality. For good coding practices, and so that users don't have to
recompile Tika every time that they want to change a configuration property,
a lot of properties are factored out into separate XML configuration files,
that are read, and used to construct the user's desired internal Tika
configuration object. The use of XML here is not mandated: we could have
used a Java properties style file (a=b\nc=d\n,...etc), we could have sent a
communication to an external database to load the configuration, or we could
have neglected to provide any separate external configuration files at all,
and required Tika to be configured programmatically; and mandated that as
the only option.

 However, it's important to understand as well, that XML is a convenience
only: not a necessity. Additionally, just because Tika ships with a
tika-config.xml file, or a tika-mimetypes.xml, doesn't mean that it's the
end-all for configuration, and it's generally applicable to everyone's
deployment environment or use-case. It should probably be something that we
emphasize that these files, because of their ease to change and convenience,
are amenable to change and should be changed, to meet user's use-cases. We
ship the XML configuration files with the best, most general guesses that we
could make: however, that doesn't mean that they'll never need to be
changed.

The best example of this I can imagine would be something like the Apache
webserver. It ships with a mime types configuration file -- however, it
doesn't try to include every possible mime type in there as a default. In
fact, if you have exotic content types that you create (e.g., within the
scientific domain there are a lot of .hdf files), then you need to manually
edit this file yourself and add in your new exotic mime types. Additionally,
think about the httpd.conf file that ships with Apache. There are parameters
in there such as WebServerAdmin and DefaultPort, things that can come with
default values, but most likely need to be configured by each person who
downloads and uses the software. It's the same case here, with Tika.

>
> I suppose even that is ok for a 0.1 release, but here's another thing.  I
> was having problems using this feature.  I looked through the code, and
> could not see where the byte array MIME type detection was being used.  If
> there had been a minimal test exercising the feature I could have run it,
> stepped through it, or just looked at it and been sure that it was me and
> not Tika that needed correction.  After some review of the code, I came to
> the conclusion that it was only used where the user passes the byte header
> him/herself; I was thinking that one of our utilities would have read the
> header.
>
> If others with no familiarity with Tika internals will find it even more
> difficult to figure this kind of stuff out, then working with Tika may be
> frustrating for them.  There have been many times when I have explored a new
> piece of software, and the amount of effort to understand it and get it to
> work exceeded my patience.  I wouldn't want this to happen with Tika.

While I understand what you're saying, I am basically in agreement with
Bertrand: release early, and release often. Releases are good to "get the
software out there", but more so from a perspective of having a tangible,
stable artifact. To tell someone, "Oh you should use Tika for your project.
Just go to the .../trunk and check out the latest source from there" doesn't
exactly exude confidence that what the user downloads/checks out will be
stable, or even the same code within a few hours time. Having a release is
something that we can always point back to, and something that we can use to
version track the differences in the software as it evolves over time. I'm
reminded of my work where some software deliveries aren't necessary even
intended for outside use: they are simply "feature deliveries" that show
progress towards the overall deliverable. I think that's what this 0.1
release of Tika is: progress towards the overall 1.0 deliverable. We're not
mandating that Tika become a household name with this release -- just
showing that we are making measurable, tangible progress towards something
generally useful.

>
> I don't mean this in any way as a criticism to anyone.  You are very
> generously giving your personal time and expertise to this code, and I truly
> appreciate it.  My point was to elevate the importance of user friendliness
> in our release criteria. I would like to be helpful in this area, creating
> unit tests, providing documentation, etc.

Agree with this point, wholeheartedly, though I think we need to clarify
that 100% (even 10%) user-friendliness need not be part of an 0.1
(alpha-type) release. I think we can set user-friendliness as a measure for
each release, e.g., by saying, "by release 0.4 we'll be XXX user friendly,
by 0.6 we'll be YYY user friendly, and finally by 0.1, we'll be 100%
bona-fide user-buddy buddy" (beyond friendly) ;) But I don't think we should
stymie the 0.1 release by expecting it to be production quality from a user
friendliness point of view.

Cheers,
  Chris

______________________________________________
Chris Mattmann, Ph.D.
[hidden email]
Cognizant Development Engineer
Early Detection Research Network Project

_________________________________________________
Jet Propulsion Laboratory            Pasadena, CA
Office: 171-266B                     Mailstop:  171-246
_______________________________________________________

Disclaimer:  The opinions presented within are my own and do not reflect
those of either NASA, JPL, or the California Institute of Technology.