Planning Tika 0.2

classic Classic list List threaded Threaded
11 messages Options
Reply | Threaded
Open this post in threaded view
|

Planning Tika 0.2

Jukka Zitting
Hi,

Tika has already come a long way since the 0.1 release, and I'd like
to push for the next release, 0.2. Any special wishes of the features
to include?

My goals for the release would be finishing TIKA-115 (making a
runnable jar instead of using startup scripts), upgrading our parser
dependencies (especially POI), and closing some of the reported bugs.

It would be nice to get the media type registry and configuration
changes that I've been working on finished, but that's IMO not a
requirement before 1.0. A nice extra feature would be some light
integration with Lucene Java. Also, I've been thinking about
potentially splitting Tika into component libraries like tika-core,
tika-parsers, tika-lucene, etc. to better manage external dependencies
and to make it more attractive for parser libraries to directly
implement the Parser interface.

BR,

Jukka Zitting
Reply | Threaded
Open this post in threaded view
|

Re: Planning Tika 0.2

robert burrell donkin-2
On Sun, May 25, 2008 at 2:50 PM, Jukka Zitting <[hidden email]> wrote:

> Hi,
>
> Tika has already come a long way since the 0.1 release, and I'd like
> to push for the next release, 0.2. Any special wishes of the features
> to include?
>
> My goals for the release would be finishing TIKA-115 (making a
> runnable jar instead of using startup scripts), upgrading our parser
> dependencies (especially POI), and closing some of the reported bugs.
>
> It would be nice to get the media type registry and configuration
> changes that I've been working on finished, but that's IMO not a
> requirement before 1.0. A nice extra feature would be some light
> integration with Lucene Java. Also, I've been thinking about
> potentially splitting Tika into component libraries like tika-core,
> tika-parsers, tika-lucene, etc. to better manage external dependencies
> and to make it more attractive for parser libraries to directly
> implement the Parser interface.

components sound good to me :-)

- robert
Reply | Threaded
Open this post in threaded view
|

Re: Planning Tika 0.2

chrismattmann
In reply to this post by Jukka Zitting
Hi Jukka,

>
> Tika has already come a long way since the 0.1 release, and I'd like
> to push for the next release, 0.2. Any special wishes of the features
> to include?

Yes, you are right and I am really looking forward to Tika 0.2. I've got a
couple wishes:

TIKA-80 Utility method in MimeUtils to perform full mime resolution using
all available strategies

TIKA-74 Test Resources should be loaded by the class loader (e.g.
getResourceAsStream()).

TIKA-61 Add namespaces to our metadata keys

TIKA-121 MimeType.clean method no longer exists as a capability

TIKA-79 Mime type detection from file header appears to be failing.

TIKA-118 Bouncycastle binaries requires US exports regulation compliance


As for TIKA-80, TIKA-74, TIKA-61, TIKA-121, TIKA-79, I assigned them to me
and will push hard to get them closed out within the next few weeks. I'm not
sure how much I can help with TIKA-118, but we have the same issue now in
Nutch (since Nutch now depends on apache-tika-0.1-incubating official
release), so I will watch how you guys solve that problem and then follow
suit :)

>
> My goals for the release would be finishing TIKA-115 (making a
> runnable jar instead of using startup scripts), upgrading our parser
> dependencies (especially POI), and closing some of the reported bugs.

+1

>
> It would be nice to get the media type registry and configuration
> changes that I've been working on finished, but that's IMO not a
> requirement before 1.0. A nice extra feature would be some light
> integration with Lucene Java. Also, I've been thinking about
> potentially splitting Tika into component libraries like tika-core,
> tika-parsers, tika-lucene, etc. to better manage external dependencies
> and to make it more attractive for parser libraries to directly
> implement the Parser interface.

I think separate libraries is a very interesting and cool idea. I'm happy to
help out with the separation, but I don't think it's a req for 0.2.

Also, once we're ready to release, I volunteer to be the release manager if
everyone is +1 for it.

Thanks!

Cheers,
 Chris


>
> BR,
>
> Jukka Zitting

______________________________________________
Chris Mattmann, Ph.D.
[hidden email]
Cognizant Development Engineer
Early Detection Research Network Project
_________________________________________________
Jet Propulsion Laboratory            Pasadena, CA
Office: 171-266B                     Mailstop:  171-246
_______________________________________________________

Disclaimer:  The opinions presented within are my own and do not reflect
those of either NASA, JPL, or the California Institute of Technology.


Reply | Threaded
Open this post in threaded view
|

Re: Planning Tika 0.2

Jukka Zitting
Hi,

On Fri, Jun 6, 2008 at 8:45 PM, Chris Mattmann
<[hidden email]> wrote:
> TIKA-118 Bouncycastle binaries requires US exports regulation compliance

Done.

> I think separate libraries is a very interesting and cool idea. I'm happy to
> help out with the separation, but I don't think it's a req for 0.2.

Agreed, we can do that later.

> Also, once we're ready to release, I volunteer to be the release manager if
> everyone is +1 for it.

Excellent, +1 from me.

BR,

Jukka Zitting
Reply | Threaded
Open this post in threaded view
|

Re: Planning Tika 0.2

Sami Siren-2
In reply to this post by chrismattmann
Chris Mattmann wrote:
> Hi Jukka,
>
>  
> Also, once we're ready to release, I volunteer to be the release manager if
> everyone is +1 for it.
>  

+1

--
 Sami Siren
Reply | Threaded
Open this post in threaded view
|

Re: Planning Tika 0.2

Niall Pemberton
On Mon, Jun 9, 2008 at 6:27 PM, Sami Siren <[hidden email]> wrote:

> Chris Mattmann wrote:
>>
>> Hi Jukka,
>>
>>  Also, once we're ready to release, I volunteer to be the release manager
>> if
>> everyone is +1 for it.
>>
>
> +1

+1 from me, sorry haven't found any time to help with Tika

Niall
Reply | Threaded
Open this post in threaded view
|

Re: Planning Tika 0.2

Rida Benjelloun
+1
Rida.


2008/6/9 Niall Pemberton <[hidden email]>:

> On Mon, Jun 9, 2008 at 6:27 PM, Sami Siren <[hidden email]> wrote:
> > Chris Mattmann wrote:
> >>
> >> Hi Jukka,
> >>
> >>  Also, once we're ready to release, I volunteer to be the release
> manager
> >> if
> >> everyone is +1 for it.
> >>
> >
> > +1
>
> +1 from me, sorry haven't found any time to help with Tika
>
> Niall
>
Reply | Threaded
Open this post in threaded view
|

Re: Planning Tika 0.2

Keith R. Bennett
In reply to this post by chrismattmann
A *very* belated +1 from me too.

- Keith

Chris Mattmann wrote
Hi Jukka,


Also, once we're ready to release, I volunteer to be the release manager if
everyone is +1 for it.

Thanks!

Cheers,
 Chris
Reply | Threaded
Open this post in threaded view
|

Re: Planning Tika 0.2

Jukka Zitting
In reply to this post by Jukka Zitting
Hi,

The following issues were remaining on the 0.2 roadmap:

  TIKA-50  Unit tests are incomplete.
  TIKA-61  Add namespaces to our metadata keys
  TIKA-69  ParseUtils methods need to support Metadata
  TIKA-74  Test Resources should be loaded by the class loader ...
  TIKA-79  Mime type detection from file header appears to be failing
  TIKA-80  Utility method in MimeUtils to perform full mime resolution ...
  TIKA-121 MimeType.clean method no longer exists as a capability

None of them looked terribly urgent or blocking, so I just removed
them from the 0.2 roadmap.

I think the current trunk is good enough to be released.

BR,

Jukka Zitting
Reply | Threaded
Open this post in threaded view
|

Re: Planning Tika 0.2

Sami Siren-2
Jukka Zitting wrote:
> I think the current trunk is good enough to be released.
>  
+1

--
 Sami Siren

Reply | Threaded
Open this post in threaded view
|

Re: Planning Tika 0.2

David Meikle
2008/9/28 Sami Siren <[hidden email]>

> Jukka Zitting wrote:
>
>> I think the current trunk is good enough to be released.
>>
>>
> +1
>
> --
> Sami Siren
>
>
If it mattered from me I would give it a +1, but since it doesn't I will
just give it a smile :-)