Contribute more code for TIKA

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Contribute more code for TIKA

Uwe Schindler
Hallo people,

As noted in my reply to the ODF thread, I think the SAX-design of TIKA is
really great. I submitted two patches for extension of TIKA. If you like my
work and you would like to get more SAX-enabled document parsers (like the
OpenXML for Office 2007) just let me know. I am rather new to your project
and your coding styles, but I hope, may patches look good for you. They
still need some JavaDocs but this I s astart.

If you are interested, I would also work with SVN directly (I created my
patches with SVN), and may commit my code directly. Just inform me about
that and if you would like to allow me that and how the workflows are.

I currently work in two other OpenSource projects in the core group:

- Inventor of http://www.panFMP.org (a Metadata Portal that uses Lucene). A
metadata portal based on that (http://sedis.iodp.org) needed fulltext
support, so I started to study TIKA, but I had problems with whitespace
during indexing and missing parsers for OpenXML and correct working ODF.
If you look into the source code of panFMP, you will see, that it is
sometimes using SAX and DOM intermixed (in one parser!), but using Commons
Digester for that. When I saw your MatchingContentHandler with streaming
XPath support, I was thinking of rewriting Digester code in panFMP! (You see
how cool your implementation is :-] ).

- Maintainer of the Sun Java System Webserver SAPI for PHP
(http://www.php.net/credits)

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: [hidden email]



Reply | Threaded
Open this post in threaded view
|

Re: Contribute more code for TIKA

Grant Ingersoll-2
Inline

On Nov 15, 2008, at 1:02 PM, Uwe Schindler wrote:

> Hallo people,
>
> As noted in my reply to the ODF thread, I think the SAX-design of  
> TIKA is
> really great. I submitted two patches for extension of TIKA. If you  
> like my
> work and you would like to get more SAX-enabled document parsers  
> (like the
> OpenXML for Office 2007) just let me know. I am rather new to your  
> project
> and your coding styles, but I hope, may patches look good for you.  
> They
> still need some JavaDocs but this I s astart.
>
> If you are interested, I would also work with SVN directly (I  
> created my
> patches with SVN), and may commit my code directly. Just inform me  
> about
> that and if you would like to allow me that and how the workflows are.

Not to discourage you, but more to educate, generally speaking we open  
up access to SVN after several contributions and the earning of  
"merit" as determined by the Lucene PMC.  This is often referred to as  
the Apache Way, or at least it is a part of the Apache way.

For guidelines on becoming a committer, see http://cwiki.apache.org/MAHOUT/howtobecomeacommitter.html 
  (they are for Mahout, a sister project of Tika's under the Lucene  
umbrella, but they more or less (unofficially) sum up the Lucene PMC's  
position on adding committers)

So, basically, keep up the work and stick around and you will likely  
become a committer at some point in the (near) future.

Cheers,
Grant
Reply | Threaded
Open this post in threaded view
|

RE: Contribute more code for TIKA

Uwe Schindler
Hi Grant,

I am not discouraged, I only wanted to be informed "how things work" at
TIKA. For me it is not a problem to create patches with SVN and post it to
the issue tracker (e.g., work with Mike McCandless about patches for fixing
some issues with CheckIndex(IndexWriter deprecation before Lucene-2.4's
release was good through the issue tracker together with a local SVN
checkout, so I have no problems with it)!

I only wanted to impress my interest in contributing to this project.

Uwe

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: [hidden email]

> -----Original Message-----
> From: Grant Ingersoll [mailto:[hidden email]]
> Sent: Saturday, November 15, 2008 7:20 PM
> To: [hidden email]
> Subject: Re: Contribute more code for TIKA
>
> Inline
>
> On Nov 15, 2008, at 1:02 PM, Uwe Schindler wrote:
>
> > Hallo people,
> >
> > As noted in my reply to the ODF thread, I think the SAX-design of
> > TIKA is
> > really great. I submitted two patches for extension of TIKA. If you
> > like my
> > work and you would like to get more SAX-enabled document parsers
> > (like the
> > OpenXML for Office 2007) just let me know. I am rather new to your
> > project
> > and your coding styles, but I hope, may patches look good for you.
> > They
> > still need some JavaDocs but this I s astart.
> >
> > If you are interested, I would also work with SVN directly (I
> > created my
> > patches with SVN), and may commit my code directly. Just inform me
> > about
> > that and if you would like to allow me that and how the workflows are.
>
> Not to discourage you, but more to educate, generally speaking we open
> up access to SVN after several contributions and the earning of
> "merit" as determined by the Lucene PMC.  This is often referred to as
> the Apache Way, or at least it is a part of the Apache way.
>
> For guidelines on becoming a committer, see
> http://cwiki.apache.org/MAHOUT/howtobecomeacommitter.html
>   (they are for Mahout, a sister project of Tika's under the Lucene
> umbrella, but they more or less (unofficially) sum up the Lucene PMC's
> position on adding committers)
>
> So, basically, keep up the work and stick around and you will likely
> become a committer at some point in the (near) future.
>
> Cheers,
> Grant