What kind of files do you support?

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

What kind of files do you support?

Karl Heinz Marbaise-3
Hi there,

i have found the project and find it very interesting....

I would like to know if there are plans to integrate scanning of e.g.
Java-Files and extract some meta information ? Or may be Perl, Python,
PHP etc.

Kind regards
Karl Heinz Marbaise
--
SoftwareEntwicklung Beratung Schulung    Tel.: +49 (0) 2405 / 415 893
Dipl.Ing.(FH) Karl Heinz Marbaise        ICQ#: 135949029
Hauptstrasse 177                         USt.IdNr: DE191347579
52146 Würselen                           http://www.soebes.de
Reply | Threaded
Open this post in threaded view
|

Re: What kind of files do you support?

Jukka Zitting-3
Hi,

On Thu, Mar 20, 2008 at 6:04 PM, Karl Heinz Marbaise <[hidden email]> wrote:
>  I would like to know if there are plans to integrate scanning of e.g.
>  Java-Files and extract some meta information ? Or may be Perl, Python,
>  PHP etc.

Currently the best we have is to treat such files as just plain text,
but I agree that more intelligent parsing would be really nice.

To implement that, we'd need some language parsers (or at least
patterns like the ones used by many syntax highlighters) preferably as
Java libraries from the central Maven repository. Or perhaps we could
use something like the enscript.st files to best leverage existing
work.

BR,

Jukka Zitting
Reply | Threaded
Open this post in threaded view
|

Re: What kind of files do you support?

Karl Heinz Marbaise-3
Hi,

first of all,

thanks for the reply...
> Currently the best we have is to treat such files as just plain text,
> but I agree that more intelligent parsing would be really nice.
>
> To implement that, we'd need some language parsers (or at least
> patterns like the ones used by many syntax highlighters) preferably as
> Java libraries from the central Maven repository. Or perhaps we could
> use something like the enscript.st files to best leverage existing
> work.
May be i can help with this, in my project http://supose.soebes.de i
have started to parse Java files using ANTLR parser generators to
extract particular information (Comments, Method names, maybe more)...
I think this would result in enhancing the Metadata object which seemed
to be no real problem....

May be i could integrated this into Tika ?

I had becoming advertent on the Tika project, cause i need extracting
information from different kind of documents as well so i took a look
into and found things i can use...


Kind regards
Karl Heinz Marbaise
--
SoftwareEntwicklung Beratung Schulung    Tel.: +49 (0) 2405 / 415 893
Dipl.Ing.(FH) Karl Heinz Marbaise        ICQ#: 135949029
Hauptstrasse 177                         USt.IdNr: DE191347579
52146 Würselen                           http://www.soebes.de
Reply | Threaded
Open this post in threaded view
|

Re: What kind of files do you support?

Jukka Zitting-3
Hi,

On Tue, Mar 25, 2008 at 3:25 PM, Karl Heinz Marbaise <[hidden email]> wrote:
>  May be i can help with this, in my project http://supose.soebes.de i
>  have started to parse Java files using ANTLR parser generators to
>  extract particular information (Comments, Method names, maybe more)...

Your project looks cool! I've quite often wanted something like that.
Of course there's Krugle, but a good open source repository search
tool would be really nice.

>  I think this would result in enhancing the Metadata object which seemed
>  to be no real problem....

Enhancing Metadata would be nice, but I think it would be even better
if you could annotate the XHTML output with <span class="..."> tags
(or something) to give the indexer more accurate context information.

>  May be i could integrated this into Tika ?

That would be great! You can contribute your work as a feature request
in https://issues.apache.org/jira/browse/TIKA.

More generally, it would be great if you could share your thoughts on
how Tika could best integrate with SupoSE. Is there anything we should
change in Tika to make your work easier?

BR,

Jukka Zitting