Logging in Tika

classic Classic list List threaded Threaded
10 messages Options
Reply | Threaded
Open this post in threaded view
|

Logging in Tika

Jukka Zitting
Hi,

Any thoughts on logging within Tika components? We already have both
commons-logging and log4j as dependencies, and I wouldn't be surprised
to see some external parser libraries using JUL or SLF4J for logging.

See also http://kasparov.skife.org/blog/src/logging_for_libraries.html

BR,

Jukka Zitting
Reply | Threaded
Open this post in threaded view
|

Re: Logging in Tika

chrismattmann
Hi Jukka,

 I think that we need to standardize on the underlying logging library in
the core Tika code. I'm actually a fan of JDK's logging facilities, however,
I'm open to other options as well.

 It's going to be difficult to control the underlying log libraries used by
external parsing plugins, agreed, however that's partially a function of how
the external library is called (e.g., if it's called through a
Runtime.getRuntime().exec(), then we'll have little programmatic control of
the external log library, versus if it's called programmatically, and has
Java APIs).

 In any case, my +1 for JDK logging.

Thanks,
  Chris



On 10/5/07 9:13 AM, "Jukka Zitting" <[hidden email]> wrote:

> Hi,
>
> Any thoughts on logging within Tika components? We already have both
> commons-logging and log4j as dependencies, and I wouldn't be surprised
> to see some external parser libraries using JUL or SLF4J for logging.
>
> See also http://kasparov.skife.org/blog/src/logging_for_libraries.html
>
> BR,
>
> Jukka Zitting

______________________________________________
Chris Mattmann, Ph.D.
[hidden email]
Cognizant Development Engineer
Early Detection Research Network Project

_________________________________________________
Jet Propulsion Laboratory            Pasadena, CA
Office: 171-266B                     Mailstop:  171-246
_______________________________________________________

Disclaimer:  The opinions presented within are my own and do not reflect
those of either NASA, JPL, or the California Institute of Technology.


Reply | Threaded
Open this post in threaded view
|

Re: Logging in Tika

Jukka Zitting
Hi,

On 10/6/07, Chris Mattmann <[hidden email]> wrote:
> I think that we need to standardize on the underlying logging library in
> the core Tika code. I'm actually a fan of JDK's logging facilities, however,
> I'm open to other options as well.

JDK logging sounds good to me, especially since it avoids an extra dependency.

BR,

Jukka Zitting
Reply | Threaded
Open this post in threaded view
|

Re: Logging in Tika

Bertrand Delacretaz-2
In reply to this post by Jukka Zitting
On 10/5/07, Jukka Zitting <[hidden email]> wrote:

> ...Any thoughts on logging within Tika components?...

> ...See also http://kasparov.skife.org/blog/src/logging_for_libraries.html...

The same blog entry came to mind when seeing your message's subject.

I wonder if we should not create our own very simple logging
interface, as Brian suggests in his blog, to allow client code to use
the library of their choice.

We'd define a very simple Tika-specific logging interface, provide an
implementation that logs to stderr, and allow client code to wrap any
other logger as they see fit.

That would allow us to remove all logging dependencies (unless the
parsers that we use require logging libs themselves, of course).

WDYT?
-Bertrand
Reply | Threaded
Open this post in threaded view
|

Re: Logging in Tika

Yonik Seeley-2
On 10/8/07, Bertrand Delacretaz <[hidden email]> wrote:

> I wonder if we should not create our own very simple logging
> interface, as Brian suggests in his blog, to allow client code to use
> the library of their choice.
>
> We'd define a very simple Tika-specific logging interface, provide an
> implementation that logs to stderr, and allow client code to wrap any
> other logger as they see fit.
>
> That would allow us to remove all logging dependencies (unless the
> parsers that we use require logging libs themselves, of course).
>
> WDYT?

A couple of thoughts:
1) would this mess up logging that displays the class method?

Example:
Oct 9, 2007 1:13:23 AM org.apache.solr.search.SolrIndexSearcher warm
INFO: autowarming result for Searcher@830122 main

Now would they all say
Oct 9, 2007 1:13:23 AM org.apache.tika.util.LogWrapper log

2) if you go this route, be sure to include an isLoggable(level)...
some log messages are expensive to construct.  I know JDK logging
supports this, do other common log libs?

3) throw exceptions whenever possible instead of logging (small libs
should probably avoid logging altogether)

-Yonik
Reply | Threaded
Open this post in threaded view
|

Re: Logging in Tika

Jukka Zitting
Hi,

On 10/9/07, Yonik Seeley <[hidden email]> wrote:
> 3) throw exceptions whenever possible instead of logging (small libs
> should probably avoid logging altogether)

I'm wondering if we could go down this route with Tika.

BR,

Jukka Zitting
Reply | Threaded
Open this post in threaded view
|

Re: Logging in Tika

Christiaan Fluit-2
Some input for you guys: in Aperture we have been using SLF4J (.org) for
a while now, after an intensive debate on the pros and cons of various
frameworks. See this archived mail thread:
http://sourceforge.net/mailarchive/forum.php?thread_name=45D9CF4A.3060707%40aduna-software.com&forum_name=aperture-devel

Summarized, we're using SLF4J because:

- It's only a facade API to program against, the developer that is
integrating Aperture can still determine which logging implementation is
used, by including the corresponding SLF4J driver jar file for that
logger in the classpath. As far as I can see, there is a driver for
every commonly used logger.

- There are serious classloading issues with java.util.logging (and
apparently also with Apache Commons Logging) that may make forwarding
calls to it to another logger effectively impossible in some scenarios.
For example, when Tika were to use java.util.logging and a user would
install two webapps containing Tika code, the log files of the two
webapps may get screwed up: one log file would contain messages of both
apps, the other would remain empty. This problem has happened to us in a
production system, so for us it's not a theoretical issue :)

I have had zero issues and complaints after the switch, so I guess it's
doing its job perfectly fine.


Regards,

Chris
--
Reply | Threaded
Open this post in threaded view
|

Re: Logging in Tika

Bertrand Delacretaz-2
In reply to this post by Yonik Seeley-2
On 10/9/07, Yonik Seeley <[hidden email]> wrote:

> On 10/8/07, Bertrand Delacretaz <[hidden email]> wrote:
> >... We'd define a very simple Tika-specific logging interface, provide an
> > implementation that logs to stderr, and allow client code to wrap any
> > other logger as they see fit....

> A couple of thoughts:
> 1) would this mess up logging that displays the class method?...

Gotcha. We could require "this" to be included in the logging calls,
to make sure we have the class name but this starts to get messy...

> ...2) if you go this route, be sure to include an isLoggable(level)....

Sure, not all logging libraries are efficient if you don't use this.

> ...3) throw exceptions whenever possible instead of logging (small libs
> should probably avoid logging altogether)...

Not logging anything....why not, sounds like a good idea.

I'm all for throwing exceptions when things go wrong, and using
logging more for debugging and informational stuff.

And Tika will probably stay easy to debug, as most tests (for now all
of them) run in single-threaded mode with no rocket science stuff.

I'd be ok with the idea of not doing any logging in Tika. As this is
an important design decision, we might want to have a vote to make
sure we're all on the same wavelength.

What do people think?

-Bertrand
Reply | Threaded
Open this post in threaded view
|

Re: Logging in Tika

Yonik Seeley-2
On 10/9/07, Bertrand Delacretaz <[hidden email]> wrote:
> I'm all for throwing exceptions when things go wrong, and using
> logging more for debugging and informational stuff.

I don't know if the never-log scenario fits with Tika or not (be nice
if it did though)...
If something goes wrong and you still want to continue on, you need to log.

-Yonik
Reply | Threaded
Open this post in threaded view
|

Re: Logging in Tika

Bertrand Delacretaz-2
On 10/9/07, Yonik Seeley <[hidden email]> wrote:

> ...I don't know if the never-log scenario fits with Tika or not (be nice
> if it did though)...
> If something goes wrong and you still want to continue on, you need to log....

Agreed, but for this one case we could have our own warning() method
somewhere, that logs to stderr and can be replaced easily.

-Bertrand