Any thoughts on logging within Tika components? We already have both
commons-logging and log4j as dependencies, and I wouldn't be surprised
to see some external parser libraries using JUL or SLF4J for logging.
I think that we need to standardize on the underlying logging library in
the core Tika code. I'm actually a fan of JDK's logging facilities, however,
I'm open to other options as well.
It's going to be difficult to control the underlying log libraries used by
external parsing plugins, agreed, however that's partially a function of how
the external library is called (e.g., if it's called through a
Runtime.getRuntime().exec(), then we'll have little programmatic control of
the external log library, versus if it's called programmatically, and has
> Any thoughts on logging within Tika components? We already have both
> commons-logging and log4j as dependencies, and I wouldn't be surprised
> to see some external parser libraries using JUL or SLF4J for logging.
> See also http://kasparov.skife.org/blog/src/logging_for_libraries.html >
> Jukka Zitting
Chris Mattmann, Ph.D.
[hidden email] Cognizant Development Engineer
Early Detection Research Network Project
On 10/6/07, Chris Mattmann <[hidden email]> wrote:
> I think that we need to standardize on the underlying logging library in
> the core Tika code. I'm actually a fan of JDK's logging facilities, however,
> I'm open to other options as well.
JDK logging sounds good to me, especially since it avoids an extra dependency.
> I wonder if we should not create our own very simple logging
> interface, as Brian suggests in his blog, to allow client code to use
> the library of their choice.
> We'd define a very simple Tika-specific logging interface, provide an
> implementation that logs to stderr, and allow client code to wrap any
> other logger as they see fit.
> That would allow us to remove all logging dependencies (unless the
> parsers that we use require logging libs themselves, of course).
A couple of thoughts:
1) would this mess up logging that displays the class method?
Oct 9, 2007 1:13:23 AM org.apache.solr.search.SolrIndexSearcher warm
INFO: autowarming result for Searcher@830122 main
Now would they all say
Oct 9, 2007 1:13:23 AM org.apache.tika.util.LogWrapper log
2) if you go this route, be sure to include an isLoggable(level)...
some log messages are expensive to construct. I know JDK logging
supports this, do other common log libs?
3) throw exceptions whenever possible instead of logging (small libs
should probably avoid logging altogether)
- It's only a facade API to program against, the developer that is
integrating Aperture can still determine which logging implementation is
used, by including the corresponding SLF4J driver jar file for that
logger in the classpath. As far as I can see, there is a driver for
every commonly used logger.
- There are serious classloading issues with java.util.logging (and
apparently also with Apache Commons Logging) that may make forwarding
calls to it to another logger effectively impossible in some scenarios.
For example, when Tika were to use java.util.logging and a user would
install two webapps containing Tika code, the log files of the two
webapps may get screwed up: one log file would contain messages of both
apps, the other would remain empty. This problem has happened to us in a
production system, so for us it's not a theoretical issue :)
I have had zero issues and complaints after the switch, so I guess it's
doing its job perfectly fine.
> On 10/8/07, Bertrand Delacretaz <[hidden email]> wrote:
> >... We'd define a very simple Tika-specific logging interface, provide an
> > implementation that logs to stderr, and allow client code to wrap any
> > other logger as they see fit....
> A couple of thoughts:
> 1) would this mess up logging that displays the class method?...
Gotcha. We could require "this" to be included in the logging calls,
to make sure we have the class name but this starts to get messy...
> ...2) if you go this route, be sure to include an isLoggable(level)....
Sure, not all logging libraries are efficient if you don't use this.
> ...3) throw exceptions whenever possible instead of logging (small libs
> should probably avoid logging altogether)...
Not logging anything....why not, sounds like a good idea.
I'm all for throwing exceptions when things go wrong, and using
logging more for debugging and informational stuff.
And Tika will probably stay easy to debug, as most tests (for now all
of them) run in single-threaded mode with no rocket science stuff.
I'd be ok with the idea of not doing any logging in Tika. As this is
an important design decision, we might want to have a vote to make
sure we're all on the same wavelength.