HtmlMapper

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

HtmlMapper

Jukka Zitting
Hi,

See TIKA-347 for a nice alternative to the earlier TIKA-304 approach
to customizing the way Tika maps incoming HTML to XHTML.

You can now inject a custom mapping strategy through the parse
context, like this:

    Parser parser = ...;
    ParseContext context = new ParseContext();
    context.set(HtmlMapper.class, new MyCustomHtmlMapper())
    parser.parse(..., context);

The new HtmlMapper interface contains the same mapSafeElement() and
isDiscardElement() method signatures that we already used for the
overridable HtmlParser methods in TIKA-304. If a custom HtmlMapper
instance is not found in the parse context, then the existing TIKA-304
mechanism is used.

BR,

Jukka Zitting