Remove support for building language identifier profiles?

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Remove support for building language identifier profiles?

kkrugler
Hi all,

As part of integrating language-detector into Tika (see TIKA-1723), I noticed TIKA-546 ("Add ability to create language profiles to tika-app")

If we switch over to language-detector, then this code no longer makes sense.

Also note that many language detectors require the full set of language data in order to generate the most relevant (discriminating) ngrams, thus the current support for passing in data for one language doesn't work.

So any suggestions for what to do? Leave the code as is, with deprecated annotations, even though the profiles generated won't be useful?

Or wait for pluggable detectors, and someone could port the current Tika code - then this profile building support might still make sense, though it would want to be moved into the specific plugin.

-- Ken


Reply | Threaded
Open this post in threaded view
|

Re: Remove support for building language identifier profiles?

Oleg Tikhonov-2
Hi Ken,
I would be choose the last option you've mentioned.

-- Oleg

On Sat, Aug 29, 2015 at 7:58 PM, Ken Krugler <[hidden email]>
wrote:

> Hi all,
>
> As part of integrating language-detector into Tika (see TIKA-1723), I
> noticed TIKA-546 ("Add ability to create language profiles to tika-app")
>
> If we switch over to language-detector, then this code no longer makes
> sense.
>
> Also note that many language detectors require the full set of language
> data in order to generate the most relevant (discriminating) ngrams, thus
> the current support for passing in data for one language doesn't work.
>
> So any suggestions for what to do? Leave the code as is, with deprecated
> annotations, even though the profiles generated won't be useful?
>
> Or wait for pluggable detectors, and someone could port the current Tika
> code - then this profile building support might still make sense, though it
> would want to be moved into the specific plugin.
>
> -- Ken
>
>
>