[jira] Created: (TIKA-125) Pass Locale information to parsers

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

[jira] Created: (TIKA-125) Pass Locale information to parsers

JIRA jira@apache.org
Pass Locale information to parsers
----------------------------------

                 Key: TIKA-125
                 URL: https://issues.apache.org/jira/browse/TIKA-125
             Project: Tika
          Issue Type: New Feature
          Components: parser
            Reporter: Jukka Zitting


Looking at TIKA-103 I realized that some file formats can contain data whose text rendering depends on the active Locale which might not be explicitly specified in the file format or the specific document being parsed.

It should be possible for a parser client to explicitly specify which Locale should be used as the default when extracting text from a document. Setting the global default with Locale.setLocale() is not an option in many cases.

I think the best way to pass Locale information to a parser is as a part of the Metadata object.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (TIKA-125) Pass Locale information to parsers

JIRA jira@apache.org

    [ https://issues.apache.org/jira/browse/TIKA-125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12569797#action_12569797 ]

Bertrand Delacretaz commented on TIKA-125:
------------------------------------------

> I think the best way to pass Locale information to a parser is as a part of the Metadata object.

Agreed, Locale.setLocale() could be set to a Locale that has nothing to do with the documents being analyzed, so what you suggest makes sense.

> Pass Locale information to parsers
> ----------------------------------
>
>                 Key: TIKA-125
>                 URL: https://issues.apache.org/jira/browse/TIKA-125
>             Project: Tika
>          Issue Type: New Feature
>          Components: parser
>            Reporter: Jukka Zitting
>
> Looking at TIKA-103 I realized that some file formats can contain data whose text rendering depends on the active Locale which might not be explicitly specified in the file format or the specific document being parsed.
> It should be possible for a parser client to explicitly specify which Locale should be used as the default when extracting text from a document. Setting the global default with Locale.setLocale() is not an option in many cases.
> I think the best way to pass Locale information to a parser is as a part of the Metadata object.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.