[jira] Created: (NUTCH-162) country code "jp" is used instead of language code "ja" for Japanese

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

[jira] Created: (NUTCH-162) country code "jp" is used instead of language code "ja" for Japanese

JIRA jira@apache.org
country code "jp" is used instead of language code "ja" for Japanese
--------------------------------------------------------------------

         Key: NUTCH-162
         URL: http://issues.apache.org/jira/browse/NUTCH-162
     Project: Nutch
        Type: Bug
  Components: web gui  
    Versions: 0.7.1    
 Environment: n/a
    Reporter: KuroSaka TeruHiko
    Priority: Trivial


In locale switching link for Japanese, "jp" is used as language code but it is an ISO country code.  The language code "ja" should be used.

By the way, I don't think many users are familiar with the ISO language codes.  A Canadian user may click on "ca" uknowoing that ca stands for Catalan, not Canadian English or French. Rather than listing the language code, listing the language names in the prospective languages may be better. (I say "may be" because the browser could show some language names in corrupted text if the current font does not support that language --- this is a difficult problem.)


--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (NUTCH-162) country code "jp" is used instead of language code "ja" for Japanese

JIRA jira@apache.org
    [ http://issues.apache.org/jira/browse/NUTCH-162?page=comments#action_12361683 ]

KuroSaka TeruHiko commented on NUTCH-162:
-----------------------------------------

This is causing an undesired behavior for Japanese users.  If the Nutch main index.jsp is visited from the browser of which the preferred language is configured to be Japanese, the app server's main page is displayed instead (Tomcat's Welcome page, for example).  This is because the Nutch index.jsp tries to redirect to the non-exisiting "ja/".


> country code "jp" is used instead of language code "ja" for Japanese
> --------------------------------------------------------------------
>
>          Key: NUTCH-162
>          URL: http://issues.apache.org/jira/browse/NUTCH-162
>      Project: Nutch
>         Type: Bug
>   Components: web gui
>     Versions: 0.7.1
>  Environment: n/a
>     Reporter: KuroSaka TeruHiko
>     Priority: Trivial

>
> In locale switching link for Japanese, "jp" is used as language code but it is an ISO country code.  The language code "ja" should be used.
> By the way, I don't think many users are familiar with the ISO language codes.  A Canadian user may click on "ca" uknowoing that ca stands for Catalan, not Canadian English or French. Rather than listing the language code, listing the language names in the prospective languages may be better. (I say "may be" because the browser could show some language names in corrupted text if the current font does not support that language --- this is a difficult problem.)

--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (NUTCH-162) country code "jp" is used instead of language code "ja" for Japanese

JIRA jira@apache.org
In reply to this post by JIRA jira@apache.org
    [ http://issues.apache.org/jira/browse/NUTCH-162?page=comments#action_12362274 ]

Paul Baclace commented on NUTCH-162:
------------------------------------

The best practice for identifying localization is to use the ISO language and country code in the form of lowercase language code followed by upper case country code.  This makes it possible to use specific idioms used in particular countries.  English has over a dozen variants; a few examples are:

  enAU-English-Australia
  enIE-English-Ireland
  enJM-English-Jamaica
  enUS-English-United_States

Inexplicably, different codes were used for the Japanese language and the country Japan.   The locale is jaJP.  Meanwhile, Javanese in Java is jwJA.  

The web gui should obtain the user's prefered language and country combination from the HTTP request headers and  use the nearest matching Locale:

  http://java.sun.com/docs/books/tutorial/i18n/locale/create.html

This is preferred over having the user pick the language and/or conutry from a list since the user might not be able to read the labels.



> country code "jp" is used instead of language code "ja" for Japanese
> --------------------------------------------------------------------
>
>          Key: NUTCH-162
>          URL: http://issues.apache.org/jira/browse/NUTCH-162
>      Project: Nutch
>         Type: Bug
>   Components: web gui
>     Versions: 0.7.1
>  Environment: n/a
>     Reporter: KuroSaka TeruHiko
>     Priority: Trivial

>
> In locale switching link for Japanese, "jp" is used as language code but it is an ISO country code.  The language code "ja" should be used.
> By the way, I don't think many users are familiar with the ISO language codes.  A Canadian user may click on "ca" uknowoing that ca stands for Catalan, not Canadian English or French. Rather than listing the language code, listing the language names in the prospective languages may be better. (I say "may be" because the browser could show some language names in corrupted text if the current font does not support that language --- this is a difficult problem.)

--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (NUTCH-162) country code "jp" is used instead of language code "ja" for Japanese

JIRA jira@apache.org
In reply to this post by JIRA jira@apache.org
    [ http://issues.apache.org/jira/browse/NUTCH-162?page=comments#action_12362363 ]

KuroSaka TeruHiko commented on NUTCH-162:
-----------------------------------------

I agree with Paul in principle.  With the current way of designating language by the lang code alone, there is no way to distinguish Simplified Chinese and Traditional Chinese, the written variants of Chinese language.  These have been traditionally distinguished by zh_cc where cc is a county code such as tw (Taiwan which uses Traditional form) or cn (People's Repblic of China which uses Simplified form).  Nutch currently displays Simplified Chinese for "zh", which would disappoint Traditional Chinese readers.

I am not too sure about *always* using the llcc naming convention.

(1) To be compatible with the web standard & practice, and not inventing another naming convention, I would prefer using minus as delimiter, e.g. en-au.  

(2) Not all languages need country modifier.  Japanese, for example, is spoken (by a large enough community that devlopes its own dialect) only in Japan.  Major browsers send out "ja", not "ja-jp".

(3) We would still need the generic "en" (or "fr") because there is a generic English (French) setting in many browsers with which "en" is sent without country code, and because we would need a fall back locale when unsupported country variants of English is specified by the browser.

(By the way, jwJA is an interesting example, but jw is not a registered ISO language code (it's jv), or Java is not a country.)

Another things we need to be concerned is the implication of the new locale naming scheme.  The language identifier and analyzer plugin (in Trunk) are consumers of the locale too.  The current code assumes the two letter language names.  This needs to be extended to accept both types of names, ll and ll-cc.




> country code "jp" is used instead of language code "ja" for Japanese
> --------------------------------------------------------------------
>
>          Key: NUTCH-162
>          URL: http://issues.apache.org/jira/browse/NUTCH-162
>      Project: Nutch
>         Type: Bug
>   Components: web gui
>     Versions: 0.7.1
>  Environment: n/a
>     Reporter: KuroSaka TeruHiko
>     Priority: Trivial

>
> In locale switching link for Japanese, "jp" is used as language code but it is an ISO country code.  The language code "ja" should be used.
> By the way, I don't think many users are familiar with the ISO language codes.  A Canadian user may click on "ca" uknowoing that ca stands for Catalan, not Canadian English or French. Rather than listing the language code, listing the language names in the prospective languages may be better. (I say "may be" because the browser could show some language names in corrupted text if the current font does not support that language --- this is a difficult problem.)

--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira