I18N Support in Koha 3

Themes

Koha's concept of themes allows template variants independent of language to be created. The default theme in Koha 3.0 is 'prog', which originally stood for 'Programmer's Theme'.

Terminology

theme

                      prog, etc.

lang_string

                      a directory name composed of a language string corresponding
                      to RFC 4646. e.g., en (language), fr-FR(language + region), ar,
                      ar-Arab (language + script), etc. 'lang_string' is always
                      the fully-composed string not divided up into subtags (language,script,
                      region,variant,extension,privateuse subtags); valid values for
                      'lang_string' are detected by C4::Languages::_get_language_dirs

native_description

                      the language's language subtag description in the native language/script
                      e.g., en-GB and en share 'English' as their native language description

client_language_subtag

                      the user's language subtag
                      e.g., if the user has selected the language 'ar-Arab', their
                      client_language_subtag is 'ar'

client_lang_description

                      the lang_string's language subtag description in the current
                      client's language (keyed by language subtag only, not the client's
                      lang_string)
                      e.g., the language under consideration is en-GB, and the user's
                      current language is 'fr-CA', the client_lang_description is keyed
                      by language subtag 'fr' and resolves to 'Anglais'

language_subtag

                      the lang_string's language subtag
                      e.g., if the lang_string under consideration is 'en-Latn-US', the
                      language_subtag is 'en'

language_description

                      the lang_string's language subtag description as contained in
                      RFC4646, independent of the client or native descriptions.
                      'language_description' is keyed by the 'language_subtag'
                      e.g., if the 'lang_string' under consideration is 'fr-FR', the
                      language_description is 'French'

script_subtag

                      the lang_string's script subtag
                      e.g., if the lang_string under consideration is 'zh-Hans', the
                      'script_subtag' is 'Hans'

script_description

                      the lang_string's script subtag description
                      e.g., if the lang_string under consideration is 'zh-Hans', the
                      'script_subtag' is 'Hans' and the 'script_description' is 
                      'Han (Simplified Variant)'

region_subtag

                      the lang_string's region subtag
                      e.g., if the lang_string under consideration is 'en-GB', the
                      region_subtag is 'GB'

region_description

                      he lang_string's region subtag description
                      e.g., if the lang_string under consideration is 'en-GB', the
                      region_subtag is 'GB' and the region_description is
                      'United Kingdom'

variant_subtag

                      the lang_string's variant subtag - rare

variant_description

                      the lang_string's variant subtag description - rare

Bi-Directional Scripts (BiDi)

Some scripts, notably Arabic and Hebrew, require right-left directionality. If a lang_string contains a script subtag, it will be detected and the appropriate “dir=” will be added to the <head> tag for every page in the interface. If a lang_string does not contain a script subtag, a table of defaults is consulted and the script/bidi is set according to that table.

Accept-Language Detection

The HTTP_ACCEPT_LANGUAGE is ingested and compared to the list of available 'translated' languages in the system. If a good match is found, the system uses that as the default 'language'

Language Seletion by the User

Each language is grouped into it's language_subtag for top-level identification (language_subtag→native_description).

The link that shows up on the interface has all the 'language_subtag'-keyed 'language_description's. If there is only one language, the link changes language to the 'language'. The title for the link is set to contain the descriptions for all subtags associated with that language, such descriptions are given in the user's current locale; if descriptions don't exist in the user's locale, English descriptions will display, or finally, the fully-composed 'language' code itself will display.

If there is more than one 'language', the language_subtag is used for top-level identification (language_subtag→native_description) a dropdown reveals all of the available 'language's. language_descriptions for these are keyed by all subtags just the same as the title for the single-case link.

When the user selects a 'language', that choice is stored in a cookie, and overrides the Accept-Language preference that was auto-detected.

Other Language Tags (ISO 639-1, 639-2, MARC Code List for Languages, etc.)

The codes defined in RFC4646 in the registry are pulled from ISO-639-2. two-character codes are used whenever possible. Sometimes however, it's necessary to determine the two-character code given it's three-character form. The classic example of this is code reading from the MARC record in the 008/35-37, a three-letter code that corresponds to the MARC Code List for Languages. http://www.loc.gov/marc/languages/

Translating Koha

FIXME: add overview of translation process with PO/POT files

(NOTE: is there a way to signal to gettext that something shouldn't be translated?)

PO File Naming Conventions

Starting with Koha 3.0, I'd like to propose a new naming convention for the PO/POT files in Koha that follows RFC4646 (have a look at http://www.w3.org/International/articles/language-tags/ for an intro). We can utilize RFC4646's concept of 'extensions' to tag for the 'theme' (t) and 'interface' (i) and 'version' (v). For example, if you're creating a French translation in France, the filename would be:

fr-FR-i-staff-t-prog-v-30000.po

Note that FR is a 'region' subtag … but in some languages, such as Chinese, it's also useful to be able to distinguish between 'script's, such as 'Hans' (Simplified Han) or 'Hant' (Traditional Han). So you might, if you are translating Koha into Chinese, end up with a filename like:

zh-Hans-i-staff-t-prog-v-30000.po

RFC4646 gives us a lot of flexibility in how we distinguish between various translations.

if you switch to a language that doesn't exist, re-route to English if there's only one translated language, don't bother

preferences on time/date formats, bidirectional text CurrencyDecimals places

Addresses, Zip Codes, Telephone Numbers

Take for example a typical (fictive) Dutch address:

Jan Hoogslag Jan van Beckerstraat 28a 9452 XJ Sappermeer

Regex used to validate a zip code vs. a postal code

Chinese / Japanese Format:

(postal code) (prefecture, city, street + number) (last name, first name)

Telephone numbers according to: XXX-YYY-ZZZZ or XXXX-YY-ZZZZ or XXXX-YYY-ZZZZ

Telephone numbers according to: XX-YYYYYYYY or XXX-YYYYYYY or XXXX-YYYYYY

Currency, Number, Date, Time Formatting

currency / number / date / time formatting

 
en/development/kohai18n.txt · Last modified: 2007/12/27 07:53 by kados
 
Except where otherwise noted, content on this wiki is licensed under the following license:CC Attribution-Noncommercial-Share Alike 3.0 Unported
Recent changes RSS feed Donate Powered by PHP Valid XHTML 1.0 Valid CSS Driven by DokuWiki