rspeer/langcodes: Version 3.3
Description
langcodes knows what languages are. It knows the standardized codes that refer to them, such as en for English, es for Spanish and hi for Hindi.
These are IETF language tags. You may know them by their old name, ISO 639 language codes. IETF has done some important things for backward compatibility and supporting language variations that you won't find in the ISO standard.
It may sound to you like langcodes solves a pretty boring problem. At one level, that's right. Sometimes you have a boring problem, and it's great when a library solves it for you.
But there's an interesting problem hiding in here. How do you work with language codes? How do you know when two different codes represent the same thing? How should your code represent relationships between codes, like the following?
engis equivalent toen.fraandfreare both equivalent tofr.en-GBmight be written asen-gboren_GB. Or as 'en-UK', which is erroneous, but should be treated as the same.en-CAis not exactly equivalent toen-US, but it's really, really close.en-Latn-USis equivalent toen-US, because written English must be written in the Latin alphabet to be understood.- The difference between
arandarbis the difference between "Arabic" and "Modern Standard Arabic", a difference that may not be relevant to you. - You'll find Mandarin Chinese tagged as
cmnon Wiktionary, but many other resources would call the same languagezh. - Chinese is written in different scripts in different territories. Some software distinguishes the script. Other software distinguishes the territory. The result is that
zh-CNandzh-Hansare used interchangeably, as arezh-TWandzh-Hant, even though occasionally you'll need something different such aszh-HKorzh-Latn-pinyin. - The Indonesian (
id) and Malaysian (msorzsm) languages are mutually intelligible. jpis not a language code. (The language code for Japanese isja, but people confuse it with the country code for Japan.)
One way to know is to read IETF standards and Unicode technical reports. Another way is to use a library that implements those standards and guidelines for you, which langcodes does.
When you're working with these short language codes, you may want to see the name that the language is called in a language: fr is called "French" in English. That language doesn't have to be English: fr is called "français" in French. A supplement to langcodes, language_data, provides this information.
langcodes is maintained by Elia Robyn Lake a.k.a. Robyn Speer, and is released as free software under the MIT license.
Files
rspeer/langcodes-v3.3.0.zip
Files
(187.8 kB)
| Name | Size | Download all |
|---|---|---|
|
md5:cd92fb1afe4dea7fdb2493d173114c2c
|
187.8 kB | Preview Download |
Additional details
Related works
- Is supplement to
- https://github.com/rspeer/langcodes/tree/v3.3.0 (URL)