The naïveté of programming language internationalization

July 19, 2020

As a speaker of both English and a niche European language which has many interesting and archaic features that have eroded in other languages, I have my fair skepticism of internationalization efforts that amount to simple string replacement, especially after I’ve tried to do i18n right, which took some effort.

Citrine is a programming language intended to be translatable and portable across supported human languages in order to both ease programming for non-English-speakers and allow international teams to work together more effectively. It is a noble goal, and I do like the way that they’re bringing into light many of the concepts of Smalltalk, which is one of my favourite non-mainstream languages that I have also never worked with.

Unfortunately, their translation system seems not much more than basic string replacement. That is simpler to implement from a programming standpoint, I’ll grant them that, but string substitution works well only when the target language has a feature set that is near-identical to the source one.

There are many ways such an assumption could break down. Simplest of them is word order – forcing subject verb: object. as the canonical ordering could be awkward in languages with different ones.

A more complex example are fusional languages, of which Latvian is one, where the same word (usually a noun or verb) can take on one of a multitude of suffixes that convey its properties and meaning within the sentence. English is also fusional to an extent, but that extent is low and contained, and, importantly, doesn’t extend to nouns; in languages with grammatical case distinguished by declension, using the nominative for variables might be unnatural and weird, and string replacement can’t really do different declensions.[1]

Meanwhile, on HN I saw a couple of comments noting that the translation was simplistic and didn’t account for differences between languages that might cause problems in applying the translation, like assuming prepositions exist in the target language and work similarly to English.

Though another comment pointed out that Citrine isn’t attempting natural language processing, one of its goals is adoptability by non-English speakers. Pushing assumptions baked into English to other languages where they don’t hold true seems suboptimal to this goal.

Translating any one piece of text takes a mind of two – both a complete understanding of the text and all of its nuance in its original language, and a knowledge and mastery of the foreign language to be able to reproduce it faithfully. Programming languages are no different; just because they use words from one language, doesn’t mean that you can just swap out these words for the same ones in a different language,[2] especially if the target language has no such equivalent word, or its meaning is overloaded in the source.

Actually, this makes me think that languages of the APL family might be far more universal in this regard. Their reliance on symbols instead of words obviates the need for translation among human languages, just an understanding of a symbolic one.

This relatedly linked thread also lists a couple of interesting approaches to localized programming languages.