In this post I will say some things about localization that I've never felt it was politic to say before. (It still isn't, but what the hey.) But I first want to mention that one of the reasons Christian Perrier is awesome is because he manages to steer projects in ways that avoid (most of) the problems below, and because he has managed to scale rather impressively far. But not far enough that these things haven't gotten under my skin.
So, I hate localization. Hate, hate, hate. I wish it would die. I wish everyone spoke the same language. I hate that we have to be politically correct about it, to the point that even if everyone does speak English (hello, Denmark?), we still have to worry about localization into their native language. I especially hate that, after we waste time and resources doing that, everyone uses the software in English anyway (hello, most of Europe).
I hate that the typical best effort localization of software is typically so incomplete that it's not actually usable by people who really do need the translation. Either because "best effort" == 80% translated, which is sometimes worse than 0%, or because the documentation doesn't get translated too. And that even with what we consider a complete translation, there are still all sorts of English bits that nobody would ever consider translating. Such as the names of commands, of command-line switches, of markup keywords and configuration files.
I hate that translating software and documentation always involves bolting an enormous, nasty build system onto its side, and that the bulk of translations always greatly outweighs the size of the code. I hate that these systems constantly produce deltas that I have to avoid cluttering my commits with. I hate the expectation that every programmer somehow become an expert in internationalization, rather than letting them devote their time to something they're good at.
But none of that is what really bothers me about localization. What really, really bothers me is that I often see the need to localize code inhibiting that code becoming better. Just two examples:
- Oh no, we worry, if we fix this a message, we will have to do 100 times more work to update the translations. Surely, then, anyone who wants to fix a bad message is an anglocentric who should be ignored.
- Rather than allow the computer to dynamically generate correct and expressive English on the fly, to actually talk to us, we have to reduce everything to gettextable static strings. The poster child for this disease is that it becomes easier to tack on "(s)" or reword entirely, than to deal with the cumulative complexity of plural forms in every language.
(PS, I also hate that we have to keep "l10n" and "i18n" straight, and that the words are so unwieldly in this f5g modern English that we represent them based on their length.)
Hey, you're right. Localization is a pain and a lot of effort is spend to translate and translate again the same sentences.
Why can't we just get gettext to use translate.google.com or something similar. So programmers could just change an error message and the internet crowd has to translate this sentence by translate.google.com, but only once! All programs could use the same base for translations.
And on top of this, don't use gettext! Just let printf print the untranslated text if LANG=C and if LANG!=C let printf ask translate.google.com!?
The point is, just unleash the programs from the translations. Let the programmers code and let the translators translate :-)
I enjoy localized applications. They are not terribly important to me, but I find english-language interfaces used by some (when they have a different mother tounge) somewhat uncultural or flat. However, for traditional reasons (probably), I expect GUI programs to be in Swedish and shell commands to be in english. Also I couldn't play freeciv in Swedish since everything sound dumb when translated too far.
This reminds me of the university course on Operating systems I just took. The spoken language of the course and lectures was Swedish, but all slides, exercises, labs and the exam was/were in english. I ended up writing the exam in english. (If the question is posed in english, I answer in english, often without thinking about it).
Another time I enjoyed a very small course in Germany -- two students. The course had started with more students and was in english.. but at the end only us German-fluent were left, but we kept the lecture in english. The strange thing is the 5-min chat in German right before start, then right, now we start, lecture goes on, then pause the two students and the prof break out in german chatting.
On programming.. minimally, everything required of the uninterested developer is to set up gettext and use ("the gettext () function") around all strings? That can't be so hard.
Consider a program that would like to output a message like: "You have one apple, three oranges, and 90 cents left."
Writing code to output this English is fairly easy, needing only to pull in a library to deal with English pluralisation, and optionally doing some simple mapping of small numbers to strings. But doing the same with gettext is basically impossible.
As you point out, an application that’s not 100% localized is much worse than the English version. Actually I wish we’d exclude too incomplete translations from being bundled at all, but setting a limit is hard and it depends on the application.
This is why fixing the localized message in okular is not a trivial matter. But the fact the KDE maintainers use it as a bad excuse for not changing it (the real reason being, they don’t want to change it) does not mean it is impossible. There are several GNOME packages with added strings from the upstream version, and there are simple ways to translate them without the package breaking havoc - actually, I think the code I’m using for GNOME could be re-used verbatim for KDE.
As for the rest of your rant, Christian answered it all. If you want software written by geeks, for geeks that like their system in English rather than in a language of their choice, fine, but happily this view is not shared by all free software developers.