Seems like the most popular computer languages, from Assembly to XML, are all English-derived.
Do programmers from non-English speaking countries have a significantly harder time because of the language gap, or are English-based keywords simple enough to rote memorize that it’s not a big deal?
I understand that the logic behind code is what’s difficult, and pseudocode can be in any human language, but how much does it slow down work to have to try to remember what the words “case” vs “while” mean? Or maybe having to refer to a dictionary in addition to any reference doc to try to figure out what the words “System.TimeZoneInfo.ConvertTimeFromUTC” mean? Seems even if an IDE is in another language, the code itself remains English – at most they get a tooltip, but the function names and sorting are still unlocalized.
Not a big deal. I’ve worked with lots of coders from India and China (both here and outsourced) and English keywords in coding languages isn’t an issue. They’re really not very complex compared to the concepts of programming.
In fact, the people who win international coding challenges are frequently Indian and Chinese, IME disproportionately so. (Though I was under the impression that in India many people have a dialect of English as their first language; in fact, isn’t it one of India’s official languages?)
I am not a computer scientist, so perhaps I am wrong about this, but it seems to me that it ought to be fairly trivial to modify the compilers or interpreters used with programming languages so that you could write your programs using the foreign language equivalents of the quasi-English words used in a programming language (so where an English speaking programmer might use a Do…While loop, a French speaker could use a Faire…Pendant one, for instance). Is this not done? If not, why not?
It could be done fairly easily. Compilers are generally made of two parts: a lexer and a parser. A lexer converts strings of characters into symbols, and then the parser interprets those symbols and converts it into the program. So a lexer would convert the literal string “for” into the symbol FOR (which in reality is probably a unique numeric ID); it would be trivial to make the lexer interpret any other word, be it “cat” “sheep” or “asdlkfjlckb” and pass the symbol FOR to the parser. (With caveats that you have to avoid a certain amount of overlap between keywords).
Why is this not done? Largely portability. If you write your code with French variable names I may have difficulty reading it, but I absolutely can download your code, compile it, and run it, and it will do what it’s supposed to. To make it international you’d have to introduce a bunch of undesireable things.
The problem comes in with large programs that include external libraries. You could make a compiler flag like -locale=“fr_FR” or -locale=“us_EN”, but what if your library is in French and mine is in English? Hell, what if my English one happens to use the word faire as a variable name? Suddenly you get into conflicts where code suddenly doesn’t work when it’s compiled against code written in different locale settings. At this point the only solutions are to introduce very complex conditional compilation or settings to combat this (obnoxious and needless) or make it so the compiler reserves all possible keywords in all possible locales to prevent this sort of conflict (introduces a ridiculous level of complexity and minefields in writing a program where you have to play “guess whether or not this string of letters is secretly reserved”).
In the end, it’s simply easier to expect foreign speakers to learn the like 20 words needed to program. Hell, it’s not like it’s immediately obvious what a “float” or “double” is to an English speaker either.
I believe Microsoft once used to have localized versions of MS Word’s scripting language (pre-VBA) - I seem to recollect it was a PITA, and did not convert automatically between different-language versions of Word.
One major drawback about translating programming language keywords is that with a translated version you are limited to a much smaller domain of information on the Web.
Frankly I hate translations not only in the development world but also in the user interface - if I want to search online for a solution for a software problem and don’t find it using the German user interface terms, I have to guess what the terms used in the English-language user interface would be - often there are several ways I could think of how to word a menu item, a button or a setting name in English.
Overall, if you are too dumb to learn the basics of a foreign language you are too dumb to use a computer.
There are so few reserved words in most languages that should be trivial to remember them. About as hard as learning what the various Italian dishes are in your local restaurant. More to this, the meanings are never quite the same as general language use, and it might even help if they were unfamiliar words. Then you can look at a language like APL and shudder.
I have had to work though large scale code written in French. Luckily not debugging it, but we were using the system. All the comments and variable names were in French. That was hard going initially. But it wasn’t impossible, and you got used to it.
The Wiki link in the OP mentions that a few of the internationalized languages did, in fact, have multiple front ends (“lexers”) to map keywords from various languages, and a few even had some scheme for mapping library function names from various languages.
It should also be noted that source code is not really English even if it contains many recognizable words. Many keywords have a more specific, or just plain different, definition to their everyday spoken meaning.
Sure, “if” and “else” fit their everyday use very well, but what about “float”, “switch”, “void” etc?
Then you have all the OO and design pattern terms that would be virtually meaningless to someone that knows English but hasn’t studied these concepts. So it’s largely a foreign language to us all.
In terms of reading all source code (so including comments and variable names), yeah I’d say it’s an advantage being fluent in English as most code out there is in English. But it shows how big the field of programming is, and how much there is to learn, that it’s only a small advantage in the grand scheme of things.
It’s an entertaining story, but it should be noted that the cause of this bug was nothing to do with the fact that the development team’s first language wasn’t English.
An all-English team that program over tea and scones could’ve made the same programming / inadequate localization testing error.
I used such a program for my CoCo3. It was kind of cool to be programming in Vietnamese just for fun.
Why isn’t it done? Well, the programming language isn’t English. The user interface may use English words, but that’s not really English. The programming language has its own definitions for the words used and its own syntax and grammar.
One of the more interesting problems I’ve occasionally run into isn’t around the languages themselves - as noted, most non-English programmers can learn/memorize the needed keywords - but is instead what the non-US programmers choose as variable names. Those are often in the programmer’s native language* which can cause issues sometimes if the code then goes over to an English speaker and he/she misses some subtle variations or misspellings.
*I’ve worked with programmers in India, Eastern Europe and Central America and it seems mostly the latter two who do this the most. Not editorializing, just making an observation. *
I’m not complaining about you posting the story, it’s relevant enough, I’m trying to head off any conclusions of “that’s why we use one language” or whatever.
The scenario in the story is no more or less likely to happen in an alternate world where programming languages / APIs are routinely localized. And no more or less likely to have happened with an entirely english-speaking team.
I once had a similar issue with a physics problem I was grading, with no programming involved. One student decided to simplify his answer by introducing some auxiliary variables, which is fine. The student was from Russia, and decided to use Cyrillic letters for those variables: Also fine (in fact, probably good, overall, since it reduced the chance of confusion with preexisting variables). But the two Cyrillic letters he happened to choose (ш and щ, or sha and shcha) looked very similar, so similar that I, not being familiar with Cyrillic, thought that they were the same letter.
If we make the assumption that many of the Execs and Professionals immigrating to the US are IT professionals (and many start out as programmers) and also that the want to immigrate to the US for IT Professionals in India and China is comparable, then the following statistics maybe useful here :
The table here is the Year — total immigrants / % in Execs & Prof category (taken from DHS Website)
At one time it was said that the Indians had an advantage because English was a common language of instruction / academics in India. I interpret the above table as: if there was any gap China is catching up.
I would think the problem is more likely to come in learning opportunities - things like examples and textbooks. Fewer will be available in the native language and if the student tries to learn from English source material, may miss some of the language subtleties. Thus, you might have more problems with a Czech or Bulgarian programmer than, say Russian or Chinese, where the market size means more native language materials are available (as well as more tools for learning English).
Beyond that, my impression after decades of programming is that it’s like mechanics. Almost anyone can learn the basics, but extremely talented people are rare and seem to learn instinctively regardless of the barriers.
One of the nice features of (good) editors and IDEs is if you select one occurrence of a variable, it’ll highlight other occurrences of the same variable in the document. Super-handy for finding those two variables that look alike but are completely different as far as the compiler is concerned.