What written language conveys the most information with the least characters?

jovan · April 4, 2007, 12:27am

I can read both Japanese and Chinese, and I assure you both are quite readable. As a matter of fact, all the excerpts I posted are shown at default size. Look up any Chinese web page, and you’ll see the characters are displayed at about the same size. Here:
http://www.yahoo.co.jp (Japanese)
http://cn.yahoo.com (Chinese)

The question in the OP was what language conveys the most information per page. Of course, Chinese characters have many more strokes than other writing systems. However, as far as space is concerned, Chinese is the winner. I think it should be noted that Hangul (Korean) is also a very compact script, however, the grammar of Korean isn’t as concise as that of Chinese, so you probably end up with texts taking more space.

Frylock · April 4, 2007, 12:39am

I think Monty (and everyone else on this thread) intends to discuss only actual, not merely possible, scripts.

-FrL-

jovan · April 4, 2007, 12:48am

If we use strokes as a unit, how do we deal with cursive scripts?

Staggerlee · April 4, 2007, 8:19am

Sorry to press the point, but doesn’t Ogham, the old Celtic script I mentioned in post #13, look like this? most of the ‘letters’ are small strokes crossing a central line.

elmwood · April 4, 2007, 2:18pm

How about an abjad? Hebrew might be up there, if you’re looking at languages with written alphabets that will convey a lot of information in a little bit of space.

Hebrew text:
וַיְהִי כָל-הָאָרֶץ, שָׂפָה אֶחָת, וּדְבָרִים, אֲחָדִים.

Roman transliteraton
Vayehi khol-ha’arets safa ekhat udvarim akhadim.

English
Now the whole world had one language and a common speech.
More Hebrew
עַל-כֵּן קָרָא שְׁמָהּ, בָּבֶל, כִּי-שָׁם בָּלַל יְהוָה, שְׁפַת כָּל-הָאָרֶץ; וּמִשָּׁם הֱפִיצָם יְהוָה, עַל-פְּנֵי כָּל-הָאָרֶץ.

More translisteration
Al-ken kara shmah bavel ki-sham balal Adonai shefat kol-ha’arets umisham hefitsam Adonai al-pnei kol-ha’arets.

English
That is why it was called Babel, because there the Lord confused the language of the whole world. From there the Lord scattered them over the face of the whole earth.

Frylock · April 4, 2007, 2:35pm

It looked to me like letters consisted in a series of strokes. So like, “A” might be three strokes, “B” four strokes, “C” two strokes, a loop, and a stroke, and so on.

-FrL-

yabob · April 4, 2007, 3:16pm

I know that elmwood mentioned “abjads” when he brought up Hebrew, but it probably bears amplification.

One of the things which helps reduce Hebrew is that the vowels aren’t represented, or only represented by the Masoretic points, which are optional (the excerpts above have them). They are not typically used in things like secular books and newspapers, as I understand it. It would compact the English, too, to write it like:

Nw th whl wrld hd n lngg nd cmmn spch.

Letting the reader puzzle out the intended vowels. Off course, it really wouldn’t work in English - too many words are disambiguated solely by their vowel sounds, and “a” and “I” are common words all by themselves.

If you regard the Masoretic points as parts of the letters, the fact that they are so tiny in comparison to the rest of the letter limits practical font size, and adds strokes, if we’re using that for a measure. It’s hard to imagine shrinking the Hebrew text shown above much more without losing the points.

For alphabetic representations, I might still favor Korean, because of the compactification from the syllable blocks. Of course, that does nothing to help the stroke count.

Bambi_Hassenpfeffer · April 4, 2007, 4:32pm

Yet again, Nava stole my post. I don’t translate for a living, but when I’m reading something in Spanish, it’s almost always longer than the same paragraph would be in English. And she’s right – our verb system features shorter verbs that don’t require differing conjugations in most persons, so we make up a lot of length there. On the other hand, Spanish makes up some of it by dropping pronouns and being spoken very, very quickly.

I would venture to say that English is more compact when written, but Spanish takes less time to tell the same story out loud.

Colibri · April 4, 2007, 4:46pm

I’ve done a number of bilingual books and exhibitions in English and Spanish, and the rule of thumb is that the Spanish will run about 25-30% longer. (Of course this depends on the subject and style, technical texts in Spanish being closer to the length in English than more literary ones.) This has to be taken into account in running parallel columns of text in the two languages. We generally put photos, tables, and other stuff into the English column to help equalize the texts.

I don’t know about that, but even if it did Spanish-speakers almost never do.

Gary_Robson · April 4, 2007, 6:06pm

A quick side question, if you don’t mind. I’m currently reading this thread on my work computer: a Windown XP system running Firefox 2.0.0.3. I’ve been messing with my character encoding and auto-detect settings for the last five minutes and I just can’t make any of these Chinese characters show up. The Hebrew shows up fine, but the Chinese characters are just a string of boxes. I’ve yet to check my Mac at home to see if it’s a Windows issue or a Firefox issue.

Does anyone know how to set up Windows Firefox to show Chinese?

jovan · April 5, 2007, 1:06am

On Windows, I you need to install Chinese and Japanese fonts on your system for the messages to display properly. I posted from a Mac, but verified the thread on Windows with Opera and it displayed fine.

Out of curiosity, I compared another excerpt from the UDHR, this time with Chinese, and different phonetic systems:

Chinese:

Arabic:

لكل إنسان حق التمتع بكافة الحقوق والحريات الواردة في هذا الإعلان، دون أي تمييز، كالتمييز بسبب العنصر أو اللون أو الجنس أو اللغة أو الدين أو الرأي السياسي أو أي رأي آخر، أو الأصل الوطني أو الإجتماعي أو الثروة أو الميلاد أو أي وضع آخر، دون أية تفرقة بين الرجال والنساء.وفضلاً عما تقدم فلن يكون هناك أي تمييز أساسه الوضع السياسي أو القانوني أو الدولي لبلد أو البقعة التي ينتمي إليها الفرد سواء كان هذا البلد أو تلك البقعة مستقلاً أو تحت الوصاية أو غير متمتع بالحكم الذاتي أو كانت سيادته خاضعة لأي قيد من القيود

Hebrew:

Korean:

Spanish:

Arabic hadn’t been mentioned so far. As a consonant alphabet, like Hebrew, it seemed likely to be fairly efficient.

elmwood · April 5, 2007, 4:00am

If anything, Greenlandic probably conveys the least information with the most characters. Article 1 from the DHR:

Matses is also long.

In English:

Latin seems nice and compact.

Mazahua is VERY compact.

jamus_se · April 5, 2007, 5:29am

My language just encapsulated every single bit of information on the entire internet with this symbol:

O|-<

I win.

Seriously though, why is there extra spaces between each character for the Chinese translation of the HDR when most common Chinese web sites don’t employ that kind of formatting? Something is fishy here.

人人有资格享有本宣言所载

should probably be shown as

人人有资格享有本宣言所载

and I can read both just fine, and to be honest I can read the bottom row faster because the odd spacing forces me to read the characters separately.

jovan · April 5, 2007, 5:48am

I c&p the texts from the UN’s site. I, too, thought the kerning was a bit wide, but completely failed to notice that it was an extra space causing it.

Here’s Chinese vs. Arabic for article 2, with the spaces removed:

لكل إنسان حق التمتع بكافة الحقوق والحريات الواردة في هذا الإعلان، دون أي تمييز، كالتمييز بسبب العنصر أو اللون أو الجنس أو اللغة أو الدين أو الرأي السياسي أو أي رأي آخر، أو الأصل الوطني أو الإجتماعي أو الثروة أو الميلاد أو أي وضع آخر، دون أية تفرقة بين الرجال والنساء.وفضلاً عما تقدم فلن يكون هناك أي تمييز أساسه الوضع السياسي أو القانوني أو الدولي لبلد أو البقعة التي ينتمي إليها الفرد سواء كان هذا البلد أو تلك البقعة مستقلاً أو تحت الوصاية أو غير متمتع بالحكم الذاتي أو كانت سيادته خاضعة لأي قيد من القيود

Napier · April 5, 2007, 11:58am

I think these are the two logical extremes:

A picture is worth a thousand words. Or, more accurately, a well done painting or photograph can convey a great deal, albeit with ambiguity. If characters or ideographs or symbols can be carried to an extreme, let the picture be one character. For that matter, I read somewhere that we interpret written text in English a word at a time when we read, or even read some phrases without parsing the words per se. Not surprising - you read “you” so many times in a lifetime that you can easily recognize the shape of the word itself. So I think it’s arguable that many entire words are characters or symbols as written.

At the other extreme, in a sense, the two dimensional barcodes that put a 1 or a 0 in each cell of a matrix represent either a very big number of characters (if each cell is a character) or a great deal of information in the character (if the entire barcode is a character). Moreover, file compression algorithms that reduce the information content into a nearly minimized number of bits make such barcodes maximally efficient in some sense.

But the trouble with all of these is that raw information isn’t very useful. Hell, matter itself is made of information, something like 10^65 bits to the kilogram. It is only the way in which information resolves uncertainty that makes it useful. One lantern if by land, two if by sea, conveys volumes to the intended reader, and makes no sense to people having a different state of mind.

Hari_Seldon · April 5, 2007, 4:56pm

Usually yes. But Mrs. Seldon is a professional French to English translator and the English text invaiably comes out shorter, as much as 10% shorter, than the French. Translators going from English to French expect the text to grow, by up to 20%. On the other hand, French is spoken so much faster, often with syllables dropped (qu’est c’est que ca has fragments of eight words compressed into five syllables) that I suspect spoken French goes at least as fast as English.

BrainGlutton · April 5, 2007, 6:52pm

I’ve heard that even spoken Chinese is more concise than most languages, because it relies on tone to make phonemic distinctions between similar words, most of which are of but one or two syllables. Is that true?

b3tour · April 6, 2007, 12:16am

This phenomenon of “rate” in languages has come up earlier in the thread as well; I believe it’s probably false that syllable-timed languages are somehow faster than stress-timed languages.

I can’t think just now of an easy way to confirm my suspicion, but from my own experience I have difficulty thinking of French, for example, as particularly “fast,” in relation to English.

NoCoolUserName · April 7, 2007, 2:11pm

Even with tones, there are fewer phonemes in Chinese. That means there are more two-syllable words to disambiguate. On the other hand, if the context is clear, you don’t need the extra syllables.

I’m not a languages scholar, and this is not very clear. I’m sure someone will elaborate shortly.

Polycarp · April 7, 2007, 3:18pm

All else equal, fewest-words syntax structure gives compactness. Morphemic/ideatic symbology compacts better than phonemic.

The above are awkward but validly constructed English sentences. They would be shorter in written Chinese (possibly spoken Chinese as well) and longer in any Romance language. German might shorten word count, but at the cost of more characters per phoneme ( Russian ч = English ch = German Tsch). Latin might compact them even better (ceteris paribus for “Everything else being kept equal,” my “All else equal,” for example).

Topic		Replies	Views
What is the most efficient human language? Factual Questions	27	3813	March 30, 2003
What language says the most with the least? Factual Questions	43	1829	December 6, 2000
What spoken language is the most verbally compact? Factual Questions	40	6372	October 20, 2006
Do some languages process information in a more sophisticated manner than others? Factual Questions	34	2274	January 10, 2009
what is the most long-winded language? Factual Questions	25	4585	October 3, 2007

What written language conveys the most information with the least characters?

Related topics