Of course I’m not actually inventing a language. It’s just a question that crossed my mind when reading about the Voynich Manuscript and cryptological attempts to find whatever meaning might be hidden in it.
Say I’ve esperanto-style invented an artificial language. I’ve created a new vocabulary (which does not borrow words from existing laguages - my etymology is totally new, I’ve dreamed up fancy words for every term I want my vocab to include) and a grammar (which, although systematic and without exceptions, is not necessarily similar structurewise to the grammars of existing laguages).
I don’t tell anyone any details about my language, but I translate a multi-page story that actually makes sense into it. If I present this translation to a fine cryptologist, could he decipher and re-translate it?
I’m inclined to say no, for how should he know that a given combination of letters corresponds to a certain English word? Then again, people say there is no 100 % surefire code (except for one-time pads perhaps), cryptologists could crack anything providing that there are rules, systematics and cycles in the text, which certainly are in my new language.
So what? Let’s give our cryptologist plenty of computing power and money. Will he find out what I wrote?
A cryptologist could not. A linguist could probably identify structures but not meaning. The difference between a code or cipher and a different language is that a different language is just that. A cryptologist who speaks only English probably could not apply crypto methods to read Chinese.
I doubt anyone could translate it. As William Poundstone pointed out in his discussion of the Voynich Manuscript, nobody was able to figure out Egyption Heiroglyphics until the Rosetta Stone was found.
The book The Code Book by Simon Singh has a chapter on the decipherment of ancient languages using code-breaking approaches. The most famous was the Rosetta Stone, but that wouldn’t apply in your case.
Also treated in this book are the Navajo code talkers in WWII who foiled Axis cryptologists by merely speaking their own native language. It was never deciphered.
BTW this is a great read for a non-fiction book. It’s hard to put down.
lucwarm: If the Poundstone book you’re talking about is Labyrinths of Reason, it’s exactly the one I was reading. Poundstone also mentions an American WW1 cryptologist named Herbert Yardley, who allegedly cracked a Japanese code without ever having learnt Japanese (it might be one of those codes that don’t encrypt words but that include nonsense letter combinations for certain meanings, though). That’s why I personally would not bet you cannot cryptologically analyse languages you don’t speak yourself.
I tend to agree with Dogface that a cryptoanalysis wouldn’t be too fruitful against a new language. Several examples exist where people studying a dead language have been completely stymied until they found a key such as the Rosetta stone.
One comparison to crypto might be useful. “Good” crypto relies on problems which are mathematically difficult. The algorithms and implementations are open for review, but without knowing the specific key, it’s very difficult to decrypt a specific message. Some “bad” crypto relies on security-by-obscurity, which means it tries to keep the math, protocols, etc. secret. Security by obscurity is a bad approach because if your secret ever gets out, your entire system is blown whereas with “good” crypto, if your key is exposed you can change keys but your crypto system is still useful. Your language is comparable to security by obscurity. In order to be useful for communication, you have to teach the people you want to communicate with, and this risks exposure of your entire system.
This has no bearing on the OP about whether you could decipher a completely unknown language, but it addresses the big picture of what a language would be used for in the first place.
CookingWithGas is right… The Code Book was a very good read. For a good work of fiction incorporating cryptanalysis as a major theme, try Cryptonomicon by Neal Stephenson.
Yes and no. Yardley’s own account of the Japanese decipherment can be found in chapter XIV of his The American Black Chamber (1931; Ballantine, 1981). From this, it’s clear that, while uselessly minimal in any other context, he had some knowledge of the language. David Kahn’s version (in the abridged The Codebreakers, Sphere, 1973) adds the detail that he could draw upon Frederick Livesey, who wasn’t initially fluent, but who was a gifted linguist and who had picked up the language by the time Yardley finished.
As an analogy, suppose one has a mass of encoded telegrams sent by the French diplomatic corps and a copy of Proust in the original. But you don’t speak French. However, you guess that the telegrams use a simple substitution cipher (like a=b, b=c, etc.). Can you test this? Yes, you can. Construct frequency tables of the letters used in both the telegrams and the novel. Is there a resemblence? You could even tart this up as hypothesis testing. If they are using such a cipher, you could conclude that they are and that you’ve cracked it.
As regards the general problem posed by Schnitte, the difficulty here is the Proust. Suppose you don’t actually have a copy of the novel in French, but a copy enciphered with a substitution cipher. If you know nothing about French, you can’t tell.
Back to Yardley. He had a set of 25 telegrams sent in the clear - his equivalent of the Proust. He also knew enough about the structure of Japanese written in Roman characters to know to concentrate on two-letter groups. His first step was therefore to do a vast frequency analysis of such groupings in both the clear and the encoded telegrams. After much additional effort, the breakthrough step was to guess particular words. Some of these were English words phonetised into Japanese (Ireland = airurando) and some were Japanese that he knew (independence = dokuritsu). This step also involved the crucial use of context: the knowledge, based on the likely content of the telegrams, that “independence” was likely to be in proximity to “Ireland”, given the events of the time.
Yardley’s achievement, while brilliant, thus wasn’t entirely blind. He had to know he had some clear text and he had to know a little Japanese. In practice, he also had to know something of the structure of the language and what the telegrams might discuss.
What cryptographers can do in the absence of any information about the target language is identify structure. I haven’t looked for a cite, but I recall that many of those at Bletchley Park couldn’t speak German, Italian or Japanese. But they could dig out significant patterns.
They weren’t speaking just Navajo. Often Navajo terms were used as code-names for something else. IIRC, one such example was the Navajo word for “hawk” being used for a fighter plane. Also, they worked out a series of words to be used to stand for letters of the English alphabet, analogous to the “Alpha, Bravo, Charlie” system. Again, IIRC, the English translation of the Navajo word would begin with the letter of the alphabet it represented.
…Which completely eliminates the point of using the Navajo language in the first place. Any decent codebreaker given a sufficiently-large message spelled out in that way could figure out that a certain word stood for “D”. He could figure this out with equal ease regardless of whether he knew Navajo, and happened to know that that was the Navajo word for “dog”.
As for the OP, it’s theoretically possible, given a large enough body of text, to deduce a language without any other clues. But there’s no possible way you’re going to get a large enough body of text (it’d certainly take more than a single short story), and even given that, it’d be very difficult and time-consuming to do. If we knew what short story you started with, it’d be much easier, and even if we knew of a thousand short stories which it might be and had to pick from them, it would help.
Thanks, Chronos. I was thinking about changing the task a bit: We translate an existing novel of considerable length into the new language and tell the codebreaker to find out what novel it was.
All of this discussion makes me wonder even more about the value of the SETI program. Even if we ever hear a signal (doubtful), it will forever be meaningless to us.
Yes, a linguist would be able to deduct the language structure but would find it very hard to translate it if no reference is given.
Yet the more examples provided the better the chances that vocabulary in combination with syntax/grammar starts to make sense. And this would be much easier if the inventor of the new language would translate an amount of pages from different languages (which he identifies).
As a linguist, I would say no. The Mayan writings were only recently deciphered. They would not have deciphered them if not for the discovery of writings which they knew to be the names of kings. However, in your example, there is no such parallel. You have created the language and there isn’t even a modicum of translated material.
As previously stated by many, a linguist could determine sentence structure, but little more could be derived from your language. Languages can vary so widely (particularly fake languages) that it would be near impossible to decipher a language without any starting point.
I agree it is doubtful that we would ever hear a signal, even if there were aliens a mere 100ly away. But if we did, some things might be decipherable-
images, for instance, and perhaps telemetry; language would be impossible to decipher, but an analysis of the complexity might give an idea of the level of civilisation you are dealing with.
If presented with an extraterrestrial e-mail, containing holiday photo’s, for instance, the images might be decipherable, while you could perhaps make a guess that the text content was trivial.
Theres a big difference between translating something and recognising if something is a language or not. The first is hard, the second is trivially easy.