How much Text to decipher a made up language?

This question is based on an assumption

First the assumption.

Let’s say I go to a group of like 5 linguists and employ them to “make up” a language. And after a year they do this and they can talk and write in this language. I am assuming since we have Ido and Esperanto among others it is possible.

But lets say I take this group of 5 linguists and tell them to write a text that will allow another group of linguist the ablity to decipher their “secret language” they made up.

How much text would it take? Books? One page? And what things must be contained in this text for success. Also would it make a difference if they used an alphabet like ours or if the alphabet was also made up? Other than making it harder.

IANALinguist, but this’ll depend on a lot of things. Is the invented language related to any known languages? Are there any contextual clues to the meaning of the passage? Are there any parallel texts in other languages? And what counts as “decipher”… Is it OK if there’s one word, or 10, or 5000, that linguists aren’t sure what they mean? What if the analysts only figure out a single word, does that count for anything? And is the sample text designed to teach the language, or is it just a random sample of usage?

The existance of parralel texts is cruicial for deciphering most texts. Without that, it’s incredibly difficult. Theres just so much things which anything could mean. We had plenty of examples of Egyptian but it took the rosetta stone to finally crack it.

More than the Voynich Manuscript provides.

I am thinking that if you task five linguists with creating a usable language that is practically uncrackable, they can do so. If you task five linguists with creating a usable language that is easily decipherable they will come back with something like english but with “th” instead of every “t” and “s” and a “t” instead of every “th”.

Tat will be just fabulouth. :stuck_out_tongue:

So the question is kind of meaningless. What are you asking them to create?

(Informed WAG) The best you can do with made up words is to do a frequency check and try and match that up to existing languages to be able to figure out which parts are the verbs, nouns, etc. But having done that, you’ve still not accomplished much. Without knowing at least some percentage of the words, you have no way to be able to guess what meaning each of the things on the page is meant to represent. Knowing where the noun is doesn’t do you any good if you don’t know what noun it is.

Now if someone were to make a comic book using the new language, that might be a different pot of fish.

Thatreminds me of the time I convince people I made a Lisp converter. I made an awk script that converted “s” to “th” and some other tricks. When you ran Pascal through it you go stuff like:


PROGRAM thtringth3 (output);
  { Extended Pathcal exampleth }
  {Thchematic thtring parameterth and 'domainth' }

  TYPE thtringp = ^thtring;
   VAR p1,p2: thtringp;

  FUNCTION ps (s: thtring) = p: thtringp;
  ...


If a language is unrelated to existing languages, then I suspect that it is uncrackable without some clues external to the text, such as parallel text in a known language, or a book with pictures in it.

Ok that answers my question.

So I would need a “rosetta stone.” So to speak.

How big of a text in my secret language and the equivilent in English (or any other language) would it take to decipher the language in full.

An unabridged dictionary. If I can’t write a passage in English that you find completely incomprehensible, someone on here can.

For any given text size I can produce you a made up language, and any piece of English text of your choice of any length I can custom-tailor you a (most likely usable) made up language and produce you with a translation of that piece so that it would still be undecipherable. You are venturing into the territory of fuzzy cryptography that humans brains are capable of on the fly, and the question of “How big must a plain text be to break any cryptosystem” has the unfortunate answer of “there is no such size.” You could have a failry usable language that has two entire novels that do not share a single common or related word, use different word order or what have you.

Maybe not a completely ridiculous question, if you add ‘for a typical real human language’, and accept that answers are going to be very subjective.

But remember, now there are two variables: how long is the ‘rosetta’ text in both English and Language X, and how much other untranslated Language X text you have. You’ll also have to specify what kind of texts you have: free-verse experimental poetry won’t be as useful as an elementary math textbook with lots of illustrated word-problems.

It is possible to design a language that teaches the reader how to read it, on the fly.

Here is an example of an attempt to do just that - a self-decoding message, theoretically readable by anyone with a knowledge of basic maths and science. When I first encountered it, I tried decoding before reading the key and explanation - and it really does work.

The reason it works is context - it starts off explaining itself in terms of something already known to the reader.

And to provide an example opposite that of Mangetout, consider how we could invent a language that wouldn’t be decipherable without a “complete” dictionary. For example, let’s make a language that doesn’t have prefixes, suffixes, root words, possessives, or tense forms: every time we need a new word (even if it’s just a modified version of an old word), we have a random number generator make up a new stream of symbols for us (checking to see that they’re not already in use for some other word).

Given such a language, you could never figure out a word by structure; you’d have to either have a parallel translation of every word in any text you want to read, or else figure words out by context.

Next, let’s take the hundred thousand or so most common sentences, paragraphs, and phrases in this language, and represent them as single (unique) “words”, again with the “random sequence each time” rule. Now we’ve removed a lot of context, too – you see something you don’t recognize, you can’t even tell if it’s a word or a whole sentence. “I have a dog.” might be “XfGIJUY”, and “I had a dog.” might be “AKLUop345”.

At this point, you’re getting close to the point where your parallel text needs to contain every word and sentence in your cipher text. And yet, to someone with a good memory who “knew” the language, it would still be intelligible.

Admittedly, we’re actively designing a language to be hard to translate here, but it’s one end of a spectrum.