# Lambda Calculus, CFGs, etc. - what are they?

I took some linguistics classes in college but I was always puzzled by its mathematical side (you damn near had to be a math or CS major to get by in a few of the classes). I’ve tried and tried to understand the significance of the following concepts, but I’ve never been able to wrap my brain around them:

1. Lambda calculus
2. Boolean algebra
3. Lattices
4. Context-free grammar
5. Context-sensitive grammar
6. Finite state machines/automata/Turing Machines

What the hell do these have to do with language? Please don’t link me to wikipedia articles, there are far too many symbols in those articles for me to make sense of them. Is there a beginner’s guide to any of this stuff? The textbooks that we used assumed a level of knowledge that most of us didn’t have. The textbooks also had the shortcoming that many textbooks do–there were a few trivial examples and everything else was left as an exercise to the reader.

Numbers four, five, and six have a lot to do with parsers (and lexical analyzers) that is – mathematical constructs that allow a machine to understand a language. Formalized context-free and context-sensitive grammars are used as the basis for creating programming languages, and the same ideas are used to try to understand natural languages from a machine perspective. (For example, to try to program a computer to understand English. (Good luck.))

I’m not sure how lambda calculus is directly related to linguistics; I’m only familiar with the very basics and its relation to functional programming. But functional programming languages are often used to create language parsers.

Anywho, here’s the very basic stuff, off the top of my head:

1. Lambda calculus is a mathematical system (a calculus) for finding the results of functions that operate on other functions, including themselves. You probably remember functions from basic algebra: f(x) = 3x + 2, for example. So f(5) = 3*5 + 2 = 17. The input of this function is a number (5) and the output is also a number (17). In computer science (especially in functional programming) we often want functions whose inputs and/or outputs are other functions, rather than numbers. Lambda calculus lets us talk about, evaluate, and solve problems with these kinds of functions.

2. Boolean algebra is a formalized system for dealing with truth tables. The idea is, given a list of things which are either true or false, and given the relationships between those things, what other things must be true or false? If it makes you feel any better, those boneheaded philosophy majors have to suffer through this one, also.

3. I dunno anything about lattices, except there’s usually too much in my salad.

4 & 5: When writing language parsers, we have to define a formal grammar for the language.

The grammar consists of tokens which must appear in a certain order; some tokens are terminals and some are compound, composed of multiple other tokens. A formal grammar defines all of these in a symbolic way. For example, we can define the following rules:

[ul]
[li]A sentence -> Adjective noun transitiveVerb[/li][/ul]

Adjective, noun and transitiveVerb are terminals because we can simply provide a list of all adjective, nouns and transitive verbs. Our parser can then look at a sentence like “Big dog barks” and find that it satisfies the rule of “a sentence.” But “The big ugly dog barks” would not satisfy the rule, because we haven’t defined any rules about definite articles and two adjectives next to each other.

With that in mind, a context free grammar is one in which all rules can be expressed as Foo -> Bar where Foo is a compound element and Bar is a terminal or list of terminals. In other words, it doesn’t matter in what context Foo appears, it’s always broken down the same way. A context-sensitive grammar is one which doesn’t satisfy that condition. They are much harder to write formal parsers for, but it’s still possible.

(The formal description of context-sensitive grammars is one of the only useful things Noam Chomsky has ever contributed to the world.)

1. FSMs also come into play when talking about parsers and languages, especially regular expressions, which you may or may not be familiar with. Put simply, an FSM is a machine (or computer program, which is a kind of machine) that always exists in one of a finite number of states. A light switch is a FSM with two states. Language parsers often use a simple FSM to keep track of what kind of token is being parsed at any given time.

…and that’s all I have time to write about or I’ll miss my train.

You’re in luck, I’ve published papers on the lambda calculus

Church originally intended to create a foundation for mathematics that dodged the paradoxes associated with naive set theory and seem more natural than the type theories proposed by Russel. Church chose the concept of the function as primitive, rather than set. Unfortunately, soon after, paradoxes in this theory were discovered, and Church’s dream of a new foundation for mathematics were dashed. However, all was not lost: the functional subset of the theory could be salvaged, and this became known as the lambda calculus.

Now, before the lambda calculus is explained, it’s probably better to explain it’s significance. The 1920’s and 1930’s saw a frantic effort to formalise the notion of computation, as researchers sought a definite answer to the Enscheitdungsproblem (“decision problem” in German). Basically, the exact form of the question is not really relevant. All that is relevant is the fact that a) the answer had grave consequences for the philosophy of mathematics, as well as its practice, b) loads of people were working on the problem and c) an answer entailed formalising the notion of computation/algorithm. Church was able to prove a negative answer for the Enscheitdungsproblem using the lambda calculus (Turing followed less than a month later, using Turing machines), and this is how the lambda calculus became significant.

Later, it was realised the the lambda calculus was also a prototypical computer programming language. To this end, the lambda calculus is still in widespread use today, with (typed) variants being the theoretical backbone of programming languages like ML and Haskell.

So, now we have a context for the development of the LC. Clearly, the LC holds a central place in theoretical computer science. Not only is it useful in computability theory, but it also is a simple programming language, to boot.

But what, exactly, is the lambda calculus? Well, it’s actually pretty simple. There’s only three formation rules. I’ll assume that we have fixed a set of variables. I’ll use lower case letters to range over these variables, and employ a permutative convention (i.e. “a” and “b” are two distinct variables).

• Any variable, a, is a term of the lambda calculus.

• If r’ and r are terms of the lambda calculus, then r’r is a term of the lambda calculus (application).

• If a is a variable and r is a term of the lambda calculus, then \a.r is a term of the lambda calculus (abstraction).

Here, “” in “\a.r” is usually read (and typeset) as a lambda (hence the name of the calculus).

Now, depending on your tastes and intentions, there’s a number of reduction rules (or conversion rules) that may be used to simplify a LC term. The simplest is that of alpha conversion, which encodes the notion that the name of abstracted variables does not matter. To whit:

\a.r =alpha= \b.r[a |-> b]

where [a |-> b] is a substitution mapping variable “a” to variable “b”.

The other rule that I will consider is beta reduction, which corresponds to computation:

(\a.r)t -beta-> r[a |-> t]

(There’s another rule, eta reduction/conversion, that corresponds to extensionality, which will not be considered).

ARGH: pressed the wrong button, see next post.

The short answer is that most of those are notions that are useful in studying programming languages, which are in some sense the simplest interesting languages. The thinking is that since natural languages have structure that’s in some way similar to the structure that programming languages have, they can be analyzed in similar ways.

The long answer involves a bit of history, which I’m going to oversimplify to avoid boring you to death. Back around 1940 or so, it was clear that computers were very powerful machines, and some people started to wonder if there was anything they couldn’t do. Real computers are too complex to reason about directly, so people set out to come up with simplified models that were more powerful than real computers and were much easier to work with. Turing machines and lambda calculus are two such models, and perhaps the most commonly used today. Programming language theorists love lambda calculus and modern variations, whereas most people who study computability theory study it in terms of Turing machines. For the record, there are things that Turing machines can’t do, and therefore real computers are similarly limited.

There were a bunch of these models proposed, and at the same time, Chomsky came up with his notion of generative grammars. As you probably know, Chomsky grammars can be placed in a hierarchy, and it turns out that each level in that hierarchy corresponds directly to a model of computation. That’s another reason why linguists and computer scientists took note of each other way back when.

It’s kinda tough to get into the specifics of each of these things without resorting to symbols. In brief, Turing machines and lambda calculus are two equivalent models of computability in the sense that any program that can be written as one can be written as the other. Furthermore, no one has ever written a finite-length program that can’t be expressed as one of those two objects.

Context-free grammars, context-sensitive grammars and finite state automata describe models of computation that are strictly less powerful than Turing machines. You can write every finite state automaton as a Turing machine, but only very simple Turing machines can be expressed as finite state automata.

Lattices are generalization of our notion of “greater than” in a technical sense. Boolean algebras are a special kind of lattice.

I don’t recommend buying it, but Ken Rosen’s “Discrete Mathematics and its Applications” is a great introductory textbook that will at least touch on all of these things (except maybe lambda calculus). If you’re comfortable with high school level math, you shouldn’t be totally lost reading it.

Now, using the lambda calculus, natural numbers (0, 1, 2, …) and booleans (true, false) may be encoded, as well as functions that act on them. In addition, it’s relatively simple to prove (once you know how), that the LC is consistent, and LC terms really do encode functions (that is, starting with any term, all reduction paths eventually end up at the same place).

So, whilst the LC is very simple, it’s also very expressive.

So why is the LC useful in linguistics? Well, I’m no linguist, but I’ve sat through conference talks where linguists have explained what they do with the LC. Basically, it seems that they assign to each LC term a type, corresponding to a grammatical formation rule. They then have a translation function that maps English sentences to typed terms of the LC. This then provides a precise semantics for the English sentence (the LC itself is very simple, and it’s semantics are generally well understood), and the reduction rules somehow correspond with the simplification of the sentence (I’m unclear on this point exactly what goes on from a linguistics viewpoint).

Do you want primarily to know what these things are in themselves or how they connect to language? (Or both?)

1. The lambda calculus is a mathematical system for describing functions/predicates, and particularly their “higher-order” properties. Connections to language come through Montague semantics, which treats many aspects of natural language as such functions and higher-order functions, and work such as Lambek’s on syntax.

For example, consider a sentence such as “Every kid likes candy”; we might want to know its truth conditions. We can think of “likes” as a function of two arguments; it takes in a ‘liker’ and a ‘likee’ and outputs TRUE or FALSE according as to whether the former likes the latter; “likes candy” is then a predicate of one argument, obtained by fixing the second argument of the former. Finally, “Every kid”, like other constructions using determiners, is a higher-order function: a predicate whose inputs are themselves predicates, in this case outputting TRUE or FALSE according as to whether the input predicate outputs TRUE of all input kids. This shows applications of lambda calculus to semantics, but similar applications exist at a purely syntactic level as well (for example, one can think of lexical classes/parts of speech as being particular “types”, some of which act like “function types”, accepting arguments on the left or the right so as to produce complex phrases of different types. ).

1. Boolean algebras are mathematical structures which capture the essence of propositional theories (i.e., those interrelationships between the truth conditions of various statements in a language which involve such logical connectives as OR, AND, NOT, etc.). Obviously, they come in useful in semantics as well.

2. Lattices are mathematical structures which capture the essence of the order-theoretic notions of “least upper bound” and “greatest lower bound”, which can model notions such as OR and AND, union and intersection, etc. In particular, Boolean algebras are very special kinds of lattices.

3. Thinking of a “grammar” as a set of rules saying which sentences (strings of lexemes) are grammatical, a context-free grammar is one which can be given by a bunch of rules of the form " An instance of the lexical class A can take the form B_1 B_2 … B_n (where each of those B_ks is another particular lexical class, and primitive lexemes all have particular lexical classes assigned to them). It is the second lowest level of Chomsky’s four-level hierarchy for complexity of grammars.

4. A context-sensitive grammar is like a context-free grammar, but the rules can also be restricted so as only to apply in particular “contexts” (i.e., only when surrounded by terms of particular lexical classes). These are the third lowest level of the Chomsky hierarchy.

5. Finite state machines are basically computer programs that only use a fixed amount of memory. The grammars they can “recognize” are the lowest level of the Chomsky hierarchy. Turing machines are just arbitrary computer programs. The grammars they can recognize are the highest level of the Chomsky hierarchy.

ETA: Man, I am way late to this thread. When I started typing that, there were no replies yet. Well, time to see how redundant I’ve been.

Lost in pre-posting edit-land: The main linguistic significance of finite state machine grammars/the lowest rung of the Chomsky hierarchy is that, being recognizable using only bounded memory, these cannot contain any nontrivially recursive constructions.