One area where pure functionality is a huge win, it might be mentioned, is in concurrent programming. Actually, one interesting place where I saw this recognized was in a talk given by Tim Sweeney (developer of the very popular Unreal Engine) entitled “The Next Mainstream Programming Language: A Game Developer’s Perspective”. It was interesting to see a mainstream, performance-minded, down in the trenches instead of up in the ivory tower, etc., programmer recognizing the merits of Haskell-style programming (in addition to concurrency, he focused on reliability benefits which fall naturally out of type safety), making such statements as “Purely Functional is the right default”, and essentially expressing his admiration of Haskell and desire to revamp UnrealScript in that direction. Nothing he says is a shock to seasoned functional programmers or PL researchers, but it makes me happy to see this kind of recognition from such canonically mainstream quarters (videogame programming! The example that always gets trotted out for people to say “You’d be crazy to use anything other than C/C++; think of the performance!..”).
What is it that strikes you as unsupported nonsense?
I don’t know why any fan of functional programming languages wouldn’t be impressed with this move. It’s not like the FPL community has been cut out of the equation, either - Phil Wadler was an adviser on the project and Don Syme is well known, too.
Another advantage of FPL’s is that they’re particularly amenable to analysis and synthesis. There’s a reason why nearly all research in the synthesis of provably correct programs has involved a pure fragment of some FPL, and it’s also no coincidence that Isabelle/HOL’s input language is remarkably similar to Standard-ML.
Finally, programs in FPL’s tend to be incredibly terse, dispensing with many of the quirks of traditional programming languages, like explicit type annotations. Type inference in languages such as those in the ML family also has the advantage of guaranteeing certain properties of programs that are correctly type checked - SML programs are impervious to segmentation faults, for instance.
Actually, Backus’s suggestion was something slightly different. He proposed “function-level” programming, rather than functional programming. The two are not the same.
It still seems that Haskell programmers like to put explicit type annotations in their programs. Don’t ask me why.
(It should be noted for those playing at home that Standard ML, OCaml, and Haskell pretty much define the ML family these days, though there are others.)
Oh, yes. I love the type system except when I hate it.
The upside is that programs are guaranteed to not be absolute nonsense: The old C problem of passing arguments that make absolutely no sense to a function is gone entirely, and with it are gone some of the worst bugs C programs are heir to. Also gone is the extreme rigidity of C++ and Java: Functions only specify what they need, not irrelevant details (float? integer? Who cares! They’re numbers and you can add them, you stupid freaking compiler!) that only serve to trip you up. C++ templates come close, if you ignore the fact they’re an unholy hack resulting from the Common Lisp macro system getting drunk with the C preprocessor.
The downside is that you need to spend more time thinking about types and the compiler isn’t going to let you get away with anything. If the function really does need floating-point numbers, you can’t serve it integers and expect autoconversion to save you. (This is stupid to me, but some people like that kind of control. Freaks.) Plus, they all use the ML syntax for describing what types a function takes and what type it returns, which is very concise and mathematical but doesn’t look like anything at all to someone who comes from C# or C++.
If you mean the headers that they write for functions:
length :: [a] -> Int
length [] = 0
length (h:t) = 1 + (length t)
Then it’s to ensure that the code they write actually has the type that they want. A kind of type-directed programming. You get the same effect with signatures for modules in SML and O’Caml.
I think you’ll often see explicit type annotations right before function definitions, but on something like “let x = f(a, b, c) in blah blah” you’re rarely going to see an explicit type annotation on the x (maybe if it’s a recursive definition), which is different from the C/Java style of explicit type annotations being required on every variable declared anywhere. [Not that it’s the biggest difference in the world or inextricably tied to the functional paradigm, but, for what it’s worth, that’s how things stand].
To be fair, out of ML-like languages, I think only Haskell does this at all decently (i.e., lets you create a function whose argument type is “Things which can be +ed”); Standard ML is just as ad hoc about operator overloading as anything else (except for where it has its disgustingly specific hack of equality types) and I don’t think OCaml has any operator overloading at all. Even Haskell occasionally has you wrangling semi-frustratedly with the compiler around the various levels of the Num hierarchy; but, in general, yeah, its system of type class polymorphism is quite beautiful (and convenient!), far and away a better approach to the issue than I’ve seen anywhere else.
An uninformed question if I may: what’s wrong with typing? I had always learned that it was in fact a good thing. How does a compiler know what to do when a string literal is added to a numeric constant, for example?
If that’s directed at the functional/ML camp, then, nothing’s wrong with typing. We’re the strongest supporters of strong static type systems there are; we love types. But what Dominic Mulligan was talking about was “type inference”; often, a compiler has enough information available to it to determine the type of an expression (or, perhaps, the most general possible type in a polymorphic system, or some such thing, depending on the details of the type system) without the programmer having to explicitly specify all or any of it. (For example, if you declare a new variable and then assign a string to it, it’s clear that the type of the variable is a string; if you declare a function which sometimes indexes into its input like a list and prints the third element it finds before returning some value, while some other times it prints the result of a recursive call to itself before returning some value, then it’s clear that the function’s type must be List of Strings -> String [assuming that the print function takes a string as its argument]; if you declare a function which takes in an ordered pair and returns a swapped version of it, then its most general possible type is the polymorphic type forall a, b. (a, b) -> (b, a).). In such cases, it can be greatly convenient, especially when rapidly whipping up code, for the programmer to be relieved of the unnecessary burden of redundantly specifying this type information, since the compiler can already figure it out without him; but don’t get the wrong idea, the types are very much there, even if not explicitly mentioned. The compiler knows about them and ideally the programmer should realize what they are as well. He just doesn’t have to write them down if he doesn’t want to (though, in many cases, it is good practice to do so anyway).
Liberal: Type information is good. It should be available when the program is being run, in fact, something C and C++ can’t help you with. (I don’t know if the ML languages allow it, either.) That is because you want failing programs to give you good error messages, and you want to make it easy to write functions that don’t fail. (Given a string? Detect it and see if you can convert it to a numerical type before screaming and dying.)
What’s bad are two things: Type systems that don’t help you, and having to declare things too soon.
Type systems that don’t help you are type systems that don’t allow you to treat similar things the same way. For example, numbers can be added. It’s a property of all numbers from small integers all the way to large complex floating-point numbers. (Even quaternions and octonions, if you want to get freaky.) A type system that doesn’t allow functions that just add numbers to take all numbers is quite stupid. Look up the philosophy of “duck typing”: If it walks like a duck and quacks like a duck, a function can treat it like a duck. (It also allows you to duck typing out explicit annotations. And, presumably, duck soup as well.)
(The absolute iron-clad worst example of an unhelpful type system was the one attached to standard Pascal. It treated an array’s size as an immutable part of its type, meaning you could not write functions that handled arrays of arbitrary length. In specific, you could not write functions that handled strings (arrays of characters) of arbitrary length. You had to decide on a length for every array and pad all of them out to that length, which meant string constants could have a lot of trailing spaces. Wirth has much to answer for.)
Type systems that make you declare things too soon are type systems that make you decide on optimizations before the basic design is done. Should this sequence be an array or a singly-linked list? Should this number be large or small? How in the blazing holy hells am I supposed to know that now?!? I haven’t even gotten the algorithms fully fleshed out yet! In fact, I may never make a firm decision on some things: Most of the program is not speed critical and does not need to be optimized. I might make second and later passes through the code with profilers and decide what needs to be tightened up, at which point I do want to choose the specific representation of sequences in memory. But not up-front. Never up-front.
Obviously, I’m more in tune with Common Lisp than any of the ML languages. (I even know CL better than Scheme at this point.) But I think the basic points should port over to some extent, and if they don’t it’s all the fault of those ML languages and their freaky disciples.
Thanks, both of you, for those explanations. A follow up question, please.
I reckon nothing is more untyped than VB Script. All variables are “variants” (a catch-all data type that can be anything from an integer to a date to an array to an object from a DLL). Things can get squirrely, though, with comparison operators. Maybe it’s just intrinsic to VB’s longstanding problem of using “=” for both comparison and assignment, but when you’re testing a variable’s value, weird shit sometimes happens.
For example, say you read a value passed from the previous page on a form post. It’s an explicit zero as opposed to an uninitialized zero, since variants initialize as empty strings. It’s hard to know which to expect, and If x = 0 may choke because the interpreter is expecting If x = “0”. Or even worse is an assignment from the field of a SQL table. The default property of a field is its value, but if an object is assigned from a field without the value property explicitly written, the default property of the object is what the assigned variable holds.
Are the functional languages y’all are talking about more robust than this? I assume they are, but what do they do exactly that blocks all these gotchas and surprises?
It sounds to me like VB Script is what is known as “dynamically typed”; rather than variables and other expressions having fixed types known at compile-time, types (to the extent they exist) are associated with particular data values instead. A variable is free to hold an integer at one moment and a string the next moment. The data itself will be tagged with some information about its “type” (as opposed to, say, assembly, where it’s all bits interpretable however you want), but there is no compile-time typechecking process, with type errors popping up at run-time instead (for example, the compiler won’t ensure that your program never attempts to add a numeric values to a string value; you won’t find out about the error until you actually run your program and it executes that attempt). The opposite of this (expressions have fixed types and programs are checked at compile-time to guarantee a complete lack of type errors in anything that makes it through compilation) is static typing. Just to make sure there’s no confusion, functional languages do exist in both these styles; the members of the ML family are the major statically typed functional languages, while the members of the Lisp/Scheme family are the major dynamically typed functional languages.
I’m not exactly sure what you mean by your examples (in particular, I find the second one difficult to parse), but I’ll try to speak to the first one. In a statically typed language, the variable x will have a fixed type assigned to it at compile-time (either explicitly specified by the programmer or inferred less directly from his code, but it all amounts to the same thing in the end). If that type is not one which allows an equality check against 0, then the compiler will stop when it sees If x = 0, and output an error message regarding the ill-typed expression. Similarly, if that type is not one which allows an equality check against “0”, then the compiler will report a type error in If x = “0”. Thus, it will not be possible to get the compiler to accept code which would only lead to heartbreak later because of an illegal comparison; such code is just as malformed as if it had a glaring syntax error. This ability to have the machine catch errors of this form with perfect reliability upfront rather than waiting for them to manifest at runtime or having to do a variety of unit tests or what have you to try to preemptively find them, well, I think it’s extraordinarily helpful, fundamentally The Right Way to do things, and I have difficulty understanding why anyone would forgo it (to me, in a high-level language, that would be a step backwards the way a full jettison of the concept of types and return to the anarchy of assembly would be). But, dynamic typing does have adherents in spades who somehow feel that static typing is more restrictive than helpful, so my own particularly colored views are certainly not universal. (To be sure, if a type system is not expressive enough to express the concepts you want [say, “Those things which can be added, checked for equality, and printed” rather than merely such concrete types as “Integer” or “Floating-point” or such], then you’re in trouble, but this seems to me an issue which is orthogonal to static vs. dynamic typing, despite occasionally being brought up in that context).
Type inference in a strong, statically typed functional programming language is usually some variant on Hindley-Milner type inference (lots of algorithms in programming language design are double barelled in their name - the first name is usually a logician who first discovered the algorithm for lambda calculi, for instance, the second is usually a computer scientist who rediscovered the algorithm for practical programming languages). The type checker is basically a theorem prover that takes a string of the source language and deduces a proof of the type correctness of that string.
Type systems are normally presented in a natural deduction style. For a simple language of integer expressions with simple boolean expressions, you’ll have something like this:
|- n : int (INTAX)
|- b : bool (BOOLAX)
n : int, n' : int |- n + n' : int (INT+)
n : int, n' : int, n'' : int, n''' : int |- if n = n' then n'' else n''' : int (IF)
A simple deduction in this system would be the proof that the expression “5 + 3” has type int:
_______(INTAX) _______(INTAX)
5 : int 3 : int
_______________________(IF)
5 + 3 : int
The advantage of this is that a) you can now bring standard tools from mathematical logic to bear on your type system b) properties of your type system can be shown to hold as theorems - for example, it’s a theorem that SML’s type system prevents all segmentation faults and c) the type system is completely described unambiguously, contrast this to, for example, C++'s standard, where many things are ambiguous due to the use of natural language.
So, to answer your question, every term in the language has a type associated with it. The type system provides axioms and rules of inference, and the type checker functions as a theorem prover which uses those rules to deduce a proof that a term is well typed.
There’s another axis which you’ve not mentioned: that of strong typing versus weak typing. Java is relatively strongly typed, Haskell is extremely strongly typed, Prolog and C are very weakly typed.
Can you elaborate on that? Admittedly, it’s been 25 years since I struggled through that paper (and I’m not going to do it again unless someone gives me cookies), but my memory is that Backus just took the concept a bit further than most.
Backus drew a distinction between value level" and
functional level" programming languages. Haskell, Miranda, SML, Scheme etc. are all value level programming languages - they have variables for storing values, and a program’s core purpose is to evaluate a function on some value, obtain that value and store it, modify it further, then finally return that value.
Contrast this to FP, Joy etc. which are ``functional level" (point free programming). Here, there are no variables. Programs are written just through composition of existing functions. Programs are provided with a fixed set of functionals, and new programs are written by combining these functionals. Programs therefore become easier to analyse and also become even more terse than functional language counterparts.
For instance, in a value level language, deciding equality of functions is equivalent to the halting problem. In a function level language, equality of functions is potentially decideable (or at least ``easier", in his own words) as equality within the algebra of programs.
Straight from the horse’s mouth:
Object level is a synonym for value level.
I didn’t explain it very well. The variable is free to hold (or reference) an integer or a string or what-have-you, but once an assignment is made to it, the type cannot change. For example, the syntax to declare a variable in VB Script is “Dim [variable]”. So:
Dim X
X = 0
X = "0"
The output will be “Type Mismatch” (on the line X = “0”)
My fault again. In VB Script, the dot operator references a property or method of an object, and so the syntax to read from a field in a database is like this:
Dim X
X = MyDatabase.MyTable.MyField("CustomerName").Value
But this also works:
X = MyDatabase.MyTable.MyField("CustomerName")
You can leave off the “.Value”. Either one will give you, say, “Elmo” because the default property of a field object is its value. However, in this case the field is holding a simple string (say an MS-SQL type NVarChar — a variable length string). But if the field were holding, say, a bitmap, then:
X = MyDatabase.MyTable.MyField("SmallPicture")
That will not return the value property of the field (i.e., the bitmap); rather, it will return a reference to the bitmap, which is pretty meaningless for the script. To get the bitmap, the code must specify the value property explicitly:
X = MyDatabase.MyTable.MyField("SmallPicture").Value
OK, now I understand what your question is, I think. In answer, most functional languages will not allow a declaration of a variable without also initializing it to some value.
For example, val x; is not allowed. You must use val x = 5;
Also, to simulate VB’s lack of typing (somewhat), it would be possible to use an algebraic type. For example, suppose you knew you needed a variable that could hold either an integer or a string, then:
datatype VBVariant = Integer of int | String of string | Empty
val variant = Empty
val variant = Integer(5)
val variant = String("Hello")
etc.
In fact, OCaml has two distinct sets of arithmetic operators for integers and floats. You use (+) for integer addition, and (+.) for floating-point addition — and the same goes for subtraction, multiplication, and division. In each case, both arguments must be of the same type. Also, the built-in math functions like sqrt and cos take only floating-point arguments.
So although polymorphism is a key feature of the language, it’s much more limited when it comes to basic numerics. I believe this was a deliberate design decision.
F# is supposed to be most like OCaml, as compared to other ML dialects, but I don’t know whether it copies this behavior.
Thanks, Dominic, for that response. Just one small item to clean up, so as not to leave a wrong impression. What I described was VB Script, used like javascript on Microsoft ASP web pages. VB itself (i.e., Visual Basic and VB Dot Net) requires typing at declaration:
Dim X As String
X = 0
Won’t compile. (You can also make an assignment with the declaration, but it must be correct type.)