Is !(p && q) exactly the same as !p || !q?

Seems like the most common usage of the term data type would include “int”. To say it’s not seems like a non-standard less common definition.

From GNU C Reference manual:
“int
The 32-bit int data type can hold integer values in the range of -2,147,483,648 to 2,147,483,647. You may also refer to this data type as signed int or signed.”

And that shows the issue. int is totally a type. If you are a hardware processor.

int is totally *not *a type if you’re operating at the semantic level where business objects interact within their specific domains. And that’s true whether your “business” is payroll checks or spacecraft guidance or video rendering.

Loading an integer hardware register with bits that are actually representing a float is a type error.

Adding an int that represents a count of days to an int representing a weight in pounds is also a type error. Just at different levels of abstraction.

50 years ago it was pretty magic to get compilers to be smart enough to enforce simply low-level typing like adding an int to a float. We’ve gotten better at higher levels now. A well-designed OOP object hierarchy builds out a lot of bugs. But far from all.

Last of all, once we have to send bits over a pipe or bits onto mass storage and retrieve them later, we don’t (generally) have round-trip type-safeness. Something like fully formatted XML Docs with full-up XSDs that define absolutely everything and can perfectly audit their data structures are about as close as we get to type-safty while data’s at rest.

But even then XSD is sadly lacking many necessary idioms.

One of my favorite bottom lines: Dev is about layers and ;layers of abstractions. downs or even a hundred of them. Invariants at one level usually propagate downwards pretty well (net of the at-rest problem). But they do not propagate upwards unless everybody involved is working very, very hard to make that so.

And even though there are very few computer systems that know about inches and days and that you can’t add them, you can enforce this sort of strict typing through style, using things like Hungarian notation. At least, if you use it correctly, which many people don’t, which has led to Hungarian notation getting a bad reputation.

And it can go even further than just units. I could measure both the length of my pencil and the radius of the Earth’s orbit in meters, and thus I can add those two lengths together. But it would be very, very rare that I would ever want to add two quantities like that together, and if someone told me to do it, it would give me pause. So one might want to create a meta-typing system that would give those two measurements different types, and allow them to be interconverted so they could be added, but make it just a little bit more difficult to do so, to make sure that that’s actually what is intended.

…And on thinking about it, I’m probably preaching to the choir, since I’ve a hunch that it was probably from Derleth that I first saw that link in the first place. But it rings very true to me.

My worst coding headaches all came from a complicated simulation code I was working with in grad school, which just didn’t work as advertised. The innards were a mess of different variables, and some of them were physical quantities (in who-knew-what units), some were base-10 logs of physical quantities, and some of them were natural logs of physical quantities. A naming convention (that was actually consistently followed) would have helped tremendously with understanding, but I couldn’t go changing the code at the level that would be needed for that. What I eventually did was to convert my copy of the code to RTF, and used color-coding for the different loggedness of variables. While I was at it, I also put in subscripts and superscripts in function calls, to keep track of which arguments were inputs to the function, and which were outputs (it was Fortran, so that sort of thing was commonplace). The first step of the compiling process was to collapse the code back down to plain text, so it was all the same to the compiler, but it made things much easier for me, the human, to read. And when I found places where a variable of one color was assigned to a variable of a different color, I knew that I had unearthed one of the program’s (many) bugs.

Of course, almost all of the variables in this program, aside from trivialities like loop counters, were floats. But they were different kinds of floats, and that was important to keep track of.

Modern C++ allows all this and more. You can create a system that annotates numbers with units and does all the compile-time checks that you expect to happen. Like:

auto dist = 12.3_meters;
auto t = 4.1_seconds;
auto speed = dist / time; // speed has type MetersPerSecond
auto speed2 = speed + (5_meters / 2_seconds); // fine
auto speed3 = speed + dist; // compile error: incompatible types
auto speed4 = speed + (3_inches / 2_seconds); // compile error: incompatible types
auto speed5 = speed + (3_inches / 2_seconds).convert<MetersPerSecond>(); // fine
auto speed6 = speed + 3_inches.convert<MetersPerSecond>(); // compile error

That’s just one way of implementing things; one could eliminate the convert() function if you wanted to make all unit-safe conversions automatic, for instance.

I agree with eschereal here: validation of user input is not type checking. The reason I bought IO up before is as a demonstration that “can get at the bit representation” has nothing to do with type safety. Since all non-toy languages can do so for IO purposes (disk, network, etc.), regardless of where on the strong/weak spectrum they lie, that cannot possibly be an element that distinguishes them.

You cannot infer anything from the binary representation anyway. 0x3f800000 is a perfectly legitimate (32-bit) int and float. There is absolutely no way to tell which one it’s “supposed” to be without additional metainformation.

All human user input is a string. There’s no way to input an int at all: just a string that may or may not contain an int, in some range of acceptable formats. Decimal? Hex? Empty string converts to 0? Ignore trailing non-digit characters? Etc. The answers to these depend on the design requirements and have nothing to do with type safety. And it’s something that all languages have to address, although some may make it easier to arbitrarily choose one set of rules (atoi(), int(), to_number(), etc. have some behavior that may or may not match what you want).

If a machine communicates with another machine and platform type is uncertain, they will almost always exchange data in ASCII/UTF. However, to be picayune, in the modern age most human input is not text but three numbers (two of which can be either integers or floats, depending on the system). I imagine that some users can go whole days without entering any text.

“Please input your area code by moving your mouse that many pixels to the right” :slight_smile:

It isn’t all that hard to create type systems that manage such things as dimensions perfectly well. You could add an understanding of inches and SI, but usually you would not have separate types for inches and metres.

The problem with manifest constants is that they are the base type. You would end up with code that looks like:
length = Length.from_inches(0.1) + Length.from_SI(1.0)
but from then on in you could use “length” with perfect safety, and code that looked like:
wrong = Time.from_days(5) + length
would get you an error.

Simple typing when you have named types can be used to simply change the effective name of your floating point number or integer, so that the compiler complains when you mix them. The nice thing about that is that you don’t lose any speed, it just compiles down to the base type.

But you can step up to add ons like the uncertainties package in Python that seamlessly (nearly) adds tracking of uncertainty in numerical calculations (and is very smart about how it does it - even calculating derivatives as needed.) You can very nearly just import the package and change the type of the instantiated variables and let the rest of the code roll. Very cool.

Hmm, it took forever to add the above post, interrupted by multiple phone calls. So the conversation had moved on a bit.

I’ll not the IO is a funny beast. Most languages provide support for arbitrary IO of bytes, so you can always get your self into trouble if you want. Most environments provide support for fully type safe persistent storage and usually that extends to fully type safe inter-process and inter-machine communication. Python provides any number of pickling systems, minimally its own pickle, plus Numpy will serialise Python classes including Numpy arrays. Inter-machine can be a problem as you can and will run into endienness and machine representation issues. Again, these are not difficult to abstract over, the underlying libraries just need to be written to cope. Just about any modern data storage system maintains pretty good meta-data - where you get into some trouble is when moving between language as you very quickly find yourself working at the lowest common denominator. So high level semantics are hard to manage. But formats like NetCDF provide pretty reasonable bases for working in a pretty safe manner. (I regularly have to cope with the same NetCDF files with Fortran, C++ and Python. It mostly works pretty well.)

Really high level languages support things like the ability to communicate entire object closures between machines. This gets pretty hairy, as you can get into all sorts of interesting strife. Your type checking and binding mechanisms can become interesting hybrids.

For some definition of “perfect safety”. As I’ve said, from my perspective, runtime errors aren’t good enough. They’re exactly as bad as just crashing, because from the user’s perspective the same thing happened; the app stopped working mysteriously. Python can’t even catch trivial cases at compile time, like:

print(1 + "hello")

If the call is buried in an infrequently-used function that slipped through my unit testing, I’m screwed.

The uncertainties package sounds pretty cool, though.

Well that is in part why you have coverage tests. Bad code buried in infrequently used bits are not peculiar to any language. You need coverage tests to weed them out.

You can get static types in Python, but it gets a trifle convoluted, and arguably no longer really Python. OTOH, unless you can reason about every possible exception your code can throw, none of you code is really reliable. Any code that can crash with any sort of unhanded exception is IMHO incomplete. This is a viewpoint I have developed over the years with a whole range of projects. Static typing helps with a certain class of problems, but only some, and usually the easiest.

I usually quote the loss of the first Arianne 5 as a brilliant case of a mix of software engineering failures. Having a real time control executive that was unable to catch and handle every exception was perhaps the most obvious of a number of blunders.

No system is perfect–that much is certain. I don’t think anyone has produced a successful non-toy language where you can prove correctness.

One can certainly show that, for some given system, every exception is caught. I don’t know that I’ve ever seen an example where this wasn’t achieved by–in some fashion or other–having some universal “I have no idea what’s happening” handler.

So the rocket catches the numeric exception, say, but still blows up because it doesn’t know what to do in that case. The Ariane 5 (which I almost brought up earlier) is certainly a good example. The numeric exception happened despite the input being legitimate. And it wasn’t necessary to terminate the rocket anyway, because the code wasn’t necessary for the core guidance loop. And yet changing the language or exception model or whatever likely wouldn’t have made a lick of difference, because fundamentally they didn’t anticipate that input and so couldn’t possibly reason about what to do with it.

Static typing may well only handle the simplest class of problems, but nevertheless, most bugs are pretty dumb when it comes down to it. The hard bugs (race conditions, etc.) are rare almost by definition.

Spark is a subset of Ada that provides a pretty nice set of capabilities that gets you a long way towards a useful level of provable highly reliable code. It has been ages since I worried about Ada in anger, but reviewing where things are, you would have to say that it delivers in a lot of areas where I would shudder to imagine getting reliable code in something as evil as C++.

Ada sort of got sidelined in the popular view of programming languages - especially by Java. In retrospect I don’t think this was a totally good thing.

E67584653: Syntax error: “;” , expected noun
E57493242: No overload of operator “or” accepts operands of type “direction” and “int”.

  • runs away, cackling *

Whoa, really? I’ve contemplated something like this before, but always came to the conclusion that it’d be way too hard. How does it deal with trig functions, for instance? A small error in an angle can lead to an infinite error in a tangent (in fact, any nonzero Gaussian error on an angle input leads to an infinite standard deviation of tangent’s output). It seems to me that, in order to be able to do this to any useful degree, you’d have to store not just summary information like standard deviations, but entire error distributions.

From the users perspective, an exception that is handled gracefully is frequently much better than a crash.

If you are trying to save your spreadsheet you’ve been working for a while, and there is a problem parsing the name of the file you want to save it as due to some mishandling of special characters, would you prefer it crashes and you lose your work, or that it gracefully reports there was an error trying validate that file name?

For the environment I work in (business/ERP software), there are relatively few data type related bugs compared to higher level bugs.

You’re compiler misdiagnosed the error. Read “downs” as “dozens” and suddenly it makes more sense. I forgot to prood & edit until it was too late.

Understood; preferences in this stuff depends greatly on the use case. My environment is essentially high performance computing, where user input validation plays almost no role, but it’s very important that not a single clock cycle goes to waste.

Of course I prefer that things fail gracefully than crash, but if you failed to validate the input (using traditional, not exceptional means) then we’re a little past the truly graceful part. Either you’ve fallen back to an outer, generic handler (with, in all likelihood, a less-than-useful error message) or the API you’re using fails with some error code (also useless to the user). Either way, things have failed mysteriously. Maybe better than a crash but with no indication of what’s wrong, they may well have no idea how to fix things so that they can continue and save their work.

Plus we’re veering back into user validation stuff, which I don’t feel really belongs in the category of type errors. Type correctness exists to prevent programming errors. Users have no hope of detecting and working around internal errors of this kind.

HPC is indeed a very different beast. The trouble here is that there are no languages that really get you what you need, you absolutely have to know what the damned thing is going to compile down to, and have a very good idea of what the hardware is doing under the hood.

In this respect something like the C++11 template language is actually rather nice, as you can essentially write your own smarts into the templates and have some control over the paradigm. When you are worried about cache performance and pipeline stalls you are very much in a different world. Parallel is at least passably helped by things like OpenMP, but even here, if you don’t really know what is going on under the hood, you are not fully in control of the performance. Big parallel is still a bit in the stone age with things like MPI. Language level support for things like release consistency would not go astray. Or tagged memory, but that is something of a vain hope.

I’ve been in the position of coding my own coroutine kernel to slip in under C++ code for a discrete event simulator in order to claw back a pretty substantial amount of performance (about doubled the speed of the sim) and here every cycle was crafted, with a very careful eye on cache and pipeline performance, as well as such mundane things as instruction decode and I cache performance. You are way under the level of language definition, and right in the guts of language implementation.

I’m insane enough to still have Fortran2000 code with OpenMP, forall and where clauses in it. For the right platform and problem this is actually a pretty nice way of coding. That is it called from the Python to Fortran binding API to get from the high level driver harness is also altogether a nice way of working. It is actually not too bad from a type safety point of view.

HPC can be a huge amount of fun. I rather miss some of it now.