I thought that the OP was asking how to learn every library. A quote from the OP -
What more can they do than yes or no? What more does Java, C, C++, or VB (or scripting languages) do than process the yes or no and continue from there into yet another yes or no? (or just null)
A honest question, machine language can do quite a bit more? What? Maybe? Can’t be sure?
I think this is a good example of how C++ can be used well. You are not alone in using it this way. I remember the Chorus operating system (essentially an offshoot of work done at Inria in France, and then commercialised in the 90’s) was written in C++, but they only used a limited subset of the language, essentially writing in C, but with better control. The Choices OS used C++ more in anger, but again, still kept to a limited subset. I have done things in a similar way as well. Where C++ gets out of control is when programmers drink the full Kool Aid and start madly using the less well advised language features. C++ also has scalability problems. Very large systems become hard to control as C++ simply doesn’t do things like interfaces properly.
A RTOS is a perfect example of where no true OO language can be easily used, it at all. There are sort-of predictable garbage collectors, but you would want to be very clear about the capabilities, and I would still be not real happy about their use. OTOH, use of automatic storage allocators has been shown to significantly improve reliability in systems. Ultra reliable real time might use a language like Erlang.
For scientific work, the go-to systems are often Fortran 95, 2000 and nowadays many scientists are using Python with Numpy. The modern Fortrans are nothing like the Fortrans of old, and the presence of intrinsic parallel constructs (as well as complex numbers as an intrinsic datatype) allow for performance that is difficult to realise in other languages. These Fortran systems can typically compile to use MPI across large scale distributed memory systems, or multiprocessor system, or hybrids. Something that is essentially impossible in other languages. Using OpenMP with C++ works OK on a multi, but only gets you so far, and remains much messier than Fortran.
I do a lot of scientific numerical work in just these systems, and the code is - faster to write, and gives better performance, than pretty much any other system. There was an effort in the early days of Java to convince the Java guys to add some high performance numerical capabilities - but these were rebuffed by Gosling and co. In particular the internal layout of multi dimensional arrays is not conducive to either vectorised or parallel operations, and their answer to adding complex numbers was that it wasn’t needed as you could do complex as a subclass of real. Which was just appalling lack of understanding of the issues. Sadly the JVM was designed using technology that went back to the 80’s and didn’t use a great deal of more advanced ideas that did exist when it was designed. (For instance a separate pointer stack.) This causes all sorts issues. But work done on the JIT compilers, especially the hot-spot system did claw back a lot of the run-time deficits. (I know one of the guys that worked on it - there were a very smart bunch.)
But at least Java is a real object oriented language - it does what it says on the tin. They all hail back to Smalltalk, but some do it better than others. Objective C is perhaps the best of the bunch in this respect. But you don’t get much choice in the matter. The environment you work in dictates a lot of what you can use. Using C++ in a Microsoft environment is not a lot of fun. Microsoft seem to only begrudgingly support the language, and in places admit that they would prefer not to. Things like exception handling don’t play well at all. I wrote one system that ended up explicitly slamming the (undocumented) internal exception handler pointer to get enough control of the system to allow things to work correctly. MS really prefer that you use their managed interfaces, and with good reason.
The OS for the Symbolics Lisp Machine was written in Lisp. Then again that machine had an architecture that directly supported the language.
The Lilith machine’s OS was coded in Modula-2.
Then you get into the question - what is systems programming versus the operating system proper? Writing code that executes in kernel mode, and against physical memory is a different beast to writing code that simply accesses deep OS interfaces. Unix kernels are written in C, and that goes all the way down to things like the kernel exception handler. You need very few kernel mode instructions. Device drivers are where it might get more hairy - but even here it isn’t hard to write them pretty much in C. I have dim memories of writing the virtual memory manager for one machine in C. Really pretty easy, and the compiled code no worse than hand crafted assembler. But mistakes can be made in any language, and the bad mistakes are more often made in the specifications rather than bugs in the coding. 20 years ago I spent a week debugging the exception handler in the 680xx version of Sun-OS, and finally found an exquisite bug that was the result of a misunderstanding of the machine architecture. It would have trivially allowed an exploit to gain total subversion of the operating system from user code. It was present in every shipped machine, and to my knowledge never fixed.
Of course, let’s be honest, if you’re using non-homebrewed numerical code you’re probably still secretly using a little Fortran via BLAS or LAPACK.
I’m not sure what exactly you mean by “intrinsic parallel constructs”, but if you mean strict aliasing rules, I believe C and C++ can be as fast/safe with -fstrict-aliasing compilation (which is the default for -O2), but pretty much everybody disables it with -fno-strict-aliasing because it’s easier to reason about until the memory bugs hit.
In my experience, no scientist under the age of 50 uses any flavor of Fortran any more, unless forced to work with ancient legacy code. For general-purpose coding, C and/or C++ is king, though specific fields might have other languages that are preferred (a lot of astronomy is done in something called IDL, for instance, because it happens to have a heck of a lot of library support that’s well-suited for astronomical tasks).
Well, no. Quite a few apps are written in some much less complex languages (such as Lua if you use the Corona engine, which I like). Then the engine translates it into java, but that is transparent to the developer. And the additional bonus is that the same Lua code can be turned by the engine into an IOS app, again transparently to the developer.
Not entirely transparently. If you end up needing performance beyond simple algorithmic choices you’ll probably have to hack some stuff independently for each target platform.
You’d be surprised. Some very sophisticated apps, both game-type and others, have been written using pure Lua in Corona engine. I’d give you mine, but they are under my name in the app stores, so don’t want to disclose it here. Here is one I saw recently but there are literally thousands.
Lua is a sinful pleasure to write in for any serious programmer - yes, it is the antithesis of proper strongly typed and structured programming language but it is so much fun to use.
Add, subtract, multiply, divide, jump to a given memory location unconditionally, jump to a given memory location conditional on the status of some flag, convert from integer to floating-point, convert from floating-point to integer, and hundreds of other things detailed in processor data sheets.
I agree, the paths between switches and intermediate gates in a computer are more meaningful than the fact that the switches are binary. While you can get away with fewer instructions than modern processors provide in their external APIs, you need something like an ALU to get a computer to work. Saying “it’s just binary” is far too reductionist to be meaningful.
Lua’s pretty cool. I maintain that you’d probably have to hack at some point, but it’s also plausible that that point doesn’t come up much in practice. I’m used to transpilation being pretty bad.
I don’t know; Objective-C is about as OO as Java, and its behavior should be pretty predictable because it doesn’t have GC but reference-counting, which should be trivial to predict from the source code.
I should really have written for high performance scientific code. If you have code that runs fine on your PC, I don’t count that as HPC.
What the various modern (starting with HPF and Fortran 95) get you is parallel constructs that the compiler can use to automatically create parallel code. So you get array operations that are not function calls, but simply arithmetic expressions. You get the forall and where clauses. These allow you to express the mathematics easily, an in a way that expresses exactly what you mean, rather than what the syntax forces you to mean. You can use Open MP to let many compilers get a little way here, but not all that far. A modern Fortran will generate MPI code, and will automatically distribute your data across many separate nodes, and will manage the transfer of information as needed between nodes. Modern machines are fast enough that a lot of people simply don’t need this capability anymore. A two socket x86 with a few tens of gigabytes of memory is faster than a top 500 supercomputer of a bit more than a decade ago. But for big data codes, and big problems, this sort of ability is nice.
Writing this sort of thing explicitly calling the MPI library is doable and there are lots of codes that do this, but if your problem maps to large scale array manipulation, and a great many do, this remains a nice paradigm.
For smaller scale stuff, or problems that don’t neatly map to large scale parallel data structures, you need to get deeper. Any sort of performance critical code needs to provide control of code and data locality, and for this you often need to get much deeper. Cache line alignment and cache behaviour can dominate the performance of your code, and you need to be able to control this. Languages with automatic storage management don’t exactly do well here.
In a previous life I was looking at possible new language mechanisms to cope with large scale scientific programming, but the simple reality is that there is a disconnect between the resources needed to do it well, and the number of people that would benefit (and thus fund it.)
Predictable, maybe, but trivially - not so sure. Reference counting implementations don’t have to be immediate, and so may delay their operation - making RTOS operations difficult. And of course they don’t cope with cycles. Objective-C requires the use of weak references to cope, but that assumes you have found all the possible cycles. In all, reference counting is probably the best answer, but it isn’t a free answer. RTOS is never easy. You get to work for your living more than most.
I think Rust could rival Fortran in a hypothetical future, since it has very strong guarantees about mutability and aliasing and it also adopts of ideas from modern programming language theory such as associated and algebraic typing. It’s young, but if people put in the effort, it could marry the aggressive optimizations performed both by Haskell and Fortran to various degrees.
Of course, as always, people have to be willing to make these optimizations. There’s only so much that can be done in-library. (On the other hand, Rust makes it really easy to program and import compiler extensions, so it’s not out of the realm of possibility to see an open source scientific compiler extension).
I’d encourage you to look into a Data Structures and an Algorithms course on something like Coursera. Those will give you the building blocks for most things you need.
You’ll note that libraries generally fall into three types:
Domain-specific libraries where if you know the domain you at least generally understand the library. These are things like Digital Signal Processing libraries. You don’t need to know these unless you know you’re working on a problem in this domain.
“Glue” libraries. These are things like GLFW, OpenGL, and such where they’re more or less heavily abstracted interfaces to something like your computer’s graphics card or a computing cluster or a web server. These generally take some effort to learn, but there’s generally little reason to learn them unless you’re working with them. You’d do well to look up the Twitter API instead of inventing a Twitter scraper from scratch, for instance, but knowing it isn’t required for programming in general.
General utility libraries. These are things like basic collections (hash-maps, dynamic arrays/vectors), sorting, and general data manipulation (reading in files and stuff, which crosses over a bit with 2). These tend to be somewhat language specific, but generally map pretty well between languages. In that if you understand printing or sorting in one language, aside from some function name or argument order weirdness you can pretty quickly pick up another language’s version. There are exceptions – game development has a bit of a plague of avoiding standard libraries and instead preferring in-house made idiosyncratic collections with fancy things like “memory pools” and “arena allocators”, but that’s a deal with it as it becomes an issue sort of thing.
You probably want to aim for keeping everything in 3 in your head first. There’s nothing wrong with looking up the documentation, but you use stuff from 3 so much that knowing it makes your life a lot easier. This simply comes from experience using them, as well as a general knowledge of common algorithms and data structures (hence why I recommended looking up those courses).
From there, 2 is important if you want to work in a specific domain. Maybe you want to learn some SQL interface or how to write Apache plugins or whatever.
Spending time “learning” things from 1 is generally pretty pointless since it’s usually better to just understand the underlying concepts and deal with the quirks as they come. I can’t think of a real reason to learn how to use the R package bnlearn instead of learning the theory behind Bayes Nets.
I’ll just throw in an off-topic comment slightly pertinent to C.
As operating systems, both Windows and Linux suck. I mean really suck! Massively bloated agglomerations of buggy ill-documented and unreadable C code in a massive monolithic kernel - at least that’s what Linux is. I get the impression Windows Kernel is a lot better designed and documented but it’s still HUGE!
And then the entire Linux GNU copies of original Unix utilities! Horrific! Obscure ugly C everywhere! We’re running code that is essentially 40 years old in concept and they are still finding bugs and they keep using it because that’s what people are used to! We at the mercy of evangelical sandal wearing tyrants.
And compiling code in Linux?? That horrific configure/make cycle from GNU should be consigned to the dustbin of history. CMake is better, but surely that area should be completely redone.
What the world needs is a new ultra-lightweight OS and new ultra-lightweight utilities and processes to make things happen in seconds rather than days.
I was looking at compiling Linux for the Raspberry Pi ARM processor (a nice little chip). I gave up when the estimates just for a cross-compile were measured in weeks to months!
At least Microsoft have demonstrated they can make small O/S such as WIndows CE (relation to NT only at API level). I don’t think anything like that has worked in the GNU/Linux arena. Maybe I’ll be pleasantly surprised? Probably not.
Configure/Make is a quirk of many languages that rely on things like highly configurable compiler toolchains. Haskell, Rust, Go, they all come with worry-free compilation and dependency management suites to varying degrees (unless they link with FFI C code, naturally). And of course interpreted languages like Lisp have a leg up here.
Part of the reason I dread working with C and C++ is because of cmake/configure/automake nonsense, but it’s not an inherent problem with Unix-likes so much as a small family of languages. You get similar problems if you do command line programming in Windows instead of IDE programming.
And OS Kernels will just never be clean. For one, the market won’t allow it because people rely on legacy software too much. For the other, there’s just too wide of an array of hardware and security quirks to make it feasible for general-purpose use. Could we do better with kernel code than we do now? Certainly, but any OS is going to get a glut of bad stuff after it has to deal with real world problems like buggy device drivers, security patches, hardware-specific optimizations and other things.
OS design is a favourite hobby horse of mine. Back in the day I was involved in a number of OS research projects. What killed them all was Linux. And it shouldn’t have happened this way. Indeed Linux is a very very old design. Famously Andrew Tannembaun challenged Linus as to why it was a monolithic kernel. In some senses Linus was right - for what he wanted to do going the traditional route was the best answer. He wasn’t interested in a latest and greatest, he was interested in a workmanlike solution to get an unencumbered OS. Ironic when you consider the genesis of Linux.
The work on lightweight OS designs was in full swing when Linux hit. Mach, Chorus, V-Kernel, and a host of others. Interestingly Mach lives on - both inside Mac-OS’s Darwin kernel, and the GNU Hurd. The big drop of the ball was the GNU Hurd. It got blindsided by Linux in a way that should not have happened, and was, IMHO, a problem with personnel.
But you can buy an argument. Some components arguably belong in the kernel. I remember when the micro-kernel fraternity bragged about how many lines of code their kernel’s were (smaller better of course) which totally missed the point. Performance matters, and the overall OS building matters. Just because your file-system/network-stack runs in a user mode process doesn’t make it somehow intrinsically wonderful. Brent Welch’s paper - The File System Belongs in the Kernel was an early observation on this.
In some senses we are still locked in the 70’s. Linux with its feet in the Unix kernel, and Windows as a clone of the VMS kernel. I always liked the VMS kernel design. But there were many aspects that were seated in a mix of very limited resources, and the need for special architectural features, that made the translation to x86 less satisfactory. But you did get a much more holistic design.
Things have moved on enormously, and many of the key ideas and abstractions just don’t really cover the ground anymore. But the ubiquity of Linux and Windows makes it very difficult to make progress.
Maybe you can explain why the hardware privilege rings in many processors are essentially ignored by Linux and Windows? It’s either userland or kernel-land.
Actually, showing my age, my first mainframe was a Burroughs B6700 which had 4 bits of protection on every word. It meant many of the stack corruption tricks used today by malware would have been very difficult to perform as the tag on every word specified amongst other things whether it could be executed or not
I think the privilege rings would have had a similar effect on security?
Exactly. Simple binary privilege. So far as I can see it is simply a lowest common denominator argument. You can get away with two, and since you can’t guarantee that any given target architecture supports more than two, anything more complex is ignored. The historical mess of microprocessor architectures that got us here is also to blame. VAX had 4 levels, so VMS did. Windows NT threw that bit away.
Man, people really don’t know what they are missing when you look back to some of the older machines, and the Burroughs was one of the greats.
My wish list of architectural features has tags as number one. But there seems to be scant hope we will see them. We get x86, ARM, driving the market, MIPS in the wings again, and nothing interesting. The Tera MTA was the last interesting architecture, and that is well over 20 years old. (And dead.) The only company on the planet that is in a position to change this is Apple, and I just don’t think they are interested.