Do you have to know the PARTICULAR computers hardware for C programming?

One other point to make… from my “long time ago in a galaxy far, far away” memory of the topic. The compiler takes care of translating higher level variable names/pointers to whatever registers are appropriate, etc. for machine op codes depending on the hardware of the machine. However, there is usually also a step called the optimizer, that removes excess and redundant actions. For example, if one action does the math then stores that value back to RAM, and the next line of code/action loads that value from RAM into the register, then that is obviously the load is not neccessary. If that next line then modifies the value of the variable again (in the register) and saves the new value, then that first save to RAM was redundant too. The optimizer scans for redundant code from the compile output.

Sometimes these optimizers get very clever. There’s the story of someone who was testing their FORTRAN code back in the mainframe days, decided to see how long the complex calculation would run. It should have run for a decent amount of time, but finished in microseconds, no errors. Why? because they didn’t output the results, they were just looking for runtime. The optimizer realized nothing was output, and eliminanted the whole calculation sequence where no output was required, since it did not affect any output from the calculation.

Optimisers are where a lot of the magic lives. Lifetimes of effort vanishes into just this topic.

Optimisers and memory allocation is a trap for young players when debugging code. The optimiser may well leave a variable in a register and never allocate memory. The debugger comes along and you ask it what the variable’s value is. No such variable.
Debugging often requires that only the basic optimisation passes are applied. Optimisers are not foolproof either. Finding bugs introduced by optimisation is not for the faint hearted.

Optimisation is what made RISC architectures fly. The ability of the optimiser to allocate registers to data and thus avoid a significant amount of data movement was the difference to elevate them past CISC architectures. When AMD created the x86-64 architecture they doubled the number of registers and that alone got a significant speed improvement.

The other side of the coin is when memory is special. When you need the compiler to understand that it cannot optimise anything about use of particular locations. Like with memory mapped I/O. Certain locations in memory are mapped to device registers, and they don’t behave well.
Write a certain value to a location and it configures the device. Read from the same location and you don’t get the configuration back, you get the device status. Write to another location and the device emits that value (say over a serial link). Read from the same location and you get the last character received.
You need ways of telling the compiler not to assume anything about those locations.

This is very common for things like embedded systems. So here we have hand crafted memory layout for code. It goes further as you need to configure the system not to cache such locations, so code gets to go very deep. This becomes very architecture specific. Right down to individual variants of an architecture. Again, C is probably the default language.

Worth noting that the Gnu Compiler Collection (the software previously known as the Gnu C compiler, aka gcc and its associated libraries) was the critical enabler of all of this. The advent of a free open source compiler system was a watershed moment. It enabled Linux, and pretty much all the embedded systems. Hardware vendors didn’t have to develop their own compilers (which was never provided for free, and they guarded the IP ferociously) and the ecosystem was able grow faster than anyone could have imagined.

You can even produce a compiler for hardware that doesn’t exist.

In one of my jobs in Silicon Valley we did just that: we wanted to write software for a processor that was still in development and didn’t exist in silicon form yet. So we wrote a software emulation of the hardware, and used that to run the actual software we were developing.

Millions of times slower than the real thing, of course, but allowed us to make progress while the hardware folks were still trying to get their Verilog to converge.

No worries, that chunk is still in scope.

We will release it when we’re done: we are all careful programmers here…

Right- it’s not assembly, but it is about as low as you can go otherwise.

Basically you have to do all sorts of stuff that higher level languages do for you. For example, if you’re creating a linked list in C, you have to define the list as a structure with data and pointer(s) to the next node, then when you create nodes of that structure, you have to explicitly allocate memory to that node. And when you remove them, you have to deallocate that memory.

In something like Java, you just instantiate an object of the LinkedList class, and then you can manipulate that with the built-in methods, which automatically handle the memory allocation and deallocation.

Metaphorically, it’s kind of like having a stick shift car with dial gauges, versus having a modern automatic transmission computerized car with idiot lights. One gives you much more control over things, but is more difficult and involved to deal with, while the other is more abstracted, but easier and simpler to deal with.

Oh, for sure. I suspect that in the good old days when divergent hardware was coming from somewhere every week, compilers were written before the hardware was ready. In some cases you could emulate the new hardware in microcode on some other machine and run the code from the new compiler on it.
I wrote code generators for hardware that did exist, but which I had no access to. And in my compiler class I’m pretty sure we generated code for a nonexistent and fairly simple virtual machine.

The C language combines all the power of assembly language with all the ease-of-use of assembly language.

– Mark Pearce

A lot of the stuff that required you to know the particulars (bits per byte, bytes per int, etc.) were packaged into some headers (limits.h and stdint.h). Most C programmers seemed to be entirely unaware of them, and kept writing their own equivalents back when I was doing C programming. You can do a little pointing and dancing in code to get the endianness but headers and some other tools exist for that, on some platforms and versions of C.

So, technically, you don’t need to know the particulars for most things. However, depending on what you’re doing you might need to understand the scope of the particulars, so that you can use the above tools to write cross-platform code that will execute correctly on all of them without having to care about the specifics of this one versus that one.

For most tasks, that’s probably not necessary, though. 95% of code that isn’t doing cryptography nor parsing binary file formats won’t need any of that.

For some tasks - like if you’re writing a driver for an embedded system - even the headers and endianness check are still insufficient because you’re literally dealing with the hardware and need to communicate with it in its preferred protocol of boops and beeps.

But for that, if you’re asking this question then you’re not writing that sort of code, anyways.

Dealing with modern code, it seems many devs manage to learn most of the syntax & semantics of the language but little of the libraries that are part and parcel of developing professional code in that language.

That was also true in the 1970s when I started, but the relative size & complexity of libraries has exploded versus the size & complexity of languages. Even back in 1990, if you’re rolling your own in C, you’re doing it wrong. Now, 35 years later that’s far more true. But folks keep wanting to re-invent date handling or string parsing or whatever.

Good programmers write good code. Great programmers steal good code.

If only there was a way to identify which libraires contain great code. And will continue to be maintained so as to continue to be great.

I dunno, some libraries are pretty junky and you’d do better to roll your own than use them. A lot of them are just something made by some newbie and published to the web. On the inside they’re crap, with a horrible API.

Of course, there are others like Google Protocols that are god tier and I shake my head every time I see that people are doing their own serialization. (And GSON… And Guice… Basically, just use Google libraries - not sponsored.)

The one thing that’s sure, though, I’ll throw a pen at the next person that tries to roll their own CSV parser. Read the f’in RFC y’idjut.

Agree completely. Same for people that roll their own CSV serializer.

Maybe we could sentence both groups of devs to have to produce / consume each other’s work exclusively. That’ll learn 'em. And get them the hell away from us so we can get real work done, not fight with their mistakes in corner cases.

Gosh I’m glad I’m out of that business for awhile now (15 years!). My tolerance for frustration has never been high and is not getting better as I slowly geezerize.

I vividly remember my first 3 day Visual C class that my employer paid for me to attend.

We had a simple little program. Read three records until EOF from a text file and print the name, address, and DOB formatted as January xx, xxxx.

10 minute job in COBOL. I always have empty skeleton templates saved. Plug in the physical filename location, create a month names table in working storage. My priming read and next record loop is already in the template as InFile-process. My loop to print to Staff-Report-File is also in the template.

I looked at what we had to write in C in horror. Why would I turn the clock back 15 years and put myself through that work? I wrote Assembly code in college.

I know now that libraries can be purchased and added to C that support data processing. Our little 3 day class only had the most simple install.

I came back to my boss and explained migrating to C for any SQL database reporting was a very bad use of our staff resources. (Me)

I tell people that at least half of Google’s competitive advantage is due to Protocol Buffers. It turns a problem that people spend time worrying about to a problem that they don’t, which frees up a lot of time to work on everything else.

Part of the reason for that is that C is an example of what we used to call Systems Implementation Languages, which were meant to be able to go pretty low. It came from a language used on Multics called BCPL (B → C) which I used for work in grad school where we used Multics. BCPL was even more primitive, but the similarities were clear.

K&R had a nice sentence or two in the first edition of the C book.

Something like: “C is a relatively low level programming language. Some programmers used to other languages may be horrified. ‘What! I have to call a function to just compare two strings??’”

But of course as Voyager said above, it was explicitly designed as a systems programming language.

If you try to create a one-size-fits-all language you end up with a horrible mess like C++…

I vaguely remember creating a table with variable string lengths isn’t easy in C. Then keeping the pointer to reference the string data.

Not bad at all in Python. A months name table is routine.

It gets back to memory management. Fixed string length tables require more storage than variable length.

Add in a variable occurrence table like 1 to 2,000 and allocating memory gets more important at the systems level.

Gosh, I shudder to think about a three dimensional table in C.

20 columns, 6,000 rows with 9 tables.

I’ll hold your coat. Good luck. :grinning_face:

Heh, on the other hand, the situation has seemingly reversed in web dev: Many now only know & use libraries and frameworks, and don’t know how to do things in the vanilla language anymore. It’s overwhelmingly Meta’s React now, which started as a small UI lib but soon to came to supersede both Javascript itself and even HTML. Nowadays it’s all JSX, React’s XML-like syntax that abstracts over web primitives. It’s very rare to find medium-sized or larger projects coded in raw HTML + JS anymore… it’s just a big black box.

It exchanges one set of complexity (and classes of bugs) for its own, new complexities and quirks.

The downside of this approach is that the dependence (over-reliance?) on libraries has actually gone far enough in the other direction that software supply chain attacks are now sadly commonplace… Widespread Supply Chain Compromise Impacting npm Ecosystem | CISA

The libraries have libraries have their own, which have their libraries of their own… the average web app now has literally thousands of libraries within libraries, written by tens of thousands of pseudonymous people across the world with different motives and levels of motivation and maintenance. If any one of them get bribed or hacked, thousands of peer libraries and hundreds of thousands of projects are infected. There are some limited versioning & hash-checking systems in place, but mostly they are there to prevent mistakes and provide little security against deliberate sabotage and malware-ification.

I guess we took the lessons of our ancestors ahem, the giants before us :sweat_smile:, and followed them a little too closely… to the extent now where many of my peers could not write, understand, evaluate, optimize, or validate any of the library code that they use. I don’t think you could even read all the library code in a human lifetime anymore…