Do you have to know the PARTICULAR computers hardware for C programming?

In C, I hear you have to manually allocate the memory, and possibly do other things by hand in a similar realm. This seems to be saying that you need to write C with the particular hardware of the computer you are actually using in mind (otherwise you how do you know where you are supposed to do the specific memory allocation to and all that). Or is this not the case? Just was wanting some clarification.

Thanks.
Nublette

You can do a lot without assuming anything about the hardware not defined by the standard library itself. The biggest concern is endian-ness, but that can be tested.

For memory allocation, you can request a big block and then manage it yourself.

~Max

C is a bit higher level than that. The compiler will take care of the specifics, but you will have to tell the compiler to target a specific operating system and cpu architecture.

When you allocate memory in c, it is an abstraction that resolves to a call to the operating system.

When people talk about manual memory allocation in c, they are contrasting it with automatic memory allocation in even higher level languages than c, usually by a process called garbage collection.

It can help depending on what you are writing. Even BASIC back in the day could peek and poke memory addresses for faster graphics. But I’d say for most applications you don’t need to know.

There is under the covers going on, even with very simple languages like C.

Memory allocation comes in a range of forms. Every variable (or more abstractly program state) you use needs to be held somewhere. The compiler is responsible for some of these, utility libraries for more, and eventually the operating system.

Short term variables, including variables that only exit as intermediate values inside expressions are usually allocated to machine registers. This is the job of the compiler. It targets the exact machine architecture.
If your language supports subroutines/functions, the local variables are generally stored on the stack. Almost all architectures provide explicit support for a stack. But they didn’t always. So sometimes the compiler needed to create a stack from the raw capability of the machine.

A modern architecture has lots of registers, so the compiler has the freedom to use them for lots of stuff, with one huge performance boost coming from allocating most, if not all, local variables in a function to registers. Similarly parameters can be passed on the stack or in registers, with the compiler doing the leg work. Critically, recursive calls allocate a new area on the stack for each call, thus providing for automatic management of local variables. So much so that these variables were termed automatic in some parlances.

Your program likely has some need for state that is known a-priori that is used by the entire program. So the compiler sets up program execution in a manner where this memory is allocated before the program proper starts execution. This memory allocation does not change, so is static, and usually allocated at program startup from a region of memory designated for this purpose, and usually called something with static in the name.

This far you have enough to write and run lots of programs. The question then arises when you have needs where you don’t know ahead of time exactly what size or number of data items or program state you need. So dynamic memory allocation.

In its simplest form, the operating system provides a mechanism to delineate a region of memory that you can manage dynamically to use for program state. Pretty quickly you want a well known library that does the heavy lifting for you. If you are coding with C, your friends are malloc and free. malloc allocates a region of memory for you (given the size of the region you desire) and returns to you the address of the start of the region you can use. free takes that pointer, and melds the region of memory back into the free memory, ready to use again when needed. It is up to you to make sure you use the memory so allocated responsibly. Don’t accidentally access addresses past the end of the space you asked for. Don’t access stuff before it either. Make sure you call free when you are done, lest you run out of memory to malloc. Make absolutely sure you never access memory using the address you were given if you have called free with it. Never ever call free twice.

Keep to the rules and things work generally quite well. Break the rules, and no mercy is shown. Subtle and devilishly hard to find bugs may lurk, ready to bite you at the most unexpected point. Discipline and care, and you can write good quality robust code.

malloc generally interacts with the operating system, so that when more memory is needed, the OS can allocate more virtual address space for it to operate in, only at this point allocating additional system resources to your running program.

More modern languages, Java, Python, Javascript and so on provide dynamic memory management in a more transparent and safe manner. The underlying language runtime will keep track of allocations, and critically can keep track of what parts of data structures contain pointers. This is the key. By itself there is nothing about a region of memory to tell you what it contains. In C, you use appropriate type casts on the pointer values to make it look like any data structure you like. The underlying run-time has no idea what you are doing. Object oriented languages do know what objects hold. To do so they add additional state information for every allocation that helps it keep track, and alwyas able to know where pointers are. Languages like Java and Python also impose strict rules about pointers to objects. The language and runtime explicitly prohibit code that creates or modifies pointers. The only thing allowed to create a pointer is the object management run-time. Once the management runtime is able to keep track of all pointers, no matter where they may reside, it can reason about what objects are no longer accessible, and can reclaim the memory used. Hence garbage collection. Java and Python go about this in different ways (reference counting versus reachability) but the broad idea is the same. Managing object allocation and garbage collection in parallel systems is a whole new level of fun.

You can go further. Program state might reasonably want to persist between runs of you code. In most languages support for this is pretty desultory. Some form of serialisation that encodes the run-time data in a form that can be written to, and restored from persistent media. JSON for instance. Similar problems if you wish to communicate state between programs. In the large, reasoning about allocation of program state in either persistent or distributed systems is a vastly more complex problem, with a range of solutions of greater or lesser satisfaction. Reasoning about garbage collection becomes fabulously evil.

The manner in which such systems are commonly implemented is to create a virtual machine. This is a machine architecture that is explicitly designed to run the higher level abstractions of the programming language. Most importantly this is an architecture that itself implements the rules about pointers. For java, this is the JVM Java Virtual Machine. So now, every machine architecture that supports Java does so by running a JVM, and the JVM abstracts away everything about the hardware. There are of course efficiency imposts. But with modern hardware, it isn’t as if that matters much.

Trying to summarize a bit …

C is a language that permits you to exploit the hardware details of this or that different CPU. To the degree the OS will allow it. If you’re programming a Raspberry Pi or some random microcontroller for a machine you’re building the OS’s answer is “Yes, absolutely”. On a phone, Mac, or PC, the OS’s answer is “Yes, but only a little bit within a lot of well-defended walls.”

C can be written in a generic machine-agnostic fashion. And absent a very strong reason to do otherwise should be written in a generic machine-agnostic fashion.

As noted above, C must be compiled in a machine-specific fashion. So one body of source code could be compiled with different compilers or different compiler settings depending on which hardware type(s) you intend to run the resulting executable(s) on. The term “transpiler” is sometimes used for a compiler which runs on one type of hardware but which can produce object code targeting different hardware.

Non-machine specific source code needing to be compiled in machine-specific fashion is actually true for nearly every language, ancient or modern. The exceptions are languages where there’s a defined intermediate virtual machine with virtual hardware, a virtual instruction set, and a virtual low-level OS interface to a virtual OS. In that case all compilers on all hardware produce object code targeting the virtual machine’s “hardware”. And then a hardware-specific app installed on the real hardware does the job of appearing to be that virtual machine + virtual OS and executes the virtual object code.

Python is a bit better at hiding object classes and the hardware.

I prefer Python over C because it feels more like a procedural language. That’s based on my training and IT work experience. Fortran and COBOL made my career.

Python has a very nice IDE that includes an Editor with debugger for development.

Huge memory is such a luxury today. We had to carefully allocate our useage for so many years. You had to avoid restricted memory areas.

That’s all handled for us today in C and Python. Unless you’re trying to write very efficient and fast code.

You can include a C library and use it in Python when desired. ctypes or Cython are very useful.

Part of the whole point of a programming language is that you can write one program, and then compile it on whatever machine you want. In extreme-performance situations, you might occasionally need to dip into something lower-level, but between compilers getting smart enough to do that for you, and computers getting fast enough that you don’t care about inefficient code, this is getting extremely rare, these days.

That said, C does enable you to dig in to lower levels, if (for some reason) you really want to. Most modern languages are deliberately designed to not allow this, because it’s so rarely worthwhile, and attempting to do so is so likely to cause huge problems.

Yeah.
The OP asks: Do you have to know the PARTICULAR computers hardware for C programming?
The answer is no.

But the converse: If you need to know the particulars of the hardware, do you need to code in C?
The answer is usually yes.

Of the common languages that will get you down to the metal, C is easily the most prevalent.

One of the keys to the success of Unix was the use of C. If you had a C compiler for a new machine architecture, you could port most of the OS trivially. There would be low level stuff to cope with things like exception handling, process dispatch and virtual memory, but most of that could be done in C. Only a tiny bit of really low level work in assembler was needed. Linux is still almost entirely C, with Rust just getting its foot in the door.

If I was designing a course in computer science, I would teach C and Python. Between them you can get pretty much everything needed covered. Part of the course would teach how to implement Python like systems in C.

It depends on what kind of program you are writing.

If it’s something that runs ‘on the bare metal’, like an operating system kernel or firmware for a small embedded system, then yes, you will need to know some details of the hardware.

On the other hand, if it’s an application, at ‘user level’, the heavy lifting has all been done for you by the operating system (and built-in library code).

You just call malloc() and you get your chunk of memory, The machine-specific details are all hidden under the covers.

There’s an old saying in the computer world that “FORTRAN programmers can write FORTRAN code in any language, and C programmers can write C code in any language”. Really, it applies to almost any language. Famously, for example, the classic Numerical Recipes was adapted for several languages, including C, but it was written by FORTRAN programmers, and even in C, it’s all FORTRAN code.

And of course, to get a compiler in the first place, someone needs to write some machine code at some point. But if you have to ask, it’s not you who’s doing that part. And even there, you keep the machine code to a minimum: You write a super-simple, bare-bones compiler in machine code, and then you use that to compile your real compiler with the much fancier error-checking and efficiency improvements and so on. And then, for good measure, you use your real compiler to compile itself.

You don’t decide where the memory is allocated. You just specify how much memory you want and get a pointer back to that block of memory. It’s all very high level and abstract, there are many layers of OS code between that pointer and the bits of actual physical memory.

And the physical memory you write to is probably in cache, so the actually physical memory is managed by the hardware which updates the memory as the cache is altered.

That point is probably way back in history, since you can write a code generator for a compiler in another language, never going back to the hardware. As prep work for my dissertation I got the Pascal compiler on our Multics system to compile itself. Before they translated Pascal to PL/1. The problem was that sets in the Zurich compiler assumed you were on a 60 bit CDC machine. I guess that is hardware dependency, but I never had to figure out the assembly language for the machine we were running on.
My dissertation was along the lines of the question in the OP. I designed and wrote a compiler for an object-oriented microprogramming language where you could define an object representing a target machine hardware resource and compile code using it, or, if you were on another machine, used the code in the object to emulate the resource using regular microcode. (Hardware resources were handled also.) I rewrote the Jensen/Wirth Pascal compiler to do this, which is why I had to get it to compile itself.

And depending on the OS, there may not be any physical memory behind that pointer until you try to actually use it. Linux will happily let you malloc() far more memory than physically exists on the system, if more gets used than it can physically access the Out Of Memory Killer will just start wiping out processes.

For a lot of the memory you allocate in your program - namely, for variables that are created with block, function, module or whole-program scope*, the compiler will take care of all of the memory allocation and deallocation for you. You don’t need to do anything for those classes of variables.

But there is some memory used in programming that cannot be allocated with a fixed size without loss of efficiency. Consider an image editor that might have to allocate 16K bytes to hold a 64x64 icon, or over 8 megabytes to hold an entire screenshot or video frame on a 1920x1080 screen. It’s better to allocate just enough space that any particular image needs.

That’s what we call dynamic memory. Managing dynamic memory in C involves calling malloc() to allocate space of a given size, and then later using free() to deallocate that space. malloc() returns a special kind of variable, called a pointer, so that you can use that space for the storage of variables and data, and there’s a special syntax you use to access that storage via the pointer.

So when people talk about “manual” memory management, they really mean that you have to keep track of dynamic memory regions allocated by malloc(), often along with their sizes (so you don’t try to use more memory than you allocated), and you have to worry about the lifetime of the object (the time between malloc()ing space and free()ing the same space). You have to worry about ownership of a given region of memory as opposed to referencing that memory, and know which pointer in your program’s logic expresses which. Once don’t need a memory area anymore, the last owning pointer of that memory area needs to call free(), so that you don’t leak memory (where your program consumes more and more memory over time) but you need to make sure that your program’s logic means that no other pointer owning or referencing that object can ever again be used to free() it, get data from it, or store data to it. And you have to keep track of this all by yourself - there is no aid in the programming language that will keep you from doing these things - things that are likely to be fatal to your program or worse, create a horrid error despite it looking that nothing has gone wrong, when you run it.

Many programming languages offer features that will ensure that you cannot make these kinds of errors, or at least that making these kinds of errors will not cause the kinds of problems seen in C programs. These features include garbage collection (cleaning up memory that no pointer can use anymore) or sandboxing (ensuring that if a pointer is used incorrectly, you will have a predictable program crash rather than an outcome that could generate a bug). Rust, which I had mentioned in the previous thread, attempts to prove the validity of a program’s use of memory, and while there are some constructs that are valid that it will reject, it can ensure that memory bugs in a program are confined to what are hopefully small sections of code.

So, none of this forces knowledge of the computer hardware; memory management is abstracted away via malloc() and free(). But there are other languages that make different tradeoffs in how they allow the use of dynamic memory, which generally make it easier for programmers to use.

The memory allocation issue has been thoroughly addressed.

In the context of game dev, though (based on OP’s other post), you might also have to target specific hardware for low-level optimizations. Examples include graphics cards and their shaders, or at least a specific API level that they support (DirectX/OpenGL/Vulkan/etc.). You might also have to deal with sound card differences or CPU quirks, especially if you want to target older CPUs and their extended instruction sets.

For example, here’s the C source code to Quake 3 Arena, with some platform-specific optimizations for Win2k gamma and Linux netcode

But has it been de-addressed?

I count 6 malloc() calls in this thread but only 5 free() calls, so it looks like we have a memory leak. :stuck_out_tongue:

Heh heh :grin:

Likely from here:

Real programmers don’t use Pascal.

Real programmers only ever use FORTRAN, and never a version newer than F77. And even that is suspiciously easy to understand.
Real programmers would be horrified to see the namby pamby quiche eating language Fortran has become.

The bootstrap Pascal compiler was written in Fortran on a CDC 172. Its first and only job was to compile the Pascal compiler written in Pascal. (Something of a rite of passage for a language is to compile its own compiler.)
The CDC machines didn’t implement a call stack. Early Fortran functions were not reentrant. (Recursion is for quiche eating Lisp programmers. No real programer would ever code in Lisp. Seriously, the name alone should tell you that.)