If I have a variable, or a data structure, it would seem to me that in order to reference it, the compiler would have to know it’s address, so is there any actual instructions generated to “take” the address of it, or is it just a matter of the compiler doing all the offset math during compilation?
I realize that if one is using malloc to store variables, then the address isn’t known at compile-time, but it’s still just a matter of a base address plus an offset.
Won’t the precise details depend on the type of processor, etc.?
I tried compiling a minimal program using gcc on an Intel processor. I defined a local variable and the compiler put it on the stack. Then I tried taking its address using & and this compiled into an leaq instruction.
Right.
If you look at what lea is doing, it’s just a base + offset (essentially). For addresses on the stack, this makes sense, since the SP address isn’t known at compile-time. I wonder- if you allocate the variable in global memory, does the complier generate the same code?
The global variable is allocated in the .text section (so is the program code) and the address is leaq’d relative to the instruction pointer rip, ie position-independent code is being generated
There is no one way this happens. Data can be allocated in a range of ways, and for each the operation to determine the address is different. But in general the compiler need to know how to generate addresses for ordinary operations, so you just use whatever the compiler uses.
LEA (load effective address) is useful when there is a complex addressing mode in use. The instruction decodes the addressing mode, applies the calculations as needed, but instead of actually accessing the address (which it would for most instructions) it just delivers the address it calculated.
Data is allocated with a few different mechanisms. Data on the stack is accessed with a stack pointer relative offset. The compiler knows statically what the offset is. So even though the data could be anywhere on the stack (and for a multiply entrant routine, there may be more than one instance) when the data is in scope, the stack pointer provides the known base for determining the address.
Dynamically allocated data, well you get given the address outright, so you don’t take the address of anything with the & operator.
Statically allocated data will be laid out by the linker and loader, and will be accessed via offsets calculated from a table of some kind that is filled in at run time from offsets determined by the linker. The precise mechanisms varies significantly between languages and operating systems, but the overall idea is pretty constant. The x86 provides a set of segment registers that can be used to provide the base location of useful regions of memory for things like this.
If the address calculation is doable within the ability of the processor’s addressing modes, LEA is a perfect instruction. LEA gets used for a lot of things. There are many times the compiler needs to workout the base address of something, stick it into a register and use for for a while.
If you have a RISC processor, there usually isn’t any sort of LEA, you get to do it all step by step. Again, this is the compiler’s job. What it does is still much the same.
Eg using “clang” instead of “gcc” the example with the global variable did not use lea at all, it just had a hard-coded address by default, unless I compiled it with -fPIC in which case it did use lea…
Note this is all for a trivial program, unoptimized, on a particular architecture.
The variable is the address. Aside from constants and a few other things, there is no representation of the variable outside of that. As Francis_Vaughan noted, the address might computed in a few different ways, but is usually some kind of offset from a known spot. Here’s a quick example on x86 (compiled assembly interleaved with C declarations):
int a = 2;
00461782 mov dword ptr [ebp-0Ch],2
int b = 3;
00461789 mov dword ptr [ebp-18h],3
int c = a + b;
00461790 mov eax,dword ptr [ebp-0Ch]
00461793 add eax,dword ptr [ebp-18h]
00461796 mov dword ptr [ebp-24h],eax
int *d = &a;
00461799 lea eax,[ebp-0Ch]
0046179C mov dword ptr [ebp-30h],eax
int e = *d + b;
0046179F mov eax,dword ptr [ebp-30h]
004617A2 mov ecx,dword ptr [eax]
004617A4 add ecx,dword ptr [ebp-18h]
004617A7 mov dword ptr [ebp-3Ch],ecx
Every time you see a [ebp-x], that’s just x86 shorthand for “take the ebp register, subtract an offset, and use that as an address”. The only difference that the &x adds is that instead of loading from that address (via the MOV instruction), it simply copies the address (via LEA). Either way, the variable is the address.
One fun example: on x64, there is no 64-bit fused multiply-add instruction. Some hashing schemes have as their inner loop something like “hash = (hash * small_prime) + new_data”, which would benefit from such an instruction.
LEA can compute an arbitrary “a + a*[0, 1, 2, 4, 8] + b”. That’s good for addressing calculations, like stepping through an array of 8-byte doubles or the like. But it can be used for hashing as well–have it compute “hash + 4*hash + new_data”. That’s “5*hash + new_data”, and 5 is an effective “small prime” in this context.
The compiler is actually smart enough to do this automatically in many cases. It’s twice as fast as a series of mul/add instructions.
I don’t have lot of specifics to add to this discussion, but wanted to point out that https://godbolt.org/ is a great, and not terribly well-known tool for seeing how various bits of code get compiled by various compilers with various options.