Why Are Local Variables of Caller Stack Saved in Registers in Callee Stack?
Why Are Local Variables of Caller Stack Saved in Registers in Callee Stack?
I'm trying my best to learn about the call stack and how stack frames are structured in an ARM Cortex-M0, it's proving to be a little difficult, but with patience I'm learning. I have several questions throughout this one question, so hopefully you guys can help me out in all areas. The questions I have will be highlighted in bold throughout this explanation.
I'm using an ARM Cortex-M0 with GDB and a simply program to debug. Here is my program:
int main(void)
static uint16_t myBits;
myBits = 0x70;
halInit();
return 0;
I have a breakpoint set on halInit()
. I then execute the command info frame
on my GDB terminal to get this output:
halInit()
info frame
Stack level 0, frame at 0x20000400:
pc = 0x80000d8 in main (src/main.c:63); saved pc 0x8002dd2
source language c.
Arglist at 0x200003e8, args:
Locals at 0x200003e8, Previous frame's sp is 0x20000400
Saved registers:
r0 at 0x200003e8, r1 at 0x200003ec, r4 at 0x200003f0, r5 at 0x200003f4, r6 at 0x200003f8, lr at 0x200003fc
I will explain how I am interpreting this, please let me know if I am correct.
Stack level 0
: Current level of the stack frame. 0
will always represent the top of the stack, in other words the current stack frame being used.
Stack level 0
0
frame at 0x20000400
: This represents the location of the stack frame in flash memory.
frame at 0x20000400
pc = 0x80000d8 in main (src/main.c:63);
: This represents the next execution to be executed, i.e. the program counter value. Since the program counter always represents the next instruction to be executed.
pc = 0x80000d8 in main (src/main.c:63);
saved pc 0x8002dd2
: This one is a little confusing to me, but I think it means the return address, essentially the instruction to be executed when it returns from executing the halInit()
function. However, if I type the command info reg
into my GDB terminal I see that the link register is not this value, but the next address instead: lr 0x8002dd3
. Why is that?
saved pc 0x8002dd2
halInit()
info reg
lr 0x8002dd3
source language c.
: This represents the language being used.
source language c.
Arglist at 0x200003e8, args:
: This represents the starting address of my arguments that were passed to the stack frame. Since args:
is blank, that means no arguments were passed. Which makes since for two reasons: this is the first stack frame in the call stack and my function doesn't have any arguments int main(void)
.
Arglist at 0x200003e8, args:
args:
int main(void)
Locals at 0x200003e8
: This is the starting address of my local variables. As you can see in my original code snippet, I should have one local variables myBits
. We'll come back to that later.
Locals at 0x200003e8
myBits
Previous frame's sp is 0x20000400
: This is the stack pointer which points to the top of the callers stack frame. Since this is the first stack frame, I expect this value should equal the current frame's address which it does.
Previous frame's sp is 0x20000400
Saved registers:
r0 at 0x200003e8
r1 at 0x200003ec
r4 at 0x200003f0
r5 at 0x200003f4
r6 at 0x200003f8
lr at 0x200003fc
These are registers that have been pushed to the stack to be saved for use later by the current stack frame. This part I am curious about because it's the first stack frame so why would it save so many registers? If I execute the command info reg
I get the following output:
info reg
r0 0x20000428 0x20000428
r1 0x0 0x0
r2 0x0 0x0
r3 0x70 0x70
r4 0x80000c4 0x80000c4
r5 0x20000700 0x20000700
r6 0xffffffff 0xffffffff
r7 0xffffffff 0xffffffff
r8 0xffffffff 0xffffffff
r9 0xffffffff 0xffffffff
r10 0xffffffff 0xffffffff
r11 0xffffffff 0xffffffff
r12 0xffffffff 0xffffffff
sp 0x200003e8 0x200003e8
lr 0x8002dd3 0x8002dd3
pc 0x80000d8 0x80000d8 <main+8>
xPSR 0x21000000 0x21000000
This tells me that if I check the values stored in each of the memory addresses of the saved registers by executing the command p/x *(register)
, then the values should be equal to that of the values shown in the output above.
p/x *(register)
Saved registers:
r0 at 0x200003e8 -> 0x20000428
r1 at 0x200003ec -> 0x0
r4 at 0x200003f0 -> 0x80000c4
r5 at 0x200003f4 -> 0xffffffff
r6 at 0x200003f8 -> 0xffffffff
lr at 0x200003fc -> 0x8002dd3
It works, the values in each address represent the values shown by the info reg
command. However, I notice one thing. I have one local variable myBits
with a value of 0x70
and this appears to be stored in r3
. However r3
is not pushed to the stack for saving.
info reg
myBits
0x70
r3
r3
If we step into the next instruction, a new stack frame is created for the function halInit()
. This is shown by executing the command bt
on my terminal. It generates the following output:
halInit()
bt
#0 halInit () at src/hal/src/hal.c:70
#1 0x080000dc in main () at src/main.c:63
If I execute the command info frame
then I get the following output:
info frame
Stack level 0, frame at 0x200003e8:
pc = 0x8001842 in halInit (src/hal/src/hal.c:70); saved pc 0x80000dc
called by frame at 0x20000400
source language c.
Arglist at 0x200003e0, args:
Locals at 0x200003e0, Previous frame's sp is 0x200003e8
Saved registers:
r3 at 0x200003e0, lr at 0x200003e4
Now we see that register r3
was pushed onto this stack frame. This register holds the value of the variable myBits
. Why is r3
pushed onto this stack frame if the caller stack frame is what needs this register?
r3
myBits
r3
Sorry for the long post, I just want to cover all areas of required information.
Update
I think I might know why r3
was pushed onto the callee stack and not onto the caller stack even though the caller is the one that needs this value.
r3
Is it because the function halInit()
will be modifying the value in r3
?
halInit()
r3
In other words, the callee stack frame knows that the caller stack frame requires this register value, so it will push it onto its own stack frame so that it can modify r3
for its own purpose, then when the stack frame is popped it will restore the value 0x70
that was pushed onto the stack frame back into r3
for the caller to use again. Is this correct and if so, how did the callee stack frame know that the caller stack frame will need this value?
r3
0x70
r3
r0-r3
I don't think I understand. Register
r3
is not saved in the caller stack frame from what I can tell, it's saved in the callee stack frame. Or am I misinterpreting the last info frame
output I show where the saved registers are that of the callee stack frame and not the caller stack frame?– eddie garcia
Aug 28 at 23:23
r3
info frame
this is not a cortex-m0 thing specifically it is the abi or calling convention used by the toolchain. While arm recommends one the compiler authors are free to do whatever they want, so this is not a C thing this is a what compiler are you using thing. As far as exceptions/interrupts, yes arm logic does have a known list of things they preserve in order on the stack so that the handler doesnt have to. and if the compiler used to generate the handler doesnt conform to a convention that the logic supports then the programmer or the compiler has to wrap the handler with more code.
– old_timer
Aug 29 at 1:12
you are better off examining functions not named main(). as some compilers latch on to that name and implement that function differently from the rest, causing more confusion than understanding.
– old_timer
Aug 29 at 1:13
@old_timer so does this mean that my assumption in my update is correct? What about the question I had about the return address and the link register being offset by one byte?
– eddie garcia
Aug 29 at 1:14
2 Answers
2
I'm trying my best to learn about the call stack and how stack frames
are structured in an ARM Cortex-M0
So based on that quote, first off the arm cortex-m0 does not have stack frames, processors are really really dumb logic. The compiler generates stack frames which are a compiler thing, not an instruction set thing. The notion of a function is a compiler thing not really anything lower. A compiler uses a calling convention or some basic set of rules designed so that for that language the caller and callee functions know exactly where parameters are, return values, and nobody trashes the others data.
The compiler authors are free to do whatever they want so long as it works and fits withing the rules of the instruction set, as in the logic not assembly language. (An assembler author is free to make up whatever assembly language they want, mnemonics whatever so long as the machine code conforms to the rules of the logic). And they used to do that, the processor vendors have started making recommendations lets say, and the compilers are conforming to them. Its not about sharing objects across compilers as much as it is 1) I dont have to come up with my own 2) we are trusting the ip vendor with their processor and hope that their calling convention was designed for performance and other reasons that we desire.
gcc so far has attempted to conform with ARMs ABI as it evolves and gcc evolves.
When you have "many" registers, what many means is a matter of opinion, but you will see that the convention will use registers first then the stack for passed parameters. You will also see that some registers will be designated as volatile within a function to improve performance over having to use memory (the stack) so much.
By using a debugger and a breakpoint you are looking in the wrong place your statement was you want to understand about the call stack and stack frames which is a compiler thing, not about how exceptions are handled in the logic. Unless that is really what you were after your question wasnt accurate enough to understand.
Compilers like GCC have optimizers and despite them creating confusion with respect to dead code learning from the optimized version is easier than the non-optimized version. Lets dive in
extern unsigned int more_fun ( unsigned int, unsigned int );
unsigned int fun ( unsigned int a, unsigned int b )
return(a+b);
Optimized
<fun>:
0: 1840 adds r0, r0, r1
2: 4770 bx lr
not
00000000 <fun>:
0: b580 push r7, lr
2: b082 sub sp, #8
4: af00 add r7, sp, #0
6: 6078 str r0, [r7, #4]
8: 6039 str r1, [r7, #0]
a: 687a ldr r2, [r7, #4]
c: 683b ldr r3, [r7, #0]
e: 18d3 adds r3, r2, r3
10: 0018 movs r0, r3
12: 46bd mov sp, r7
14: b002 add sp, #8
16: bd80 pop r7, pc
First off why is the function at address zero? Because I disassembled the object not a linked binary, maybe I will later. And why disassemble vs compile to assembly? If the disassembler is any good, then you actually get to see what was produced rather than the assembly which will contain, certainly with compiled code, a lot of non-instruction language as well as pseudo code that gets changed when finally assembled.
A stack frame IMO is when there is a second pointer, a frame pointer. You often see this with instruction sets that have instructions or limitations that lean toward this. For example an instruction set might have a stack pointer register but you cant address from it, there may be another frame register pointer and that you can. So the typical entry would be to save the frame pointer on the stack because the caller may have been using it for their frame and we want to return it as found, then copy the address of the stack pointer to the frame pointer, then move the stack pointer as far as needed for this function so that interrupts or calls to other functions the stack pointer is on the boundary between used and unused stack space, as it should be at all times. The frame pointer would be used in this case to access any passed in parameters or return addresses in a frame pointer plus offset fashion (for downward growing stacks) and in the negative offset direction for local data.
Now it does look like the compiler is using a frame pointer, what a waste, lets ask it not to:
00000000 <fun>:
0: b082 sub sp, #8
2: 9001 str r0, [sp, #4]
4: 9100 str r1, [sp, #0]
6: 9a01 ldr r2, [sp, #4]
8: 9b00 ldr r3, [sp, #0]
a: 18d3 adds r3, r2, r3
c: 0018 movs r0, r3
e: b002 add sp, #8
10: 4770 bx lr
so first off the compiler determined there were 8 bytes of things to save on the stack. Unoptimized pretty much everything gets a place on the stack, the passed parameters as well as local variables, there werent any locals in this case so we just have the passed in ones, two 32 bit numbers, so 8 bytes. The calling convention used attempts to use r0 for the first parameter and r1 for the second if they fit, in this case they do. so the stack frame is formed when 8 is subtracted from the stack pointer, the stack frame pointer is the stack pointer in this case. The calling convention used here allows for r0-r3 to be volatile in the function. The compiler does not have to return to the caller with those registers as they were found, they can be used within the function at will. The compiler chose in this case to pull from the stack the addition operands using the next to registers rather than the first to free ones. Once r0 and r1 are saved to the stack then the "pool" of free registers one would assume start with r0,r1,r2,r3. So yes it does appear to be broken, but it is what it is, it is functionally correct and that is the job of a compiler to produce code that functionally implements the compiled code. The calling convention used by this compiler states that the return value goes in r0 if it fits, which it does.
So the stack frame is setup, 8 is subtracted from sp. Passed in parameters are saved to the stack. Now the function starts by pulling the passed in parameters from the stack, adding them, and placing the result in the return register.
Then bx lr is used to return, look that instruction up along with pop (for armv6m, for armv4t pop cant be used to switch modes so compilers will if they can pop to lr then bx lr).
armv4t thumb, cant use pop to return in case this code is mixed with arm, so the return pops into a volatile register and does a bx lr, you cant pop directly into lr in thumb. It is possible that you might be able to tell the compiler I am not mixing this with arm code so its save to use pop to return. Depends on the compiler.
00000000 <fun>:
0: b580 push r7, lr
2: b082 sub sp, #8
4: af00 add r7, sp, #0
6: 6078 str r0, [r7, #4]
8: 6039 str r1, [r7, #0]
a: 687a ldr r2, [r7, #4]
c: 683b ldr r3, [r7, #0]
e: 18d3 adds r3, r2, r3
10: 0018 movs r0, r3
12: 46bd mov sp, r7
14: b002 add sp, #8
16: bc80 pop r7
18: bc02 pop r1
1a: 4708 bx r1
to see a frame pointer
00000000 <fun>:
0: b580 push r7, lr
2: b082 sub sp, #8
4: af00 add r7, sp, #0
6: 6078 str r0, [r7, #4]
8: 6039 str r1, [r7, #0]
a: 687a ldr r2, [r7, #4]
c: 683b ldr r3, [r7, #0]
e: 18d3 adds r3, r2, r3
10: 0018 movs r0, r3
12: 46bd mov sp, r7
14: b002 add sp, #8
16: bd80 pop r7, pc
first off you save the frame pointer to the stack as the caller or the callers caller, etc may be using it, its a regisiter we have to preserve. now some calling convention comes into play right off the start. We know that the compiler knows that we are not calling another function so we dont need to preserve the return address (stored in the link register r14), so why push it on the stack why waste the space and the clock cycles? Well the convention changed not long ago to say the stack should be 64 bit aligned, so you basically push and pop in pairs of registers (an even number of registers). Sometimes they use more than one instruction for a pair as we see in the armv4t return. So the compiler needed to push another register, it could and you will see sometimes that it does just pick some register it is not using and push that on the stack, maybe we can get that to do this here in a bit. In this case being armv6-m you can switch modes with a pop so it is safe to generate a return using a pop pc, so you save an instruction by using the link register here instead of some other register. A little optimization despite being unoptimized code.
save the frame pointer then associate the frame pointer with the stack pointer, in this case it moves the stack pointer first and makes the frame pointer match the stack pointer then uses the frame pointer for stack accesses. Oh how wasteful, even for unoptimized code. But perhaps this compiler defaults to a frame pointer when told to compile like this.
While here one of your questions and I have commented on this thus far indirectly. The full sized arm processors armv4t through armv7 support both arm instructions and thumb instructions. Not everyone supports every one there was an evolution, but you can have arm and thumb instructions coexist as part of the rules defined by the logic for that core. The ARM design to support this is since arm instructions have to be word aligned, the lower two bits of the address of an arm instruction are always zeros. A desired 16 bit instruction set, also aligned, would always have the lower bit of the address zero. So why not use the lsbit of the address as a way to switch modes. And that is what they chose to do. With a few instructions at first, then became more that are allowed by the armv7 architecture, if the address of the branch (look up bx first, branch exchange) has an lsbit of 1 then the processor switches to thumb mode when it begins to fetch instructions at that address, the program counter does not retain this one, it is stripped by the instruction, it is just a signal used to tell the instruction to switch modes. if the lsbit is a 0 then the processor switches to arm mode. If it was already in the said mode it just stays in that mode.
Now comes these cortex-m cores which are thumb only machines, no arm mode. The tools are in place, it all works no reason to change, if you try to go into arm mode on a cortex-m you get a fault.
now look at the code above, sometimes we return with a bx lr and sometimes a pop pc, in both cases lr held the "return address". for the bx lr case to have worked the lsbit of lr must be set. The caller cant know which instruction we are going to use for the return, and the caller doesnt have to but likely used a bl to make the call so the logic actually set the bit not the compiler. That is why your return address is off by one byte.
If you want to learn about compilers and stack frames though, while unoptimized definitely uses the stack as you can see, optimized code if you have a compiler with decent optimization can be easier to understand the compilers output once you learn not to make dead code.
00000000 <fun>:
0: 1840 adds r0, r0, r1
2: 4770 bx lr
r0 and r1 are the passed in parameters, r0 is where the return value goes, link register is the return address. This is what you would hope a compiler would produce for a function like that.
So now lets try something more complicated.
extern unsigned int more_fun ( unsigned int, unsigned int );
unsigned int fun ( unsigned int a, unsigned int b )
return(more_fun(a,b));
00000000 <fun>:
0: b510 push r4, lr
2: f7ff fffe bl 0 <more_fun>
6: bd10 pop r4, pc
a few things things, first why didnt the optimizer do this:
fun:
b more_fun
I dont know.
why does it say bl 0, more fun is not at zero? This is an object not linked code, once linked the linker will modify that bl instruction to point at more_fun().
Third we already got the compiler to push a register we didnt use. It is pushing and popping r4 so that it can keep the stack aligned per the calling convention used by this compiler. It could have chosen almost any one of the registers, and you may find a gcc or llvm/clang version that uses say r3 instead of r4. gcc has been using r4 for a bit now. its the first in the list of registers you have to preserve first in the list of registers that if they want to preserve something across a call they will use (as we will see in a second). so perhaps thats why, who knows ask the author.
extern unsigned int more_fun ( unsigned int, unsigned int );
unsigned int fun ( unsigned int a, unsigned int b )
more_fun(a,b);
return(a);
00000000 <fun>:
0: b510 push r4, lr
2: 0004 movs r4, r0
4: f7ff fffe bl 0 <more_fun>
8: 0020 movs r0, r4
a: bd10 pop r4, pc
Now we are making progress. So we tell the compiler it has to save the passed in parameter across a function call. Each function starts the rules over, so each function called can trash r0-r3, so if you are using r0-r3 for something you need to save them somewhere. So a very wise choice, instead of saving the passed in parameter on the stack and possibly having to do multiple costly memory cycles to access it. Instead save a callee or callee's callee, etc value on the stack and use a register within our function to save that parameter, as a design it saves a lot of wasted cycles. We needed the stack to be aligned anyway so this all worked out preserve r4 and save the return address since we are making a call ourselves which will trash it. Save the parameter we need after the call into r4. Make the call place the return value in the return register and return. Cleaning up the stack as you go. So the stack frame here is minimal if at all. Not using the stack much.
extern unsigned int more_fun ( unsigned int, unsigned int );
unsigned int fun ( unsigned int a, unsigned int b )
b<<=more_fun(a,b);
return(a+b);
00000000 <fun>:
0: b570 push r4, r5, r6, lr
2: 0005 movs r5, r0
4: 000c movs r4, r1
6: f7ff fffe bl 0 <more_fun>
a: 4084 lsls r4, r0
c: 1960 adds r0, r4, r5
e: bd70 pop r4, r5, r6, pc
we did it again we got the compiler to have to save a register we didnt use to keep the alignment. And we are using more of the stack but would you call that a stack frame? We forced the compiler to have to preserve both incoming parameters through a subroutine call.
extern unsigned int more_fun ( unsigned int, unsigned int );
unsigned int fun ( unsigned int a, unsigned int b, unsigned int c, unsigned int d )
b<<=more_fun(b,c);
c<<=more_fun(c,d);
d<<=more_fun(b,d);
return(a+b+c+d);
0: b5f8 push r3, r4, r5, r6, r7, lr
2: 000c movs r4, r1
4: 0007 movs r7, r0
6: 0011 movs r1, r2
8: 0020 movs r0, r4
a: 001d movs r5, r3
c: 0016 movs r6, r2
e: f7ff fffe bl 0 <more_fun>
12: 0029 movs r1, r5
14: 4084 lsls r4, r0
16: 0030 movs r0, r6
18: f7ff fffe bl 0 <more_fun>
1c: 0029 movs r1, r5
1e: 4086 lsls r6, r0
20: 0020 movs r0, r4
22: f7ff fffe bl 0 <more_fun>
26: 4085 lsls r5, r0
28: 19a4 adds r4, r4, r6
2a: 19e4 adds r4, r4, r7
2c: 1960 adds r0, r4, r5
2e: bdf8 pop r3, r4, r5, r6, r7, pc
what is it going to take? we at least did get it to save r3 to even out the stack. I bet we can push it now...
extern unsigned int more_fun ( unsigned int, unsigned int );
unsigned int fun ( unsigned int a, unsigned int b, unsigned int c, unsigned int d, unsigned int e, unsigned int f )
b<<=more_fun(b,c);
c<<=more_fun(c,d);
d<<=more_fun(b,d);
e<<=more_fun(e,d);
f<<=more_fun(e,f);
return(a+b+c+d+e+f);
00000000 <fun>:
0: b5f0 push r4, r5, r6, r7, lr
2: 46c6 mov lr, r8
4: 000c movs r4, r1
6: b500 push lr
8: 0011 movs r1, r2
a: 0007 movs r7, r0
c: 0020 movs r0, r4
e: 0016 movs r6, r2
10: 001d movs r5, r3
12: f7ff fffe bl 0 <more_fun>
16: 0029 movs r1, r5
18: 4084 lsls r4, r0
1a: 0030 movs r0, r6
1c: f7ff fffe bl 0 <more_fun>
20: 0029 movs r1, r5
22: 4086 lsls r6, r0
24: 0020 movs r0, r4
26: f7ff fffe bl 0 <more_fun>
2a: 4085 lsls r5, r0
2c: 9806 ldr r0, [sp, #24]
2e: 0029 movs r1, r5
30: f7ff fffe bl 0 <more_fun>
34: 9b06 ldr r3, [sp, #24]
36: 9907 ldr r1, [sp, #28]
38: 4083 lsls r3, r0
3a: 0018 movs r0, r3
3c: 4698 mov r8, r3
3e: f7ff fffe bl 0 <more_fun>
42: 9b07 ldr r3, [sp, #28]
44: 19a4 adds r4, r4, r6
46: 4083 lsls r3, r0
48: 19e4 adds r4, r4, r7
4a: 1964 adds r4, r4, r5
4c: 4444 add r4, r8
4e: 18e0 adds r0, r4, r3
50: bc04 pop r2
52: 4690 mov r8, r2
54: bdf0 pop r4, r5, r6, r7, pc
56: 46c0 nop ; (mov r8, r8)
Okay thats that is how it is going to be...
extern unsigned int more_fun ( unsigned int, unsigned int );
extern void not_dead ( unsigned int *);
unsigned int fun ( unsigned int a, unsigned int b )
unsigned int x[16];
unsigned int ra;
for(ra=0;ra<16;ra++)
x[ra]=more_fun(a+ra,b);
not_dead(x);
return(ra);
00000000 <fun>:
0: b5f0 push r4, r5, r6, r7, lr
2: 0006 movs r6, r0
4: b091 sub sp, #68 ; 0x44
6: 0004 movs r4, r0
8: 000f movs r7, r1
a: 466d mov r5, sp
c: 3610 adds r6, #16
e: 0020 movs r0, r4
10: 0039 movs r1, r7
12: f7ff fffe bl 0 <more_fun>
16: 3401 adds r4, #1
18: c501 stmia r5!, r0
1a: 42b4 cmp r4, r6
1c: d1f7 bne.n e <fun+0xe>
1e: 4668 mov r0, sp
20: f7ff fffe bl 0 <not_dead>
24: 2010 movs r0, #16
26: b011 add sp, #68 ; 0x44
28: bdf0 pop r4, r5, r6, r7, pc
2a: 46c0 nop ; (mov r8, r8)
And there is your stack frame but it doesnt really have a frame pointer and doesnt use the stack to access stuff. Would have to keep working harder to see that, very doable. But hopefully by now you see my point. Your question is about stack frames are structured in compiled code, in particular how a compiler might implement that for a particular target.
BTW this is what clang did with that code.
00000000 <fun>:
0: b5b0 push r4, r5, r7, lr
2: af02 add r7, sp, #8
4: b090 sub sp, #64 ; 0x40
6: 460c mov r4, r1
8: 4605 mov r5, r0
a: f7ff fffe bl 0 <more_fun>
e: 9000 str r0, [sp, #0]
10: 1c68 adds r0, r5, #1
12: 4621 mov r1, r4
14: f7ff fffe bl 0 <more_fun>
18: 9001 str r0, [sp, #4]
1a: 1ca8 adds r0, r5, #2
1c: 4621 mov r1, r4
1e: f7ff fffe bl 0 <more_fun>
22: 9002 str r0, [sp, #8]
24: 1ce8 adds r0, r5, #3
26: 4621 mov r1, r4
28: f7ff fffe bl 0 <more_fun>
2c: 9003 str r0, [sp, #12]
2e: 1d28 adds r0, r5, #4
30: 4621 mov r1, r4
32: f7ff fffe bl 0 <more_fun>
36: 9004 str r0, [sp, #16]
38: 1d68 adds r0, r5, #5
3a: 4621 mov r1, r4
3c: f7ff fffe bl 0 <more_fun>
40: 9005 str r0, [sp, #20]
42: 1da8 adds r0, r5, #6
44: 4621 mov r1, r4
46: f7ff fffe bl 0 <more_fun>
4a: 9006 str r0, [sp, #24]
4c: 1de8 adds r0, r5, #7
4e: 4621 mov r1, r4
50: f7ff fffe bl 0 <more_fun>
54: 9007 str r0, [sp, #28]
56: 4628 mov r0, r5
58: 3008 adds r0, #8
5a: 4621 mov r1, r4
5c: f7ff fffe bl 0 <more_fun>
60: 9008 str r0, [sp, #32]
62: 4628 mov r0, r5
64: 3009 adds r0, #9
66: 4621 mov r1, r4
68: f7ff fffe bl 0 <more_fun>
6c: 9009 str r0, [sp, #36] ; 0x24
6e: 4628 mov r0, r5
70: 300a adds r0, #10
72: 4621 mov r1, r4
74: f7ff fffe bl 0 <more_fun>
78: 900a str r0, [sp, #40] ; 0x28
7a: 4628 mov r0, r5
7c: 300b adds r0, #11
7e: 4621 mov r1, r4
80: f7ff fffe bl 0 <more_fun>
84: 900b str r0, [sp, #44] ; 0x2c
86: 4628 mov r0, r5
88: 300c adds r0, #12
8a: 4621 mov r1, r4
8c: f7ff fffe bl 0 <more_fun>
90: 900c str r0, [sp, #48] ; 0x30
92: 4628 mov r0, r5
94: 300d adds r0, #13
96: 4621 mov r1, r4
98: f7ff fffe bl 0 <more_fun>
9c: 900d str r0, [sp, #52] ; 0x34
9e: 4628 mov r0, r5
a0: 300e adds r0, #14
a2: 4621 mov r1, r4
a4: f7ff fffe bl 0 <more_fun>
a8: 900e str r0, [sp, #56] ; 0x38
aa: 350f adds r5, #15
ac: 4628 mov r0, r5
ae: 4621 mov r1, r4
b0: f7ff fffe bl 0 <more_fun>
b4: 900f str r0, [sp, #60] ; 0x3c
b6: 4668 mov r0, sp
b8: f7ff fffe bl 0 <not_dead>
bc: 2010 movs r0, #16
be: b010 add sp, #64 ; 0x40
c0: bdb0 pop r4, r5, r7, pc
Now you used the term call stack. The calling convention used by this compiler says that use r0-r3 when possible to pass in the first parameters then use the stack after that.
unsigned int fun ( unsigned int a, unsigned int b, unsigned int c, unsigned int d, unsigned int e )
return(a+b+c+d+e);
00000000 <fun>:
0: b510 push r4, lr
2: 9c02 ldr r4, [sp, #8]
4: 46a4 mov r12, r4
6: 4463 add r3, r12
8: 189b adds r3, r3, r2
a: 185b adds r3, r3, r1
c: 1818 adds r0, r3, r0
e: bd10 pop r4, pc
so having more than four parameters the first four are in r0-r3 and then the "call stack" assuming that is what you were referring to is the fifth parameter. The thumb instruction set uses bl as its main call instruction which uses r14 as the return address, unlike other instruction sets that might use the stack to store the return address, arm uses a register. And the popular arm calling conventions use registers for the first few operands then use the stack after that.
You would want to look at other instruction sets to see more of a call stack
00000000 <_fun>:
0: 1d80 0008 mov 10(sp), r0
4: 6d80 000a add 12(sp), r0
8: 6d80 0006 add 6(sp), r0
c: 6d80 0004 add 4(sp), r0
10: 6d80 0002 add 2(sp), r0
14: 0087 rts pc
thanks a lot! This really helps understanding how stack frames are built. I don't understand 100% of it, but I'll dissect it and study up on the parts I'm unfamiliar with. Thanks again!
– eddie garcia
Aug 29 at 22:00
no problem.....
– old_timer
Aug 29 at 22:05
On ARM systems, many auto's are stored in registers, rather than allocating space on the stack. ARM has a lot of registers, compared to other processors. When a function (context) calls into another function, those registers could be overwritten. Compiler writers have two choices, 1) save all registers on entry into (at the top of) every function, or 2) save the registers that function is using at whatever point it calls into another function.
The caller has full context, so it is more efficient to save only the registers that are in use. The ARM ABI defines the conventions that most compilers use. This enables function libraries from different compilers to inter-operate.
By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.
The registers
r0-r3
are caller-saved, so the values of these registers have to be saved before calling any function if they are still needed after the function call.– Ctx
Aug 28 at 23:20