Which Register Refers To Top Of Stack X86_64

Lecture 8: Assembly Language, Calling Convention, and the Stack

» Lecture video (Brownish ID required)
» Lecture code
» Post-Lecture Quiz (due 6pm Monday, Feb 24).

Assembly, continued

Final time, we looked at assembly code and adult an intuition for how to read assembly language instructions. Merely all programs we looked at contained only direct control flow, meaning that the assembly instructions just execute one after another until the processor hits the ret teaching. Real programs comprise provisional (if) statements, loops (for, while), and office calls. Today, we will empathise how those concepts in the C language translate into assembly, and so build up an understanding of the resulting memory layout that reveals how a dangerous class of computer security attacks is enabled by seemingly innocuous C programs.

Control Flow

Your computer's processor is incredibly dumb: given the memory address of an pedagogy, it goes and executes that instruction, and then executes the next instruction in memory, then the adjacent, etc., until either there are no more than instructions to run. Control menstruum instructions change that default behavior by changing where in memory the processor gets its next instruction from.

The role of the %rip register
The %rip register on x86-64 is a special-purpose annals that always holds the memory address of the adjacent education to execute in the program's code segment. The processor increments %rip automatically afterwards each pedagogy, and control menses instructions like branches fix the value of %rip to modify the side by side instruction.
Perhaps surprisingly, %rip too shows up when an associates program refers to a global variable. Encounter the sidebar under "Addressing modes" beneath to understand how %rip-relative addressing works.

Deviations from sequential didactics execution, such equally part calls, loops, and conditionals, are chosen command period transfers.

A branch pedagogy jumps to the instruction following a label in the associates program. Recall that labels are lines that terminate with a colon (e.g., .L3:) in the assembly generated from the compiler. In an executable or object file, the labels are replaced by bodily memory addresses, so if you disassemble such a file (objdump -d FILE), you lot will see retention addresses every bit the branch target instead.

Here is an example of the assembly generated by a program that contains an if statement (controlflow01.c):

          .LFB0:         movl    a(%rip), %eax         cmpl    b(%rip), %eax         jl      .L4 .L1:         rep ret .L4:         movl    $0, %eax         jmp     .L1

The 3rd and eighth (concluding) lines both contain branch instructions.

At that place are ii kinds of branches: unconditional and conditional. The jmp or j instruction (line eight) executes an unconditional branch and command flow always jumps to the co-operative target (here, .L1). All other branch instructions are conditional: they only branch if some condition holds. That condition is represented past condition flags that are set every bit a side effect of every arithmetics operation the processor runs. In the example program higher up, the instruction that sets the flags is cmpl, which is a "compare" pedagogy that the processor internally executes as a subtraction of its starting time statement from its second argument, setting the flags and throwing away the issue.

Arithmetic instructions change office of the %rflags register. The almost commonly used flags are:

ZF (nothing flag): set iff the result was zero.
SF (sign flag): fix iff the result, when considered as a signed integer, was negative, i.eastward., iff nigh meaning bit (the sign bit) of the effect was one.
CF (deport flag): set iff the event overflowed when considered an unsigned value (i.e., the consequence was greater than 2^{Due west}-i for a value of width Due west bytes).
OF (overflow flag): set up iff the result overflowed when considered a signed value (i.e., the result was greater than 2^{Due west-ane}-1 or less than –2^West-1 for a value of width W bytes).

Although a few instructions allow y'all load specific flags into the flag register, lawmaking usually accesses flags via a conditional spring or a conditional move instruction.

Y'all will frequently encounter the test and cmp instructions earlier a conditional branch. As mentioned higher up, these operations perform arithmetic but throw away the issue (rather than storing it in the destination register), but set the flags. test performs binary AND, while cmp performs subtraction, and both fix the flags co-ordinate to the effect.

Below is a table of all co-operative instructions on the x86-64 architecture and the flags they await at to make up one's mind whether to branch and execute the next education at the branch target, or whether to continue execution with the side by side sequential pedagogy subsequently the branch.

Pedagogy	Mnemonic	C example	Flags
j (jmp)	Bound	`break;`	(Unconditional)
je (jz)	Leap if equal (zero)	`if (x == y)`	ZF
jne (jnz)	Jump if non equal (nonzero)	`if (x != y)`	!ZF
jg (jnle)	Bound if greater	`if (x > y)`, signed	!ZF && !(SF ^ OF)
jge (jnl)	Jump if greater or equal	`if (x >= y)`, signed	!(SF ^ OF)
jl (jnge)	Spring if less	`if (x < y)`, signed	SF ^ OF
jle (jng)	Spring if less or equal	`if (x <= y)`, signed	(SF ^ OF) \|\| ZF
ja (jnbe)	Spring if higher up	`if (x > y)`, unsigned	!CF && !ZF
jae (jnb)	Leap if above or equal	`if (x >= y)`, unsigned	!CF
jb (jnae)	Jump if below	`if (x < y)`, unsigned	CF
jbe (jna)	Bound if below or equal	`if (x <= y)`, unsigned	CF \|\| ZF
js	Jump if sign flake	`if (x < 0)`, signed	SF
jns	Jump if not sign scrap	`if (x >= 0)`, signed	!SF
jc	Leap if carry scrap	N/A	CF
jnc	Bound if non carry bit	N/A	!CF
jo	Jump if overflow bit	N/A	OF
jno	Jump if not overflow bit	Due north/A	!OF

Loops

Conditional branch instructions and flags are sufficient to support both conditional statements (if (...) { ... } else { ... } blocks in C) and loops (for (...) { ... }, while (...) { ... }, and do { ... } while (...)). For a conditional, the branch either jumps if the status is true (or imitation, depending on how the compiler lays out the assembly) and continues execution otherwise. For a loop, the assembly will contain a conditional branch at the end of the loop body that checks the loop status; if information technology is still satisfied, the branch jumps back to a label (or address) at the top of the loop.

When y'all see a conditional branch in assembly lawmaking whose target is a label or address above the branching educational activity, it is near e'er a loop.

Consider the example in controlflow02.s, and the respective programme in controlflow02.c. Allow's focus on the assembly lawmaking following the label:

          .L3:         movslq  (%rdx), %rcx         addq    %rcx, %rax         addq    $4, %rdx         cmpq    %rsi, %rdx         jne     .L3         rep ret [...]

Here, the loop variable is held in annals %rdx, and the value that the loop variable is compared to on each iteration is in %rsi. (You tin infer this from the fact that these registers are the only ones that appear in a comparison.) The teaching above cmpq increments the loop variable by 4 every time the loop executes. Finally, loop'due south body consists of the ii instructions above the addq $4, %rdx teaching: the first dereferences a pointer in %rdx and puts the value at the retention address it points to into annals %rcx, and the second adds that value to the contents of %rax. Since %rax does not change before the conditional branch, it volition be incremented by the value pointed to by %rdx on every iteration: this loop iterates over integers in memory via pointer arithmetic.

Adressing Modes

We accept seen a few ways in which assembly didactics's operands can be written already. In detail, the loop example contains (%rdx), which dereferences the address stored in register %rdx.

The full, general class of a memory operand is beginning(base of operations, index, scale), which refers to the address offset + base + index*calibration. In 0x18(%rax, %rbx, four), %rax is the base, 0x18 the offset, %rbx the index, and 4 the calibration. The starting time (if used) must be a constant and the base and index (if used) must be registers; the calibration must be either 1, 2, iv, or 8. In other words, if we write this as N(%reg1, %reg2, Grand), the accost computed is %reg1 + N + %reg2 * M.

The default offset, base, and alphabetize are 0, and the default scale is 1, and instructions omit these parts if they have their default values. You will nearly frequently see instructions of the form commencement(%register), which perform simple addition to the address in the register and then dereference the consequence. Merely occasionally, you lot may come up across instructions that use both base and index registers, or which use the full full general form.

Beneath is a handy overview table containing all the possible ways of writing operands to assembly instructions.

Type	Example syntax	Value used
Register	`%rbp`	Contents of `%rbp`
Immediate	`$0x4`	0x4
Retention	`0x4`	Value stored at address 0x4
	`symbol_name`	Value stored in global `symbol_name` (the compiler resolves the symbol name to an address when creating the executable)
	`symbol_name(%rip)`	`%rip`-relative addressing for global (encounter below)
	`symbol_name+4(%rip)`	Simple computations on symbols are immune (the compiler resolves the computation when creating the executable)
	`(%rax)`	Value stored at address in `%rax`
	`0x4(%rax)`	Value stored at accost `%rax + four`
	`(%rax,%rbx)`	Value stored at address `%rax + %rbx`
	`(%rax,%rbx,4)`	Value stored at address `%rax + %rbx*4`
	`0x18(%rax,%rbx,4)`	Value stored at address `%rax + 0x18 + %rbx*4`

%rip-relative addressing for global variables
x86-64 code often refers to globals using %rip-relative addressing: a global variable named a is referenced as a(%rip). This fashion of reference supports position-independent code (PIC), a security feature. It specifically supports position-independent executables (PIEs), which are programs that work independently of where their code is loaded into memory.

When the operating system loads a PIE, it picks a random starting point and loads all instructions and globals relative to that starting bespeak. The PIE's instructions never refer to global variables using straight addressing: there is no movl global_int, %eax. Globals are referenced relatively instead, using deltas relative to the side by side %rip: to load a global variable into a annals, the compiler emits movl global_int(%rip), %eax. These relative addresses work contained of the starting bespeak! For instance, consider an pedagogy located at (starting-bespeak + 0x80) that loads a variable g located at (starting-point + 0x1000) into %rax. In a non-PIE, the instruction might be written as movq g, %rax; but this relies on g having a fixed address. In a PIE, the pedagogy might be written movq g(%rip), %rax, which works out without having to know the starting address of the program's code in retention at compile time (instead, %rip contains a number some known number of bytes apart from the starting signal, and so whatsoever accost relative to %rip is likewise relative to the starting point).

At starting signal… The mov pedagogy is at… The side by side pedagogy is at… And k is at… And then the delta (one thousand - next %rip) is…

0x400000 0x400080 0x400087 0x401000 0xF79

0x404000 0x404080 0x404087 0x405000 0xF79

0x4003F0 0x400470 0x400477 0x4013F0 0xF79

At starting signal…	The `mov` pedagogy is at…	The side by side pedagogy is at…	And `k` is at…	And then the delta (`one thousand` - next `%rip`) is…
0x400000	0x400080	0x400087	0x401000	0xF79
0x404000	0x404080	0x404087	0x405000	0xF79
0x4003F0	0x400470	0x400477	0x4013F0	0xF79

Calling Convention

Nosotros discussed conditionals and loops, but there is a 3rd blazon of control menstruum: part calls. Assembly language has no functions, just sequences of instructions. Function calls therefore translate into control menstruum involving branches, only nosotros need a bit more than that: functions tin take arguments, and the compiler amend brand sure that the statement are available later it jumps to a function's instructions!

Defining how function calls and returns work, where a function can expect to find its arguments, and where it must place its render value is the business of a calling convention. A calling convention governs how functions on a particular compages and operating system interact in associates lawmaking. This includes rules on how function arguments are placed, where return values go, what registers functions may use, how they may allocate local variables, and others.

Why do we need calling conventions?
Calling conventions ensure that functions compiled by different compilers can interoperate, and they ensure that operating systems can run lawmaking from different programming languages and compilers. For instance, you lot can call into C code from Python, or link C code compiled with gcc and code compiled with clang. This is possible only because the Python libraries that phone call into C lawmaking understand its calling convention, and because the gcc and clang compilers' authors agree on the calling convention to use.

Some aspects of a calling convention are derived from the instruction set itself and embedded into the compages (e.g., via special-purpose registers modified as a side-result of certain instructions), but some are conventional, meaning they wre decided upon past people (for instance, at a convention), and may differ across operating systems and compilers.

Programs call01.c to call06.c and their corresponding associates in call01.s to call06.s aid the states effigy out the calling convention for x86-64 on the Linux operating organization!

Some basic rules are:

The starting time six function arguments are passed in registers %rdi, %rsi, %rdx, %rcx, %r8, and %r9 (in this order; see the register list from last lecture).
The seventh and subsequent arguments are passed on the stack (see more beneath).
The return value is passed in register %rax.

At that place are actually several other rules, which govern things like how to pass data structures that are larger than a annals (e.chiliad., a struct), floating point numbers, etc. If you're interested, you can detect all the details in the AMD64 ABI, section 3.2.3.

call04.s illustrates the rule almost the outset half dozen arguments all-time: they are passed straight in registers. Other examples (e.1000., call01 to call03) are compiled without optimizations and have somewhat more complex assembly lawmaking, which takes the values from registers, writes them onto the stack (more than on that beneath), so moves them into registers again. The reason why the unoptimized programs seemingly pointlessly write all their arguments to retentivity in the stack segment is that arguments are local variables of a part, and since local variables have automatic lifetime, they're technically stored in the stack segment. With optimizations, the compiler is smart plenty to realize that it tin can just skip really storing them, then it just uses the registers containing the arguments direct.

The Stack

You will recall the stack segment of memory from before lectures: it is where all variables with automatic lifetime are stored. These include local variables alleged within functions, but importantly also role arguments.

Call up that in call01.south to call03.s independent a agglomeration of instructions referring to %rsp, such as this implementation of the part f() (from call01.s):

                      movl    %edi, -iv(%rsp)         movl    -iv(%rsp), %eax         ret

The start movl stores the first argument (a 4-byte integer, passed in %edi) at an address four bytes below the address stored in register %rsp; the second movl educational activity takes that value in memory and loads it into register %eax.

The %rsp register is called the stack pointer. It always points to the "top" of the stack, which is at the lowest (leftmost) address electric current used in the stack segment. At the start of the function, whatever memory to the left of where %rsp points is therefore unused; any retentiveness to the correct of where it points is used. This explains why the lawmaking stores the argument at addresss %rsp - four: information technology's the first four-byte slot available on the stack, to the left of the currently used memory.

In other words, the what happened with these instructions is that the bluish parts of the picture below were added to the stack retentivity.

We tin give names to the retentivity on the left and correct of the accost where %rsp points in the stack. The are called stack frames, where each stack frame corresponds to the data associated with one role call. The retentiveness on the right of the accost pointed to exist %rsp at the point f() gets chosen is the stack frame of whatsoever function calls f(). This part is named the caller (the function that calls), while f() is the callee (the part beingness called).

The retentivity on the right of the %rsp address at the point of f() being called (we refer to this every bit "entry %rsp") is the caller's stack frame (cherry below), and the memory to its left is the callee'southward stack frame.

The arguments and local variables of f() live inside f()'s stack frame. Subsequent arguments (2d, tertiary, fourth, etc.) are stored at subsequently lower addresses below %rsp (run into call02.s and call03.s for examples with more arguments), followed eventually by any local variables in the caller.

How does %rsp modify?
The convention is that %rsp ever points to the lowest (leftmost) stack address that is currently used. This ways that when a function declares a new local variable, %rsp has to movement downwards (left) and if a function returns, %rsp has to motility upwardly (right) and back to where information technology was when the office was originally called.

Moving %rsp happens in two ways: explicit modification via arithmetic instructions, and implicit modification every bit a side effect of special instructions. The former happens when the compiler knows exactly how many bytes a function requires %rsp to move by, and involves instructions similar subq $0x10, %rsp, which moves the stack pointer downward by 16 bytes. The latter, side-upshot modification happens when instruction button and popular run. These instructions write the contents of a register onto the stack memory immediately to the left of the current %rsp and too modify %rsp to indicate to the beginning of this new data. For example, pushq %rax would write the 8 bytes from annals %rax at accost %rsp - 8 and fix %rsp to that address; it is equivalent to movq %rax, -8(%rsp); subq $8, %rsp or subq $8, %rsp; movq %rax, (%rsp).

Equally an optimization, the compiler may choose to avoid writing arguments onto the stack. It does this for up to six arguments, which per calling convention are held in specific registers. call04.s shows this: the C lawmaking we compile it from (call04.c) is identical to the code in call03.c.

Just there is a limited number of registers in the x86-64 architecture, and you lot can write functions in C that have whatsoever number of arguments! The calling convention says that the first six arguments max exist passed in registers, only that the 7^th and above arguments are always passed in retention on the stack. Specifically, these arguments become into the caller'southward stack frame, so they are stored above the entry %rsp at the point where the office is chosen (see call05.{c,south} and call06.{c,due south}).

Return Address

As a function executes, information technology eventually reaches a ret instruction in its assembly. The issue of ret is to return to the caller (a form a command flow, every bit the adjacent instruction needs to change). Merely how does the processor know what didactics to execute next, and what to ready %rip to?

It turns out that the stack plays a office here, as well. In a nutshell, each function phone call stores the render accost as the very first (i.e., rightmost) data in the callee's stack frame. (If the function called takes more than 6 arguments, the render accost is to the left of the 7^th argument in the caller's stack frame.)

The stored return address makes it possible for each function to know exactly where to go along execution once information technology returns to its caller. (However, storing the return address on the stack besides has some dangerous consequences, equally we will see shortly.)

Nosotros tin can now define the full function entry and get out sequence. Both the caller and the callee have responsibilities in this sequence.

To prepare for a part call, the caller performs the post-obit tasks:

The caller stores the first six arguments in the respective registers.
If the callee takes more than 6 arguments, or if some of its arguments are big, the caller must store the surplus arguments on its stack frame (in increasing guild). The seven^th argument must be stored at (%rsp) (that is, the acme of the stack) when the caller executes its callq instruction.
The caller saves any caller-saved registers (run across last lecture's list). These are registers whose values the callee might overwrite, but which the caller needs to retain for afterward use.
The caller executes callq FUNCTION. This has an outcome like pushq $NEXT_INSTRUCTION; jmp FUNCTION (or, equivalently, subq $eight, %rsp; movq $NEXT_INSTRUCTION, (%rsp); jmp FUNCTION), where NEXT_INSTRUCTION is the accost of the teaching immediately following callq.

To return from a part, the callee does the following:

The callee places its return value in %rax.
The callee restores the stack pointer to its value at entry ("entry %rsp"), if necessary.
The callee executes the retq pedagogy. This has an issue like popq %rip, which removes the return address from the stack and jumps to that address (because the instruction writes information technology into the special %rip annals).
Finally, the caller so cleans up any infinite it prepared for arguments and restores caller-saved registers if necessary.

Base Pointers and the `%rbp` Register

Keeping track of the entry %rsp can be tricky with more circuitous functions that allocate lots of local variables and modify the stack in complex ways. For these cases, the x86-64 Linux calling convention allows for the utilize of another register, %rbp as a special-purpose annals.

%rbp holds the address of the base of the electric current stack frame: that is, the address of the rightmost (highest) address that points to a value still function of the current stack frame. This corresponds the rightmost accost of an object in the callee'southward stack, and to the commencement address that isn't office of an statement to the callee or one of its local variables. Information technology is called the base pointer, since the address points at the "base" of the callee's stack frame (if %rsp points to the "meridian", %rbp points to the "base" (= bottom). The %rbp annals maintains this value for the whole execution of the function (i.e., the office may not overwrite the value in that register), even as %rsp changes.

This scheme has the reward that when the function exits, it can restore its original entry %rsp by loading it from %rbp. In addition, it too facilitates debugging because each function stores the quondam value of %rbp to the stack at its bespeak of entry. The 8 bytes property the caller'southward %rbp are the very first thing stored inside the callee's stack frame, and they are right below the return address in the caller's stack frame. This mean that the saved %rbps course a chain that allows each function to locate the base of its caller's stack frame, where it will find the %rbp of the "thousand-caller's" stack frame, etc. The backtraces you see in GDB and in Address Sanitizer error messages are generated precisely using this chain!

Therefore, with a base arrow, the role entry sequence becomes:

The get-go educational activity executed by the callee on function entry is pushq %rbp. This saves the caller'south value for %rbp into the callee's stack. (Since %rbp is callee-saved, the callee is responsible for saving it.)
The 2d instruction is movq %rsp, %rbp. This saves the electric current stack pointer in %rbp (and so %rbp = entry %rsp - 8).

This adjusted value of %rbp is the callee'southward "frame pointer" or base arrow. The callee volition not alter this value until it returns. The frame arrow provides a stable reference signal for local variables and caller arguments. (Complex functions may need a stable reference indicate because they reserve varying amounts of space.)

Annotation, besides, that the value stored at (%rbp) is the caller'south %rbp, and the value stored at 8(%rbp) is the render address. This information can be used to trace backwards by debuggers (a process called "stack unwinding").
The function ends with movq %rbp, %rsp; popq %rbp; retq, or, equivalently, leave; retq. This sequence is the terminal thing the callee does, and it restores the caller's %rbp and entry %rsp before returning.

You tin can find an example of this in call07.due south. Lab 3 likewise uses the %rbp-based calling convention, so make certain you go along the actress 8 bytes for storing the caller's %rbp on the stack in heed!

Buffer overflow attacks

Now that we understand the calling convention and the stack, allow's have a pace back and think of some of the consequences of this well-defined retentiveness layout. While a callee is not supposed to access its caller's stack frame (unless it's explicitly passed a pointer to an object within it), in that location is no principled machinery in the x86-64 architecture that prevents such access.

In item, if you tin can guess the address of a variable on the stack (either a local within the current part or a local/argument in a caller of the current function), your program tin can just write information to that accost and overwrite whatever is there.

This tin happen accidentally (due to bugs), just it becomes a much bigger trouble if done deliberately past malicious actors: a user might provide input that causes a program to overwrite of import data on the stack. This kind of attack is called a buffer overflow attack.

Consider the code in attackme.cc. This program computes checksums of strings provided to it as command line arguments. Y'all don't need to sympathise in deep detail what information technology does, only find that the checksum() function uses a 100-byte stack-allocated buffer (as part of the buf union) to hold the input string, which information technology copies into that buffer.

A sane execution of attackme might look like this:

          $ ./attackme hey yo CS131 hey: checksum 00796568, sha1 7aea02175315cd3541b03ffe78aa1ccc40d2e98a  - yo: checksum 00006f79, sha1 dcdc24e139db869eb059c9355c89c382de15b987  - CS131: checksum 33315374, sha1 05ab4d9aea4f9f0605dc4703ae8cfc44aab7a5ef  -

But what if the user provides an input string longer than 99 characters (remember that we also need the zero terminator in the buffer)? The office simply keeps writing, and information technology will write over whatever is adjacent to buf on the stack.

From our prior pictures, we know that buf volition exist in checksum's stack frame, below the entry %rsp. Moreover, directly in a higher place the entry %rsp is the return accost! In this case, that is an address in chief(). So, if checksum writes beyond the end of buf, volition overwrite the return address on the stack; if it keeps going farther, information technology will overwrite information in principal's stack frame.

Why is overwriting the return address dangerous? It ways that a clever attacker can direct the programme to execute any function within the plan. In the instance of attackme.cc, note the run_shell() function, which runs a string as a shell command. This has a lot of nefarious potential – what if we could cause that function to execute with a user-provided string? We could print a lot of sorry face up emojis to the trounce, or, more dangerously, run a command like rm -rf /, which deletes all data on the user'due south estimator!

If we run ./attackme.unsafe (a variant of attackme with safe features added by mondern compilers to gainsay these attacks disabled), it behaves every bit normal with sane strings:

          $ ./attackme.unsafe hey yo CS131 hey: checksum 00796568, sha1 7aea02175315cd3541b03ffe78aa1ccc40d2e98a  - yo: checksum 00006f79, sha1 dcdc24e139db869eb059c9355c89c382de15b987  - CS131: checksum 33315374, sha1 05ab4d9aea4f9f0605dc4703ae8cfc44aab7a5ef  -

Merely if nosotros laissez passer a very long string with more than than 100 characters, things get a scrap more unusual:

          $ ./attackme.unsafe sghfkhgkfshgksdhrehugresizqaugerhgjkfdhgkjdhgukhsukgrzufaofuoewugurezgureszgukskgreukfzreskugzurksgzukrestgkurzesi Segmentation fault (cadre dumped)

The crash happens considering the return address for checksum() was overwritten past garbage from our string, which isn't a valid address. But what if we figure out a valid accost and put it in exactly the correct place in our string?

This is what the input in attack.txt does. Specifically, using GDB, I figured out that the accost of run_shell in my compiled version of the code is 0x400734 (an address in the code/text segment of the executable). attack.txt contains a carefully crafted "payload" that puts the value 0x400734 into the correct bytes on the stack. The assault payload is 115 characters long considering we demand 100 characters to overrun buf, three bytes for the malicious return accost, and 12 bytes of extra payload because stack frames on x86-64 Linux are aligned to sixteen-byte boundaries.

Executing this assault works every bit follows:

          $ ./attackme.unsafe "$(cat attack.txt)" OWNED Owned OWNED OWNED OWNED Endemic sh: 7: ��v��: not plant Segmentation error (core dumped)

The cat attack.txt shell command simple pastes the contents of the set on.txt file into the string nosotros're passing to the plan. (The quotes are required to make certain our set on payload is candy as a single string even if it contains spaces.)

Summary

Today, nosotros concluded our brief tour of assembly language and the depression-level concepts of programme execution.

We first looked at control menstruum in assembly, where instructions change what other instructions the processor executes next. In many cases, control period commencement involves a flag-setting instruction then a conditional branch based on the values of the flags annals. This allows for provisional statements and loops.

Part calls in assembly are governed past the calling convention of the architecture and operating arrangement used: it determines which registers hold specific values such as arguments and return values, which registers a function may alter, and where on the stack sure information (such every bit the return address) is stored.

Nosotros also understood in more detail how the stack segment of retentiveness is structured and managed, and discussed how it grows and shrinks. Finally, we looked into how the very well-defined memory layout of the stack tin can become a danger if a program is compromised through a malicious input: by carefully crafting inputs that overwrite part of the stack retentiveness via a buffer overflow, nosotros can change important data and cause a program to execute arbitrary code.

In Lab 3, you will craft and execute buffer overflow attacks on a program yourself!