0%

Introduction

Control-flow Integrity still allows the attacker control over the execution along two dimensions: the imprecision in the analysis and CFI’s statelessness; data-only attacks allow an attacker to influence conditional branches arbitrarily. With CFI, exploits become highly program dependent as the set of reachable gadgets is severely limited by CFI policy, so exploits must therefore follow valid paths in the CFG.

We present BOPC (Block Oriented Programming Compiler), an automatic framework to evaluate a program’s remaining attack surface under strong control-flow hijacking mitigation. BOPC compiles an “exploit” into a program trace, which is executed on top of the orignal program’s CFG. To express the desired exploits flexibly, BOPC provides a Turing-complete, high-level language: SPloit language (SPL). BOPC takes as input an SPL payload and a starting point and returns a trace through the program that encodes the SPL payload.

First, BOPC translates the SPL payload into constraints for individual statements and, for each statement, searches for basic blocks in the target binary that satisfy these constraints (called candidate blocks). At this point, SPL abstract register assignments from underlying architecture. Second, BOPC infers a resource (register and state) mapping for each SPL statement, iterating through the set of candidate blocks and turing them into functional blocks which can be used to execute a concrete instantiation of the given SPL statement. Third, BOPC constructs a trace that connect each functional block through dispatcher blocks. Since the mapping process is NP-hard, to find a solution in reasonable time BOPC first prunes the set of functional blocks per statement to constrain the search space and then use a ranking based on the proximity of individual function blocks as a heuristic when searching for dispatcher gadgets.

Assumption and threat model

Our threat model consists of a binary with a known memory corruption vulnerability that is protected with the state-of-art control-flow hijacking mitigation, such as CFI along with a shadow stack. Futhermore, the binary is also hardened with DEP, ASLR and Stack Canaries.

We assume that the target binary has an arbitrary memory write vulnerability. The attacker can write any value to any (writable) address which called an Arbitrary memory Write Primitive (AWP). To bypass probabilistic defenses such as ASLR, we assume that the attacker has access to an information leak, i.e., a vulnerability that allows her to read any value from any memory address which called Arbitrary memory Read primitive (ARP).

We also assume that there exist an entry point, i.e., a location that the program reaches naturally after completion of all AWPs (and ARPs). Thus BOPC does not require code pointer corruption to reach the entry point. Determining an entry point is considered to be a part of the vulnerability discovery process.

These assumption enable BOPC to inject a payload into a target binary’s address space, modifying its memory state to execute the payload. BOPC assumes that the AWP (and/or ARP) may be triggered multiple times to modify the memory state of the target binary. After the state modification completes, the SPL payload executes without using the AWP (and or ARP) further.

Design

  1. SPL payload
  2. Selecting functional blocks
  3. Searching for dispatcher blocks
  4. Stiching BOP gagdets

First, BOPC providing an exploit programming language, called SPL, that enables analysts to define exploits independent of the target program or underlying architecture. Second, to automate SPL gadget discovery, BOPC finds basic blocks from the target program that implement individual SPL statements, called functional blocks. Third, to chain basic blocks together in a manner that adheres with CFI and shadow stacks, BOPC searches the target program for sequences of basic blocks that connect pairs of neighboring functional blocks, which we called dispatcher blocks. Fourth, BOPC simulates the BOP chain to produce a payload that implements that SPL payload from a chosen AWP.

The BOPC design builds on two key ideas: Block Oriented Programming and Block Constraint Summaries. First, BOP constructs exploit programs called BOP chains from basic block sequences in the valid CFG of a target program. Each BOP gadget is one functional block that implements a statement in an SPL payload and zero or more dispatcher blocks that connect the functional block to the next BOP gadget in a manner that compiles with the CFG.

Second, BOPC abstracts each basic block from individual instruction into Block Constraint Summaries, enabling blocks to be employed in a varity of different ways. That is, a single block may perform multiple functional and/or dispatching operations by utilizing different sets of register for different purposes. That is, a basic block that modifies a register in a manner that may fulfill an SPL statement may be used as a functional block, otherwise it may be considered to serve as a dispatcher block.

BOPC leverages Block Constraint Summaries to apply blocks in multiple contexts. There are two cases: either the candidate dispatcher block’s summary constraints indicate that it will modify the register state set and/or the memory state by the functional blocks, called the SPL state, or it will not, enabling the computation to proceed with out disturbing the effects of the functional blocks.

An important distinction between BOP and ROP is that the problem of computing BOP chains is NP-hard. Conventional ROP assumes that indirect control-flow may target any executable byte in memory while BOP must follow a legal path through the CFG for any chain of blocks, resulting in the need of automation.

Expressing Payloads

BOPC provides a programming language, called SPloit language that allows analysts to express exploit payloads in a compact high-level language that independent of target program or processor architecture.

The architecture independence of SPL has important advantages. First, the same payload can be executed under different ISAs or operating systems. Second, SPL uses a set of virtual registers, accessed through reserved volatile variables.

The environment for SPL differs from that of conventional languages. Instead of running code directly on CPU, our compiler encodes the payload as a mapping of instructions to functional blocks. That is, the underlying runtime environment is the target binary and its program state, where payloads are executed as side effects of the underlying binary.

Selecting functional blocks

To generate a BOP chain for a SPL payload, BOPC must find a sequnce of blocks that implement each statement in the SPL payload, which called functional blocks.

Conceptually, BOPC must compare each block to each SPL statement to detemine if the block can implement the statement. However, blocks are in terms of machine code and SPL statements are high-level program statements. BOPC computes Block Constraint Summaries, which define the possible impacts that the block would have on SPL state. Block Constraint Summaries provide flexibly in the matching blocks to SPL statements because there are multiple possible mappings of SPL statements and thier virtual registers to the block and its constraints on register and state.

The constraint summaries of each basic block are obtained by isolating and symbolically executing it. The effect of executing a basic block creates a set of constraints, mapping input to the resultant output. Such constraints refer to registers, memory locations, jump types and external operatons(e.g., library calls).

Finding BOP gadgets

The combinaton of a functional block and dispatcher blocks is called a BOP gadgets. However, dispatcher paths between two functional blocks may not exist either because there is no legal path in the CFG between them, or the control flow cannot reach the next block due to unsatisfiable runtime constraint.

BOP gadgets are volatile: gadget feasibility changes based on the selection of prior gadgets for the target binary. The problem of selecting a suitable sequence of functional blocks, such as that a dispatcher path exists between every possible control flow tansfer in the SPL payload, is NP-hard.

We propose several heuristics and optimizations to find solutions in reasonable amount of time. BOPC leverages basic block proximity as a metric to “rank” dispatcher paths and organizes this information into a special data structure called a delta graph.

Searching for dispatcher blocks

While each functional block executes a statement, BOPC must chain multiple functional blocks together to execute the SPL payload. Functional blocks are connected through zero or more basic blocks that do not clobber the SPL state computed thus far.

Finding such blocks is challenging, thus, we propose a graph data structure called delta graph, to represent the state of the search for dispatcher blocks. The delta graph stores, for each functional block, the shortest path to the next candidate blocks.

The delta graph is a multi-partite, directed graph that has a set of functional block nodes for every payload statement. An edge between two functional blocks represents the minimum number of executed basic blocks to move from one functional block to the other, while avoiding clobbering blocks.

Calculating the shortest path between two basics in a CFG faces a challenge: while the CFG statically allow multiple targets, at runtime they are context sensitive and only have an concrete target.

Our context-sensitive shortest path algorithm is a recursive version of Dijkstra’s shortest path algorithm that avoids all clobbering blocks. Initially, each edge on the CFG has a cost of 1. When it encounters a basic block with a call instruction, it recursively calculates the shortest path starting from the calling function’s entry block (Be). if the destination block, Bd, is inside the callee, the shortest path is the concatenation of the two individual shortest paths from the beginning to Be and from Be to Bd. Otherwise, our algorithm finds the shortest path from the Be to the closest return point and uses this value as an edge weight for that callee.

After creation of delta graph, out algorithm selects exactly one node (i.e, functional block) from each set (i.e payload statement) to minimize the total weight of the resulting induced subgraph.

Stiching BOP gadgets

To find a dispatcher between two functional blocks, BOPC leverages concolic execution, which collects the required constraints that are needed to lead the execution to the next functional block.

To construct an SPL payload from a BOP chain, BOPC launches concolic execution from the first functional block in the BOP chain, starting with an empty state. At each step BOPC tries the first K shortest dispather path until it finds one that reaches the next functional block. The corresponding constraints are added to the current state. The search therefore incrementally adds BOP gadgets to the BOP chain.

Implementation

BOPC requires three distinct inputs:

  • The exploit payload expressed in SPL,
  • The vulnerable application on top of which the payload runs,
  • The entry point in the vulnerable application, which is a location that the program reaches naturally and occurs after all AWPs have been completed.

The output of BOPC is a sequence of (address, value, size) tuples that describes how the memory should be modified during the state modification phase to execute the payload.

Binary Frontend

Using angr to lift the target binary into the VEX intermediate representation to expose the application’s CFG. Then, we translate each basic block into a block constraint summary. BOPC executes each basic block in an isolated environment, where every action (e.g., register and memory accesses) is monitored. Therefore, instead of working with the instructions of each basic block, BOPC utilizes its abstraction for all operations.(CFGa)

SPL Frontend

The SPL Frontend translates the exploit payload into a graph-based IR for further processing. For each statement sequence we build a dependence graph based on a customized version of Kahn’s topological sorting algorithm, to infer all groups of independent statements, Which can be executed out-of-order.

Locating candidate block sets

SPL is a high level language that hides the underlying ABI. Therefore, BOPC finds all possible ways to map individual elements from the SPL environment to the ABI (though candidate blocks).

Once CFGa and IR are generated, BOPC searches for and marks candidate basic blocks. For a block to be a candidate, it must “semantic match” with one (or more) payload statement. Note that variable assignments, unconditional jumps, and returns do not require a basic block and therefore are excluded from the search.

All statement that assign or modify registers require the basic block to apply the same operation on the same. For function calls, the requirement for the basic block is to invoke the same call, either as a system call or as a library call.

Upon a successful matching, BPC builds the following data stuctures:

  • The Register Mapping Graph, which is a bipartite undirected graph. The nodes in the two sets represent the virtual and hardware registers respectively. The edges represent potential associations between them.

  • The Variable Mapping Graph, which associates payload variables to underlying memory addresses.

  • The Memory Dereference Set, which has all memory addresses that are dereferenced and their values are loaded into registers.

Identifying functional block sets.

BOPC iteratively identifies, for each SPL statement, which candidate blocks can serve as functional blocks. This step determines for each candidate block if there is a resource mapping that satisfies the block constraints.

BOPC identifiers the concrete set of hardware registers and memory addresses that execute the desired statement.

This step determines, for each statement, which concrete registers and memory addresses are reserved. Merging this information with the set of candidate blocks constructs each block’s SPL state, enabling the removal of candidate blocks that are unsatisfiable.

Selecting functional blocks

Given the functional block set Fb, this step searches for a subset that executes all payload statements. The goal is to select exactly one functional block for every IR statement and find dispather blocks to chain them together.

Once the delta graph is generated, this step locates the minimum induced subgraph, that contains the complete set of functional blocks to execute the SPL payload. If the subgraph does not result in a solution, the algorithm tries the next minimum induced subgraph, until a solution is found or a limit is reached.

If the resulting deleta graph does not lead to a solution, this step “shuffle” out-of-order payload statement and build a new delta graph.

Discovering dispatcher blocks

The simulation phase takes the individual functional blocks and tried to find appropriate dispatcher blocks to compose the BOP gadgets. It returns a set of memory assignments for the corresponding dispatcher blocks.

BOPC is called to find a dispatcher path for every edge in the minimum induced subgraph. That is, we need to simulate every control flow transfer in the adjacency matrix of SPL payload. However, dispatchers are built on the binary’s execution state so far, so BOP gadgets must be stiched with the respect to the program’s current flow originating from the entry point.

Finding dispatcher blocks relies on concolic execution. Our algorithm utilizes functional block proximity as a metric for dispatcher path quality. However, it cannot predict which constrains will take exponential time to solve. Therefore, concolic execution selects the K shortest dispatcher paths relative to current BOP chain, and tries them in order until one produces a set of satisfiable constraints.

When simulation starts it also initializes any SPL variables at locations that are reserved during the variable mapping. These addresses are marked as immutable, so any unintended modification raises an exception.

Simulation traverse the minimum induced subgraph, and incrementally extends the SPL state from one BOP gadget to the next, ensuring that newly added constraints remain satisfiable. When encounting a conditional statement, BOPC clones the current state and continues building the trace for both paths independently.

Synthesizing exploits

If the simulation module returns a solution, the final step is to encode the execution trace as a set of memory writes in the target binary.

Discussion and future work

BOPC is limited by the granularity of basic blocks. That is, a combination of basic blocks could potentially lead to execution of a desired SPL statement, while individual blocks might not.

BOPC sets several upper bounds defined by user inputs. These configurable bounds include the upper limit of (i) SPL payload permutations P, (2) length of continuous block L, (3) of minimum induced subgraphs extracted from the delta graph N, (4) dispatcher paths between a pair of functional blocks K. These upper bounds along with the timeout for symbolic execution, reduce the search space, but prune some potentially valid solutions.

Introduction

Control-flow hijacking attacks are the predominant category of memory exploits today. In response, numerous principled defenses which aim to ensure legal control flow has been proposed.

A natural question is to analyze the limits of protection offered by control-flow defenses, and the remaining capabilities of the adversary. Control-flow defenses aim to ensure the execution of the program stay legitimate, by protecting the integrity of the control plane memory of by directly checking the targets of control transfers. However, the data plane of memory can offer an additional source of advantage for adversaries (e.g., non-control data attacks).

In this paper, we show that non-control data attacks with rich expressiveness can be crafted using systematic construction techniques. The key idea in our contribution is to find data-oriented gadgets. then we find gadget dispatchers which are fragment of logic that chain together disjoint gadgets in an arbitrary sequence.

Problem

Non-control Data Attacks

Non-control data attacks tamper with or leak security-sensitive memory, which is not directly used in control-flow transfer instruction. The attack payloads, however, exhibit limited expressiveness, such as writing a target variable of choice or leaking contents of a sensitive memory region. Such simple payloads can enable privilege escalation and sensitive data-leakage attacks.

However, non-control data attacks cannot divert the control flow to arbitrary location, unlike ROP attacks, the expressiveness is believed to be very limited.

Example

In fact, non-control data attacks can offer rich exploits from common vulnerabilities.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
struct server{ int * cur_max, total, typ;} *srv;
int connect_limit = MAX_CONN; int *size, *type;
char buf[MAXLEN];
size = &buf[8]; type = &buf[12];
...
while (connect_limit--) {
readData(socketfd, buf); // stack bof
if (*type == NONE) break;
if (*type == STREAM) { // condition
* size = * (srv -> cur_max); // dereference
} else {
srv -> typ = * type; // assignment
srv -> total += * size; // addition
}
...
}

This code does not invoke any security-critical functions in its benign control-flow, and the vulnerability just corrupts a handful of local variables.

The assignment, dereference and the addition operation can be considered as data-oriented gadgets. The loop structure is called gadget dispatchers.

Data-Oriented Programming

DOP Overview

The key is to manipulate non-control data such the executed instructions do the attacker’s bidding. In order to give a concrete and systematic construction, we define a simple mini-language called MINDOP with a virtual instruction set and virtual register operands, in which payload we can specific.

The MINDOP language has 6 kinds of virtual instructions, each operating on virtual register operands. Each virtual operation is simulated by real x86 instruction sequences available in the vulnerable program, which we call data-oriented gadgets. The control structure allows chaining of gadgets, and the x86 code sequences that simulate the virtual control operations are referred to as gadget dispatchers. None of the gadgets or dispatchers modify any code-pointers or violate CFI in the real program execution.

Semantics Instructions in C Data-Oriented Gadgets in DOP
arithmetic/logical a op b *p op *q
assignment a = b *p = *q
load a = *b *p = **q
store *a = b **p = *q
jump goto L vpc = &input
conditional jump if a goto L vpc = &input if *p

Data-Oriented Gadgets

Virtual operations in MINDOP are simulated using concrete x86 instruction sequences in the vulnerable program execution. Such instruction sequences or gadgets read inputs from and write output to memory locations which simulate virtual register operands in MINDOP. Gadgets is within the legitimate CFG of the program, therefore, between two gadgets there maybe several uninteresting instructions which out of the attacker’s control.

Conceptually, a data-oriented gadget simulates three logical micro-operations: the load micro-operation, the intended virtual operation’s semantics, which are different for each gadget, and the final store micro-operation.

C code srv -> total += *size
ASM Code 1. mov (%esi), %ebx  // load micro-op
2. mov 0x4(%edi), %eax  // load micro-op
3. add %ebx, %eax  // addition
4. mov %eax, 0x4(%edi)  //store micro-op

Data-oriented gadgets are similar to code gadgets employed in return-oriented programming or in jump-oriented programming. However, there are two differences between data-oriented gadgets and code gadgets. First, data-oriented gadgets require to deliver operation result with memory. Second, data-oriented gadgets must execute in at least one legitimate control flow, and need not execute immediately one after another.

Simulating Arithmetic Operations. With addition over arbitrary values, it is possible to simulate multiplication efficiently if the language supports conditional jumps. MINDOP supports conditional jumps which allow to check if a value is smaller / greater than a constant. This can help us to compute the bit-decomposition of a finite-size integer. With bit-decomposition, simulating a multiplication a * b reduces to the efficient shift-and-add procedure, adding a to itself in each step conditional on the bits in b.

Simulating Assignment Operations. In MINDOP, assignment gadgets read data from one memory location and directly write to another memory location, e.g., load -> move -> store.

Simulating Dereference (Load/Store) Operations. In data-oriented programming. registers are simulated by memory, therefore the memory dereference is simulated by two memory dereferences.

Gadget Dispatcher

Various gadgets can be chained sequentially by gadget dispatchers to enable recursive computation. Gadget dispatchers are sequences of x86 instructions that equip attackers with the ability to repeat gadget invocations and, for each invocation, to selectively activate specific gadgets.

Each iteration executes a subset of gadgets using outputs from gadgets in the previous iteration, which done by change the load address if iteration i+1 to the store addresses of iteration i. Additionally, the selector’s behavior is controlled by attackers through the memory error. The corruption is done in a way that it enables only the gadgets of the attacker’s choice. These gadgets take as input the outputs of the previous round’s gadget by selectively corrupting operand pointers. The remaining gadgets may still get executed, but their inputs and outputs are set up such that they behave like NOPs.

An interactive attack means attacker can prepare the memory state at the start of loop iteration i in a way that the desired gadget works as required and other gadgets operate on unused memory.

Another class of DOP attacks are non-interactive, whereby the attacker provides the entire malicious input as a single data transmission. In such a scenario, all the memory setup and conditions for deciding loop termination and selective gadget activation need to be encoded in a single malicious payload. To support such attacks, MINDOP has two virtual operations that enable conditional chaining of operations, or virtual jumps. The basic idea is as follows: the attacker provides the memory configuration Mj necessary for each gadget j to be selectively executed in a particular iteration in the input payload. In addition, it keeps a pointer called virtual PC which points to the desired configuration Mj at the start of each iteration. It suffices to corrupt only the virtual PC, so that the program execution in that iteration operates on the configuration Mj. To decide how to switch to Mk in the next instruction, MINDOP provides virtual operations that set the virtual PC, conditionally or unconditionally.

Simulating Jump Operations. The key here is to identify a suitable variable to implement a virtual PC which can be corrupted in each loop iteration. To simulate an unconditional jump, attackers just prepare the memory configuration to trigger another operation gadget (e.g., addition, assignment) to change the value of the virtual PC.

DOP Attack Construction

Challenges

Though the concept of data-oriented programming is intuitive, it is challenging to construct data-oriented attacks in real-world programs:

  • Data-oriented gadget identification. We use static analysis as an aid in identifying these gadgets.

  • Gadget dispatcher identification. Our gadget dispatcher requires a loop with various gadgets and a selector controlled by the memory error. But it is possible to have the selector and gadgets inside the functions called from loop body.

  • Data-oriented gadget stitching. The reachability of gadgets depends on concrete memory errors. We need to find malicious input that makes the program execute selected gadgets with the expected addresses and order.

Gadget Identification

A useful data-oriented gadget needs to satisfy the following requirements:

  • MINDOP semantics. It should have instructions for the load micro-operation, the store micro-operation, and others simulating semantics of MINDOP.
  • Gadget internal order. The three micro-operations should appear in the load-operation-store order, and this correct order should show up in at least one legitimate control flow.
1
2
3
4
5
6
7
8
9
10
11
12
13
Input:  G:- the vulnerable program
Output: S:- data-oriented gadget set

S = ∅;
FuncSet = getFuncSet(G)
foreach f ∈ FuncSet do
cfg = getCFG(f)
for instr = getNextInstr(cfg) do
if isMemStore(instr) then
gadget = getBackwardSlice(instr, f)
input = getInput(gadget)
if isMemLoad(input) then
S = S ∪ {gadget}

We compile the program source code into LLVM intermediate representation (IR) and perform our analysis on LLVM IR. Our analysis iterates through all functions in the program. We treat each store instruction in the function as a store micro-operation of a new potential gadget. Then our analysis uses a backward data-flow analysis to identify the definitions of the operands in the store instruction. The generated data-flow contains the instructions that derive the operands, like loaded from memory or calculated from registers.

Gadget Classification. We classify data-oriented gadgets into different categories based on their semantics and computed variables. Gadgets with the same semantics are functional-equivalent to simulate one MINDOP operation. There are no function call gadgets in data-oriented programming, as it does not change the control data. Based on the computed variables, we further classify gadgets into three categories: global gadget, function-parameter gadget and local gadget.

Dispatcher Identification

We use static analysis on LLVM IR for the initial pharse of dispatcher identification. Since loops are necessary for attackers to repeatedly connect gadgets, we first identify all possible loops in the program. For each loop, we scan the instructions in the loop body to find interesting gadgets. For function calls within the loop, we step into functions through the call graph and iterate through all instruction inside.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
Input: G:- the vulnerable program
Output: D:- gadget dispatcher set

D = ∅;
FuncSet = getFuncSet(G)
foreach f ∈ FuncSet do
foreach loop = getLoop(f) do
loop.gadgets = ∅
foreach instr = getNextInstr(loop) do
if isMemStore(instr) then
loop.gadgets ∪= getGadget(instr)
else if isCall(instr) then
target = getTarget(instr)
loop.gadgets ∪= getGadget(target)
if loop.gadgets != ∅ then
D = D ∪ {loop}

The second phase of dispatchers identification correlates the identified dispatcher candidates with a known memory error. In this phase, we use a static-dynamic approach to provide identification results with varying degrees of coverage and precision. We mark a loop as reachable if it enfolds the given memory error.

Attack Construction

For a given concrete memory error, the available gadgets and dispatchers rely on the location of the vulnerable code in the program, while the stitchability of gadgets depends on the corruptibility of the memory error. To connect two disjoint data-oriented gadgets, attackers should have the control over the address in the load micro-operation of the second gadget or the address in the store micro-operation of the first gadget.

  1. Gadget preparation (semi-automated). Given a memory error, we locate the vulnerable function from the source code. Then we identify the gadget dispatchers that enfold the vulnerable code and collect data-oriented gadgets.

  2. Exploit chain construction (Manual). We take the expected malicious MINDOP program as input.

  3. Stichability verification (Manual). We feed concrete input to the program to trigger memory errors to connect expected gadgets.

Evaluation

Feasibility of DOP

We study 9 applications and measure how many x86 gadgets in these programs can simulate MINDOP operations. We aim to evaluate the following four aspects in our analysis:

  • Empirically justify the choice of operations in MINDOP based on the prevalence of x86 gadgets.
  • Study the distribution of various types of gadgets based on the scope of input operands.
  • Measure the reachability of these x86 gadgets in concrete executions in presence of an exploitable memory corruption.
  • Verify if the memory error (in the public CVEs) have the capability to control the input operands and activate the gadgets in concrete executions.

Gadgets classification. We classify data-oriented gadgets into three categories based on the scope of the operands, such as global gadgets, function-parameter gadgets and hybrid gadgets

Execution reachability from memory errors. We run the vulnerable program with given CVE PoC to get dynamic function call trace, including the vulnerable function. From function call trace, we identify the function invoked by the vulnerable function during the execution. The gadgets inside the invoked function and enfolding loops are the reachable gadgets from the dispatcher.

Turing-Complete examples

ProFTPD is a file server and it’s 1.2-1.3 versions have a stack-based buffer overflow vulnerability in the sreplace function.

1
2
3
4
5
6
7
8
9
10
11
12
13
char *sreplace(char *s, ...) {
char *src; *cp, **mptr, **rptr;
char *marr[33], *rarr[33];
char buf[PR_TUNABLE_PATH_MAX] = {'0'};
src = s; cp = buf; mptr = marr; rptr = rarr;
...
while (*src) {
for (; *mptr; mptr++, rptr++) {
sstrncpy(cp, *rptr, blen - strlen(pbuf));
}
}
}
/*memory error & assignment*/
  • Conditional assignment operation. We use the sstrncpy function to simulate an assignment which moves data from on arbitrary location to another. In the first iteration of the while loop, the memory error corrupts the variable cp and the content of the array rarr. So in the next iteration, both the source and the target of the string copy sstrncpy are controlled by the attacker. This way, the attacks simulate a assignment operation which is conditional because the attacker can corrupt src.
1
2
3
4
5
6
7
8
9
int pr_display_file(...) {
...
outs = sreplace(p, buf, ..., "%V", main_server -> ServerName,);
pr_response_send_raw("%s-%s", code, outs);
}

void pr_response_send_raw(const chat *fmt, ...) {
vsnprintf(resp_buf, size, fmt, msg);
}
  • Dereference operations(Load/Store). The load operations takes two memory addresser as input(p and q) and performs operation *p == ** q. We decompose the operation into two sub-operations: * ptmp = * q and * p = * tmp, such that the ptmp is the address of tmp. We use the assignment gadgets to move data from resp_buf to &ServerName as the first dereference. Then we use the fuction pr_display_file, which reads the contents of ServerName to the buffer resp_buf as the second dereference. These two dereference from MINDOP load operation *resp_buf = ** resp_buf.
1
2
3
MODRET xfer_log_retr(cmd_rec *cmd) {
session.total_bytes_out += session.xfer.total_bytes;
}
  • Addition operation. The structure session is a global variable and hence all the operands of this gadget are under attacker’s control. To achieve an addition operation on arbitrary memory locations, we use the MINDOP assignment operation to load operands from desired source locations to the session structure, perform the addition, and then move the result to the desired destination location.
1
2
3
4
5
6
7
8
9
10
11
12
void cmd_loop( server)rec * server, conn_t *c) {
while (True) {
pr_netio_telnet_gets(buf, ..);
cmd = make_ftp_cmd(buf, ...);
pr_cmd_dispatch(cmd);
}
}

char * pr_netio_telnet_gets(char * buf, ...) {
while (* pbuf -> current != '\n' && toread > 0)
* buf++ = * pbuf -> current++;
}
  • (Conditional) jump operation. pbuf -> current is a pointer to next command in the input, thus forming a virtual PC for attacker’s MINDOP program. By corrupting pbuf -> current, the attacker can select a particular input that invokes a specific MINDOP operation. We use assignment operation to conditionally update virtual PC, thus simulating a conditional jump operation.

Wireshark is a widely used network packet analyzer and its versions before 1.80 have a stack-based buffer overflow vulnerability. The buffer pd in function packet_list_dissect_and_cache_record accepts frame data from a mpeg trace file. If the attacker sends a malicious trace file containing a large frame, the frame data overflows the buffer. This is used to overwrite variables col, cinfo, and parameter packet_list with malicious input.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
void packet_list_dissect_and_cache_record (PacketList * packet_list, ...) {
gint col; column_info * cinfo;
guint8 pd[WTAP_MAX_PACKET_SIZE]; // vul buf
// memory error function
cf_read_frame_r(..., fdata, ..., pd);
packet_list_change_record(packet_list, ..., col, cinfo);
}

void packet_list_change_record(PacketList * packet_list, ..., gint col, column_info * cinfo) {
record = packet_list -> physical_rows[row];
record -> col_text[col] = (gchar * )cinfo -> col_data[col];

if (!record-> col_text_len[col])
++packet_list -> const_strings;
}

void gtk_tree_view_column_cell_set_cell_data(...) {
for (cell_list = tree_column -> cell_list; cell_list;
cell_list = cell_list -> next) {
...
// finally calls vulnerable function
show_cell_data_func();
}
}
  • Assignment operation. Line record -> col_text[col] = (gchar * )cinfo -> col_data[col]; shows the gadget. The attacker corrupts record and cinfo to point to controllable memory location.

  • Dereference operation(Load / Store). To simulate a load operation, the attacker corrupts record->col_text and cinfo. To simulate a store operation, the attacker can change the value of record and cinfo->col_data.

  • Conditional addition operation. if (!record-> col_text_len[col]) and ++packet_list -> const_strings; if the attacker corrupts packet_list, she can invoke it to perform add operation on the target location.

  • Conditional jump operation. The memory error is triggered by the file read, and the program maintains a file position indicator in the FILE structure. The attacker can change the file position indicator which serves as a virtual PC.

To chain a large number of gadgets together, we identify a gadget dispatcher from the parent function gtk_tree_view_column_cell_set_cell_data. In the first invocation of the memory error, the attacker uses the assignment operation to corrupt the loop condition cell_list, and points it to a fake linked-list in the malicious payload, making it an infinite loop.

Why are expressive payload useful

Bypassing Randomization Defenses. We show how to defeat ASLR with DOP without leaking any addresses to the network. As a real example, consider the vulnerable ProFTPD server, which internally uses OpenSSL for authentication. Our goal is to leak the server’s OpenSSL private key. We found that the private key has a chain of 8 pointers pointing the private key buffer.

The key idea is to use a short MINDOP virtual program that starts from the base pointer (of known location) and dereference it 7 times within the server’s memory to correctly determine the randomized location of the private key. Once we have the private key buffer’s address, we simply replace an address used by a public output function, causing it to leak the private data to the network.

Introduction

SGX isolates sensitive code and data from the OS, hypervisor, BIOS and other applications. Besides, SGX code and data is always encrypted as soon as it leaves the CPU. sensitive data, e.g. cryptographic keys, of applications is protected by SGX containers called enclaves, which can be dynamiclly created while the applications, known as hosts, is running. enclaves provide predefined entry points to hosts performing sensitive computation.

Ideally, the enclave code only includes minimal carefully-inspected code, which could be formally proven to be free of vulnerabilities. However, legacy applications can be adapted as well to run inside SGX enclaves with necessary modifications. Formally proving or manually inspecting legacy applications is not feasible, meaning that memory corruption vulnerabilities occurs in enclaves with high probability.

Recently, Dark-ROP[1] was proposed to leverage memory-corruption against SGX. Dark-ROP is based on several oracles, which inform the attackers about the internal status of encalves, and return-oriented programming. However, Dark-ROP requires a non-randomized memory layout to locate secret code and data after crashing. Therefore, an implementation of SGX randomization called SGX-Shield[2] mitigates Dark-ROP attack.

However, SGX-Shield does not randomize the part of SGX SDK that handles transitions between host code and enclave code, which contains a number of gadgets to mount a ROP attack. This paper demonstrates that the interface code is enough to mount powerful run-time attacks and bypass SGX-Shield without requiring kernel privileges.

The Guard’s Dilemma

Controlling registers is essential in any code-reuse attack, which can prepare data for subsequent gadgets or set arguments for function calls. Thus, attackers always use specific register-setting gadgets, e.g. pop gadgets, to control registers. But, this paper allows attacker to use whole functions in tRTS as building blocks instead of small gadgets. Here lies the dilemma: the SDK is an important part in creating enclaves, but in this case it is actually exposing them to attacks.

two new exploitation primitives:

  • The ORET primitive allows attacker to gain access to a critical set of CPU registers by exploiting a stack overflow vulnerability.
  • The CONT primitive allows attacker to gain access to all general-purpose registers, with the control of a register (on x86_64, rdi).

Overview and Attack Workflow

Primitives

  • ORET primitive. asm_oret function is used to restore the CPU context after a OCALL. Once the attacker control the instruction pointer(hijacking control flow) and stack contents, e.g. stack overflow or format string, she can set a subset of CPU registers, e.g. rdi, rip.

  • CONT primitive. continue_execution function is meant to restore CPU context after an exception. The prerequisite is calling this function with a controlled rdi register, e.g. exploiting a memory corruption affecting a function pointer. The attacker can control over all general-purpose CPU registers.

  • ORET+CONT loop. the basic idea behind the attack is to use CONT primitive repeatedly to invoke the various gadgets with correct register values.

Each iteration of this loop executes one gadget and is stuctured as follows:

  1. A CONT primitive manipulates the stack pointer to hijack it into attack-controlled memory and executes a gadget.
  2. Once the gadget completes, the previous stack manipulation cause the execution of an ORET primitive.
  3. The ORET primitive triggers the CONT primitive for next gadget, continuing the cycle from the first step.

Workflow

  1. Payload preparation
    The attack performs static analysis on the enclave binary to determine the gadgets in the non-randomized part of binary, e.g. tRTS. Next, she construct a gadgets chain and defines the register states that should be set before executing each gadget, e.g. function argument register. According to Threat Model, attacker knows the memory address layout, including enclave binary offset. Also, she has to determine the offset of asm_oret and continue_execution(both in tRTS).

  2. Fake structures preparation
    The primitives work by abusing functions intended to restore CPU contexts by tricking them into restore fake contexts. Contrast to a standard ROP exploit, attacker requires a number of memory structures to hold the fake context and execution primitives.

  • Multiple fake exception information structures

  • Fake stack is a supporting structure for the ORET+CONT loop that serves two purpose. On the one hand, it is used to bring control back to an ORET primitive after a gadget executes. On the other one hand, it contains fake context for the transition from the ORET primitive to the CONT primitive to continue the loop.

  1. Attack execution
    When the vulnerability satisfies the CONT preconditions (e.g., exploitation of an indirect function call), the attacker can execute the first CONT directly. When the vulnerability satisfies the ORET preconditions (e.g., stack overflow), the attacker can set the first function argument register and the instruction pointer.

Details

ORET Primitives

ORET primitive abuses the asm_oret function to restore CPU context from the OCALL frame saved on the stack. The prototype of this function is:

1
sgx_status_t asm_oret(uintptr_t sp, void *ms);

the first argument (sp) points to the OCALL frame, which contains the partial CPU context to be restored, including saved values for rbp, rdi, rsi, and r12 to r15.

1
2
3
4
5
6
7
8
9
10
11
12
typedef struct _ocall_context_t {
uintptr_t r15;
uintptr_t r14;
uintptr_t r13;
uintptr_t r12;
uintptr_t xbp; //rbp
uintptr_t xdi; //rdi
uintptr_t xsi; //rsi
uintptr_t xbx; //rbx

uintptr_t ocall_ret
} ocall_context_t;

Attacker able to control the OCALL frame can set all registers mentioned; moreover, the new instruction pointer (rip) can also be set.

The values of rsp and rip after asm_oret depend on the SGX SDK version. For versions earlier than 2.0, the stack pointer is set to point to the ocall_ret field before issuing a ret instruction. Hence, the new instruction pointer will be the value of ocall_ret, and the new stack pointer will pointer to the memory location immediately following the OCALL frame.

From the version 2.0, a more traditional epilogue is used: the base pointer(rbp) is moved into rsp, the rbp is popped from the stack, and finally a ret is issued. Therefore, rbp in the OCALL frame points to a memory area containing two 64-bit words: the new value for rbp, and the return address(new instruction pointer).

The first operation done by asm_oret is shifting rsp to the sp argument, i.e., the top of OCALL frame. A attacker can jump to the code after the function prologue and let asm_oret believe that the OCALL frame is at the top of the current stack. It is always possible to abuse asm_oret to restore a fake OCALL frame at the top of the stack, without the need to control the first argument, by jumping to an appropriate instruction inside asm_oret.

An attacker who has control over the stack contents can reuse asm_oret to set the registers. The application is vulnerable to a buffer overflow error on the stack. The attacker exploits this to overwrite the return address with the address of asm_oret, properly adjusted to account for skipped instructions. Moreover, she places a fake ocall_context_t immediately after the return address. Once function call returns, control is transferred to asm_oret with the fake OCALL frame at the top of stack.

CONT Primitive

The CONT primitive is based on exception handle function continue_execution, which used to restore CPU context from a exception information structure.

1
continue_execution(sgx_exception_info_t *info);
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
typedef struct _sgx_exception_info_t {
sgx_cpu_context_t cpu_context;
sgx_exception_vector_t exception_vector;
sgx_exception_type_t exception_type;
}sgx_exception_info_t;

typedef struct _cpu_context_t {
uint64_t rax;
uint64_t rcx;
uint64_t rdx;
uint64_t rbx;
uint64_t rsp;
uint64_t rbp;
uint64_t rsi;
uint64_t rdi;
uint64_t r8;
//...
uint64_t r15;
uint64_t rflags;
uint64_t rip;
} cpu_context_t;

Note that the stack pointer(rsp) and the instruction pointer(rip) are part of this context, attacker can control stack pointer and hijack it to attacker-controled memory(fake stack). This technique is known as stack pivoting.

As a example, continue_execution can be reused by corrupting a function pointer and hijacking it to it, moreover the attacker needs to control rdi or the memory pointed to by rdi.

Fake stack

The fake stack is used to chain CONT to ORET, and it is composed of a sequences of frames, which consists of the address of asm_oret(properly adjusted) followed by an ocall_context_t structure. The CONT in the loop invokes a gadget with stack pointer set to the top of fake stack. The address of asm_oret will be at the top of the stack before gadget return. Therefore, the gadget will return to asm_oret, launching an ORET primitive and restore the fake context from frame. The fake context is set up so the rdi points to a fake exception structure and the instruction pointer is set to continue_execution.

Fake exception information

For each gadget, the attack sets up a fake sgx_exception_info_t structure with desired register values and instruction pointer to the gadget’s address. The stack pointer is set to the top of fake stack.

Attacking SGX-Shield

Overview on SGX-Shield

  • Fine-grained randomization
  • Software DEP
  • Software Fault Isolation
  • Coarse-grained Control Flow Integrity

Because SGX-Shield needs writable code pages during loading, the enclave code will stay writable for the whole enclave’s lifecycle.

Exploit

Assumption: a stack overflow vulnerability in the enclave.
Observation: SGX-Shield enclaves feature writable code pages.
Idea: the first stage, based on code reuse, injects the second-stage code, also known as shellcode.

First stage

  1. Payload preparation. The attacker starts by determining the offsets of asm_oret and continue_execution. Next, for the code-injection attack, the attackers needs a gadget (from do_rdrand function in tRTS) to write to memory.

    1
    2
    3
    mov eax, (rcx)
    mov 1 , eax
    ret

    Our chain repeatedly invoke this gadget to write the shellcode 4 bytes a time. The address to place the shellcode at is taken from the writeable SGX-Shield code pages.

  2. Fake structures preparation. The attacker starts by creating a fake stack that contains the address of continue_execution repeated n-1 times, where n is the number of gadgets in the chain. A sgx_exception_info_t structure is set up for all registers, with rip is set up for the shellcode’s address and the other registers at the attacker’s discretion.

  3. Attack execution. The attacker triggers the stack overflow in the enclave. She overwrites a return address with the address of asm_oret, and place a fake ocall_context_t after the address. The ocall_ret set to the address of continue_execution and the rdi set to the fake sgx_exception_info_t as the argument. This will result in continue_execution being called on fake exception structure, which starts the chain. The fake exception structure set up the rip to gadget’s address, the rax and rcx to proper value to place attacker’s code in SGX-Shield code pages. The rsp will point at fake stack, which are all continue_execution.

Second stage

The Shell code has full control over the enclave. In this case, attacker extract cryptographic keys used during remote attestation process through the shellcode. Therefore, she can sent these keys to remote server.

Installation

Via source code

  1. Download tar ball from release link

  2. Dependencies

    1
    2
    apt install libssl-dev liblzo2-dev libpam0g-dev
    apt install easy-rsa
  3. Build source code

    1
    2
    3
    4
    5
    tar xfz ./openvpn<version>.tar.gz
    cd openvpn<version>
    ./configure
    make
    make install

Via official apt repositories

1
2
3
wget -O - https://swupdate.openvpn.net/repos/repo-public.gpg|apt-key add -
echo "deb http://build.openvpn.net/debian/openvpn/stable xenial main" > /etc/apt/sources.list.d/openvpn-aptrepo.list
apt update && apt install openvpn

Via script

1
2
3
curl -O https://raw.githubusercontent.com/Angristan/openvpn-install/master/openvpn-install.sh
chmod +x openvpn-install.sh
./openvpn-install.sh

Setup

CA

1
make-cadir CA && cd CA

In CA directory, edit the vars file and set the KEY_COUNTRY, KEY_PROVINCE, KEY_CITY, KEY_ORG and KEY_EMAIL parameters.

1
2
3
source ./vars
./clean-all
./build-ca

Key generation

In keys directory, generate the private keys, certification and Diffie Hellman parameters.

  1. Server key generation
    1
    ./build-key-server server
  2. Client key generation
    1
    2
    3
    ./build-key client1
    ./build-key client2
    ./build-key client3
  3. Diffie Hellman parameters generation
    1
    ./build-dh

Here is an explanation of the relevant files in keys subdir:

Filename Needed By Purpose Secret
ca.crt server + all clients Root CA certificate NO
ca.key key signing machine only Root CA key YES
dh{n}.pem server only Diffie Hellman parameters NO
server.crt server only Server Certificate NO
server.key server only Server Key YES
client1.crt client1 only Client1 Certificate NO
client1.key client1 only Client1 Key YES

Configuration

  • In VPNServer, put server.conf, ca.crt, server.crt, server.key and dh{n}.pem in the same directory(e.g. /etc/openvpn/).
  • In Client, put client.conf, client{n}.crt, client{n}.key and ca.crt in the same directory.
  • In client.conf, modify the remote value to VPNServer’s IPv4 address, modify cert and key to corresponding filename.

Running

  1. In VPNServer:
    1
    openvpn --config /etc/openvpn/server.conf --daemon
  2. In Client:
    1
    openvpn --config /etc/openvpn/client{n}.conf --daemon

CTF

Wargame

Wargames are similar to CTF but always ongonging. Typically, they are always organized into levels.

  1. Microcorruption

  2. Smashthestack

CTF

Following the steps in Resources.

And reading the Tips.

LLVM

CPP

  1. Finishing the tutorial in cplusplus.com.

  2. Practices in Codewars.

Compiler

Compilers, Principles, Techniques and Tools.pdf

Paper Notes

  1. Code reuse attack
  2. side-channel related