Block-oriented programming
Introduction
Control-flow Integrity still allows the attacker control over the execution along two dimensions: the imprecision in the analysis and CFI’s statelessness; data-only attacks allow an attacker to influence conditional branches arbitrarily. With CFI, exploits become highly program dependent as the set of reachable gadgets is severely limited by CFI policy, so exploits must therefore follow valid paths in the CFG.
We present BOPC (Block Oriented Programming Compiler), an automatic framework to evaluate a program’s remaining attack surface under strong control-flow hijacking mitigation. BOPC compiles an “exploit” into a program trace, which is executed on top of the orignal program’s CFG. To express the desired exploits flexibly, BOPC provides a Turing-complete, high-level language: SPloit language (SPL). BOPC takes as input an SPL payload and a starting point and returns a trace through the program that encodes the SPL payload.
First, BOPC translates the SPL payload into constraints for individual statements and, for each statement, searches for basic blocks in the target binary that satisfy these constraints (called candidate blocks). At this point, SPL abstract register assignments from underlying architecture. Second, BOPC infers a resource (register and state) mapping for each SPL statement, iterating through the set of candidate blocks and turing them into functional blocks which can be used to execute a concrete instantiation of the given SPL statement. Third, BOPC constructs a trace that connect each functional block through dispatcher blocks. Since the mapping process is NP-hard, to find a solution in reasonable time BOPC first prunes the set of functional blocks per statement to constrain the search space and then use a ranking based on the proximity of individual function blocks as a heuristic when searching for dispatcher gadgets.
Assumption and threat model
Our threat model consists of a binary with a known memory corruption vulnerability that is protected with the state-of-art control-flow hijacking mitigation, such as CFI along with a shadow stack. Futhermore, the binary is also hardened with DEP, ASLR and Stack Canaries.
We assume that the target binary has an arbitrary memory write vulnerability. The attacker can write any value to any (writable) address which called an Arbitrary memory Write Primitive (AWP). To bypass probabilistic defenses such as ASLR, we assume that the attacker has access to an information leak, i.e., a vulnerability that allows her to read any value from any memory address which called Arbitrary memory Read primitive (ARP).
We also assume that there exist an entry point, i.e., a location that the program reaches naturally after completion of all AWPs (and ARPs). Thus BOPC does not require code pointer corruption to reach the entry point. Determining an entry point is considered to be a part of the vulnerability discovery process.
These assumption enable BOPC to inject a payload into a target binary’s address space, modifying its memory state to execute the payload. BOPC assumes that the AWP (and/or ARP) may be triggered multiple times to modify the memory state of the target binary. After the state modification completes, the SPL payload executes without using the AWP (and or ARP) further.
Design
- SPL payload
- Selecting functional blocks
- Searching for dispatcher blocks
- Stiching BOP gagdets
First, BOPC providing an exploit programming language, called SPL, that enables analysts to define exploits independent of the target program or underlying architecture. Second, to automate SPL gadget discovery, BOPC finds basic blocks from the target program that implement individual SPL statements, called functional blocks. Third, to chain basic blocks together in a manner that adheres with CFI and shadow stacks, BOPC searches the target program for sequences of basic blocks that connect pairs of neighboring functional blocks, which we called dispatcher blocks. Fourth, BOPC simulates the BOP chain to produce a payload that implements that SPL payload from a chosen AWP.
The BOPC design builds on two key ideas: Block Oriented Programming and Block Constraint Summaries. First, BOP constructs exploit programs called BOP chains from basic block sequences in the valid CFG of a target program. Each BOP gadget is one functional block that implements a statement in an SPL payload and zero or more dispatcher blocks that connect the functional block to the next BOP gadget in a manner that compiles with the CFG.
Second, BOPC abstracts each basic block from individual instruction into Block Constraint Summaries, enabling blocks to be employed in a varity of different ways. That is, a single block may perform multiple functional and/or dispatching operations by utilizing different sets of register for different purposes. That is, a basic block that modifies a register in a manner that may fulfill an SPL statement may be used as a functional block, otherwise it may be considered to serve as a dispatcher block.
BOPC leverages Block Constraint Summaries to apply blocks in multiple contexts. There are two cases: either the candidate dispatcher block’s summary constraints indicate that it will modify the register state set and/or the memory state by the functional blocks, called the SPL state, or it will not, enabling the computation to proceed with out disturbing the effects of the functional blocks.
An important distinction between BOP and ROP is that the problem of computing BOP chains is NP-hard. Conventional ROP assumes that indirect control-flow may target any executable byte in memory while BOP must follow a legal path through the CFG for any chain of blocks, resulting in the need of automation.
Expressing Payloads
BOPC provides a programming language, called SPloit language that allows analysts to express exploit payloads in a compact high-level language that independent of target program or processor architecture.
The architecture independence of SPL has important advantages. First, the same payload can be executed under different ISAs or operating systems. Second, SPL uses a set of virtual registers, accessed through reserved volatile variables.
The environment for SPL differs from that of conventional languages. Instead of running code directly on CPU, our compiler encodes the payload as a mapping of instructions to functional blocks. That is, the underlying runtime environment is the target binary and its program state, where payloads are executed as side effects of the underlying binary.
Selecting functional blocks
To generate a BOP chain for a SPL payload, BOPC must find a sequnce of blocks that implement each statement in the SPL payload, which called functional blocks.
Conceptually, BOPC must compare each block to each SPL statement to detemine if the block can implement the statement. However, blocks are in terms of machine code and SPL statements are high-level program statements. BOPC computes Block Constraint Summaries, which define the possible impacts that the block would have on SPL state. Block Constraint Summaries provide flexibly in the matching blocks to SPL statements because there are multiple possible mappings of SPL statements and thier virtual registers to the block and its constraints on register and state.
The constraint summaries of each basic block are obtained by isolating and symbolically executing it. The effect of executing a basic block creates a set of constraints, mapping input to the resultant output. Such constraints refer to registers, memory locations, jump types and external operatons(e.g., library calls).
Finding BOP gadgets
The combinaton of a functional block and dispatcher blocks is called a BOP gadgets. However, dispatcher paths between two functional blocks may not exist either because there is no legal path in the CFG between them, or the control flow cannot reach the next block due to unsatisfiable runtime constraint.
BOP gadgets are volatile: gadget feasibility changes based on the selection of prior gadgets for the target binary. The problem of selecting a suitable sequence of functional blocks, such as that a dispatcher path exists between every possible control flow tansfer in the SPL payload, is NP-hard.
We propose several heuristics and optimizations to find solutions in reasonable amount of time. BOPC leverages basic block proximity as a metric to “rank” dispatcher paths and organizes this information into a special data structure called a delta graph.
Searching for dispatcher blocks
While each functional block executes a statement, BOPC must chain multiple functional blocks together to execute the SPL payload. Functional blocks are connected through zero or more basic blocks that do not clobber the SPL state computed thus far.
Finding such blocks is challenging, thus, we propose a graph data structure called delta graph, to represent the state of the search for dispatcher blocks. The delta graph stores, for each functional block, the shortest path to the next candidate blocks.
The delta graph is a multi-partite, directed graph that has a set of functional block nodes for every payload statement. An edge between two functional blocks represents the minimum number of executed basic blocks to move from one functional block to the other, while avoiding clobbering blocks.
Calculating the shortest path between two basics in a CFG faces a challenge: while the CFG statically allow multiple targets, at runtime they are context sensitive and only have an concrete target.
Our context-sensitive shortest path algorithm is a recursive version of Dijkstra’s shortest path algorithm that avoids all clobbering blocks. Initially, each edge on the CFG has a cost of 1. When it encounters a basic block with a call instruction, it recursively calculates the shortest path starting from the calling function’s entry block (Be). if the destination block, Bd, is inside the callee, the shortest path is the concatenation of the two individual shortest paths from the beginning to Be and from Be to Bd. Otherwise, our algorithm finds the shortest path from the Be to the closest return point and uses this value as an edge weight for that callee.
After creation of delta graph, out algorithm selects exactly one node (i.e, functional block) from each set (i.e payload statement) to minimize the total weight of the resulting induced subgraph.
Stiching BOP gadgets
To find a dispatcher between two functional blocks, BOPC leverages concolic execution, which collects the required constraints that are needed to lead the execution to the next functional block.
To construct an SPL payload from a BOP chain, BOPC launches concolic execution from the first functional block in the BOP chain, starting with an empty state. At each step BOPC tries the first K shortest dispather path until it finds one that reaches the next functional block. The corresponding constraints are added to the current state. The search therefore incrementally adds BOP gadgets to the BOP chain.
Implementation
BOPC requires three distinct inputs:
- The exploit payload expressed in SPL,
- The vulnerable application on top of which the payload runs,
- The entry point in the vulnerable application, which is a location that the program reaches naturally and occurs after all AWPs have been completed.
The output of BOPC is a sequence of (address, value, size) tuples that describes how the memory should be modified during the state modification phase to execute the payload.
Binary Frontend
Using angr to lift the target binary into the VEX intermediate representation to expose the application’s CFG. Then, we translate each basic block into a block constraint summary. BOPC executes each basic block in an isolated environment, where every action (e.g., register and memory accesses) is monitored. Therefore, instead of working with the instructions of each basic block, BOPC utilizes its abstraction for all operations.(CFGa)
SPL Frontend
The SPL Frontend translates the exploit payload into a graph-based IR for further processing. For each statement sequence we build a dependence graph based on a customized version of Kahn’s topological sorting algorithm, to infer all groups of independent statements, Which can be executed out-of-order.
Locating candidate block sets
SPL is a high level language that hides the underlying ABI. Therefore, BOPC finds all possible ways to map individual elements from the SPL environment to the ABI (though candidate blocks).
Once CFGa and IR are generated, BOPC searches for and marks candidate basic blocks. For a block to be a candidate, it must “semantic match” with one (or more) payload statement. Note that variable assignments, unconditional jumps, and returns do not require a basic block and therefore are excluded from the search.
All statement that assign or modify registers require the basic block to apply the same operation on the same. For function calls, the requirement for the basic block is to invoke the same call, either as a system call or as a library call.
Upon a successful matching, BPC builds the following data stuctures:
The Register Mapping Graph, which is a bipartite undirected graph. The nodes in the two sets represent the virtual and hardware registers respectively. The edges represent potential associations between them.
The Variable Mapping Graph, which associates payload variables to underlying memory addresses.
The Memory Dereference Set, which has all memory addresses that are dereferenced and their values are loaded into registers.
Identifying functional block sets.
BOPC iteratively identifies, for each SPL statement, which candidate blocks can serve as functional blocks. This step determines for each candidate block if there is a resource mapping that satisfies the block constraints.
BOPC identifiers the concrete set of hardware registers and memory addresses that execute the desired statement.
This step determines, for each statement, which concrete registers and memory addresses are reserved. Merging this information with the set of candidate blocks constructs each block’s SPL state, enabling the removal of candidate blocks that are unsatisfiable.
Selecting functional blocks
Given the functional block set Fb, this step searches for a subset that executes all payload statements. The goal is to select exactly one functional block for every IR statement and find dispather blocks to chain them together.
Once the delta graph is generated, this step locates the minimum induced subgraph, that contains the complete set of functional blocks to execute the SPL payload. If the subgraph does not result in a solution, the algorithm tries the next minimum induced subgraph, until a solution is found or a limit is reached.
If the resulting deleta graph does not lead to a solution, this step “shuffle” out-of-order payload statement and build a new delta graph.
Discovering dispatcher blocks
The simulation phase takes the individual functional blocks and tried to find appropriate dispatcher blocks to compose the BOP gadgets. It returns a set of memory assignments for the corresponding dispatcher blocks.
BOPC is called to find a dispatcher path for every edge in the minimum induced subgraph. That is, we need to simulate every control flow transfer in the adjacency matrix of SPL payload. However, dispatchers are built on the binary’s execution state so far, so BOP gadgets must be stiched with the respect to the program’s current flow originating from the entry point.
Finding dispatcher blocks relies on concolic execution. Our algorithm utilizes functional block proximity as a metric for dispatcher path quality. However, it cannot predict which constrains will take exponential time to solve. Therefore, concolic execution selects the K shortest dispatcher paths relative to current BOP chain, and tries them in order until one produces a set of satisfiable constraints.
When simulation starts it also initializes any SPL variables at locations that are reserved during the variable mapping. These addresses are marked as immutable, so any unintended modification raises an exception.
Simulation traverse the minimum induced subgraph, and incrementally extends the SPL state from one BOP gadget to the next, ensuring that newly added constraints remain satisfiable. When encounting a conditional statement, BOPC clones the current state and continues building the trace for both paths independently.
Synthesizing exploits
If the simulation module returns a solution, the final step is to encode the execution trace as a set of memory writes in the target binary.
Discussion and future work
BOPC is limited by the granularity of basic blocks. That is, a combination of basic blocks could potentially lead to execution of a desired SPL statement, while individual blocks might not.
BOPC sets several upper bounds defined by user inputs. These configurable bounds include the upper limit of (i) SPL payload permutations P, (2) length of continuous block L, (3) of minimum induced subgraphs extracted from the delta graph N, (4) dispatcher paths between a pair of functional blocks K. These upper bounds along with the timeout for symbolic execution, reduce the search space, but prune some potentially valid solutions.