Securing software by enforcing data-flow integrity
Introduction
Most software is written in unsafe languages like C and C++, which is vulnerable to attacks. Almost all these attacks subvert the intended data-flow in the program. Control-data attacks exploit buffer overflows or other vulnerabilities to overwrite a return address, a functional pointer. Non-control-data attacks exploit similar vulnerabilities to overwrite security critical data without subverting the intended control-flow in the program.
This paper present a technique that prevent both attack by enforcing data-flow integrity. This technique computes a data-flow graph for a vulnerable program using static analysis, and instruments the program to ensure that flow of data at runtime is allowed by the data-flow graph.
We implemented data-flow integrity enforcement which using reaching definition analysis to compute a static data-flow graph. For each value read by an instruction, it computes the sets of instructions that may write the value.
To enforce data-flow integrity at runtime, our implementation instruments the program to compute the definition that actually reaches each use at runtime. It maintains a table with the identifier of the last instruction to write to each memory location. The program is instrumented to update this table before every write and prevent attacker from tampering this table. We also instrument reads to check if the identifier of the instruction that wrote the value being read is an element of the set computed by the static analysis.
Data flow integrity enforcement
Overview
The first phase uses static analysis to compute a data-flow graph for the vulnerable program. The second instruments the program to ensure that the data-flow at runtime is allowed by the graph. The last one runs the instruments program and raises an exception if data-flow integrity is violated.
1 | int authenticated = 0; |
We use reaching definitions analysis to compute the static data-flow graph. An instruction that writes to a memory position defines the value in the memory position, and an instruction that reads the value is said to use the value. The analysis computes the set of reaching definitions for each use and assigns an identifier to each definition. It returns a map from instructions to definition identifiers and a set of reaching definition identifiers for each use, which we call the static data-flow graph.
The analysis can be imprecise but it is important that it be conservative. It must include in the set all definitions that may reach a use at runtime but it may include additional definitions.
The second phase instruments the program to enforce a simple safety property that we call data flow integrity, i.e., whenever a value is read, the definition identifier of the instruction that wrote the value is in the set of reaching definitions for the read.
The program is instrumented to compute the definition that reaches each read at runtime and to check if this definition is in the set of reaching definition identifiers that was computed statically. We maintain the runtime definitions table (RDT) that records the identifier of the last instruction to write to each memory position. Every write is instrumented to update the RDT. The instrumentation before reads uses the address of the value being read to retrieve the identifier from the RDT. Then, it checks if the identifier is in the statically-computed set.
The attacker must be prevented from tampering with the RDT (Any attempt to write to the RDT generates an exception), tampering with the code (Read-only protection for code pages) or bypassing the instrument (Instrumentation of control-data, e.g., CFI).
Static analysis
We compute reaching definitions using a combination of two analyses: a flow-sensitive intra-procedural analysis and a flow-insensitive and context-insensitive inter-procedural analysis.
The intra-procedural analysis takes flow control into account. We use this analysis to compute reaching definitions for uses of local variables that have no definitions external to the function in which they are declared. The inter-procedural analysis is used to compute reaching definitions for all other uses.
The inter-procedural analysis is less precise to allow it to scale to large programs. It ignore control-flow and it does not take the calling context into account when analyzing functions. We implemented points-to analysis to compute the set of objects that each pointer can point to, and we use these point-to sets to compute reaching definitions.
The points-to analysis makes a global pass over all source files to collect subset constraints. Each assignment x = y results in a subset constraints x ⊇ y, which means that the set of possible values of x contains the set of possible values of y. The analysis compiles each source file to IR, and it writes all subset constraints in the IR to a file. After this global pass, it computes the points-to sets by iterating over all the constraints until it reaches a fix point.
During the global pass, we also collect the target locations and identifiers of instruction that write to locations that may be read in other functions.
We compute inter-procedural reaching definitions using the points-to sets and information about write instructions collected during global pass.
For uses of variables, the set of reaching definitions is the union of the set containing the identifiers of all writes to the variables with the sets containing the identifiers of all writes to dereferences of pointers that may point to the variable. For pointer dereferences, the set of reaching definitions is the union of set containing the identifier of all writes to the dereferences pointer with the sets of reaching definitions of all the variable the pointer can point to.
Both the intra-procedural and the inter-procedural analyses assume that correct programs do not use pointer arithmetic to navigate between independent objects in memory. However, it is precisely this assumption that is violated by most attacks.
Since we control code generation, we can ensure that these temporaries are placed in registers beyond the reach of an attack. The attacker cannot voilate data-flow integrity by overwriting these registers because our instrumentation prevents it from subverting thr control flow. We only compute reaching definitions and instrument accesses of temporaries that are spilled to memory.
Instrumentation
We add instrumentation by inserting new instruction into the IR of the program.
- SETDEF opnd id
- CHECKDEF opnd setName
The first instruction sets the RDT entry for opnd to id. The second retrieves the runtime definition identifier for opnd from the RDT and checks if the identifier is in the reaching definitions set with name setName. The compiler maintains a map from the setName to set values that is used when lowering CHECKDEF instructions to the assembly of the target machine.
We do not instrument temporaries that we can ensure are allocated to registers, and we also do not instrument the use of local variables of which address are computed by adding a constant to the frame pointer.
To enable efficient accesses, the RDT is implemented as an array with a definition identifier for each 32-bit memory word in the instrumented program. Each definition identifier is two bytes long, which seems sufficient even for large programs (2 ^ 16 identifiers).
Since programs can access memory at byte granularity, it would seem necessary to record a 2-byte definition identifier for every byte of memory. This would result in a space overhead of approximately 200%, which is not practical. We are able to record a single identifier for each 32-bit word because we can generate code in which no two variables with distinct reaching definition sets share the same aligned 32-bit memory word. We only changed the compiler to use a minimum alignment of 32 bits when laying out local variables in a stack frame and globals in the data segment.
In current implementation, we allocate the lowest 1 GB of virtual address space to the program being instrumented and 512 MB to the RDT with a guard page between them, that is, the guard page is at address 4000 0000h and the base address of RDT is at 4000 1000h. This layout also enables efficient bounds checking of the target addresses: we raise an exception if the bitwise and of the target address with c000 0000h is non-zero.
The high-level instrumentation is lowered to x86 assembly as illustrated by the following examples:
1 | # SETDEF authenticated 1 |
Optimizations
A naive implementation of a data-flow integrity enforcement can perform poorly: each definition introduces a write to the RDT and each use check introduces a read from the RDT followed by comparisons against each identifier in the set of reaching definitions for the use.
Renaming equivalent definitions
The first Optimization partitions definitions into equivalence classes in a way that allows us to safely assign the same identifier to all definitions in the same class. Two definitions are equivalent if they have a exactly the same set of use.
Removing bounds check on writes
We can optimize SETDEFs by removing these checks from all safe writes. In the current implementation, a write is safe if the target address is obtained by adding a small constant offset (possibly) zero to the stack pointer, frame pointer, or to the address of a global or static variable.
Removing SETDEFs and CHECKDEFs
To identify redundant instrumentation, we use symbolic execution of the native code augmented with SETDEF and CHECKDEF operations. Two addresses are equal if they are syntactically equal. They are different if they are computed by adding different offsets to the same symbolic register state. Otherwise, they may refer to aliased memory locations. A write to memory invalidates the symbolic state of a register if the state refers to the content of a memory position that may be aliased with the write’s target. Additionally, it removes mappings for any memory that may be aliased with the write’s target from the symbolic RDT state. We apply the rules to eliminate redundant instrumentation after each SETDEF and CHECKDEF by examining the symbolic RDT state.
Optimizing membership checks
Another optimization renames definitions to reduce the cost of membership checks in the CHECKDEFs. Membership checks can be implemented more efficiently when sets contain ranges of consecutive identifier: a check against {0 .. n} can be implemented by a single unsigned integer camparison against n.
Removing SETDEFs for safe difinitions
The last optimization identifies local variables that have no definitions outside the function and that are written only by safe writes. It replaces all SETDEFs for such a variable by a single SETDEF with identifier 0 that is placed on function entry.
Evaluation
Overhead
We used serveral programs from the SPEC CPU 2000 benchmark suite to measure overhead added by our instrumentation. These benchmarks are CPU-intensive and they spend most time executing instrumented code at user level.
The first set of experiments measured two variants of the overhead of data-flow integrity enforcement: intraproc DFI only instruments uses of control-data and use of local variables without definitions outside their function, and interproc DFI is the variant described earlier in the paper.
The average overhead of execution time is 43% for intraproc DFI and 104% for interproc DFI.