Defenses against memory-corruption typically reduce the attack surface by preventing the adversary from corrupting part of the application’s memory which is essential for a successful attack, such as W⨁X, CFI, CPI, and DFI.
Some of these defenses can be implemented efficiently using mechanisms that reside entirely outside the underlying application process. For instance, the kernel configures W⨁X and the hardware enforces it. However, using an external mechanism is not always feasible in practice due to high performance overhead. For instance, CFI requires run-time checks and a shadow stack, which is updated every time a function is invoked or returns. CPI requires run-time checks and a safe region, which contains meta-data about the program’s variables. This data cannot be stored outside of the process, e.g., in kernel memory, because accessing it would impose an impratical performance overhead due to the time needed for a context switch. Hence, to prevent the adversary from accessing the data some form of in-process memory isolation is needed.
Goals and Contribution. In this paper, we present IMIX, which enables lightweight in-process memory isolation for memory-corruption defenses that target the x86 architecture. IMIX enables isolated pages, which marked with a special flag. Isolated pages can only be accessed using a single new instruction called
Memory corruption. We assume the presence of a memory-corruption vulnerability, which the adversary can repeatedly exploit to read and write data according to memory access permissions.
Sandboxed code execution. We assume memory-corruption mitigations cannot be bypassed unless the attacker can corrupt the mitigation’s metadata.
Immutable code. The adversary cannot inject new code or modify existing code.
Like for applications, the correct functionality of defenses relies on the integrity of their code and data. Thus, the attacker may leverage a memory-corruption vulnerability in the application to bypass those defense. Traditionally, defense developers enforce the integrity of the code using W⨁X or execute-only memory, which forces defense developers to choose between high performance overheads and compromised security. IMIX allocates data belonging to run-time mitigations in isolated pages, which can only be accessed by
In addition to rhe
smov instruction and the associated access permissions, IMIX includes a kernel extension and compiler support.
Hardware. For IMIX, we extend two of the CPU’s main responsibilities, instruction processing and memory management. We add
smov instruction to the instruction set, reusing the logic of regular memory access instruction. The memory access logic is modified so that it will generate a fault if (1) an instruction other than
smov is used to access a page protected by IMIX, or if (2) an
smov instruction is used to access a normal page.
Kernel. We extend the kernel to support an additional access permission, which identifies all pages protected by IMIX. This enables protected memory allocation for code generated at runtime.
Compiler. IMIX provides two high-level primitives: one for allocating protected memory and one for accessing it. Mitigations like CPI are implemented as an LLVM optimization pass that works at the IR level. For applications developers, IMIX provides source code annotations.
Developers can build programs with IMIX, using our extented Clang compiler. We also modified its backend to support
smov instructions. Programs protects by IMIX mark isolated pages using the system call
mprotect with a special flag. Therefore, we extented the kernel’s existing page-level memory protection functionality to support this flag and mark isolated pages appropriately. To support IMIX, the CPU must be modified to support the
smov instruction and must perform the appropriate checks when accessing memory.
We mapped the IMIX protection flag to an ignored bit in the PTE; specifically, we chose bit 52, as it is the first bit not reserved, and is normally ignored by the MMU. We used a hardware simulator to show the feasibility of our design.
Simulated hardware. We use Wind River Simics, a full system simulator, in order to simulator a complete computer which supports IMIX. And we use the complementary Intel Simulation and Analysis Engine (SAE) add-on to boot the Linux kernel and test our Linux extension. SAE supports emulating an x86 system running in a full operating system within its processes, while allowing various architecture instrumentations. This is done using extensions, called ztools.
To instrument a simulated system, ztools registers callback for specific hooks either at initialization time or dynamically. First, we make sure that our ztool is initialized by registering a callback for the initialization hook. Then, we register a callback that is executed when an instruction is added to the CPU’s instruction cache. If either a
smov instruction that accesses memory is found, we register an instruction replacement callback.
First, we check the protection flag of the memory accessed by the instruction. To identify protected memory, we look up the related PTE by combining the virtual address and the base address of the page table hierarchy linked from the
If a regular instruction attempts to access regular memory, we execute the original instruction to avoid instruction cache changes. For
smov instruction attempting to access an isolated page, we first remove the instruction from the instruction cache, and the execute our ztool implementation of this instruction.
Real hardware. Adding IMIX support to a real CPU would require extending the CPU’s instruction decoder to make it aware of our
smov instruction. Moreover, we need to modify the MMU to perform necessary checks.
The isolated pages need to be marked as such in the PTEs, which are located in kernel memory. We add a dedicated
PROT_IMIX flag into the
mprotect system call. Note that once a pages is marked as
PORT_IMIX, the only way to remove this flag from a page is by unmapping it first.
Our modification mainly concerns the IR to provide access to the
smov instruction to mitigations like CPI, and the x86 backend to emit the instruction. Further, we introduced an attribute that can be used the protect a single variable by allocating it in an isolated page.
IR Extension. Runtime defenses are usually implemented as LLVM optimization passes that interact with and modify LLVM’s IR. In order to allow those defenses to generate
smov instructions, we extended the IR instruction set. We created two IMIX instructions: sload and sstore.
LLVM IR instructions are implemented as C++ classes and therefore supports inheritance. We implemented our IR instructions to as subclass of their regular counterparts in order to reuse the existing translation functionality from LLVM IR to machine code, called lowering in LLVM parlance.
To allocate memory in isolated pages, we implemented an LLVM function that can be called from an optimization pass, which allocates memory at page granularity using
malloc and sets the IMIX permission using
Attribute support. We added a IMIX attribute which can be used to annotate C/C++ variables which should be allocated in isolated pages. All instructions accessing those annotated variables will use the IMIX IR instructions instead of regular ones. We implemented this as an LLVM optimization pass that replaces regular variable allocations with indexed slots in a IMIX protected safe region (one per compilation module).
Modification to x86 backend. In the backend, we added the code needed to process sload and sstore instructions. The process of lowering IR instructions to machine code is two-staged. First, the FastEmit mechanism is used. It consists of transformation rules explicitly coded in C++ that are too complex to be processed using regular expressions. The mechanism can be used either generate machine code directly, or to assign a rule that should be applied in the next stage. In the second stage, LLVM applies rule-based lowering using pattern matching. The IR instructions and its operands are matched against string pattern in LLVM’s TableGen definitions. We modified both stage of the lowering process, similarly to how load and store are handled.