Fast and generic metadata management with mid-fat pointers

Introduction

We present mid-fat pointers, an approach to support targeted but composable metadata-based software defenses. The key idea is to strike a balance between traditional fat pointers and low-fat pointers by preserving the pointer size but changing its representation with embed data. This is done by piggybacking on SFI to decode the pointers efficiently. In such a scenario, its use by mid-fat pointers incurs no additional overhead.

With mid-fat pointers, each pointer can embed the location of the metadata for the pointed memory object. Compared to recent generic metadata management schemes, we improve performance because we need fewer lookups as the metadata location is cached in the pointer and security because an out-of-bounds pointer is still associated with the same metadata.

Threat model

Since mid-fat pointers provide a general defense framework, the threat model depends on particular defenses deployed. With regards to these defense, we only assume they require an SFI baseline. As a result, the metadata is inherently protected against both arbitrary memory reads and writes.

Design

Pointer encoding

One core insight for our framework is that, given the presence of SFI to protect metadata from attacker, we can use the unused bit in pointers without an additional performance impact. In particular, whenever the program allocates memory, we store a pointer to the metadata for that chunk of memory in the high bits of the pointer. This design has three benefits: (1) it requires a full metadata lookup only once at allocation time, caching the more frequent lookups needed for defenses and increasing efficiency; (2) since we know the base address of newly allocated objects, we do not need range queries; and (3) pointers that go-out-of bounds still point to the metadata for the original object.

By default, we use the 32 lower bits of every pointer to encode where the data is (data pointer), and 32 higher bits to encode where the metadata is (metadata pointer). The downside is that we can use only 32-bit addresses out of the 48 bits implemented on x86-64, which restricts the address space to 4 GB. Note, however, that we could use fewer bits for the metadata pointer by assuming that every object is aligned on 8-byte boundaries.

Allocation instrumentation

We use a compiler pass that searches for calls to memory allocation functions and adds a call to hook our static libraries. The hook looks up metadata for the allocated pointer and adds the metadata pointer in the uppermost bits of the returned pointer.

Load/store instrumentation. As the program can only dereference encoded pointer safely after decoding, our framework includes a compiler pass that identifies load and store operation add adds the data pointer decoder to each instance. We effectively implement SFI, shielding all memory past the first 4GB from attacker.

Call instrumentation. While instrumented code can safely dereference encoded pointers due to automatic masking, protected applications may well use unprotected libraries and we must not pass encoded pointers to them to avoid segmentation faults. We therefore instrument calls to external functions by decoding any pointer-type parameters. Note that we can identify which are imported from libraries because our pass is part of LLVM’s link-time optimization. For indirect calls our instrumentation might not know if the current value of a function pointer is to an external call or not. Instead, when a function is address-taken we provide a replacement function that calls the original function with masked arguments.

Pointer comparison and cast instrumentation. Besides dereferencing operations, compatibility issues arise when comparing pointers, and in integer operations on pointers. We could use static analysis to determine which pointers could be used in integer arithmetic and exclude them from our tagging scheme and SFI.