SGXBounds: Memory Safety for Shielded Execution

Introduction

Intel SGX provides an abstraction of secure enclave, which can be used to achieve shielded execution for unmodified legacy applications on untrusted infrastructure.

Shielded execution aims to protect confidentiality and integrity of applications when executed in an untrusted environment. The main idea is to isolate the application from the rest of the system, using only a narrow interface to communicate to the outside.

However, shield execution does not protect the program against memory safety attacks. To validate our claim, we reproduced publicly available memory safety exploits inside the secure enclave. These examples highlight that a single exploit can completely compromise the integrity and confidentiality properties of shield execution.

To prevent exploitation of these bugs, we experimented with two prominent software- and hardware based memory protection mechanisms in the context of shield execution: AddressSanitizer and Intel MPX, respectively.

But, both of them incur high performance overhead, due to additional metadata used to track object bounds.

In this paper, we present SGXBOUNDS. The SGXBOUNDS approach is based on a simple combination of tagged pointers and efficient memory layout to reduce overheads inside enclaves. In particular, we note that SGX enclave routine use only 32 lower bits to represent program address space and leave 32 higher bits of pointers unused. We utilize these high bits to represent the upper bound of the referent object (or more broadly the beginning of the object’s metadata area); the lower bound value is stored right after the object. Such metadata layout requires only 4 additional bytes per object and does not break cache locality — unlike Intel MPX and AddressSanitizer.

Futhermore, we show that our design naturally extends for: (1) “synchronization-free” support for multi-threaded application, (2) increased availability instead of usual fail-stop semantics by tolerating out-of-bounds accesses, (3) generic APIs for object’s metadata management to support new use-cases.

Background

SCONE is a shielded execution framework that enables unmodified legacy application to take advantage of the isolation offered by SGX. With SCONE, the program is recompiled against a modified standard C library (SCONE libc), which facilitates the execution of of system calls.

Clearly, the combination of SCONE and SGX is not a silver bullet: bugs in the enclave code itself can render these mechanisms useless.

Address Sanitizer is an extension to GCC and Clang/LLVM that detects the majority of object bounds violations. It keeps track of all objects, and checks whether the address is within one of the used objects on each memory access.

Intel MPX detects all possible spatial memory vulnerabilities including intra-object ones (When one member in a structure corrupts other members). The approach to achieving this goal is different from AddressSanitizer. Instead of separating objects by unaddressable redzones, MPX keeps bounds metadata of all pointers and check against these bounds on each memory access.

One major limitation of the current Intel MPX implementation is a small number of bounded registers. If an application contains many distinct pointers, it will cause frequent loads and stores of bounds in memory.

SGXBOUNDS

We built SGXBOUNDS based on the following three insights. First, shielded application memory (specifically, its working set) must be kept minimal due to the very limited EPC size in current SGX implementation. Second, applications spend a considerable amount of time iterating through the elements of an array, and a smartly chosen layout of metadata can significantly reduce the overhead of bounds checking. Third, we rely on the SCONE infrastructure with its monolithic build process: all application code is statically linked without external dependencies. The first and second insights dictate the use of per-object metadata combined with tagged pointers to keep memory overhead minimal.

Design overview

All modern SGX CPU operates in a 64-bit mode, meaning that all pointer are 64 bits in size. In SGX enclaves, however, only 36 bit of virtual address space are currently addressable. Thus, SGXBOUNDS relies on the idea of tagged pointers: a 64-bit pointer contains the pointer itself in its lower 32 bits and the referent object’s upper bound in the upper 32 bits.

The value stored in the higher 32 bits (UB) serves not only for the upper-bound check, but also as a pointer to the object’s other metadata (LB). The metadata is stored right after the referent object.

This metadata layout has important benefits: (1) it minimizes amount of memory for metadata, (2) it requires no additional memory accesses, (3) it alleviates problems of fat pointers concerning multi-threading and memory layout changes.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
int *s[N], *d[N]
s = specify_bounds(s, s + N)
d = specify_bounds(d, d + N)
for (i = 0; i < M; i++)
si = s + i
di = d + i
sp, sLB, sUB = extract(si)
if bounds_violated(sp, sLB, sUB)
crash(si)
val = load si
dp, dLB, dUB = extract(di)
if bounds_violated(dp, dLB, dUB)
crash(di)
store val, di

Design details

Pointer creation. Whenever an object is created, SGX-BOUNDS associates a pointer with the bound of this object.

For global and stack-allocated variables, we change their memory layout so they are padded with 4 bytes and initialize them at runtime. More specifically, we wrap such variables in two-member structures, e.g., int x is transformed into struct xwarp {int x; void* LB}:

1
2
3
4
5
6
void *specify_bounds(void *p, void *UB) {
LBaddr = UB;
*LBaddr = p;
tagged = (UB << 32) | p;
return tagged;
}

For dynamically allocated variables, SGXBOUNDS wraps memory management functions to append 4 bytes to each newly created object, initialize these with the lower-bound value, and make the pointer tagged with the upper bound:

1
2
3
4
void *malloc(int size) {
void *p = malloc_real(size + 4);
return specify_bounds(p, p + size);
}

Note that there is no need to instrument free as the 4 bytes of metadata are removed together with the object itself.

Run-time bounds checks. SGXBOUNDS inserts run-time bounds check before each memory access: loads, stores, and atomic operations. For this, first the original pointer and the upper and lower bounds are extracted. To extract the original pointer:

1
2
3
void *extract_p(void *tagged) {
return tagged & 0xFFFF'FFFF
}

To extract the upper bound”

1
2
3
void *extract_UB(void *tagged) {
return tagged >> 32;
}

To extract the lower bound which stored in the padded region:

1
2
3
void *extract_LB(void *UB) {
return *UB;
}

Finally, the bound check:

1
2
3
4
5
6
bool bounds_violated(void *p, void *LB, void * UB) {
if (p < LB || p > UB) {
return true;
}
return false;
}

Pointer arithmetic. SGXBOUNDS instruments pointer arithmetic so that only 32 low bits are affected:

1
2
3
UB = extract_UB(si)
si = s + i
si = (UB << 32) | extract_p(si)

Type casts. Pointer-to-integer and integer-to-pointer casts are a curse for fat/tagged pointer approaches.

Function calls. SGXBOUNDS does not need to instrument function calls or altering calling conventions. The only uninstrumented code is the libc, for which we provide wrappers. This implies that any tagged pointer passed as a function argument will be treated as a tagged pointer in the callee.

Advanced Features of SGXBOUNDS

Multi-threading support

AddressSanitizer does not require any specific treatment of multi-threading, but it can negatively affect cache locality if a multi-threaded application was specifically designed as cache-friendly.

All fat-pointer or disjoint-metadata techniques similar to Intel MPX suffer from multi-threading issues. An update of a pointer and its associated metadata must be implemented as one atomic operation.

SGXBOUNDS does not experience this problem. Indeed, the pointer and the upper bound are always updated atomically since they are stored in the same 64-bit tagged pointer.

Tolerating bugs with boundless memory

To allow applications to survive most bugs and attacks and continue correct execution, SGXBOUNDS reverts to failure-oblivious computing by using the concepts of boundless memory blocks. In this case, whenever an out-of-bounds memory access is detected, SGXBOUNDS redirects this access to a separate “overlay” memory area to prevent corruption of adjacent objects, creating the illusion of “boundless memory allocated for the object.

Consider an example of a classic off-by-one error, SGXBOUNDS will redirect to load and store to an overlay address, instead of a violation of accessing metadata.

Metadata management support

So far, we discussed only one metadata type kept per object — the lower bound. However, our memory layout allows us to add arbitrary number of metadata items for each object to implement additional functionality.

All instrumentation in SGXBOUNDS is implemented as calls to auxiliary functions, which we refer to as instrumentation hooks. One can think of these hooks as a metadata management API : (1) on_create() is called at run-tim whenever a new object is created. In the context of SGXBOUNDS, it corresponds to the specify_bounds() function which initializes our only metadata (lowerbound). (2) on_access() is called at each memory access, be it a write, read, or both (for atomic instruction such as compare-and-swap). In SGXBOUNDS, the hook roughly corresponds to the bounds_violated() function. (3) on_delete() is called whenever the object is deallocated, we support this hook only for the head objects.

function description
on_create(base, size, type) called after object creation (global, heap, or stack)
on_access(addr, size, metadata, access_type) called before memory access
on_delete(metadata) called before object destruction(only for heap)

Implementation

SGXBOUNDS implementation

SGXBOUNDS is a compile-time transformation pass implemented in LLVM 3.8.

Compiler Support. We treat inline assembly as an opaque memory instruction: all pointer arguments to inline assembly are bounds checked. SGXBOUNDS does not yet completely support c++ exception handling.

Run-time Support. We implement boundless memory feature completely in the run-time support library. To prevent data races, all read/update operations on the cache are synchronized via a global lock. For the tagged pointer scheme, SGXBOUNDS relies on SGX enclaves (thus the virtual address space) to start from 0x0. To allow this, we set Linux security flag vm.mmap_min_addr to zero for our applications. We also modified the original Intel SGX driver to always start the enclave at address 0x0.