Welcome to Hexo! This is your very first post. Check documentation for more info. If you get any problems when using Hexo, you can find the answer in troubleshooting or you can ask me on GitHub.
Intel MPX is designed to allow a system to run both Intel MPX enabled software and legacy software. MPX enabled application can link with, call into, or be called from legacy software. MPX associates bounds with pointers in a novel manner, and uses bounds to check that pointer based accesses are suitably constrained. Futhermore, programmers can selectively use MPX to protect a subset of pointers.
The initial goal of MPX is twofold: 1) provide means to defend a system against attacks, 2) provide means to pinpoint accidental logic defects in pointer usage.
Programming Environment
Intel MPX can be enabled for user mode and supervisor mode. And, it is designed to allow software to associate bounds with pointers, and allows software to check memory references against the bounds to prevent out of bound access. The bounds registers hold lower bound and upper bound. An OOB access can causes a #BR (Bound Range Exceeded) exception.
The BNDLDX and BNDSTX instructions each take an operand whose bits are used to travese data structure in memory. In 64-bit mode, these instructions operate only on the lower bits in the supplied 64-bit addresses. The number of bits used is 48 plus a value called the MPX address-width adjust(MAWA) which depends on CPL (if CPL < 3 then MAWA = 0 else MAWA = CPUID.(EAX=07H,ECX=0H):ECX.MAWAU[bits 21:17]).
Bounds Registers
Intel MPX architectrure defines four registers BND0-3, each of which stores a pair of 64-bit lower bound (LB) and upper bound (UB). The bounds are inclusive. The upper bounds are architecturally represented in 1’s complement form. Thus, lower bound = 0, upper bound = 0 will allow access entire address space.
Configuration and Status Registers
Intel MPX defines two configuration registers for user mode (BNDCFGU) and supervisor mode (BDNCFGS), respectively. Also, Intel MPX defines a status register (BNDSTATUS) primarily used to communicates states information for #BR exception.
Intel MPX Instruction Summary
Usage and Examples
BNDMK is typically used after memory is allocated for a buffer. However, many other usages are possible such as when accessing an array member of a structure.
1 2 3 4 5 6 7 8
// assume the array A is allocated on the stack at 'offset' from `RBP` int A[100];
// the instruction to store starting address of array will be LEA RAX, [RBP+offset] // the instruction to create the bounds for array A will be BNDMK BND0, [RAX+399] // store RAX into BND0.LB and ~(RAX+399) into BND0.UB
BNDMOV is typically used to copy bounds from one bound register to another when a pointer is copied from one GP register to another, or to spill/fill bounds into memory corresponding to a spill/fill of a pointer.
BNDCL/BNDCU/BNDCN are typically used before writing to a buffer but can be used in other instances as well. The pointer used to write to memory will be compared against lower bound. However, for upper bound check, the software must add the (operand size - 1) to the pointer before upper bound checking.
BNDSTX is used to store the bounds associated with a buffer and the “pointer value” of the pointer to that buffer onto a bound table entry via address translation using a two-level structure.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
// assume a buffer with bounds stored in `BND0`, the pointer to the buffer is in ESI, the following sequence will store the "pointer value" and the bounds into a a configured bound table entry.
// store the pointer value in the index register ECX MOV ECX, DWORD ptr [ESI]
// store the pointer in the base register EAX MOV EAX, ESI
// perform address translation from the linear address of the base EAX and store bounds and pointer value ECX onto a bound table entry BNDSTX DWORD ptr [EAX+ECX], BND0
// Similarly, to retrieve a buffer and its associated bonds from a bound table entry MOV EAX, DWORD ptr [EBX]
// perform address translation from the linear address of the base EBX, and loads bounds and pointer value from a bound table entry BNDLDX BND0, dword ptr [EBX+EAX]
Loading and Storing Bounds in Memory
Intel MPX defines two instructions to load and store of the linear address of a pointer to a buffer, along with the bounds of the buffer into a data structure of extended bounds. When storing these extended bounds, the processor parses the address of the pointer (where it is stored) to locate an entry in a bound table in which to store the extended bounds.
An extended bound is a 4-tuple consisting of lower bounds, upper bound, pointer value and a reserved field. On 64-bit paging mode, a bound table entry is 4*64 bits (32 bytes). The linear address of a bound table is stored in a bound directory entry.The linear address of the bound directory is derived from either BNDCFU or BNDCFGS.
The bound directory and bound table are stored in application memory and are allocated by the application. In 64-bit mode, the size of a bound table is 4 MBytes (2 17 * 32 bytes), and the size of a bound directory is 2 1 + MAWA GBytes (2 28+MAWA).
Bounds in memory are associated with the memory address where the pointer is stored.
BNDLDX and BNDSTX in 64-Bit Mode
The linear address of the bound directory is derived from BNDCFGs. In 64-bit mode, each bound-directory entry (BDE) is 8 bytes. The number of entries in the bound directory is 228+MAWA.
In 64-bit mode, the processor uses the two-level structures to access extended bounds as follows:
In the first stage, the corresponding BD entry has to be loaded. For that, the CPU: (1) extracts the offset of BD entry from bits 20:47 of the pointer address and shifted it by 3 bits. (2) loads the base address of BD from the BDCFGx register, and (3) sums the base and the offset and loads the BD entry of the resulting address.
In the second stage, the CPU: (4) extracts the offset of BT entry from bits 3:19 of the pointer address and shifts it by 5 bits, (5) shifts the loaded entry by 3 to remove the metadata contained in the first 3 bits, and (6) sums the base and the offset and (7) finally loads the BT entry from the resulting address.
记得标注出这篇 paper 的 reference 中, 你没有读过的部分. 这能有效地帮助你了解背景知识
这个部分大致需要 1 小时地时间. 在走完这步后, 你应该能抓住 paper 的重点, 并向其它人简述它的结构与原理. 如果这篇 paper 不属于你的研究领域, 那么对它理解到这个程度是比较合适的.
有时候你在过完这部分之后还是不理解这篇 paper, 这可能是因为你之前没有接触过相关的内容, 不理解其中的一些术语和缩写. 或者作者可能使用了你不理解的证明或者实验方式. 还有可能是因为 paper 写得不够好. 你现在有 3 个选择: a) 不看它了, b) 重新读一些背景材料, c) 直接读 the third pass
The third pass
为了完全理解这篇文章, 需要 the third pass. 这一步的关键在于, 尝试虚拟复现 (virtually re-implement) 这篇文章, 即做出和作者相同的假设, 复现他的工作. 通过比较这篇文章实际的实现过程, 你不仅可以发现这篇 paper 的创新点, 还可以发现它隐藏的失误和假设.
我们需要专注于细节上, 质疑每条语句中的假设. 并且, 你应该提出一个自己的实现方案. 然后把你的实现方案和 paper 中的方案作比较, 这样你就能够了解这篇文章所使用的证明和技术, 并将它们加入你自己的技能库. 同时这样也可以为 future work 提供想法.
刚开始时, 这一步可能需要 4, 5 个小时, 到后面可能需要差不多 1 小时. 在最后, 你应该能够在脑海中重构这篇 paper 的整个结构, 并能识别出这篇 paper 的优点和缺点. 特别是能够精确识别隐藏的假设, 缺失的引用和实验中存在的问题.
Point analysis promises significant benefits for optimizing and parallelizing compilers. However, the analysis must be efficient without sacrificing the accuracy of results. Moreover, the pointer analysis must handle real C programs.
Our approach is based on the insight that procedures are typically called with the same aliases among their inputs. Our idea is to generate incomplete transfer functions that only cover the input conditions that exists in the program. These incomplete transfer functions are made up of simple partial transfer functions (PTFs) that are only applicable to calling contexts that exhibit certain alias relationships.
The example function int f(int *, int *, int *) illustrates two points. First, the alias among the inputs to a procedure determine its behavior. Second, even for this simple two-statement procedure the complete summary is fairly complex;
The main idea behind our algorithm is that we need not summarize a procedure for all possible aliases among its inputs. Instead, we only to find summaries that apply to the specific alias patterns that occur in the program.
Design of PTFs
There is a tradeoff between the complexity of the individual PTFs and their applicability. Whenever the analysis encounters another call to the procedure with the same input values, it can reuse the final values recorded in the PTF, which known as memorization.
Since it is not the specific input values but their alias patterns that determine the behavior of a procedure, we use symbolic names called extended parameters to abstract the input values. An extended parameter represents the locations reached through an input pointer at the beginning of a procedure.
Extended parameters’ three important roles:
Extended parameters make up part of the name space of a procedure.
The initial points-to function for a PTF specifies the input domain.
The final points-to function at the procedure exit summarizes the pointer assignments.
Besides the alias, the input domain of a PTF is also defined by the value of function pointers passed in and used in calls within the procedure. Because they determine what code could be executed.
Algorithm Outline
When we begin analyzing a procedure, the initial points-to function is empty and the parameter mapping only records the actual values for the formal parameters. When we need to know the initial value of an input pointer and it is not already recorded, we add an entry to the initial points-to function. To check for aliases, we need to look up the initial values in the calling context. The process continues recursively up the call graph until the pointer values are known.
When the iterative analysis encounters a procedure call, it needs to find the effects of the call on the points-to function. if the call is through a pointer passed as an input to one or more of the PTFs on the call stack, we add the potential values of that pointer to the specifications of the input domains for those PTFs. For each potential callee procedure, we first check if any of its PTFs apply. This involves building up a parameter mapping and comparing the input alias to those recorded for the PTF.
Pointer Representations
Instead of using types to identify locations that may contain pointers, we assume that any memory location could potentially hold a pointer. we also refer to locations within a block of memory by their positions, not by their field names. We define a new abstraction, the location set, to specify both a block of memory and a set of positions within that block.
We divide memory into blocks of contiguous storage, whose positions relative to one another are undefined. A block of memory may be one of three kinds: a local variable, a locally allocated heap block, or an extended parameter (global variables).
We distinguish heap blocks based on their allocation context, grouping together all the blocks allocated in each context. The minimal context information is simply the statement that creates the block.
Location Sets
Our goal with regard to aggregates is to distinguish between different fields within a structure but not the different element of an array. Our pointer analysis can be combined with data dependence tests to distinguish between different array elements.
A location set is a triple (b, f, s) where the base b is the name for a block of memory, f is an offset within that block, and s is the stride. That i, it represents the set of locations {f + is | i ∈ Ƶ} within block b.
Expression
Location Set
scalar
(scalar, 0, 0)
struct.F
(struct, f, 0)
array
(array, 0, 0)
array[i]
(array, 0, s)
array[i].F
(array, f, s)
struct.F[i]
(struct, f % s, s)
*(&p + X)
(p, 0, 1)
Extended Parameters
Every object is represented by at most one extended parameter. When adding an entry to the initial points-to function we first find the values of the pointer in the calling context. If the parameter mapping shows that an existing parameter already includes the same values, we reuse the old parameter instead of creating new one.
For simplicity and efficiency, each initial points-to entry points to a single extended parameter. In cases where the initial values are aliased with more than one parameter, we create a new extended parameter that subsumes all the aliased parameters, and we replace all references to the subsumed parameters. Likewise, when the initial values are aliased with an existing parameter but also include new values, we subsume the aliased parameter with new one.
Aliases involving fields of structures can be a bit more complicated. When updating the initial points-to function, if we find a pointer to a field of an existing parameter, we can easily record that fact. However, if we encounter a pointer to a field before a pointer to the enclosing structure, we will create an extended parameter to represent the field. We simply allow the location set offsets to be negative.
Points-to Functions
A points-to function is a mapping from location sets to sets of location sets. It is important for the analysis to know which locations may contain pointers. Since we do not use the high-level type, and since the points-to function only contain entries for pointers that have been referenced, we record separately for each block of memory the location sets that may hold pointers.
Within the pointer analysis itself, that information can enable strong updates, where assignments overwrite the previous contents of their destinations. We only need that information at the point where a pointer is dereferenced.
Analyzing a Procedure
1 2 3 4 5 6 7 8 9 10 11 12 13
intEvalProc(proc *pr, PTF *ptf){ /* iteratively analyze a procedure */ do { changed = false; foreach cfgnode nd in pr.flowgraph { if (no predecessors of nd evaluated) continue; if (nd is meet) EvalMeet(nd, ptf); if (nd is assign) EvalAssign(nd, ptf); if (nd is call) EvalCall(nd, ptf); } } while (changed) }
Strong Updates
Unless we know that an assignment will definitely occur and that it assigns to a unique location, we must conservatively assume that location potentially retains its old values.
Sparse Representation
Since we analyze heap data as well as global and stack variables, many possible memory locations could be included in the points-to functions. Fortunately, the information stored in the point-to function is very sparse.
Because of the sparse point-to function representation, looking up the values of a pointer requires searching back for the most recent assignment to that location. Beginning at the current node, we search back through the dominating flow graph nodes.
Evaluating Dereferences
Because location sets may overlap, more than one location set may refer to the same location. Value assigned to one location set must be observed by references to overlapping locations. Thus, when a pointer is dereferenced, we iterate through all of the overlapping locations and look up in the current points-to function the values that have been assigned to each one. However, if the location being referenced is a unique location, values assigned to overlapping locations may have been overwritten by a strong update.
Evaluating Assignments
An assignment node in a flow graph specifies both the source and destination expressions, as well as the size of the value to be assigned. When building the flow graph from the intermediate representation of a program, we automatically convert the assignments to a “points-to” form. That is, since a variable reference on the right-hand side if an assignment refers to the contents of that variable, we add extra dereference to each expression on the right-hand side.
The process of evaluating an assignment begins by finding the locations identified by the source and destination expressions.
Interprocedural Algorithm
To evaluate the effects of a call on the points-to function, we need to find a PTF that applies in the calling context.
Calls Through Pointers
The first step in evaluating procedure calls is to determine the target procedures. Function pointer values may be expressed in terms of extended parameters. This is one situation where we need to know the specific values represented by extended parameters.
Testing if a PTF Applies
The input domain of a PTF is specified by its initial points-to function and by the value of the parameters used as call targets. To test if a PTF applies, we check if these things are the same in the current context as they were when the PTF was created.
Basic block 之间的控制流转移指令叫做 COFI, 分为三种: 1. Direct transfer COFI, 2. Indirect transfer COFI, 3. Far transfer COFI.
同时, 按照不同的类型:
COFI Type
Instructions
Conditional Branch
Jcc , J*CXZ, LOOP
Unconditional Direct Branch
JMP (E9 xx, EB xx), CALL (E8 xx)
Indirect Branch
JMP (FF /4), CALL (FF /2)
Near Ret
RET (C3, C2 xx)
Far Transfers
INT1, INT3, INT n, INTO, IRE(T/TD/Q), JMP (EA xx, FF/5), CALL (9A xx, FF/3), RET (CB, CA xx), SYSCALL, SYSRET, SYSENTER, SYSEXIT, VMLANUCH, VMRESUME
Direct Transfer COFI
Direct Transfer COFI 是相对分支. 意味着他们的目标地址是当前的 IP 加上一个 offset. 因为二进制反汇编就可以得到直接跳转的 IP, 所以没有必要在 trace 中记录 IP. 对于条件分支, 只需要标记该分支是 taken or not 就行. 无条件分支都不需要任何记录.
Conditional Branch (Jcc, J*CXZ) and Loop: 编码一个 bit 的 TNT 就可以指示程序的控制流. 同时, 处理器会将多个 TNT bit 压缩成一个数据包.
Unconditional Direct Jumps: 不会生成 trace.
Indirect Transfer COFI
Indirect Transfer 指令会从寄存器或者内存地址中更新 IP, 所以 PT 就必须记录目的 IP, 使得 debugger 能够确定 COFI 的目的地址. 这个 IP 可能为 linear addres, 或 effective address.
Indirect Transfer 指令会生成一个包含目的地址的 Target IP Packet(TIP).
Near JMP Indirect and Near Call Indirect: 生成 TIP.
Near RET: 当 CALL 指令执行时, 它会将 CALL 后面那条指令的地址压栈. 在调用返回后, RET 指令会出栈这个返回i地址, 并 JMP 过去. 因为这个调用本身可能修改这个返回地址, 所以实际的返回地址在反汇编中仍不确定, 所以对于 RET 还是会生成一个 TIP 包.
RET Compression: 如果能保证 RET 的返回地址和压栈的返回地址完全一致, 可以仅生成一个 TNT 包表示走了这个分支. 不过这个得需要 decoder 作额外的处理.
Far Transfer COFI
所有改变 instruction pointer 的非 near jumps 指令都是 “far transfer”. 包括: 异常, 中断, trap, TSX abort, 和进行 far transfer 的指令.
所有的这些指令都会产生一个 TIP 数据包. 有一些 far transfer 在二进制代码中无法推断, 比如异步事件异常和中断, 这种会先产生 FUP 数据包提供事件源 IP, 再产生一个 TIP.
ContextEn =!((IA32_RTIT_CTL.OS =0AND CPL =0) OR (IA32_RTIT_CTL.USER =0AND CPL >0) OR (IS_IN_A_PRODUCTION_ENCLAVE) OR (IA32_RTIT_CTL.CR3Filter =1AND IA32_RTIT_CR3_MATCH does notmatch cr3))
IF (IA32_PERF_GLOBAL_STATUS.ToPA) Save IA32_RTIT_CTL Value; IF (IA32_RTIT_CTL.TraceEn) Disable Intel PT by clearing TraceEn; FI; IF (There is space available to grow the current ToPA table) Add one more ToPA entries after the last entry in the ToPA table; Point new ToPA entry address fields to new output region base; ELSE Modify an upcoming ToPA entry in the current table to have END = 1; IF (out put should translation to a new ToPA table) Point the address of the "END = 1" entry of the current table to the new table address base; ELSE /* make a circular*/ Point the address of the "END = 1" entry to the base of the current table; Modify the ToPA entry address fields for filled output regions to point to new, unused output regions; FI; FI; Restore saved IA32_RTIT_CTL.value; FI;
FUP only for asynchronous events. Order of middle packets may vary. PIP/VMCS/MODE only if the operation modifies the state tracked by these respective packets
TSX Update
MODE.TSX, and (FUP or none)
None
TIP, TIP.PGD, or none
FUP TIP/TIP.PGD only for TSX abort case
Overflow
OVF
PSB, PSBEND, or none
FUP or TIP.PGE
FUP if overflow resolves while ContextEn = 1, else TIP.PGE
switch (CurTok) { case tok_identifier: returnParseIdentifierExpr(); case tok_number: returnParseNumberExpr(); case'(': returnParseParenExpr(); case tok_if: returnParseIfExpr(); default: returnLogError("unknown token when expecting an expression"); }
// convert condition to a bool by comparing non-equal to 0.0 CondV = Builder.CreateFCmpONE(CondV, llvm::ConstantFP::get(TheContext, llvm::APFloat(0.0)), "ifcond");
// Create blocks for the then and else // cases. Insert the 'then' block at the // end of the function llvm::BasicBlock *ThenBB = llvm::BasicBlock::Create(TheContext, "then", TheFunction); llvm::BasicBlock *ElseBB = llvm::BasicBlock::Create(TheContext, "else"); llvm::BasicBlock *MergeBB = llvm::BasicBlock::Create(TheContext, "ifcont"); Builder.CreateCondBr(CondV, ThenBB, ElseBB);
// Emit then Value Builder.SetInsertPoint(ThenBB); llvm::Value *ThenV = Then->codegen(); if (!ThenV) { returnnullptr; }
llvm::Value *ForExprAST::codegen(){ // Emit the start code first, without 'variable' in scope llvm::Value *StartVal = Start->codegen(); if (!StartVal) { returnnullptr; }
// Make the new basic block for the loop header, // inserting after current block llvm::Function *TheFunction = Builder.GetInsertBlock()->getParent(); llvm::BasicBlock *PreheaderBB = Builder.GetInsertBlock(); llvm::BasicBlock *LoopBB = llvm::BasicBlock::Create(TheContext, "loop", TheFunction);
// Inserting an explict fall-through from current (PreheaderBB) to LoopBB. Builder.CreateBr(LoopBB);
// Start insertion to LoopBB Builder.SetInsertPoint(LoopBB); // Start the PHI node with one Incoming basic block llvm::PHINode *Variable = Builder.CreatePHI(llvm::Type::getDoubleTy(TheContext), 2, VarName.c_str()); Variable->addIncoming(StartVal, PreheaderBB);
// Within the loop, the variable is defined equal to // the PHI node, If it shadows an existing variable, // we have to restore it, so save it now. llvm::Value *OldVal = NamedValues[VarName]; NamedValues[VarName] = Variable;
// Emit the body of the loop. This, like any other // expr, can change the current BB. Note that we // ignore the value computed by the body, but don't // allow an error. if (!Body->codegen()) { returnnullptr; }
// Emit the step value llvm::Value *StepVal = nullptr; if (Step) { StepVal = Step->codegen(); if (!StepVal) { returnnullptr; } } else { // if not specified, use 1.0 StepVal = llvm::ConstantFP::get(TheContext, llvm::APFloat(1.0)); }
// add the step value llvm::Value *NextVal = Builder.CreateFAdd(Variable, StepVal, "nextvar");
// Compute the end condition llvm::Value *EndCond = End->codegen(); if (!EndCond) { returnnullptr; }
// Convert condition to a bool by comparing non-equal to 0.0 EndCond = Builder.CreateFCmpONE(EndCond, llvm::ConstantFP::get(TheContext, llvm::APFloat(0.0)), "loopcond");
// Create the "after loop" block and insert it llvm::BasicBlock *LoopEndBB = Builder.GetInsertBlock(); llvm::BasicBlock *AfterBB = llvm::BasicBlock::Create(TheContext, "afterloop", TheFunction);
// Insert the conditional branch into the end of LoopEndBB Builder.CreateCondBr(EndCond, LoopBB, AfterBB);
// Any new code will be inserted in AfterBB Builder.SetInsertPoint(AfterBB);
// Add a new entry to the PHI node for the backage Variable->addIncoming(NextVal, LoopEndBB);
// Restore the unshadowed variable if (OldVal) { NamedValues[VarName] = OldVal; } else { NamedValues.erase(VarName); }
voidInitializeModuleAndPassManager() { // open a new Module TheModule = std::make_unique<llvm::Module>("my cool jit", TheContext);
// Create a new pass manager attached to it TheFPM = std::make_unique<llvm::FunctionPassManager>(TheModule.get());
// Combine instructions to form fewer, simple // instructions. This pass does not modify the CFG. This // pass is where algebraic simplification happens. // (https://llvm.org/docs/Passes.html#instcombine-combine-redundant-instructions) TheFPM->addPass(llvm::InstCombinePass());
// This pass reassociates commutative expressions in an order // that is designed to promote better constant propagation, GCSE, LICM, PRE, etc. // (https://llvm.org/docs/Passes.html#reassociate-reassociate-expressions) TheFPM->addPass(llvm::ReassociatePass());
// This pass performs global value numbering to eliminate fully and partially redundant instructions. // It also performs redundant load elimination. // (https://llvm.org/docs/Passes.html#gvn-global-value-numbering) TheFPM->addPass(llvm::GVN());
// Performs dead code elimination and basic block merging. // (https://llvm.org/docs/Passes.html#simplifycfg-simplify-the-cfg) TheFPM->addPass(llvm::SimplifyCFGPass());
// Using PassBuilder and FunctionAnalysis Manager Register analysis passes ThePB = std::make_unique<llvm::PassBuilder>(); TheFAM = std::make_unique<llvm::FunctionAnalysisManager>(); ThePB->registerFunctionAnalyses(*TheFAM.get()); }
intmain() { #ifdef JIT // Initialize native target llvm::InitializeNativeTarget(); llvm::InitializeNativeTargetAsmPrinter(); llvm::InitializeNativeTargetAsmParser(); #endif InitTokPrecedence(); // prime the first token fprintf(stderr, "ready> "); getNextToken(); TheJIT = std::make_unique<llvm::orc::KaleidoscopeJIT>(); InitializeModuleAndPassManager(); MainLoop(); #ifdef JIT // Print out all of the generated code TheModule->print(llvm::errs(), nullptr); #endif return0; }
同时要在 InitializeModuleAndPassManager() 函数中设置 data layout:
1 2 3 4 5 6
// open a new Module TheModule = std::make_unique<llvm::Module>("my cool jit", TheContext); TheModule->setDataLayout(TheJIT->getTargetMachine().createDataLayout());
// Create a new pass manager attached to it TheFPM = std::make_unique<llvm::FunctionPassManager>(TheModule.get());
class Kaleidoscope 是一个简单的 JIT 构造类, 我们会在之后的章节中对它进行扩展. 目前的它提供的 API 非常简单: addModule() 给 JIT 添加了一个 Module, 使得函数能够执行; removeModule() 移除一个 Module, 释放相应的内存空间; findSymbol() 使我们能够通过 string name 来查找编译好代码的指针.
if (auto FnAST = ParseTopLevelExpr()) { if (FnAST->codegen()) { // JIT the module containing the anonymous expression, keeping a handle so we free it later auto H = TheJIT->addModule(std::move(TheModule)); // Optimization of passes InitializeModuleAndPassManager();
// Search the JIT for the __anon_expr symbol auto ExprSymbol = TheJIT->findSymbol("__anon_expr"); assert(ExprSymbol && "Function not found");
// Get the symbol's address and cast it to the right type (takes no arguments, returns a double) so we can call it as a native function. double (*FP)() = (double (*)())static_cast<intptr_t> (ExprSymbol.getAddress().get()); fprintf(stderr, "Evaluated to %f\n ", FP());
// Delete the anonymous expression module from the JIT TheJIT->removeModule(H); } }
std::unique_ptr<FunctionAST> ParseTopLevelExpr() { if (auto E = ParseExpression()) { // anonymous nullary function auto Proto = std::make_unique<PrototypeAST>("__anon_expr", std::vector<std::string>());
llvm::Function *FunctionAST::codegen() { // Transfer ownership of the prototype to the FunctionProtos // map, but keep a reference to it use below auto &P = *Proto; FunctionProtos[Proto->getName()] = std::move(Proto); // Check for an existing function from a previous `extern` declaration llvm::Function *TheFunction = getFunction(P.getName());
if (!TheFunction) { returnnullptr; } .... }
llvm::Value *CallExprAST::codegen() { // Lookup the name in the global module table llvm::Function *CalleeF = getFunction(Callee); ... }
llvm::Function *getFunction(std::string Name){ // First, check if the function has already been added to the // current mode if (auto *F = TheModule->getFunction(Name)) { return F; }
// If not exist, check wether we can codegen the declaration // from some existing prototype auto FI = FunctionProtos.find(Name); if (FI != FunctionProtos.end()) { return FI->second->codegen(); }
// if no existing prototype exists, return null returnnullptr; }
We present mid-fat pointers, an approach to support targeted but composable metadata-based software defenses. The key idea is to strike a balance between traditional fat pointers and low-fat pointers by preserving the pointer size but changing its representation with embed data. This is done by piggybacking on SFI to decode the pointers efficiently. In such a scenario, its use by mid-fat pointers incurs no additional overhead.
With mid-fat pointers, each pointer can embed the location of the metadata for the pointed memory object. Compared to recent generic metadata management schemes, we improve performance because we need fewer lookups as the metadata location is cached in the pointer and security because an out-of-bounds pointer is still associated with the same metadata.
Threat model
Since mid-fat pointers provide a general defense framework, the threat model depends on particular defenses deployed. With regards to these defense, we only assume they require an SFI baseline. As a result, the metadata is inherently protected against both arbitrary memory reads and writes.
Design
Pointer encoding
One core insight for our framework is that, given the presence of SFI to protect metadata from attacker, we can use the unused bit in pointers without an additional performance impact. In particular, whenever the program allocates memory, we store a pointer to the metadata for that chunk of memory in the high bits of the pointer. This design has three benefits: (1) it requires a full metadata lookup only once at allocation time, caching the more frequent lookups needed for defenses and increasing efficiency; (2) since we know the base address of newly allocated objects, we do not need range queries; and (3) pointers that go-out-of bounds still point to the metadata for the original object.
By default, we use the 32 lower bits of every pointer to encode where the data is (data pointer), and 32 higher bits to encode where the metadata is (metadata pointer). The downside is that we can use only 32-bit addresses out of the 48 bits implemented on x86-64, which restricts the address space to 4 GB. Note, however, that we could use fewer bits for the metadata pointer by assuming that every object is aligned on 8-byte boundaries.
Allocation instrumentation
We use a compiler pass that searches for calls to memory allocation functions and adds a call to hook our static libraries. The hook looks up metadata for the allocated pointer and adds the metadata pointer in the uppermost bits of the returned pointer.
Load/store instrumentation. As the program can only dereference encoded pointer safely after decoding, our framework includes a compiler pass that identifies load and store operation add adds the data pointer decoder to each instance. We effectively implement SFI, shielding all memory past the first 4GB from attacker.
Call instrumentation. While instrumented code can safely dereference encoded pointers due to automatic masking, protected applications may well use unprotected libraries and we must not pass encoded pointers to them to avoid segmentation faults. We therefore instrument calls to external functions by decoding any pointer-type parameters. Note that we can identify which are imported from libraries because our pass is part of LLVM’s link-time optimization. For indirect calls our instrumentation might not know if the current value of a function pointer is to an external call or not. Instead, when a function is address-taken we provide a replacement function that calls the original function with masked arguments.
Pointer comparison and cast instrumentation. Besides dereferencing operations, compatibility issues arise when comparing pointers, and in integer operations on pointers. We could use static analysis to determine which pointers could be used in integer arithmetic and exclude them from our tagging scheme and SFI.
Intel SGX provides an abstraction of secure enclave, which can be used to achieve shielded execution for unmodified legacy applications on untrusted infrastructure.
Shielded execution aims to protect confidentiality and integrity of applications when executed in an untrusted environment. The main idea is to isolate the application from the rest of the system, using only a narrow interface to communicate to the outside.
However, shield execution does not protect the program against memory safety attacks. To validate our claim, we reproduced publicly available memory safety exploits inside the secure enclave. These examples highlight that a single exploit can completely compromise the integrity and confidentiality properties of shield execution.
To prevent exploitation of these bugs, we experimented with two prominent software- and hardware based memory protection mechanisms in the context of shield execution: AddressSanitizer and Intel MPX, respectively.
But, both of them incur high performance overhead, due to additional metadata used to track object bounds.
In this paper, we present SGXBOUNDS. The SGXBOUNDS approach is based on a simple combination of tagged pointers and efficient memory layout to reduce overheads inside enclaves. In particular, we note that SGX enclave routine use only 32 lower bits to represent program address space and leave 32 higher bits of pointers unused. We utilize these high bits to represent the upper bound of the referent object (or more broadly the beginning of the object’s metadata area); the lower bound value is stored right after the object. Such metadata layout requires only 4 additional bytes per object and does not break cache locality — unlike Intel MPX and AddressSanitizer.
Futhermore, we show that our design naturally extends for: (1) “synchronization-free” support for multi-threaded application, (2) increased availability instead of usual fail-stop semantics by tolerating out-of-bounds accesses, (3) generic APIs for object’s metadata management to support new use-cases.
Background
SCONE is a shielded execution framework that enables unmodified legacy application to take advantage of the isolation offered by SGX. With SCONE, the program is recompiled against a modified standard C library (SCONE libc), which facilitates the execution of of system calls.
Clearly, the combination of SCONE and SGX is not a silver bullet: bugs in the enclave code itself can render these mechanisms useless.
Address Sanitizer is an extension to GCC and Clang/LLVM that detects the majority of object bounds violations. It keeps track of all objects, and checks whether the address is within one of the used objects on each memory access.
Intel MPX detects all possible spatial memory vulnerabilities including intra-object ones (When one member in a structure corrupts other members). The approach to achieving this goal is different from AddressSanitizer. Instead of separating objects by unaddressable redzones, MPX keeps bounds metadata of all pointers and check against these bounds on each memory access.
One major limitation of the current Intel MPX implementation is a small number of bounded registers. If an application contains many distinct pointers, it will cause frequent loads and stores of bounds in memory.
SGXBOUNDS
We built SGXBOUNDS based on the following three insights. First, shielded application memory (specifically, its working set) must be kept minimal due to the very limited EPC size in current SGX implementation. Second, applications spend a considerable amount of time iterating through the elements of an array, and a smartly chosen layout of metadata can significantly reduce the overhead of bounds checking. Third, we rely on the SCONE infrastructure with its monolithic build process: all application code is statically linked without external dependencies. The first and second insights dictate the use of per-object metadata combined with tagged pointers to keep memory overhead minimal.
Design overview
All modern SGX CPU operates in a 64-bit mode, meaning that all pointer are 64 bits in size. In SGX enclaves, however, only 36 bit of virtual address space are currently addressable. Thus, SGXBOUNDS relies on the idea of tagged pointers: a 64-bit pointer contains the pointer itself in its lower 32 bits and the referent object’s upper bound in the upper 32 bits.
The value stored in the higher 32 bits (UB) serves not only for the upper-bound check, but also as a pointer to the object’s other metadata (LB). The metadata is stored right after the referent object.
This metadata layout has important benefits: (1) it minimizes amount of memory for metadata, (2) it requires no additional memory accesses, (3) it alleviates problems of fat pointers concerning multi-threading and memory layout changes.
1 2 3 4 5 6 7 8 9 10 11 12 13 14
int *s[N], *d[N] s = specify_bounds(s, s + N) d = specify_bounds(d, d + N) for (i = 0; i < M; i++) si = s + i di = d + i sp, sLB, sUB = extract(si) if bounds_violated(sp, sLB, sUB) crash(si) val = load si dp, dLB, dUB = extract(di) if bounds_violated(dp, dLB, dUB) crash(di) store val, di
Design details
Pointer creation. Whenever an object is created, SGX-BOUNDS associates a pointer with the bound of this object.
For global and stack-allocated variables, we change their memory layout so they are padded with 4 bytes and initialize them at runtime. More specifically, we wrap such variables in two-member structures, e.g., int x is transformed into struct xwarp {int x; void* LB}:
For dynamically allocated variables, SGXBOUNDS wraps memory management functions to append 4 bytes to each newly created object, initialize these with the lower-bound value, and make the pointer tagged with the upper bound:
Note that there is no need to instrument free as the 4 bytes of metadata are removed together with the object itself.
Run-time bounds checks. SGXBOUNDS inserts run-time bounds check before each memory access: loads, stores, and atomic operations. For this, first the original pointer and the upper and lower bounds are extracted. To extract the original pointer:
To extract the lower bound which stored in the padded region:
1 2 3
void *extract_LB(void *UB){ return *UB; }
Finally, the bound check:
1 2 3 4 5 6
boolbounds_violated(void *p, void *LB, void * UB){ if (p < LB || p > UB) { returntrue; } returnfalse; }
Pointer arithmetic. SGXBOUNDS instruments pointer arithmetic so that only 32 low bits are affected:
1 2 3
UB = extract_UB(si) si = s + i si = (UB << 32) | extract_p(si)
Type casts. Pointer-to-integer and integer-to-pointer casts are a curse for fat/tagged pointer approaches.
Function calls. SGXBOUNDS does not need to instrument function calls or altering calling conventions. The only uninstrumented code is the libc, for which we provide wrappers. This implies that any tagged pointer passed as a function argument will be treated as a tagged pointer in the callee.
Advanced Features of SGXBOUNDS
Multi-threading support
AddressSanitizer does not require any specific treatment of multi-threading, but it can negatively affect cache locality if a multi-threaded application was specifically designed as cache-friendly.
All fat-pointer or disjoint-metadata techniques similar to Intel MPX suffer from multi-threading issues. An update of a pointer and its associated metadata must be implemented as one atomic operation.
SGXBOUNDS does not experience this problem. Indeed, the pointer and the upper bound are always updated atomically since they are stored in the same 64-bit tagged pointer.
Tolerating bugs with boundless memory
To allow applications to survive most bugs and attacks and continue correct execution, SGXBOUNDS reverts to failure-oblivious computing by using the concepts of boundless memory blocks. In this case, whenever an out-of-bounds memory access is detected, SGXBOUNDS redirects this access to a separate “overlay” memory area to prevent corruption of adjacent objects, creating the illusion of “boundless memory allocated for the object.
Consider an example of a classic off-by-one error, SGXBOUNDS will redirect to load and store to an overlay address, instead of a violation of accessing metadata.
Metadata management support
So far, we discussed only one metadata type kept per object — the lower bound. However, our memory layout allows us to add arbitrary number of metadata items for each object to implement additional functionality.
All instrumentation in SGXBOUNDS is implemented as calls to auxiliary functions, which we refer to as instrumentation hooks. One can think of these hooks as a metadata management API : (1) on_create() is called at run-tim whenever a new object is created. In the context of SGXBOUNDS, it corresponds to the specify_bounds() function which initializes our only metadata (lowerbound). (2) on_access() is called at each memory access, be it a write, read, or both (for atomic instruction such as compare-and-swap). In SGXBOUNDS, the hook roughly corresponds to the bounds_violated() function. (3) on_delete() is called whenever the object is deallocated, we support this hook only for the head objects.
function
description
on_create(base, size, type)
called after object creation (global, heap, or stack)
on_access(addr, size, metadata, access_type)
called before memory access
on_delete(metadata)
called before object destruction(only for heap)
Implementation
SGXBOUNDS implementation
SGXBOUNDS is a compile-time transformation pass implemented in LLVM 3.8.
Compiler Support. We treat inline assembly as an opaque memory instruction: all pointer arguments to inline assembly are bounds checked. SGXBOUNDS does not yet completely support c++ exception handling.
Run-time Support. We implement boundless memory feature completely in the run-time support library. To prevent data races, all read/update operations on the cache are synchronized via a global lock. For the tagged pointer scheme, SGXBOUNDS relies on SGX enclaves (thus the virtual address space) to start from 0x0. To allow this, we set Linux security flag vm.mmap_min_addr to zero for our applications. We also modified the original Intel SGX driver to always start the enclave at address 0x0.