The sixth assignment for SLAE certification is to take up 3 shellcodes from Shell-Storm and create polymorphic versions of them to beat pattern matching. The polymorphic versions cannot be larger 150% of the existing shellcode. Bonus points for making it shorter in lenght than original
The assignment was written on an Ubuntu Linux 18.04, with a Linux kernel 4.15 version.
A polymorphic shellcode is a a code derivated by an original code with some instructions substituted with equivalent piece of code.
As example, this piece of code
can be rewritten as
The outcome of the second assembly fragment is to put 0x11 in the EAX register in an indirect way.
The purpose of such rewriting is to bypass signature based antivirus checks and having our shellcode to be executed on our target system. The goal is to have a different code on every execution but equivalen in terms of effect.
If pid equals -1, then sig is sent to every process for which the calling > process has permission to send signals, except for process 1 (init), but see below.
So, our purpose is to write a C piece of code that execute a kill(-1, 9) since 9 is the SIGKILL signal.
Original code is 11 byte long. For the assignment engagement rule, the modified code can’t be longer than 150% of the orignal one, that it is 16 bytes (and an half :-)).
To create a first alternate version, we can separate shellcode parts in smaller ones and then rewrite in an alternative way:
set EAX to 0x25 (that is 37 in decimal, the code to kill(2) system call as defined in /usr/include/i386-linux-gnu/asm/unistd_32.h)
set EBX to -1, since we want to launch the signal to every process
set ECX to 9, since it is the code for the SIGKILL signal (as defined in /usr/include/asm-generic/signal.h)
call INT 0x80
This first kill_em_all variant is 18 bytes long, so it doesn’t fit my assignment rules and it can’t be a feasible solution.
I’ll change then the byte assignment, not using a define byte approach but a direct load, instead:
This shellcode, when compiled into binary, is only 13 bytes long that is inside our engagement rules.
Now with those two fixes values loaded into cl and al it’s easy to derive other variations. We can start from this to create some other polymorphic versions, using some strategies like:
swap system call parameter positions
add some NOPs
perform NOP equivalent operations
use math to calculate values to fill EAX, EBX and ECX to reach the desired value with some more clock cycles
Kill shellcode can be divided in three atomic patterns that can be swapped each other:
XOR-ing EBX and decrementing it
MOV AL to 0x9
MOV CL to 0x25
Given this, the following python snippet realize the swapping strategy for our kill -9 -1 shellcode:
As you can see in the following picture, some polymorphic variant of the original shellcode can be easily generated, without penalty on lenght:
Another transformation strategy can be adding NOPs. However, in order to have some more magic, I’ll mix the NOPs strategy with shuffling.
I’ll add a number of NOPs that let our shellcode to stay under 16 bytes in lenght.
Some quick improvements
With only three words that they can be swapped each other, there is no too much space for entropy. Let’s split the original shellcode into two main section:
system call setup
With this approach, the two strategies described before will result in:
Wasting CPU time
As a PoC for a polymorphic generator script, I wrote a bunch of instruction that they can be placed everywhere in the shellcode without interfering with the final result.
This is the python script filling the shellcode with some NOP equivalent instruction. Of course, to meet our SLAE assignment rule in size, we can select only 3 of them, but in a real case scenario it will be fun feeding a buffer with a very large nonsense opcode list.
Using math to substitute MOV instructions is too much expensive in terms of byte added to original shellcode and it could not be considered as a successfull strategy to deal with the < +150% rule engagement.
However, in a real case scenario, it can be a feasible way to make reversing much difficult if there are no constraints in target buffer size.
So please, not consider this for SLAE but only for fun. I just implemented, as PoC a math strategy based on DIV instruction. This piece of python code will create the assembly code needed to load DL with a random number between 1 and FF since we want to store DIV result in AL and the reminder, 0, to AH. We will load AX with the target number multiplied by this random number and then call “DIV DL”.
Since the division reminder is 0, there is no need to XOR the register to init it to 0.
This line will create the kill them all shellcode with the div strategy. Please note that we first invoke the CL set and then AL value.
The polymorphic generator for kill(-9, -1)
Here it is the python script using the described strategies to create a polymorphic version of kill -9, -1 shellcode.
Here it is some shellcodes created using the aforementioned script.
It is a 30 bytes shellcode for a partially obfuscated execve() call.
The basic idea is to use the call trick, as seen on assignment 5. After setting EAX to be filled with 0x11, that it is the opcode for execve() shellcode, there is a JUMP on a CALL instruction that set the EIP to a pop $esi, since in that register it will be stored the memory address of the executable to be called, “/bin/sh” that seems to be obfuscated, this way.
The basic idea here is that:
EAX must be filled with 0xb;
EBX must point to a location filled with “/bin/sh”;
ECX and EDX can be set to 0.
As we did it for the kill(), we can split the shellcode into pieces so that it would be easier to write variations. For SLAE engagement rules, we must create shellcodes not longer than 45 bytes.
First step, let’s init registers.
After this, there is the “call” trick:
And then, after the last call, there is the “/bin//sh” payload encoded as they were assembly opcodes.
The execve() shellcode variant is 25 byte long, instead of 30 bytes of the original shellcode taken from the website.
The disassembled executable is:
To fulfill our constraints, we’ve got 20 bytes to spend in NOP equivalent instructions. We will use the same approach as used as in the kill example.
The first polymorphic strategy is to swap system call parameter position. For such a reason, we will change the init part of our assignment first solution, in order to have an indipendent init routine for EDX. This way, the number of changes we can do it, it will raise.
Let’s rewrite the first block this way:
However, since the init part is very small here and we’ve got only one register to setup, there are only 3 possible shellcodes using this strategy:
The other shellcode sections can’t be shuffled, so this is not a wise strategy to create different polymorphic version of our obfuscated execve shellcode.
Here there are some shellcode generated this way:
NOP dope for exec
Here we’ve got 10 bytes to waste and a single NOP is 1 byte opcode, so we can add x90 ten times and in a very large number of ways.
However, please note that this strategy is very easy to detect and defeat. Adding tons of NOPs make a clever analyist to easily detect there is some other stuff under the wood.
However, since this is just a PoC for an exam, let’s use this strategy also for this shellcode.
After generated a shellcode and tried with my C skeleton program, no shell popped out. Lucky enough we’ve got GDB in help to understand why.
As you can see from the picture, the NOPs we inserted they break our jump and call offsets. In particular, the “xebx03| opcode, make a 3 byte relative jump because the shell label was just a “x5bxcdx80” bytes away. We have to adjust the xeb parameter accordingly to have our “jmp shell” logical step to work as expected.
Now that the jump works as expected, our shellcode still goes in an infinite loop.
After a GDB session, that turns to be my best friend in those latest days, it turns that I made a big logical mistake. The “x5b” instruction, pop $ebx, must be executed right after the call and the added NOPs break also the relative jump back, so eventually some generated shellcodes don’t execute the pop instruction.
The call instruction, from the documentation founds on the Internet, add a signed offset to the instruction following the call itself, to jump at the label we want to reach.
In our original shellcode we’ve got:
The “xe8” is the opcode for the CALL instruction and the other part is a 32 bit signed offset, relative to the instruction following the CALL.
We need to make a jump long 7 bytes in a backward direction, that is xFF-x07=xF8
Then we need to adjust our jump accordingly. When we corrected the first JMP, we want to reach the beginning of the call, now I want to reach the “x5b”.
After all modifications, and some more GDB sessions to fight against offsets, it turned out that this is the correct JMP/CALL offset adjustment implementation:
Even more wasted CPU time
The third strategy I want to use in my exec shell shellcode polymorphic generator, is to add some NOP equivalent operations.
The lesson learnt in the previous NOP dope chapter is that I have to deal carefully with JMP and CALL offset tuning.
Please note how the opcode is xchg eax, eax is 0x90 like NOP. I choose not to use PUSH and POP instructions, because not to mess it up with legal push and pop I executed in the shellcode. Of course, it is possible to inject stack operations preserving the ones we need to do, however I want to make it simple just to show this assignment solution.
No black magic here. It’s the same approach used with NOPs but with some more instructions having no effect on the program logical flow.
Do some math
Last strategy I would use to create polymorphic version of my first execve variant is to do some math to fill registers with the desired values.
In this payload, we want to achieve the following register status:
Unfortunately, the shellcode is too short to shuffle operands. So this strategy doesn’t work this time.
Add some NOPs
Even this strategy lacks of interest. In our variation, the MOV instruction must be executed after the XOR and both of them, they must be executed right before the INT.
So we can inject some NOPs in between each instructions but creating a sled, since it would be easy for an analyst to detect.
Add some NOP equivalent instruction
To solve this, I used the same NOP equivalent instructions used in execve() shellcode.
The python script used to create polymorphic versions using this strategy is this one. A big improvement can be rewriting the JMP instruction with some jump equivalent, maybe adding some always true condition.
Do some math
For the math strategy, I used the same approach used in execve example, adding some add and sub instructions for EBX, ECX and EDX registers that doesn’t need to have a meaningful value.
This strategy can lead our shellcode to be very large, so it can’t be useful due to engagement rules.
Anyway, it is an interesting PoC and it leads to a very large number of shellcode variations.
The python method implementing this strategy is the following. In this case, we’ve got a lot of freedom in choosing values for EBX, ECX and EDX.
This assignment was the most complex one in terms of solutions. Creating a single poliymorphic variation was easy, but generating some arbitrary shellcodes requires a lot of attentions in not mangling program logic flow.
Another important thing I learnt in this assignment is that, GDB is one of my best friends and the .gdbinit file with the hook-stop function is one of the customization you have to care most of it.
SLAE Exam Statement
This blog post has been created for completing the requirements of the SecurityTube Linux Assembly Expert certification: