Paolo PeregoFollowSpecialista di sicurezza applicativa e certificato OSCE e OSCP, amo spaccare e ricostruire il codice in maniera sicura. Sono cintura nera di taekwon-do, marito e papà. Ranger Caotico Neutrale, scrivo su @codiceinsicuro.
Assignment #6: Generate polymorphic shellcodes
7890
parole - Lo leggerai in 43 minuti
The sixth assignment for SLAE certification is to take up 3 shellcodes from
Shell-Storm and create polymorphic versions of them to beat pattern matching.
The polymorphic versions cannot be larger 150% of the existing shellcode. Bonus
points for making it shorter in lenght than original
The assignment was written on an Ubuntu Linux 18.04, with a Linux kernel 4.15
version.
Definition
A polymorphic shellcode is a a code derivated by an original code with some
instructions substituted with equivalent piece of code.
As example, this piece of code
can be rewritten as
The outcome of the second assembly fragment is to put 0x11 in the EAX register
in an indirect way.
The purpose of such rewriting is to bypass signature based antivirus checks and
having our shellcode to be executed on our target system. The goal is to have a
different code on every execution but equivalen in terms of effect.
If pid equals -1, then sig is sent to every process for which the calling >
process has permission to send signals, except for process 1 (init), but see
below.
So, our purpose is to write a C piece of code that execute a kill(-1, 9) since
9 is the SIGKILL signal.
Original code is 11 byte long. For the assignment engagement rule, the modified
code can’t be longer than 150% of the orignal one, that it is 16 bytes (and an
half :-)).
To create a first alternate version, we can separate shellcode parts in smaller ones and then rewrite in an alternative way:
set EAX to 0x25 (that is 37 in decimal, the code to kill(2) system call as
defined in /usr/include/i386-linux-gnu/asm/unistd_32.h)
set EBX to -1, since we want to launch the signal to every process
set ECX to 9, since it is the code for the SIGKILL signal (as defined in
/usr/include/asm-generic/signal.h)
call INT 0x80
This first kill_em_all variant is 18 bytes long, so it doesn’t fit my
assignment rules and it can’t be a feasible solution.
I’ll change then the byte assignment, not using a define byte approach but a
direct load, instead:
This shellcode, when compiled into binary, is only 13 bytes long that is inside
our engagement rules.
Now with those two fixes values loaded into cl and al it’s easy to derive other
variations. We can start from this to create some other polymorphic versions,
using some strategies like:
swap system call parameter positions
add some NOPs
perform NOP equivalent operations
use math to calculate values to fill EAX, EBX and ECX to reach the desired
value with some more clock cycles
Kill shake
Kill shellcode can be divided in three atomic patterns that can be swapped each other:
XOR-ing EBX and decrementing it
MOV AL to 0x9
MOV CL to 0x25
Given this, the following python snippet realize the swapping strategy for our kill -9 -1 shellcode:
As you can see in the following picture, some polymorphic variant of the
original shellcode can be easily generated, without penalty on lenght:
NOP dope
Another transformation strategy can be adding NOPs. However, in order to have
some more magic, I’ll mix the NOPs strategy with shuffling.
I’ll add a number of NOPs that let our shellcode to stay under 16 bytes in
lenght.
Some quick improvements
With only three words that they can be swapped each other, there is no too much
space for entropy. Let’s split the original shellcode into two main section:
initialization
system call setup
With this approach, the two strategies described before will result in:
Wasting CPU time
As a PoC for a polymorphic generator script, I wrote a bunch of instruction
that they can be placed everywhere in the shellcode without interfering with
the final result.
This is the python script filling the shellcode with some NOP equivalent
instruction. Of course, to meet our SLAE assignment rule in size, we can select
only 3 of them, but in a real case scenario it will be fun feeding a buffer
with a very large nonsense opcode list.
Doing math…
Using math to substitute MOV instructions is too much expensive in terms of
byte added to original shellcode and it could not be considered as a
successfull strategy to deal with the < +150% rule engagement.
However, in a real case scenario, it can be a feasible way to make reversing
much difficult if there are no constraints in target buffer size.
So please, not consider this for SLAE but only for fun. I just implemented, as
PoC a math strategy based on DIV instruction. This piece of python code will
create the assembly code needed to load DL with a random number between 1 and
FF since we want to store DIV result in AL and the reminder, 0, to AH. We will
load AX with the target number multiplied by this random number and then call
“DIV DL”.
Since the division reminder is 0, there is no need to XOR the register to init
it to 0.
This line will create the kill them all shellcode with the div strategy.
Please note that we first invoke the CL set and then AL value.
The polymorphic generator for kill(-9, -1)
Here it is the python script using the described strategies to create a
polymorphic version of kill -9, -1 shellcode.
Some shellcodes
Here it is some shellcodes created using the aforementioned script.
It is a 30 bytes shellcode for a partially obfuscated execve() call.
The basic idea is to use the call trick, as seen on assignment
5. After
setting EAX to be filled with 0x11, that it is the opcode for execve()
shellcode, there is a JUMP on a CALL instruction that set the EIP to a pop
$esi, since in that register it will be stored the memory address of the
executable to be called, “/bin/sh” that seems to be obfuscated, this way.
First variation
The basic idea here is that:
EAX must be filled with 0xb;
EBX must point to a location filled with “/bin/sh”;
ECX and EDX can be set to 0.
As we did it for the kill(), we can split the shellcode into pieces so that it
would be easier to write variations.
For SLAE engagement rules, we must create shellcodes not longer than 45 bytes.
First step, let’s init registers.
After this, there is the “call” trick:
And then, after the last call, there is the “/bin//sh” payload encoded as they
were assembly opcodes.
The execve() shellcode variant is 25 byte long, instead of 30 bytes of the
original shellcode taken from the website.
The disassembled executable is:
To fulfill our constraints, we’ve got 20 bytes to spend in NOP equivalent
instructions. We will use the same approach as used as in the kill example.
Shake execution
The first polymorphic strategy is to swap system call parameter position. For such a reason, we will change the init part of our assignment first solution, in order to have an indipendent init routine for EDX. This way, the number of changes we can do it, it will raise.
Let’s rewrite the first block this way:
However, since the init part is very small here and we’ve got only one register
to setup, there are only 3 possible shellcodes using this strategy:
(
The other shellcode sections can’t be shuffled, so this is not a wise strategy
to create different polymorphic version of our obfuscated execve shellcode.
Here there are some shellcode generated this way:
NOP dope for exec
Here we’ve got 10 bytes to waste and a single NOP is 1 byte opcode, so we can
add \x90 ten times and in a very large number of ways.
However, please note that this strategy is very easy to detect and defeat.
Adding tons of NOPs make a clever analyist to easily detect there is some other
stuff under the wood.
However, since this is just a PoC for an exam, let’s use this strategy also for
this shellcode.
After generated a shellcode and tried with my C skeleton program, no shell
popped out.
Lucky enough we’ve got GDB in help to understand why.
As you can see from the picture, the NOPs we inserted they break our jump and
call offsets. In particular, the “\xeb\x03| opcode, make a 3 byte relative jump
because the shell label was just a “\x5b\xcd\x80” bytes away. We have to adjust
the \xeb parameter accordingly to have our “jmp shell” logical step to work as
expected.
Now that the jump works as expected, our shellcode still goes in an infinite
loop.
After a GDB session, that turns to be my best friend in those latest days, it
turns that I made a big logical mistake. The “\x5b” instruction, pop $ebx, must
be executed right after the call and the added NOPs break also the relative
jump back, so eventually some generated shellcodes don’t execute the pop
instruction.
The call instruction, from the documentation founds on the Internet, add a signed offset to the instruction following the call itself, to jump at the label we want to reach.
In our original shellcode we’ve got:
The “\xe8” is the opcode for the CALL instruction and the other part is a 32
bit signed offset, relative to the instruction following the CALL.
We need to make a jump long 7 bytes in a backward direction, that is
\xFF-\x07=\xF8
Then we need to adjust our jump accordingly. When we corrected the first JMP,
we want to reach the beginning of the call, now I want to reach the “\x5b”.
After all modifications, and some more GDB sessions to fight against offsets, it turned out that this is the correct JMP/CALL offset adjustment implementation:
Even more wasted CPU time
The third strategy I want to use in my exec shell shellcode polymorphic
generator, is to add some NOP equivalent operations.
The lesson learnt in the previous NOP dope chapter is that I have to deal
carefully with JMP and CALL offset tuning.
Please note how the opcode is xchg eax, eax is 0x90 like NOP. I choose not to
use PUSH and POP instructions, because not to mess it up with legal push and
pop I executed in the shellcode. Of course, it is possible to inject stack
operations preserving the ones we need to do, however I want to make it simple
just to show this assignment solution.
No black magic here. It’s the same approach used with NOPs but with some more
instructions having no effect on the program logical flow.
Do some math
Last strategy I would use to create polymorphic version of my first execve
variant is to do some math to fill registers with the desired values.
In this payload, we want to achieve the following register status:
EBX is filled up with a pointer, so I won’t use math for it.
As simple PoC for my polymorphic generator python script, I will use SUB
instruction for EAX electing a random number and subtracting the correspondant
to reach 0xb.
To init ECX and EDX to 0, I’ll use the following instructions:
The following python routine is the one used to create shellcode variations.
The code is very easy. The fork() system call has no parameter and its listed
with number 2 in the system call list.
The shellcode is only 7 bytes long (“\x6a\x02\x58\xcd\x80\xeb\xf9”), so our
variants can’t be longer than 10,5 bytes. This will limit us very hard when
creating the generator script.
First variation
The idea behind this variation is super easy: put in EAX the value 2 and call
INT 0x80 then repeat in a loop the whole process.
The first easy variation is not using the push/pop technique to init EAX, but
XOR-ing the register and moving 0x2 instead.
This results in a 8 bytes shellcode, so just 1 bytes more than the original one:
Unfortunately, the shellcode is too short to shuffle operands. So this strategy
doesn’t work this time.
Add some NOPs
Even this strategy lacks of interest. In our variation, the MOV instruction
must be executed after the XOR and both of them, they must be executed
right before the INT.
So we can inject some NOPs in between each instructions but creating a sled,
since it would be easy for an analyst to detect.
Add some NOP equivalent instruction
To solve this, I used the same NOP equivalent instructions used in execve()
shellcode.
The python script used to create polymorphic versions using this strategy is
this one. A big improvement can be rewriting the JMP instruction with some jump
equivalent, maybe adding some always true condition.
Do some math
For the math strategy, I used the same approach used in execve example, adding
some add and sub instructions for EBX, ECX and EDX registers that doesn’t need
to have a meaningful value.
This strategy can lead our shellcode to be very large, so it can’t be
useful due to engagement rules.
Anyway, it is an interesting PoC and it leads to a very large number of
shellcode variations.
The python method implementing this strategy is the following. In this case,
we’ve got a lot of freedom in choosing values for EBX, ECX and EDX.
Lessons learnt
This assignment was the most complex one in terms of solutions. Creating a
single poliymorphic variation was easy, but generating some arbitrary
shellcodes requires a lot of attentions in not mangling program logic flow.
Another important thing I learnt in this assignment is that, GDB is one of my
best friends and the .gdbinit file with the hook-stop function is one of the
customization you have to care most of it.
SLAE Exam Statement
This blog post has been created for completing the requirements of the SecurityTube Linux Assembly Expert certification:
Simili a "Assignment #6: Generate polymorphic shellcodes"
Se questo post ti è piaciuto, sono abbastanza sicuro che troverai questi contenuti altrettanto interessanti. Buona lettura.
Episodio 32: Quando l'EDR fa crock
Introduzione
Ciao caro lettore. Ero come al solito in ritardo nella creazione di questo
numero della newsletter di cybersecurity più aperiodica dell’universo, quando
Internet si è rotta ancora.
Eh già… in questi mesi ho dato anima e corpo al canale YouTube ed ho trascurato un po’ il mio blog. Questa però è una delle cose che voglio prima raccontare qui, nella mia versione digitale di un Bullet Journal.