Retrospective on r2wars 2018

September 10, 2018

Between 5th and 8th of September I attended r2con, the annual radare2 conference in Barcelona, Spain. This highly-technical, single-track event consisted of multiple training sessions, workshops, and talks, all revolving around topics such as reverse engineering, binary exploitation, malware analysis, software instrumentation, niche hardware platforms, and even chiptune music production. Check out the event page and the r2con2018 Github repository for more details.

This post is about my entries in the r2wars competition that took place during r2con2018. If you’re into wargames, CTFs, competitive programming, low-level programming tricks, or preparing for a future r2wars event, keep reading.

The Game

This is the “official” description of the game:

r2wars is a game similar to corewars, where 2 programs run on a shared memory space trying to catch each other in order to trash their code and make them crash.

More details can be found here, but the crux of the matter is the following:

The bots must be written in x86, ARM or MIPS assembly.
On every run, exactly 2 programs compete against each other.
Both programs share the same address space of 1024 bytes, and a separate but shared (I think) stack.
On every run, both programs are loaded at random locations within the shared address space.
The r2wars scheduler will execute one instruction from each program, on every cycle. Execution of complex instructions, such as the REP instruction family on x86, gets adjusted accordingly in order to reflect the “one operation per cycle” principle.
In order to win a run, a program must either overwrite the opponent, or simply outlive it by waiting for it to execute an illegal instruction or reference an illegal memory address.
There is a limit of 8000 cycles per run, which results in a draw when reached.
Every game iteration is a tournament, where the winners are the top 3 highest-scoring bots.

You’re doing it wrong!

If you’re a seasoned assembly programmer or spent the best years of your life optimizing shellcode, you’ll notice that these bots are far from optimal -and you’ll be right. All my submissions were hacked together during breaks, in crowded and noisy rooms, in an attempt to meet submission deadlines.

Development & Debugging

The r2wars environment that was used during the event can be found here. Unfortunatelly, it’s written in C# and I did not have the patience to resolve the dependencies and the build errors on my FreeBSD laptop.

I don't always test my code - But when I do, I do it in production

I also have to admit that my r2 skills are not something to write home about, so I used the following ghetto approach to develop and debug my bots:

Bang-out x86 instructions in vim
Encode them into hex-pairs:
```
$ rasm2 -b 32 -a x86 -f <filename>
```
Start radare2 with 1024 bytes of addressable memory, initialize ESIL, write the hex-pairs into the address space, initialize the stack and program counter registers, fire-up the debugger in visual mode:
```
$ radare2 -ax86 -b32 malloc://1024
[0x00000000]> aei
[0x00000000]> aeim
[0x00000000]> wx [hex-pairs] @100
[0x00000000]> aer PC=100
[0x00000000]> aer SP=200
[0x00000000]> V!
```
Step through the code using F8 while monitoring the content of the registers
Repeat Ad Nauseum

The Good One

After contemplating various strategies, I decided on a 2-stage offensive approach:

Copy the active payload to a known location within the 1KB address space.
Jump into the active payload and start trashing the address space, from top to bottom.

I used variations of this strategy in 4 out of 5 games in total. This bot made it into the winning triplet every time that it was used, so it’s safe to say it wasn’t a bad approach.

You can find all my bots on Github

Game #1 [2nd place]

This is what my initial submission looked like:

mov eax, 0xc3c3c3c3	; [1]
mov ebx, eax
mov ecx, eax
mov edx, eax
mov ebp, eax
mov edi, eax
mov esp, 1020		; [2]
mov esi, 1021		; [3]	
mov [esi], 0xe6ff60	; [4]
jmp esi			; [5]

My intent here is to keep the active payload as small and fast as possible. The payload consists of only 3 bytes, and it’s the following:

pushad	; 60
jmp esi	; ffe6

The pushad instruction is amazing. Not only you get to write 32 bytes worth of data (all x86 general-purpose registers) wherever your stack register is pointing, it also decreases the stack pointer by the amount of bytes it pushed, all in a single byte.

As you can see in [3][4][5], I copy the aforementioned active payload into addresses 1021-1023, set the esi register to always point to the active payload, and start executing it. I’ve set the stack pointer to point right below my active payload ([2]), so this loop will trash the entire address space, top-to-bottom, 32 bytes at a time. (Remember that x86 stack grows towards smaller addresses.)

Because of [1], the stack will be mostly full of 0xc3 bytes, which is the ret instruction in x86 ISA. So, if another x86 bot attempts to execute instructions within an area I already wrote-through, it will pop a random value from the stack and try to return there, which will most likely trigger an illegal memory access and result in a crash.

Now, the obvious problem with this bot is that it can trash the executable address space only once, before the stack pointer reaches 0 and wraps-up, triggerring memory access violation. So nothing fancy here, but it was a good start and it earned me the 2nd place in that round.

Game #2 [1st place]

In this iteration of the game I patched the aforementioned issue. This increased the size of my active payload and made my bot a bit slower, but now, as long as my active payload is not corrupted, my bot will not crash itself.

mov eax, 0xc3c3c3c3	; [1]
mov ebx, eax
mov ecx, eax
mov edx, eax
mov edi, 32		; [2]
mov ebp, 0x3fc		; [3]
mov edx, 1014		; [4]
mov esp, edx		; [5]
mov esi, 1015		; [6]
mov [esi],   0x7ffc3960	; [7]
mov [esi+4], 0xffd489fb	; [8]
mov [esi+8], 0xe6	; [9]
jmp esi			; [10]

The new active playload is now 9 bytes ([7][8][9]) and it lives at address range [1015-1023]. It consists of the following instructions:

pushad		; 60
cmp esp, edi	; 39fc
jg 0x3f7 	; 7ffb
mov esp, edx 	; 89d4
jmp esi 	; ffe6

The reason behind comparing registers to registers, instead registers to constant values, is that the opcodes for register-to-register comparison are shorter, resulting in a smaller payload.

I’m checking whether the stack pointer has surpassed the 32nd byte of the address space, since performing a pushad beyond this point will result in memory access violation. If that’s the case, I set the stack pointer back to its initial state -just below the active payload- and start trashing the address space once again.

Now, there are 2 issues with this approach:

I don’t overwrite the last 32 bytes of the address space, so a competitor can hide there and eventually take me out.
There are no measures to prevent other x86-based competitors from executing my payload. This happened a couple of times and resulted in time-outs and draws.

Game #3 aka “The One With The Stack” [3rd place]

My bot now trashes the entire address space, except the region of my active payload. It also trashes the active payload delivery sequence before jumping into the active payload. This prevents my opponents from successfully executing my code since their registers are out-of-sync with mine.

mov eax, 0xc3c3c3c3	; [1]
mov ebx, eax
mov ecx, eax
mov edx, eax
mov edi, 32
mov ebp, 0x3fc
mov edx, 1011
mov esi, 1012
mov [esi],   0x7ffc3960	; [2]
mov [esi+4], 0x60fc89fb	; [3]
mov [esi+8], 0xe6ffd489 ; [4]
call self_pwn		; [5]
self_pwn:		; [6]
pop esp			; [7]
sub esp, 8		; [8]
pushad			; [9]
mov esp, edx		; [10]
jmp esi			; [11]

The active payload this time is 12 bytes ([2][3][4]):

pushad		; 60
cmp esp, edi	; 39fc
jg 0x3f4 	; 7ffb
mov esp, edi 	; 89fc
pushad		; 60
mov esp, edx 	; 89d4
jmp esi 	; ffe6

Whenever my stack pointer ends-up below the first 32 bytes of the address space, I set it back to 32 and perfom a pushad which successfully overwrites the area that I missed earlier. Then I set it again just below my active payload, and start trashing the address space again.

I also overwrite the initial sequence that generates the active payload and jumps into it, in [5][6][7][8][9]. I use the call/pop esp sequence to acquire the address of that initial sequence, tweak the stack pointer, and proceed to trash it with pushad. Once this is done, I set the stack pointer back to its upper limit, jump into my active payload and start trashing the address space.

Now this is where things get rather dramatic. @Aissn and @kbeckmann come up with the same idea: use the stack for code execution. This increased the duration of the game from the typical runtime of 10m, to 1h 15m. No other bots were actively using the stack, so this game had a lot of time-outs and draws. The game ended with @aissn and @kbeckmann getting the first 2 places, and my bot was ranked 3rd.

After this, SkUaTeR patched the game environment and set the stack to be non-executable.

Game #4 [1st place]

The 4th game of r2wars took place on Saturday morning. I just used the same version and ranked 1st.

This was probably a good time to stop changing my bot. Instead, I rewrote my bot to use a completely different strategy. Although I can’t know for sure, I suspect this led to my demise during the final game. Which brings us to the next section…

The Not-So-Good One

Mistakes

At some point during the conference, @kbeckmann mentioned something about keeping the payload within general-purpose registers at all times, and somehow executing it. I decided to experiment with this idea. The bad news is that my bot ranked 4th in this game. The good news is that I suspect that I found bugs in ESIL, which was pretty much the entire point of r2wars.

Game #5 [4th place]

The main strategy behind this bot is the following:

Keep the active payload within edi and esi registers
Descend the address space 32 bytes at a time using pushad, and jump to esp, which will always be pointing into the contents of edi and esi following a pushad.

This approach overwrites the entire address space, and the active payload is constantly on the move. It’s like surfing a wave:

mov eax, 0xc3c3c3c3	; [1]
mov ebx, 1024		; [2]
mov ecx, 0		; [3]
mov edx, 0x90909000	; [4]
mov esi, 0xe4ff60e3	; [5]
mov edi, 0x440fcc39	; [6]
call pwn		; [7]
pwn:			; [8]
pop esp			; [9]
sub esp,4		; [10]
pushad			; [11]
mov esp, ebx		; [12]
pushad			; [13]
jmp esp			; [14]

I reused multiple ideas from my previous bots:

Filling the stack with ret instrictions: ([1])
Keeping the payload short by using register-to-register comparison and storing the limits in ebx, ecx ([2][3])
Trashing the initial stage to prevent enemy bots from executing my code ([7][8][9[10])

The active payload is the following:

cmp esp, ecx	; 39cc
cmove esp, ebx	; 0f44e3
pushad		; 60
jmp esp 	; ffe4

In this scenario, I start at the very top of the address space (1024), and trash it by 32 bytes at a time. Part of the data that I use to trash the address space is my actual payload, and esp is always pointing there, so every time I overwrite the address space I jump into esp, shifting my active payload by 32 bytes. Whenever the stack pointer reaches the address 0, I reset it back to 1024 with a cmove, and start all over.

Final Words

The r2wars competition was definitelly one of the highlights of r2con2018. The organization team was also very generous and rewarded r2wars competitors with some really cool radare2 merchandise.

Something that I didn’t pay attention to during the games but realized during this retrospective, is that the r2wars event was great demonstration of the power of the radare2 framework. Working through this competition with r2 and rasm2 was a breeze and saved me a ton of time and effort. I could just concentrate on what I was actually doing instead of wasting time and energy fighting my toolchain, which would be the case if I played this event using some combination of gdb, objdump, Python, C and mmap(3).

So yeah. Come and fight me on r2wars @ r2con2019.

fight me bruh

You can comment here.