RCE Without Native Code: Exploitation of a Write-What-Where in Internet Explorer
May 21, 2019 | Simon ZuckerbraunOn the last day of 2018, I discovered a type confusion vulnerability in Internet Explorer that yields a clean write-what-where primitive. It patched this April as CVE-2019-0752. As an exercise, I wrote a full exploit for this vulnerability using an original exploitation technique. Even though the vulnerability itself produces only a controlled write and cannot be triggered to produce an info leak, nevertheless there is a direct and highly reliable path to code execution. Furthermore, the exploit uses no shellcode. In this article, please join me for a tour of the details of both the vulnerability and the exploit I wrote to demonstrate it.
Background
At an emulation level of IE=8 or lower, Internet Explorer executes DOM methods and properties via the IDispatchEx
mechanism. Although this is the most natural implementation choice, it leaves much to be desired in terms of performance. To help alleviate this performance bottleneck just a bit, there is a “fast path” implemented for a subset of DOM properties and methods. These are invoked through function pointers found within the static table mshtml!_FastInvokeTable
. When available, the fast path provides a modest speed-up by avoiding part of the usual dispatch machinery. The following decompilation is from the implementation of IDispatchEx::InvokeEx
in mshtml!CBase::ContextInvokeEx
:
As seen in the snippet above, the fast invoke mechanism will not be used if the operation requested is a property put. The apparent reason for this is that _FastInvokeTable
can contain only one entry for a given method or property, and it was decided that it in the case of a property, it would point to the more frequently-called property getter, as opposed to the setter.
The Vulnerability
The vulnerability in the code shown above stems from the fact that IDispatchEx
allows two different kinds of property puts. The typical property put assigns a scalar value into a property, for example, an integer or string. This operation type is indicated by the flag DISPATCH_PROPERTYPUT
, having a value of 0x4
. The second type of property put operation is one that assigns an object reference into the property. This is indicated by a call with the flag value DISPATCH_PROPERTYPUTREF
, having a value of 0x8
. Somewhat confusingly, the flag values are defined as if these are two unrelated operation types, so testing for the presence of the DISPATCH_PROPERTYPUT
bit fails to detect putref-type operations. Hence, in the code shown above, an operation of type DISPATCH_PROPERTYPUTREF
will be mistakenly routed via the _FastInvokeTable
entry for the property, which contains a pointer to the property’s get method. The get method and the put method will surely have different function signatures, so type confusion occurs on the value passed for assignment into the property.
What happens next depends on the signatures of the confused get/put functions that correspond to the particular property being called. I found three subcases for possible function signatures, as follows:
In each case, we have the ability to cause the get method to be invoked in place of the put method.
In case 1, there is no security implication. The function get_className_direct
will be invoked, and for its out parameter, which has type BSTR *
, there will instead be passed a value of the incompatible type BSTR
. When get_className_direct
executes, it instantiates a new BSTR
to hold the result of the get operation and writes a pointer to this new string at the address specified by the BSTR *value
parameter. In our case, the effect is to overwrite the first four bytes of character data of the supplied BSTR
. Aside from overwriting this character data, no other memory corruption takes place. Note that a 4-byte pointer value is never large enough to overflow the character data portion of a BSTR
allocation and infringe upon an adjacent memory allocation. Furthermore, script cannot access the corrupted string data for the purpose of information leakage, because the BSTR
passed to get_className_direct
is a temporary BSTR
. It is not the same BSTR
allocation that script can later access. Therefore case 1 is not exploitable.
Case 2 offers little more. The object assigned by means of the property put will be passed as struct tagVARIANT value
, but since the get
method will be invoked instead, the first four bytes of the tagVARIANT
structure will be interpreted as a VARIANTARG*
, a pointer to a VARIANTARG
structure to be filled in with a result value. It might be possible to exert partial control over the first 4 bytes of the tagVARIANT
to make it equal to an address pointing to data we wish to corrupt. Nevertheless, exploitation is virtually impossible due to the fact that the confused get and put functions in this case have different total stack argument sizes. When the getter returns, the stack pointer will not be adjusted properly. The caller will detect this discrepancy immediately and safely shut down the process.
Case 3, by contrast, offers excellent exploitability. The value passed in when setting the property will be passed to CElement::get_scrollLeft
, which will interpret it as an int*
indicating where to write the result. Thus, the current value of scrollLeft
will be written to memory at an address of our choosing. Afterwards, control returns cleanly to script. This gives the attacker a clean write-what-where primitive. The only restriction seems to be that the value of scrollLeft
cannot be set to anything greater than 0x001767dd
, so this is the maxiumum DWORD
value we can write. As we will see, this will not pose a serious difficulty.
The following PoC demonstrates the write-what-where primitive as described above. Note the use of VBScript. As far as I know, this is the only way to produce the needed DISPATCH_PROPERTYPUTREF
.
To trigger the vulnerability, we assign an instance of MyClass
to scrollLeft
. This produces a dispatch call with flag DISPATCH_PROPERTYPUTREF
. Due to the bug in mshtml!CBase::ContextInvokeEx
, CElement::get_scrollLeft
will be invoked instead of CElement::put_scrollLeft
. Knowing that CElement::put_scrollLeft
expects an integer argument, the dispatch mechanism will coerce our MyClass
instance to an integer. When received by CElement::get_scrollLeft
, the latter function will interpret this integer as a pointer indicating where in memory to place the current value of scrollLeft
. In sum, the value 0x1234
will be written to 0xdeadbeef
. Due to implementation details, there will first be some extraneous reads and writes to 0xdeadbeef
. To see the full effect, it is easiest to replace 0xdeadbeef
in the PoC with a known valid address.
Exploitation, Part 1: From Arbitrary Write to Arbitrary Read
The main barrier to exploiting this vulnerability is that it provides a write primitive, but no read primitive or information leak. Thus, to start with, the attacker does not have knowledge of any address that is safe or useful to write to.
This can be remedied quickly and inexpensively, however, simply by allocating a very large array such that a chosen constant address is nearly certain to fall within the array allocation:
Creating ar1
allocates a contiguous buffer of 0x3000000
VARIANT
structures in memory, for a total length of 0x30000000
bytes. Starting with a clean process, this will surely include our chosen address of 0x28281000
.
Initially, all the VARIANT
structures in ar1
contain all zeroes, so that each element has type VT_EMPTY
. If we write a new value, say 0x4003
(VT_BYREF | VT_I4
) at 0x28281000
, then it will change the type of one element of ar1
so that it is no longer an empty value. By iterating over the array, we can find the corrupted element. We will call this element the “gremlin”, because “gremlin” has panache. In our exploit, the variable gremlin
is used for the index, so that the gremlin itself is referenced as ar1(gremlin)
.
Note that the variability of the starting address of the array allocation is constrained in that it will always be at a page boundary, which is to say, a multiple of 0x1000. Accordingly, we do not need to examine every array element to find the gremlin. Instead we can examine every 0x100th element (0x1000 divided by the size of a VARIANT
), as long as we start at the appropriate index. Using this approach, the search for the gremlin can be concluded rapidly, generally in less than one second. Incidentally, this constraint on address variability is also the reason why we can be certain that 0x28281000
will fall specifically at the beginning of one of the VARIANT
elements instead of somewhere in the middle of a VARIANT
.
Now, why did I choose to give the gremlin the type of VT_BYREF | VT_I4
? Because that type of VARIANT
produces a read of an integer value via one level of indirection. In other words, suppose we write the gremlin’s memory as follows:
Then, when reading the value of the gremlin, ar1(gremlin)
, it will dereference our chosen address of 0x12345678
and return the 4-byte integer found there. Done – we’ve got our read primitive.
Actually, I glossed over one fine point. To construct the gremlin as shown in Figure 1, we need to write the desired target address into location 0x28281008
. However, as mentioned earlier, our write primitive has a limitation in that it cannot write values greater than 0x001767dd
. The solution I used was to write one small value at a time, each value in the range 0x00-0xff
, each starting at a subsequent address. By repeating this process 4 times, we can build up an arbitrary 4-byte value in memory, with the caveat that 3 following bytes will end up overwritten with zeroes. In the case of the VARIANT
structure as seen in Figure 1, there is an unused 4-byte field following the field at 0x28281008
, so the unwanted zeroes cause no harm (and what is more, they are all zero to begin with). The following figure shows how an arbitrary DWORD of 0x12345678
can be built up by means of four separate restricted DWORD writes.
To address the next challenge: We still don’t know any interesting addresses to read. Again, this is quickly remedied. Since we know that array element ar1(gremlin)
is at 0x28281000
, the following array element ar1(gremlin+1)
is at 0x28281010
. We can place an arbitrary object into ar1(gremlin+1)
, then use the gremlin as a read primitive to disclose the address of the object:
Figure 3 shows how I use the gremlin together with the subsequent array element. After performing this setup, reading the value of ar1(gremlin)
yields the address of the target object.
We have shown how the attacker can read and write memory at will, including disclosure of the addresses of arbitrary objects. At this point, it’s pretty much game over for arbitrary read and write access to the memory space of the current process.
Exploitation, Part 2: From Memory Control to Code Execution
The next step at this point, traditionally, would be to leverage our memory read and write capabilities to initiate ROP-style execution, leading ultimately to a native payload stage. Such methods have already been well explored. I was curious to try something a bit more innovative.
I was inspired by the “Vital Point Strike” [PDF slides] technique described back in 2014 by tombkeeper. The essential idea of that attack was to use memory read/write capabilities to locate and change a data structure in memory, thereby turning off “SafeMode”. Once accomplished, script can simply instantiate an arbitrary ActiveX object such as `WScript.Shell` and make use of the abundant functionality it provides.
Since his 2014 Black Hat talk, Microsoft has added strong tamper protection to the “Vital Point” of tombkeeper’s original presentation, so I do not believe this is a viable technique any longer. The question remains, though, as to what other “vital points” can be found.
I conjecture that once an attacker has arbitrary read/write access to the process’s address space, there will always be ways to construct hazardous objects in memory that yield easy code execution. Upon considering this, I set out to find a concise new exploitation technique that can be used against Internet Explorer today that achieves code execution easily without use of any ROP or shellcode.
The idea I settled on was to subvert the dispatch mechanism. When invoking a method or property on an object, the dispatch mechanism packages up arguments provided by script, turns them into native stack-based parameters, and finally calls the native function that implements the desired method or property. Thus, the dispatch mechanism does all the heavy lifting that is necessary for making a clean call from script to a native function. Can we subvert it to call into native code of our choosing?
In fact, altering the native target address of a dispatch is the easy part. Typically, during dispatch, the target function will be found by looking it up in a vtable. With the ability to read and write memory, we can create a fake vtable where some entry has been changed to point to a native API of our choice. I decided that WinExec
is an API that could be used most easily for code execution. By changing a vtable entry to point to WinExec
, we can in fact invoke this API via dispatch from script.
However, there is one major hitch in this plan: the function signature is not exactly correct. Whenever calling a function via dispatch, the first parameter will be a pointer to the COM object upon which the method is being invoked (the “this” parameter). That’s bad news for us, because we typically need to have full control over the first stack parameter passed to our target API. This is certainly the case for WinExec
, where the first stack parameter is a pointer to the command string to be executed.
My solution to this problem was head-on: I prepared the COM object in memory to be simultaneously a usable object and also a valid ANSI command string – a sort of in-memory polyglot. This is far simpler than it sounds. Consider: By the time we are ready to invoke WinExec
via the forged vtable, we no longer require the COM object to be in a functioning state. No methods of the COM object will be called, precisely because WinExec
will execute in place of the object’s original method. Thus, we are free to overwrite all fields of the COM object in memory with whatever we choose. The only parts of the COM object that we must be careful to leave in a usable state are those few fields that are actually needed for the dispatch mechanism itself to function properly.
I chose the ActiveX object Scripting.Dictionary
to work with. I reasoned that it would be a good choice due to its simplicity, and in particular, due to the fact that it has a relatively uncomplicated implementation of IDispatch
.
Experimenting with the memory layout of a Scripting.Dictionary
instance reveals the following:
The entire object is 0x40 bytes in size, and only three DWORD-size fields are vital to the dispatch mechanism. The first, shown in red, is the main vtable pointer. We will replace this with a pointer to a forged vtable in which one of the function pointers has been replaced with a pointer to WinExec
. The second, shown in blue, is the reference counter. This will be incremented by one for the duration of the dispatch call. Its precise value is unimportant. The final field, shown in green, is a pointer to a small structure (size 0xc) which seems to be called a “Pld”. To take an educated guess, I think this stands for “Per-LCID Dispatch”.
In total, this shows that we are in fairly good shape. We can overwrite the entire object with almost anything we choose, with the exception of the first and last fields, which must point to a usable (forged) vtable and to an intact pld structure, respectively. Recall that, to conduct our attack, the very same memory of this COM object must also be a valid ANSI command string to pass to WinExec
.
Our first challenge will be this: In the first field, how can we write a 4-byte value that is simultaneously a vtable pointer and also the first four characters of an ANSI command string? My solution was to write the first 8 bytes of the object as follows:
See what I did there? The first four bytes can be read as the pointer value 0x28282828, and we can put our forged vtable at that location. When read as ANSI characters, though, they represent the string ((((
. This is a valid Win32 path component. Right afterwards, we place the string \..\
, using path traversal to cancel the bogus path component ((((
. Note that there is no need for a folder named ((((
to exist on disk. I refer the reader to this article by James Forshaw for an excellent treatment of the subtleties of path handling in Windows.
The next hurdle to clear is the reference count, shown in blue in Figure 4, but that is a low bar indeed. Any value we place there is acceptable, as long as we bear in mind that the DWORD will be incremented prior the call to WinExec
. Therefore we place pre-shrunk data there so it will be incremented to our desired value. I decided I’d like to run some PowerShell, hence what we have so far is:
where .ewe
will be incremented so as to read .exe
(the byte 0x77 is the character w
and it is the low-order byte of the DWORD shown above at 199e3fd4
).
After this, we start to place PowerShell script. Unfortunately, by now we are running out of space. There are only 0x1c more bytes available until we hit the third hurdle, which is the pld pointer. How can we prevent the appearance of the pld pointer from ruining the text of our PowerShell script? I solved this by opening up a PowerShell comment:
Afterwards, we can close up the PowerShell command and write our desired PowerShell script without any further limitations. At that point we will be writing past the end of the Scripting.Dictionary
allocation, but that doesn’t pose any problem as long as we prepare the heap properly.
One trouble that did crop up was that the pld pointer would sometimes happen to contain a byte such as 0x00 or 0x22 (double-quote) that would prematurely terminate the PowerShell command. To prevent this, I wrote a bit more script to copy the pld structure and rewrite it at the fixed location of 0x28281020. Then I placed 0x28281020 as the pld pointer into the Scripting.Dictionary
. After taking care of this detail, the exploit worked with complete reliability when starting with a clean process.
A Surprise Bonus
I developed this exploit on Windows 7, since VBScript is not permitted on Windows 10. Soon afterwards, however, James Forshaw disclosed a bypass he had discovered that allows VBScript to run on Windows 10 as well. This allowed me to write a version of the exploit for IE on Windows 10. Microsoft has since patched this bypass with CVE-2019-0768, but we can still use it for this demonstration.
On Windows 10 there is one final line of defense before code execution: CFG. Would CFG block an attempt to call WinExec
from a vtable? It did not. It seems that Microsoft did not see fit to use CFG to restrict calls to WinExec
the way they did with GetProcAddress
and a smattering of other APIs that traditionally have been used for exploitation. I do not fault them for this decision. Once an attacker has full read/write access to the memory space of the process, it is not a worthy venture to attempt to lock down all possible avenues to code execution.
Shown here is the full exploit for Internet Explorer on Windows 10 1809 at the February 2019 patch level. This PoC may also be found on our GitHub repository. It is highly reliable when starting with a clean process. Enhanced Protected Mode can be either off or on (though not in Enhanced Protected Mode with 64-bit renderer processes). When Enhanced Protected Mode is on, the resulting code execution will be constrained by the IE EPM AppContainer.
Conclusion
I sense that we have only scratched the surface of what can be accomplished through the use of read/write access to address space. This level of access makes it possible to arbitrarily corrupt data structures, and even to manually craft new object instances that did not exist in memory beforehand. Attackers can use this to achieve their objectives without needing to execute even a single machine-level instruction.
You can find me on Twitter at @HexKitchen, and follow the team for the latest in exploit techniques and security patches.