Understanding Risk in the Unintended Giant: JavaScript
July 19, 2017 | Simon ZuckerbraunDefending a network means understanding the attack surface, but that attack surface may be broader than you think – all thanks to a scripting language that grew beyond expectations.
JavaScript is among the premier computer languages in use today. According to the results of the 2016 StackOverflow Developer Survey, “JavaScript is the most commonly used programming language on Earth.” Even this lofty assessment does not do full justice to the ubiquity of JavaScript, because in addition to its role as a programming language, JavaScript often serves as the intermediate representation for dozens of other compiled languages. In this way, even code written in other languages such as Java or C++ can execute through the agency of a JavaScript engine; typically, this is done to allow code written in various languages and toolsets to execute natively in the browser. As it has been remarked many times, “JavaScript is the assembly language of the web.”
JavaScript’s rise to prominence has been most unlikely, and I state this without malice. Brendan Eich – the creator of the language – readily allowed in his talk at Fluent 2015, “I did JavaScript in such a hurry, I never dreamed it would become this assembly language for the Web." Originally conceived as a scripting language with a low bar for entry and intended for lightweight tasks in web page design, JavaScript took on a life of its own. It completed a thorough conquest of the client-side web space in what Eich calls, “perverse, merciless history.” As of 2016, the StackOverflow survey cited above identifies JavaScript as the most widely used technology by back-end developers as well.
Perhaps most remarkably of all, JavaScript now can run even reasonably intensive graphical applications such as games. With the assistance of updated browser standards including HTML5 and WebGL, highly dynamic graphical applications can now be delivered to the browser as JavaScript code without relying on plugins such as Java, Flash, Google Native Client or Silverlight. The combination of JavaScript and HTML5, in particular, provides a potent, standards-based application delivery platform with a nearly ubiquitous install base.
So the whirlwind of technical history has taken JavaScript far from its roots in lightweight web design. For those laboring behind the scenes, a vast amount of work has been required to make this a reality. The JavaScript standards have gone through numerous revisions, compromises have been made between behaviors of competing implementations, and major features have been added. This is evidenced by the latest standards document being nearly 900 pages in length.
On the implementation side, tremendous labor has also been invested. Before we discuss why, a short digression into the nature of the language is helpful. As a language, JavaScript is highly dynamic. Variables have no declared types. Functions are “first-class objects.” Though a function will often be couched in syntax resembling a declaration, this appearance is mainly an illusion. “Declaring” a JavaScript function is, in reality, another way of programmatically creating a “function object” and assigning it to a variable, which plays the role of the “name” of the function. A JavaScript function has no set signature (declared number and types of arguments, and return type); one can pass any number and any type of arguments to any function without a hard error. “Classes” and “superclasses” are similarly first-class objects, and they can be created, attached to each other, and detached from each other at will, at any point during program execution. The same applies to their members. Arbitrary string values can also be evaluated as code. All the above characteristics are appropriate for an interpreted scripting language, as was JavaScript originally envisioned. However, none of these attributes lend themselves to a high-performance implementation that compiles down to the machine level.
On the basis of everything discussed to this point, we can begin to glimpse the enormity of the demands that are placed on the modern JavaScript engine, or runtime:
- For some use cases, such as interactive graphical client-side applications or server-side applications, high performance is a necessity. As a result, the engine cannot rely on interpretative execution.
- For general web browsing, the key to responsiveness is extremely fast start time for newly-loaded code. Since compilation produces significant start-up delays, the engine cannot rely exclusively on compilation, either.
- To achieve any degree of efficiency in optimized, compiled code, the compiler must make basic assumptions about the types of variables and the structure of program flow. Since the language lacks the syntactic means to declare these assumptions and remove their malleability, there must always be fallback mechanisms whereby already-executing, optimized code can be “de-optimized” or “re-optimized” to account for a change in assumptions. (Note: asm.js is a relatively recent JavaScript feature that directly addresses this deficiency in JavaScript syntax. A caveat, however, is that it is not intended for humans to code with, but rather only as a transpiler object format.)
- Especially due to the importance of the mobile market, a JavaScript engine must be frugal with memory consumption.
- The need to run the engine on multiple architectures multiplies the effort of machine code generation. The V8 JavaScript engine, used in the Chrome browser, can generate machine code for 9 different architectures.
To meet these towering demands – and indeed, to push the envelope of JavaScript performance far beyond what was imagined a decade ago – the developers of major JavaScript engines have produced advanced designs of remarkable ingenuity and complexity. In that very success story lies the seed of a new challenge. The danger: complexity is the enemy of security.
Language-based security concerns are a well-known phenomenon. Some languages, such as C and C++, tend to make it overly challenging to choose safe programming constructs. Other languages, notably Java, have been plagued with implementation bugs in their class libraries or sandboxing mechanisms. Still others are lacking in facilities important for secure application development.
A new class of risk is emerging in connection with JavaScript: the danger of vulnerabilities in the execution engine itself.
If JavaScript is the assembly language of the web, then the JavaScript engine is the processor. Its complexity is immense, due to the heft of the JavaScript standard it must implement, as well as the numerous levels of advanced optimizations required, as described above. The net result is an exponential increase in the potential for untoward interactions between features of the engine’s code base.
We have begun to receive reports of some vulnerabilities of this class from submitters to the ZDI program, and additionally from contestants in the Pwn2Own competition. Over the next few months, I’ll be taking a closer look at JavaScript topics including vulnerabilities and reversing techniques through a series of blog posts. The team is also working on the release of a tool that we use while debugging JavaScript engines to aid in your reversing efforts as well. The world of the JavaScript engine is fascinating territory for vulnerability research. The exploration of these vulnerabilities reveals the true extent of this attack surface, and a better understanding of the techniques that will help secure it.
You can find me on Twitter at @HexKitchen, and follow the team for the latest in exploit techniques and security patches.