among different instances of the same version of a piece of popu-
lar software that large-scale exploitation of vulnerabilities becomes
infeasible. We view that work, which is still in progress, as com-
plementary to the work we presented in this paper.
A second drawback of our approach is that applications have to
be statically linked, thus increasing their size. In our prototype
of Section 3, we worked around this issue by using a single-image
version of OpenBSD. In practice, we would seek to randomize (and
thus statically-link) only those programs that are exposed to remote
exploits, i.e., network daemons, thus minimizing the overall impact
of static linking to the system. Furthermore, it must be noted that
avoiding static linking is going to help reduce only the disk usage,
not the runtime memory requirements. Each randomized process
will need to acquire (as part of process loading) and maintain its
own copy of the encrypted libraries using either of the following
two mechanisms, neither of which is desirable. Firstly, the process
loader can load just the main program image initially, and dynam-
ically copy/load and encrypt libraries on-demand. This will incur
considerable runtime overhead and also require complex process
management logic, in the absence of which it will degrade to the
other mechanism, described next. The second approach is to load
and encrypt the program code, and all libraries that are referenced,
right at the beginning. It is obvious that this will result in large
amounts of physical RAM being used to store multiple copies of
the same code, some of which may never be executed during the
entire life of the process.
Also note that our form of code randomization effectively pre-
cludes polymorphic and self-modifying code, as the randomization
key and relevant code would then have to be encoded inside the
program itself, potentially allowing an attacker to use them.
Instruction randomization should be viewed as a self-destruct
mechanism: the program under attack is likely to go out of con-
trol and be terminated by the runtime environment. Thus, this tech-
nique cannot protect against denial of service attacks and should be
considered as a safeguard of last resort, i.e., it should be used in
conjunction with other techniques that prevent vulnerabilities lead-
ing to code-injection attacks from occurring in the first place.
Debugging is made more difficult by the randomization process,
since the debugger must be aware of it. The most straightforward
solution is to de-randomize the executable prior to debugging; the
debugger can do this in a manner transparent to the use, since the
secret key is embedded in the ELF executable. Similarly, the de-
bugger can use the key to de-randomize core dumps.
Finally, our approach does not protect against all types of buffer
overflow attacks. In particular, overflows that only modify the con-
tents of variables in the stack or the heap and cause changes in
the control flow or logical operation of the program cannot be de-
fended against using randomization. Similarly, our scheme does
not protect against attacks that cause bad data to propagate in the
system, e.g., not checking for certain Unix shell characters on input
that is passed to the system() call. None of the systems we exam-
ined in Section 5 protect against the latter, and very few can deter
the former type of attack. A more insidious overflow attack would
transfer the control flow to some library function (e.g., system()).
To defend against this, we propose to combine our scheme with
randomizing the layout of code in memory at the granularity of in-
dividual functions, thus denying to an attacker the ability to jump
to an already-known location in existing code.
5. RELATED WORK
In the past, encrypted software has been proposed in the con-
text of software piracy, treating code as content (see digital rights
management). Users were considered part of the threat model, and
those approaches focused on the complex task of key management
in such an environment (typically requiring custom-made proces-
sors with a built-in decryption engine such as DES [47]). Our re-
quirements for the randomization process are much more modest,
allowing us to implement it on certain modern processors, emula-
tors, and interpreters. However, we can easily take advantage of
any built-in functionality in the processor that allows for encrypted
software (e.g., the Transmeta Crusoe TM5800 processor).
PointGuard [22] encrypts all pointers while they reside in mem-
ory and decrypts them only before they are loaded to a CPU reg-
ister. This is implemented as an extension to the GCC compiler,
which injects the necessary instructions at compilation time, al-
lowing a pure-software implementation of the scheme. Another
approach, address obfuscation [18], randomizes the absolute loca-
tions of all code and data, as well as the distances between different
data items. Several transformations are used, such as randomizing
the base addresses of memory regions (stack, heap, dynamically-
linked libraries, routines, static data, etc.), permuting the order of
variables/routines, and introducing random gaps between objects
(e.g., randomly pad stack frames or malloc()’ed regions). Although
very effective against jump-into-libc attacks, it is less so against
other common attacks, due to the fact that the amount of possi-
ble randomization is relatively small (especially when compared to
our key sizes). However, address obfuscation can protect against
attacks that aim to corrupt variables or other data. This approach
can be effectively combined with instruction randomization to offer
comprehensive protection against all memory-corrupting attacks.
Forrest et al. [27] discuss the advantages of bringing diversity
into computer systems, and likens the effects to that which diversity
helps cause in biological systems. They observed how the lack of
diversity among computer systems can facilitate large-scale repli-
cation of exploits due to the identical weakness being present in
all instances of the system. Some ideas are presented regarding us-
ing randomized compilation to introduce sufficient diversity among
software systems to severely hamper large-scale spreading of ex-
ploits. Among these, the ones most relevant to this paper involve
transformations to memory layout.
There has been considerable research in preventing buffer over-
flow attacks. Broadly, it can be classified into four categories: safe
languages and libraries, source code analysis, process sandboxing,
and, for lack of a better term, compiler tricks.
Safe Languages and Compilers Safe languages, (e.g., Java)
eliminate various software vulnerabilities altogether by introduc-
ing constructs that programmers cannot misuse (or abuse). Unfor-
tunately, programmers do not seem eager to port older software to
these languages. In particular, most widely-used operating systems
(Windows, Linux, *BSD, etc.) are written in C and are unlikely to
be ported to a safe language anytime soon. Furthermore, learning a
new language is often considered a considerable barrier to its use.
Java has arguably overcome this barrier, and other safe languages
that are more C-like (e.g., Cyclone [35]) may result in wider use of
safe languages. In the short and medium term however, “unsafe”
languages (in the form of C and C++) are unlikely to disappear, and
they will in any case remain popular in certain specialized domains,
such as programmable embedded systems.
One step toward the use of safe constructs in unsafe languages is
the use of “safe” APIs (e.g., the strl*() API [44]) and libraries (e.g.,
libsafe [16]). While these are, in theory, easier for programmers to
use than a completely new language (in the case of libsafe, they are
completely transparent to the programmer), they only help protect
specific functions that are commonly abused (e.g., the str*() family
of string-manipulation function in the standard C library). Vulner-
abilities elsewhere in the program remain open to exploitation.