Mobile Code Security

With the growth of distributed computer and telecommunications systems, there have been increasing demands to support the concept of "mobile code", sourced from remote, possibly untrusted, systems, but executed locally. The best known examples of this are WWW applets, but it also is manifest in dynamic email, and more recently, in supporting third party suppliers in the emerging Telecommunications Information Networking Architecture (TINA). Supporting mobile code introduces a number of serious security and safety issues that must be addressed. This paper will introduce some of these issues, and outline some of the proposed solution approaches, as utilised in languages such as Safe-TCL, Java, and Omniware.
Introduction
"Mobile Code" is code sourced from remote, possibly "untrusted" systems, but executed on your local system. Examples include: web applets, dynamic email, and TINA building blocks.

The concept of "mobile code" has been called by many names: mobile agents, mobile code, downloadable code, executable content, active capsules, remote code, and others. All these deal with the local execution of remotely sourced code.
Mobile Code Examples
Examples of mobile code include:

Web Applets
Mini-programs written in Java, which are automatically loaded & run on being named in an HTML document. A document can include a number of applets, and these may be sourced from a number of different servers, and run virtually without the user being aware of them.
Dynamic Email
One proposal for the provision of dynamic email suggested incorporating Safe-TCL scripts as components of MIME email. These scripts could be run either on mail delivery, or when the mail is read by the recipient.
TINA Building Blocks
The evolving "Telecommunications Information Networking Architecture" (see NDC95) includes support for 3rd party service providers who can supply TINA Building Blocks (objects), which can manipulate network resources in order to provide value added services to clients. An outline of some of the security and safety issues is given in Shah96.

All of these examples illustrate that the use of mobile code will raise of number of serious security and safety issues. This paper will outline some general approaches to, and specific examples of, "safe" systems. I finish by mentioning some flaws which have been found in existing systems, in order to derive some lessons for future designs.
Low-level Security Issues
The use of "mobile code" raises a number of obvious security issues:

* access control -- is the use of this code permitted
* user authentication -- to identify valid users
* data integrity -- to ensure the code is delivered intact
* non-repudiation -- of use of the code, for both the sender and the receiver, especially if its use is being charged
* data confidentiality -- to protect sensitive code
* auditing -- to trace uses of mobile code

Techniques for providing these security services are well known. Their provision is not a technical problem, but rather a political and economic one. It involves the use of cryptographic extensions to communications protocols. These are well described in the OSI Security Framework, the ISO 10181 and CCITT X.810-X.816 standards, in the IETF IP-SEC proposals, and the Secure Web protocols.

Clearly a system which supports "mobile code" will need to provide these services. Before too long, I believe we will see them. A more interesting question, though, is how to address the issue of how to safely execute the code once it is validly and correctly delivered to the end-user's system.
Mobile Code Safety
The prime focus of this paper is on the techniques which can be used to provide for the safe execution of imported code on the local system. This has to address threats due to rogue code being loaded and run. Of course in many ways, these problems are not new: they have been a key component of operating systems design on multi-user systems for many years. The traditional approach to addressing these problems has been to use heavy address space protection mechanisms, along with user access rights to the file system and other resources. The difference between the traditional problems, and those posed by mobile code, is one of volume and responsiveness. Mobile code is intended for quick, lightweight execution, which conflicts with the cost of heavy address space mechanisms in most current operating systems. Also, each mobile code unit can, in one sense, be thought of as running as its own unique user, to provide protection between the various mobile code units and the system. Traditional methods of adding new users cannot cope with this demand.

The types of attacks which need to be guarded against include:

* denial of service
* disclosure of confidential information
* damage or modification of data
* annoyance attacks

Some example scenarios which can be imagined include: a Video-on-Demand service which discretely scans local files for information; An online game which opens a covert connection to run programs locally; an Invisible program that captures system activity information.
Resource Access & Safety
Fundamentally, the issue of safe execution of code comes down to a concern with access to system resources. Any running program has to access system resources in order to perform its task. Traditionally, that access has been to all normal user resources. "Mobile Code" must have restricted access to resources for safety. However, it must be allowed some access in order to perform its required functions. Just which types of access, and how these are to be controlled, is a key research issue. The types of resources to which access is required include:

* file system
* network
* random memory
* output devices (entire display, various windows, speaker ...)
* input devices (keyboard, mic ...)
* process control (access to CPU cycles)
* user environment
* system calls

Language Support for Safety
When considering means of providing safe execution, if heavy address space protection mechanisms are not being used, then considerable reliance is going to be placed on the verified use of type-safe programming languages. These ensure that arrays stay in bounds, that pointers are always valid, and that code cannot violate variable typing (such as placing code in a string and then executing it). These features are needed to ensure that various code units do not interfere with each other, and with the system.

If type-safe languages are being used, we want assurance of the type-system's soundness and safety, want validation of type-checking implementations, and of course, all without compromising efficiency.

In addition, a range of usual sound programming proceedures need to be followed. The system should be designed in a modular fashion, separating interfaces from implementations in programs, and with appropriate layering of libraries and module groups, with particular care being taken at the interfaces between security boundaries.

One general approach to designing "safe" execution evironments is to remove general library routines which could compromise security, and replace them with more specific, safer ones, eg. replace a general file access routine with one that can write files only in a temporary directory.

Great care is needed with this approach to ensure that unforeseen interactions or implementation flaws do not negate the desired security. This has been an area where failures have occured on a number of occasions.
Granting Access to Resources
One of the key issues in providing for safe execution of "mobile code" is determining exactly which resources a particular code unit is to be granted access to. That is, there is a need for a security policy which determines what type access any "mobile code" unit has. This policy may be:

fixed for all "mobile code" units
very restrictive but easy, and the approach currently used to handle applet security in web browsers such as Netscape.
user verifies each security related access requests
relatively easy, but rapidly gets annoying, and eventually is self-defeating when users stop taking notice of the details of the requests. Whilst there is a place for querying the user, it should be used exceedingly sparingly.
negotiate for each "mobile code" unit
much harder, as some basis is needed for negotiation, perhaps based on various profiles, but ultimately this is likely to be the best approach.

In the longer term, some mechanisms are needed to permit negotiation of appropriate accesses. How this is expressed is, I believe, one of the key research issues. Initially this is likely to be based on a simple tabular approach, based on the various categories mentioned above. While adequate for the simplistic applets seen to date, this is unlikely to be sufficient for more complex "mobile code" applications. For these, some faily powerful language is going to be needed to express the required types of accesses, along with a means of reasoning about those requests. For example, consider a simple "mobile code" text-editor -- it should be able to change any textual file specified by the user, have access perhaps to a preferences file, but otherwise be denied access to all other files. How can this be expressed and reasoned with? This is an area that needs considerable additional work, but will be a key to the successful use of "mobile code".
Mobile Code Technologies
Having considered some of the issues raised by the need for "safe" execution of "mobile code", I will now summarise some approaches that have been tried. One method of categorising "mobile code" technologies, given in TW96, is based on the type of code distributed:

* Source Code
* Intermediate Code
* Platform-dependent Binary Code
* Just-in-time compilation

Source Code
The first approach is based on distributing the source for the "mobile code" unit used. This source will be parsed and executed by an interpreter on the user's system. The interpreter is responsible for vetting source to ensure it obeys the required language syntactic and semantic restrictions; and then for providing a safe execution "sand-box" environment. The safety of this approach relies on the correct specification and implementation of the interpreter.

The main advantages of the source code approach is the distribution of relatively small amounts of code; the fact that since the user has the full source, it is easier to vet the code; and that it is easier for the interpreter to contain the execution environment.

Disadvantages include the fact that it is slow, since the source must first be parsed; and that it is hard to expand the core functionality, since the interpreter's design limits this.
Programmable MUDs
One early example which included aspects of "mobile code" were some of the MUDs which were programmable (see Bro93) eg MUCK, MOO, UberMUD. These systems could execute source authored by arbitrary users anywhere in the world, manually transferred to the MUD system, and subsequently executed in the MUD interpreter environment. Safeguards were provided by the fact that the MUD interpreter had no other access to host system apart from the single MUD database file. However, any MUD program had full access (as the running user) to MUD data. One limitation was that users needed explicit permission to author code: once granted however, they were trusted not to abuse the privilege.

These systems are early illustrations of some of the concepts: the use of a "sand-box" interpreter, and restrictions on the source of code.
Safe-TCL
The most widespread and common example of the source code approach is Safe-TCL, a subset of the TCL language with restricted features for safety. TCL was designed by John Ousterhout as a simple, clean, interpreted, embeddable command language, with graphical toolkit (Tk) (see Ous94). Safe-TCL is restricted by having limited file system access, and is prevented from executing arbitrary system commands. Safe-TCL code is usually executed by the "untrusted interpreter" (that is the interpreter which executes code from an untrusted source). A key component of the Safe-TCL system is the provision of another "trusted interpreter" (which executes code from a trusted source). Trusted code can be used to extend the capabilities of the Safe-TCL system. Such extension code can be invoked by any code running on the "untrusted interpreter", but the extension code uses the "trusted interpreter". This provides a clean mechanism for extending the system.

Safe-TCL was designed by Nathaniel Borenstein and Marshall Rose as a means of augmenting email to include active messages, termed "Dynamic Email" (see Bor94). With the addition of new MIME types: application/safe-tcl and multipart/enabled-mail, Safe-TCL programs could be incorporated into email messages, and executed either on delivery or access by the recipient. The concepts in Safe-TCL were subsequently adopted by the Tcl group, and are now incorporated as standard in the latest Tcl/Tk releases.

More recently, Safe-TCL has been adapted for use on the web to execute "Tclets" -- Safe-TCL code downloaded by a web browser and executed by an interpreter on the user's system (see Tclet96). It is currently handled by a plug-in on common browsers such as Netscape (see Lev96). Safe-TCL has also been extensively used in the First Virtual Internet payment system.
JavaScript
JavaScript is a source-level scripting language, which is embedded in an HTML document. It is NOT Java! It is interpreted by the user's web browser, and allows control over most of the features of the web browsers. It has access to most of the content of its HTML document, and has full interaction with the displayed content. It can access Java methods (& vica versa), providing access to features not present as standard in JavaScript. Currently there is only a very coarse level of security management: it is either enabled or disabled. Its security features are not yet well documented.
Intermediate Code
A second approach to providing "mobile code" is to have the programs compiled to a platform-independent intermediate code, which is then distributed to the user's system. This intermediate code is executed by an interpreter on the user's system.

Advantages are that it is faster to interprete than source, since no textual parsing is required, and the intermediate code is semantically much closer to machine code. The interpreter provides a safe execution "sand-box", and again, the safety of the system depends on the interpreter. The code in general is quite small, and the user's system can vet the code to ensure it obeys the safety restrictions. Disadvantages of this approach are its moderate speed, since an interpreter is still being used, and the fact that less semantic information is available to assist in vetting the code than if source was available.
JAVA
Probably the best known intermediate code technology today is Java. It is Sun Microsystems' "executable content" technology, using an interpreted, dynamic, type-safe object-oriented language (see GM95). Its safety features include the use of runtime bytecode verification, late dynamic binding of modules, automatic memory management, and exception processing. Considerable effort has gone in to ensuring its safety in design and implementation. This safety is, however, dependent on the correct specification and implementation of both the verifier/interpreter AND the standard library implementation (esp. SecurityManager). Failures in these areas have led to some security flaws, as described later.
Telescript
Telescript is a technology for creating distributed applications using "mobile agents" (see Tar95). A key difference between Telescript and Java is that a Telescript "mobile agent" is a migrating process that is able to autonomously transfer its execution to a different system by asking to "go" elsewhere. Like Java, it is an interpreted, dynamic, type-safe object-oriented language, compiled to an intermediate code, with runtime type checking and late dynamic binding, automatic memory management, and exception processing. Additional features include object persistence and remote access, enabling objects to access each other over the network. Because of its migratory and remote access features, authentication and protection features are integral. Currently, Telescript is also supported via Netscape plugins for web applications, as well as using dedicated interpreters for other distributed applications.
Native Binary Code
The final category of code distribution uses native binary code, which is then executed on the user's system. This gives the maximum speed, but means that the code is platform dependent. Safe execution of binary code requires:

* restricted use of instruction set
* restricted address space access

Approaches to ensuring this can rely upon:

* tradional heavy address space protection, which is costly in terms of system performance and support;
* the verified use of a trusted compiler, which guarantees to generate safe code that will not violate the security restrictions;
* the use of "software fault isolation" technologies (see WLAG93) which augment the instruction stream, inserting additional checks to ensure safe execution (Ste92).

A combination of verified use of a trusted compiler, and the software fault isolation approach has created considerable interest, especially when used with a Just-in-time Compiler.
Just-in-time Compilation
Just-in-time Compilation (JIT) is an approach that combines the portability of intermediate or source code with the speed of binary code. The source or intermediate code is distributed, but is then compiled to binary on the user's system before being executed. If source is used, it is slower, but easier to check. If intermediate code is used, then it is faster. Another advantage is that the user can utilise their own trusted compiler to verify code, and insert the desired software fault isolation run-time checks.

This approach is being used with Java JIT compilers, and also in the Omniware system.
Omniware
Omniware is yet another technology for "mobile code" (see LSW95). Omniware code is written in C++, which is then compiled to an intermediate code for the OmniVM. This is distributed, and at run-time is translated to native code for execution. It relies on "software fault isolation" techniques to enforce safe execution of binaries. This adds special checking code which emulates a MMU in software, placing each module in its own protection domain. The run-time environment vets access to resources. The major advantages claimed for Omniware are that it uses a standard, well known language, C++, that it is fast, since binary code is actually being executed, and yet it is safe, due to the use of the "software fault isolation" techniques.
Theory vs Practice
There are a number of good proposals for providing safe execution of "mobile code". However, some flaws have been found in practice. Most of the recent effort has focused on Java (see below), although the researchers believe that other systems would be likely to have similar flaws if they were as closely scrutinised. By examining the flaws found, some lessons may be drawn to assist with future designs.
Java Implementation Flaws
A number of implementation flaws have been found in the Java system (see DFW96, Ban95, Yel95). These include:

* Problems with network security, mostly created using DNS spoofing to subvert the interpreter's view of the domain namespace, and subsequently to violate the restriction on opening connections only back to the source of the Java code.
* There were some early problems with buffer overflows in sprintf in the original JDKs. These have mostly been fixed, except in javap (where care is still needed).
* Some of the standard routines provide information about the layout of storage for objects. This is probably not a serious flaw, but more information is revealed than is perhaps necessary.
* In HotJava, the proxy variables were public, which meant that any Java program could change them, and thus redirect all requests from the user's browser.
* There are some problems with inter-applet security. Applets are supposed to be quarantined from each other. However, using the thread manager, an applet can discover which other applets have running threads, control attributes of these threads, and even discover the applets names (since these are encoded in the thread names).

All of these flaws can be corrected by changes to the standard Java run-time environment: many have already been made.
Java Language vs Bytecodes
More serious are some deficiencies in the design of the Java language itself, or more correctly, in differing semantics between the Java language, and the bytecodes of the Java Virtual Machine (JVM) to which the language is compiled. DFW96 have identified two significant flaws.

The first, and most serious, relates to superclass constructors. Whenever an object is created, a constructor is called for it. These constructors are required to call the constructor for the super (parent) class first. Unfortunately the Java language prohibits, but the bytecode verifier allows, the creation of a partially initialised class loader, which can then be used to thwart some of the security checks on object creation, and to violate the strong typing of objects. DFW96 have found by using this attack, they can get and set the value of any non-static variable, and call any method (including native methods with fewer security restrictions).

The second flaw identified relates to the Java package names. Again the bytecode verifier allows a leading "/" on package names, which is interpreted by the run-time system as an absolute pathname to some package. Since the package is on the local system, it is regarded as trusted code. If a user is running Java on a system that allows any other type of network file access (eg FTP server with an incoming directory), then that can be used to place code on the system which can then be executed by the user's Java interpreter.

Also identified were some problems with object initialisation, where object constructors are working with partially initialised objects.

All of these suggest that some further work is needed on the design of the Java language, and particularly on its relation to the JVM bytecodes.
Security Failure Lessons
Experience with systems (esp. Java) have highlighted some dangers, showing that failures can occur in both the implementation and the specification of the system. Correct specification does not prevent poor implementation, weakening its security. Great care is needed. Ideally it should be possible to formally verify the language design, and then validate its implementation. In practise, this is unlikely to be possible for some time. Some of the methods and procedures used in the IT Security Evaluation community may, however, assist in the creation of more reliable systems.
Conclusions
"Mobile code" is here with increasing demands for its use. Safe execution of "mobile code" implies a need for controlled access to resources, access which ideally should be negotiated for each "mobile code" unit. The means for achieving this is a subject for considerable additional research.

Approaches taken so far to providing "mobile code" include the distribution of: source, intermediate code, or binary code, and the use of Just-In-Time compilers.

Experience with these systems has shown that safe and secure systems need both correct specification and implementation. There is still considerable research and development needed in these systems. However I believe the goal of safe and secure "mobile code" execution is reasonable and achievable.

0 comments: