Lisp/C Integration in Eclipse: Core Technologies

[Paper Contents] [Previous] [Next] [Eclipse Home Page]

2: Core Technologies

The Eclipse strategy is to tightly integrate Lisp and C within a single address space, that is, within a single running program.1 This integration involves more than simply implementing Lisp within C. It requires Lisp programs and utilities to behave compatibly with other C programs and utilities. In order to accomplish all this, Elwood has synthesized a number of technologies for Eclipse:

1. A naming convention that reflects the differences between C and Lisp identifiers.

2. A core of C utilities for representing, initializing, and using run-time-typed data; closures over lexical functions; multiple values; dynamic binding; non-local exits; and cleanups.

3. Conservative garbage collection.

4. A run-time function-based interface to such Lisp utilities as the object system and environment initialization.

Few C programmers need go into the details of these; it is enough to know that they exist. The descriptions in this section do, however, give experienced Lisp programmers some understanding of how Lisp issues are handled in C by Eclipse. In all cases, each technology was implemented for maximum portability and so that the design would seem understandable and usable to both Lisp and C programmers.

2.1: Identifiers

Eclipse defines a naming convention that maps Lisp names to C identifiers. Eclipse uses this convention for each C identifier it generates, including the entry points accessible in the Eclipse library and identifiers appearing in translated user code. These transformations apply only to C identifiers, all of which are compile-time entities. Run-time data, such as the strings that form SYMBOL-NAMEs, are not transformed.

The transformation of identifiers is necessary because C imposes much harsher restrictions on identifiers than Lisp.

Lisp separates functions, variables, types, blocks, and labels into separate namespaces, while similarly named C objects conflict.
Lisp has a package system to modularize namespaces.
The scoping rules for names are different.
C has several globally reserved words.
C identifiers use a smaller character set, and do not allow identifiers to begin with numbers.

Eclipse addresses these differences by translating Lisp symbol names to C identifiers using a convention that follows standard C practice.

Normal function names do not have underscores, but capitalize the first letter of each word. A short package prefix is used. For example, CL:LOGICAL-PATHNAME-TRANSLATIONS => clLogicalPathnameTranslations(), USER::FOO-BAR => usrFooBar().
The names of ``constant'' variables holding Lisp symbols are in upper case with words separated by underscores. A short package prefix is used. For example, clNIL, clCALL_ARGUMENTS_LIMIT, clLOGICAL_PATHNAME_TRANSLATIONS, usrFOO_BAR.
Lexical variables are in lower case, and underscores are used. For example, foo_bar.

Eclipse uses package prefixes only when needed for scope or distinction, or because the identifier would be illegal or reserved without it. The shortest package nickname is used by default as the prefix. The system defines the package prefix ``cl'' for all system utilities and ``usr'' for utilities defined in the Lisp COMMON-LISP-USER package.

Lexical scope issues in function names (i.e., nested FLET/LABEL) are handled by preserving the nested chain of function names, separated by underscores. For example, usrOuterFunction_InnerFunction(). Eclipse preserves method specializers and qualifiers, and SETF function names, in a similar manner.

The ``pipe'' escape characters (e.g., #\|) used by the Lisp printer when a symbol has non-default case are also part of an Eclipse C identifier name.

Finally, when characters appear in a Lisp symbol that cannot appear in a C identifier, Eclipse replaces the characters with an alphabetic name in a contrasting case. Eclipse defines names for all the non-alphabetic members of the BASE-CHAR repertoire, but uses hex codes for extended characters (i.e., Unicode). For example, *DEBUG-IO* => clstarDEBUG_IOstar, user::|lower-CASE| => usrpipelowerpipe_CASE, LIST* => clLISTstar, clListSTAR().

Lisp names that are interned in the C package are exempt from this ``name mangling.'' This allows Lisp programs to reference C utilities that do not follow these conventions.

2.2: Representation

Eclipse provides a single C header file, ``eclipse.h,'' which defines Lisp data representations. Any C code that uses the Eclipse library must include this file. The following sections describe some of the representations defined in ``eclipse.h.''

2.2.1: Objects

The header file defines a C typedef called clObject, which is used by Eclipse to represent each Lisp datum. clObject is defined as a machine word that can be treated as either a pointer or as immediate data.2

Most kinds of clObjects are implemented in Eclipse as pointers to a heap allocated structure. The first component of this structure contains type information, including a pointer to the class metaobject. On some architectures, Eclipse saves space for some data types by using the least significant bits of the pointer for typing information. Eclipse also represents some data such as fixnums and characters by storing them directly in the clObject word as immediate data. Again, the least significant word bits are used for typing. In these latter cases, Eclipse reaches the class metaobject through a globally known array, indexed by the low clObject bits.

The header file defines a macro to access the class metaobject of all the built-in clObjects, including structure and instance objects. The header file and the Eclipse library define a number of macros and functions for creating different kinds of clObjects from corresponding C data, and for accessing the internal C data from different kinds of clObjects.3

To aid in linting, and to shield code from changes in clObject implementation, the header file defines an assignment macro, clSetq(place, value), which usually expands into ((place) = (value)).

2.2.2: Functions

Eclipse compiles each Lisp function to a corresponding C function. The generated C function uses the C variable argument mechanism (varargs/stdargs) to accept clObjects as arguments. The arguments are exactly the same as they are in Lisp, except that an additional argument, the ``symbol'' clEOA, is appended as an End-Of-Arguments marker. It is used by the function during argument parsing. Eclipse COMPILE-FILE automatically adds this clEOA marker to Lisp function calls in generated code. This use of clEOA is less error prone in hand-written/modified C code than insisting that an explicit argument count be provided. Some Lisp implementations pass data on a special Lisp data stack. Eclipse programs pass arguments as ordinary C data.

No extra arguments are needed in Eclipse to represent the Lisp function's defining environment. C programmers can call any Lisp function without needing to know if or how the function refers to an enclosing environment. Eclipse handles this automatically as follows.

Eclipse represents Lisp functions at run-time as closure clObjects that contain the code to be executed (i.e., a C function pointer) and the ``closed-over'' environment. The closure environment is defined by Eclipse as a vector of those variable clBindings (addresses of clObjects) that were defined in the function's enclosing lexical environment and used within the inner function code. For many functions, the environment is empty; that is, there are no lexical variables used within the function that are defined outside of it. When Eclipse creates a closure clObject, it fills the closure's environment with any necessary bindings.

In general, Eclipse uses ordinary C variables to represent local Lisp variables. Eclipse COMPILE-FILE uses the same variable name and scope in the generated C code as was present in the original Lisp source. Eclipse declares these variables as being of type clObject. Eclipse uses the address of the C variable as the clBinding of a closed-over Lisp variable with dynamic-extent. However, for a closed-over Lisp variable with indefinite-extent, Eclipse generates code that heap allocates a clBinding. clBindings are shared by all closures over them, but environments are not.

For each function defined in a non-empty environment, Eclipse COMPILE-FILE generates two ``environment hooks'' that point to a closure's environment. One hook is defined statically, outside the C function and the other is a local variable within the C function definition. The static environment hook is initialized by Eclipse when the closure clObject is created. Generated code initializes the local environment hook from the static hook immediately upon entering the function on each call. All references to enclosing variables from within the inner generated function use this local environment hook to access the clBinding. Using a cached local variable allows for reentrant calls to identical code that is closed over different bindings.

For ``top-level'' functions that only create closures once, Eclipse initializes the static hook once and it is never changed. For arbitrary Lisp closure objects created at run time, it is necessary to call such functions through their closure objects using FUNCALL or APPLY.4 FUNCALL and APPLY set the environment hook if necessary before calling the implementing C function. The address of the static environment hook is stored by Eclipse in the closure clObject.

2.2.3: Multiple Values

Eclipse defines each generated C function to return the ``primary'' Lisp value as an clObject value. Eclipse also defines a globally known pointer to a buffer of multiple clObject values. Some functions just return the values returned by other functions (i.e., are tail calls). However, if a function returns a single value (e.g., the value of a variable), then a macro from ``eclipse.h'' must be used to indicate in the multiple values buffer that only one value is returned. The function clValues() can also be used to fill the multiple value buffer with zero or more values. Receiving multiple values is accomplished by using a macro from ``eclipse.h'' that introduces a new multiple values buffer as a local (automatic, stack) C variable. The macro stores the location of this new buffer in the globally known pointer.

2.2.4: Dynamic Environment

Eclipse uses a Lisp-specific control stack to keep dynamic environment information such as dynamic bindings, active cleanups, and exits such as catchers and closed over blocks/labels. The elements of this stack are pointers to data identifying the kind of information.

Shallow binding is used for dynamic variables. Eclipse binds special variables by placing the symbol and its old value on the control stack, and setting the SYMBOL-VALUE to a new value.
Exits are implemented with setjmp()/longjmp(). Eclipse initializes a C jmp_buf and caches the state of the multiple values machinery.

Eclipse defines macros in ``eclipse.h'' for using the control stack to establish dynamic bindings, blocks, tagbodies, catchers, and cleanups. The header file also defines macros for non-local transfers such as RETURN-FROM, GO, and THROW that unwind the control stack as necessary. Besides using the appropriate longjmp() machinery, these transfers take care of unwinding dynamic bindings and executing UNWIND-PROTECT cleanup forms.

2.2.5: C Implementations

COMPILE-FILE does not generate platform- or compiler-specific code. Eclipse abstracts any platform or compiler dependencies into conditionally defined macros within ``eclipse.h.'' These macros cover such issues as word size, variable argument mechanism, and function prototypes. This allows the same Eclipse code to be processed by ANSI/ISO C compilers [Harbison], traditional classic/K&R C compilers [Kernighan], or C++ compilers.

2.3: Memory Management

Eclipse uses a conservative, non-relocating garbage collector, publicly available from Xerox PARC.[Boehm] In this case, ``conservative'' means that C data, including that held on the C stack or in registers, are traced by the garbage collector. The system assumes that anything that looks like it could be a pointer to data is live, and the data there is not collected. This allows user-written and Eclipse-generated C code to pass Lisp data around as ordinary C data without any need to ``register'' them by hand with the garbage collector. In addition, the collector recognizes a pointer that appears to point to within a heap allocated datum. This allows the collector to work with ``mangled pointers'' such as those described in Section 2.2.1.

Non-relocating garbage collectors do not move heap allocated data during collection. This allows Eclipse to implement clObjects as pointers to data, as opposed to ``indexes,'' ``handles,'' ``pointers to pointers,'' or other more complex things.

The garbage collector uses incremental/generational collection when supported by the operating system. This means that only a small amount of work is done during each collection, which reduces delays.

The garbage collector used was written for use in arbitrary C/C++ programs, and was not modified for use in Eclipse. It can be used directly by C programmers without using the rest of Eclipse.

The collector defines alternatives to the standard malloc() utilities. An application that uses malloc() or sbrk() will not work with Eclipse, but must instead be changed to use GC_malloc(). Care must be taken when calling certain operating system utilities from C code, because they sometimes use incompatible malloc-like utilities internally.

If the provided garbage collector is undesirable for some reason, it can be replaced with any user-provided, conservative, non-relocating system with a malloc-like interface.

2.4: Function-Based Interface

The Common Lisp Object System (CLOS), and the semantics of file loading are two examples of Lisp utilities that have no analog in C. Eclipse defines functions so that these utilities may be used within C programs.

2.4.1: CLOS-MOP

Eclipse implements not just the Common Lisp Object System (CLOS), but its complete MetaObject Protocol (MOP).[Kiczales] The CLOS-MOP defines a function-based interface for defining, instantiating, and accessing classes, and for defining and using generic functions and methods. It is through these MOP functions that eclipse allows object-oriented Lisp to be accessed by C, which defines only function-like interfaces to data.5

2.4.2: Initialization

Unlike C, Lisp source files can contain not only function definitions, but arbitrary data as compile-file literals, and arbitrary top-level code that is not part of a function definition. When a program or user loads a Lisp file into any Lisp implementation, the system creates the literal data, initializes it, and executes the top level code. C provides no similar mechanism.

Consider, for example, a file that contains the following function definition:

   (DEFUN MY-FUNCTION (A B) 
     (LIST A B))

DEFUN is a macro that essentially expands this function definition into something like:

   (SETF (SYMBOL-FUNCTION 'MY-FUNCTION)
     (LAMBDA (A B) 
	(LIST A B)))

When this is loaded into any Lisp implementation, the system creates a function object (a closure) that is stored in the SYMBOL-FUNCTION of the symbol MY-FUNCTION.

In Eclipse, when COMPILE-FILE generates a C file, it generates one C function for each Lisp function in the source. It also generates one extra C function, taking no arguments and returning no value. This ``initialization function'' executes the ``load-time'' code. Calling this function is semantically equivalent to loading the corresponding Lisp file. For example, the Eclipse-generated initialization function for a file containing the previously discussed DEFUN will intern the symbol MY-FUNCTION and initialize it with a closure clObject. (See Section 2.2.2.)

Before an applications calls any ``Lisp functions,'' it must first call all the initialization functions for each user-generated file, as well as those for the Eclipse library. For example:

   /* Initialize Eclipse system code. */
   clInit();       /* Initialize Eclipse run-time library. */
   /* The next line is not needed in most applications. */
   clInitD();      /* Initialize Eclipse development library. */
   /* Initialize user code. */
   usrMyFile();    /* An Eclipse-generated initialization function 
		      for "my-file.lisp" */
   /* Now Lisp can be used. */
   clPrint(clEval(clRead(clEOA), clEOA), clEOA); ...

Execution of the Eclipse-generated initialization function ensures that:

Symbols are interned.
Other compile-time and load-time data are created and initialized, including function closures for top-level functions.
The closures of top-level functions are stored in the symbols.
Any necessary environment hooks for these closures are initialized. The top-level Eclipse read-eval-print loop program is essentially a call to clInit() and clInitD(), followed by a call to the read-eval-print loop.

[Paper Contents] [Previous] [Next] [Eclipse Home Page]