C++ and Lisp

Introduction

C++ as a programming language has undergone a huge amount of evolution in the fifteen or so years of its existence. The end result of this evolution is a uniquely flexible programming language which supports a variety of programming styles—"old-fashioned" procedural programming, object oriented programming, generic programming, some aspects of functional programming, and even extending the language itself.

However, there is another, older, programming language which is equivalently flexible: Lisp. Lisp also supports a variety of different programming styles, but in a somewhat different manner to C++.

This article investigates these different approaches in a variety of areas, with two intents. Firstly, I find that it is always illuminating to see concepts from an alternative viewpoint, to be reminded that things can be done differently when a different set of constraints apply. Neither is right or wrong, just appropriate in different circumstances.

Secondly, I find the study of Lisp interesting in itself and I believe that many other programmers who have an academic interest in the tools of their trade may also do so.

I start with an introduction to Lisp from the point of view of the interpreter program as background; the sections following that each examine the two languages in the light of one particular language feature.

The Lisp Interpreter

Before we move onto the rest of the article, it is worth explaining a little about how Lisp works. For C++ programmers, this is most easily explained by describing how Lisp interpreter works.

The Lisp interpreter is constantly sitting in the 'read-eval-print' loop. This means that the interpreter reads in an expression, evaluates it, prints the result, and then returns to waiting for more input to read. The key step in this is the evaluation of each expression.

Evaluation of expressions in Lisp start simple: if an expression just contains an identifier, then the 'eval' part will return the value of that variable (or raise an error if no such variable is found).

If an expression consists of a list, namely a collection of things delineated by parentheses, then the interpreter normally performs the following steps for the 'eval' phase:

1) For the first expression in the list, find a function with that name.
2) For all of the other expressions in the list, evaluate those expressions.
3) Execute the code in the body of the function, passing as arguments the results of step 2.

The key point to remember is that step 2 of this sequence can be recursive—if one of the expressions in the list is itself a list, then this sub-list will be evaluated in the same way.

The net of this is that for a function call (fn arg1 arg2 arg3), the Lisp interpreter will evaluate each of arg1, arg2 and arg3 to find their values, and will then pass these values to the function named fn.

The thing which does have a particular status is a special form. However, the evaluation of all of the arguments of a function is not always appropriate—for example, in the following conditional statement, we would only want one of (fn1 a b) and (fn2 a b) to be evaluated.

(if (equal a b) 
   (fn1 a b)  ; call fn1 when a==b
   (fn2 a b)) ; call fn2 when a!=b

To cope with this and other constructs where the 'evaluate everything' approach is inappropriate, the Lisp interpreter has a number of special forms built in. Special forms allow different patterns of evaluation to be performed, and are so called because they are special exceptions to the standard 'eval' steps above.

The most common special form is quote, whose pattern of evaluation is: don't. That is, a list like (quote (a b c)) will be evaluated as (a b c) rather than as the function a applied to the values of b and c; this is so common that there is even a built-in piece of syntactic sugar to make it easy: '(a b c).

Special forms affect the interpreter's main 'read-eval-print' loop in that the 'eval' step for a list becomes:

0) for the first expression in the list, look for a special form with that name
0.1) if a special form is found, perform specific processing for that special form on the remainder of the list
1) if no special form is found, look for a function of that name and proceed as above

Object Oriented Programming

There are a number of factors which make up object-oriented programming, and it is interesting to compare the approaches taken by C++ and by the OO-support in Lisp, known as the Common Lisp Object System (CLOS).

In C++, the heart of object-orientation is encapsulation. All of the data, and all of the functions which operate on that data, are encapsulated together as a class. Commonly, important parts of the class data are then subject to data hiding, by use of the private keyword.

These classes can be arranged in inheritance hierarchies; the other key OO concept is then dynamic binding: if several classes in a hierarchy share a virtual function in common, then which class-specific instance of the virtual function gets called is decided at run-time, not at compile-time.

CLOS is also built around the concepts of class hierarchies and dynamic binding, but without the C++ emphasis on encapsulation. Instead dynamic binding, in the form of generic functions, takes the pre-eminent role as the driving paradigm for object-orientation.

Encapsulation

Encapsulation in C++ accomplishes a number of related objectives:

putting all of the data for an object together in one lump, so that multiple instances of such lumps can be allocated
preventing external code from accessing some parts of this data
putting all of the functions which operate on the object together in one lump, which can't be added to later
preventing the global namespace for the program being filled up.

However, these objectives don't necessarily all have to go together. In C++, namespaces also implement a subset of these encapsulation objectives—specifically objectives 3 and 4.

Unlike C++, Lisp does not have the C legacy of separate compilation. The typical C++ encapsulation of a class—with the interface defined in a single header file, and the implementation defined in a single code file—is not appropriate in Lisp, reflecting the fact that Lisp programmers rarely have to deal with binary-only libraries. This in turn reflects the facts that Lisp has traditionally been an "Open-Source" language used in academia, and that Lisp is notionally a partly-interpreted language.

Hence, the CLOS concept of a class covers objectives 1 and 4; the data-hiding objective 2 and the method-limiting objective 3 are orthogonal to the Lisp concept of encapsulation.

Consequently, it is quite legal and normal in Lisp to introduce new methods which are specialized for a particular class long after the class has been defined. In C++ terminology, CLOS is an amiable language—all code is a friend of every class.

This is just a slightly different approach, but it is useful for reminding us that object orientation can encompass a variety of styles.

Dynamic Binding and Generic Functions

When encapsulated classes are arranged in hierarchies, it is then possible to write virtual functions. These are a collection of functions with the same name, but where different functions are used depending on where this particular object fits in the class hierarchy. The important point to note is that this decision about which function to call is made at run-time.

In C++, a function can only be virtual on its "first" parameter, the object through which it is invoked. This leads to the double dispatch problem: what do you do if you want a function to be virtual in more than one parameter? In C++ practice, you have to work around this limitation by some fancy footwork (for example, the Visitor pattern from Gamma et al).

In CLOS, the name for a virtual function is a generic function. Since CLOS does not subscribe to objective 3 of the previous section, these generic functions are not tightly associated to a particular class. Consequently, a generic function can be associated to several classes; that is, it is quite possible to have an arbitrary number of virtual parameters. For example:

(defclass shape 
  ()                                    ; no superclasses
  (centre-x centre-y))                  ; member fields, known as slots

(defclass square 
  (shape)                               ; derives from shape
  (width))                              ; extra slot 

(defclass circle
  (shape)                               ; derives from shape
  (radius))                             ; extra slot

(defmethod intersects? ((a shape) (b shape))
  ;; some slow convoluted general-purpose intersection algorithm
  ;; for determining if two arbitrary shapes intersect
  )
(defmethod intersects? ((a square) (b square))
  ;; fast special case test for intersection of two squares
  )
(defmethod intersects? ((a square) (b circle))
  ;; fast special case test for intersection of a square and a circle
  )
(defmethod intersects? ((a circle) (b square))
  ;; refer the honourable member to the answer I gave earlier
  (intersects? b a))
(defmethod intersects? ((a circle) (b circle))
  ;; another fast special case test for two circles
  )

Indeed, in CLOS it is even possible to have a specialization on a particular instance of a class.

(setf whole-screen (make-instance 'square))
(setf (slot-value whole-screen 'centre-x) 500)
(setf (slot-value whole-screen 'centre-y) 500)
(setf (slot-value whole-screen 'width)    500)
;; (there are less verbose ways of doing this)

(defmethod intersects? ((a (eql whole-screen)) (b shape))
  t                                     ; all valid shapes intersect the screen
  )

This flexibility makes generic functions the mainstay of the Lisp approach to object orientation—a slightly different slant than C++'s emphasis on encapsulation.

Of course, there are disadvantages to this approach. The algorithm used by CLOS to determine which version of a generic function to invoke is complicated, particularly in the presence of multiple inheritance. However, the fact remains that Lisp has a more powerful approach in this area—and similar mechanisms have been suggested for inclusion into C++ (see section 13.8 of Stroustrup's "The Design and Evolution of C++").

Generic Programming: Implicit and Explicit Interfaces

The most important addition to the C++ language since its initial debut is the support for templates. Templates have proved to be an extraordinarily powerful tool, allowing a huge amount of re-use of carefully written code.

The key advantage that templates have over encapsulating code in a base class (cf. Java's Object class) is that they have an implicit rather than explicit interface.

To illustrate this, consider a function which spins though an array of an arbitrary type, looking for a particular matching entry. To perform the comparison test on a particular entry, an equiv member function is required. In pre-template C++ (or in Java), we might produce:

class Comparable {
public:
  virtual bool equiv(const Comparable&) = 0;
};

// an example of a concrete implementation of this abstract
// base class
class ComparableInt : public Comparable {
  int x;
public:
  virtual bool equiv(const int& y) {return(x==y);}
};

// one version of the code, regardless of what type the
// object is
bool find_entry(Comparable *pList, int numInList, const Comparable& find)
{
  for (int ii=0; ii<numInList; ii++)
    if (pList[ii].equiv(find))
      return(true);
  return(false);
}

This approach requires anything passed to the find_entry function to be a type derived from Comparable. That is, anything passed into find_entry has an explicit interface—namely that specified by the Comparable abstract base class (in Java, such abstract base classes are sensibly called interfaces).

The templatized version of this function is rather simpler:

template <class T>
bool template_find_entry(T *pList, int numInList, const T& find)
{
  for (int ii=0; ii<numInList; ii++)
    if (pList[ii].equiv(find))
      return(true);
  return(false);
}

Here, the template_find_entry function can be used for any type T which has an equiv member function. The type T does not have to derive from any specific class; it just has an implicit interface: if the type fits, use it.

The Ruby and Python folks refer to this idea as duck typing: if it walks like a duck, and talks like a duck, then it might as well be a duck.

For this simple example, it is difficult to see the advantage of the templatized approach. However, once the size of the interface gets larger, the advantage becomes more distinct. If the Comparable base class had 15 pure virtual functions, then any type passed into find_entry would have to implement all 15 of those functions. But for template_find_entry, the type T still only needs to provide an equiv member function, and no others, in order to compile and work correctly.

This approach leads to the enormous power of the standard template library, where a few conventions about consistent naming of parts lead to a large collection of interoperable, interchangeable, efficient containers, iterators and algorithms.

However, this templatizing approach is a solution for a problem which simply does not arise in Lisp. Lisp is dynamically typed (or latently typed) rather than statically typed (or manifestly typed) like C++.

What this means is that in Lisp, variables do not have types, only values do. That is, a variable myvar can have a value which is an integer at one point in the program, and a value which is a string later in the execution of the program (many scripting languages have the same behaviour).

The "implicit interface" model described above falls out for free as an immediate consequence of this. The algorithm above can be might be written in Lisp as follows (substituting recursion for iteration).

(defun find-entry (list-of-things thing-to-find)
  (dolist (ii list-of-things)              ; iterate ii through the list
          (if (equiv ii thing-to-find)     
              (return-from find-entry t))) ; found it, return t (true)
  nil)                                     ; not found, return nil (false)
;; Note: in Lisp, this would more commonly be done by recursion than iteration

In this code, there is no mention of types. The equiv function is called for each test with the values found in list-of-things and with thing-to-find. If the values which are passed to the equiv function are of a type which cannot be dealt with (for example, if one value is a string and the other is a complex number) then this will be dealt with at runtime—either by signalling an error, or by quietly returning false for any such comparison. As for the templatized C++ version, if the code as written works with the things passed to it, no more work on the part of the programmer is required.

Generic programming involves writing your algorithms in such a way that they rise above the concerns of specific data types. This is a big deal in a statically-typed language such as C++, but in a dynamically-typed language like Lisp, all code can be written this way.

Function Objects

The Evolution of Function Objects

One of the key features of C++ which I use extensively is the ability to pass around function objects (also known as functors).

These are an invaluable solution to a number of scenarios. Suppose you need to perform some operation on a piece of data, but you won't know exactly which operation until runtime. For example, sometimes you might need to apply the "add 2 to the given integer" operation, and sometimes you might need to apply the "multiply the given integer by 3" operation.

For purely procedural languages such as FORTRAN and Pascal, the only way to deal with this situation was to pass around some form of identifier for the operation to be performed, and have an operate() function which contained a big switch statement to deal with all of the possibilities, decided in advance.

The C programming language provided a great leap forwards, by allowing the use of function pointers. Programmers could write code like this:

typedef int (*operate_on_int_fn)(int);

void f(operate_on_int_fn pFn, int i)
{
  int j = (*pFn)(i);
}

But this wasn't quite enough. What if you needed to apply the "add N to the given integer" operation, where N is only determined at run-time?

Of course, this is possible in C. All things are possible, except skiing through a revolving door. However, the alternatives are ugly. If we want to continue using the previous declaration of operate_on_int_fn, then we need a global variable, set up in advance:

int N_to_add_with = 0;
int add_N_fn(int i)
{ 
  return(i+N_to_add_with);
}

N_to_add_with = 3;
f(add_N_fn, 7);

Alternatively, if we adjust the declaration of operate_on_int_fn to take an extra parameter (even when it might not always be needed) we can use:

typedef int (*operate_on_int_fn_with_param)(int i, int N);

/* example of a function which needs the extra parameter */
int add_N_fn(int i, int N)
{ 
  return(i+N);
}

/* example of a function which does not use the extra parameter */
int do_nothing_operator(int i, int N /*unused*/)
{
  return(i);
}

void f(operate_on_int_fn_with_param pFn, int N, int i)
{
  int j = (*pFn)(i, N);
}

f(add_N_fn, 7, 3);
f(do_nothing_operator, 7, 0 /*value irrelevant*/);

Both of these alternatives are ungainly, and are likely to be difficult to maintain.

The C++ equivalent using a function is much cleaner:

class Operator {
public:
  virtual int operator()(int i) const = 0;
};

class AddNOperator : public Operator {
  int N;
public:
  AddNOperator(int n) : N(n) {}
  virtual int operator()(int i) const {return i+N;}
};

void f(const Operator &Op, int i)
{
  int j = Op(i);
}

f(AddNOperator(3), 7);

This version illustrates all of the code required; for real code, however, we can use the C++ standard library's templatized base class unary_function, instead of the hand-rolled Operator class as follows:

#include <functional>

template <int N>
struct AddNOperator : public unary_function<int, int> {
  int operator()(int i) const {return i+N;}
};

template <class T>
void f(const T &op, int i)
{
  int j = op(i);
}

f(AddNOperator<3>(), 7);

With this version of the code, we can now easily add new operators that can be passed to the function f.

This paradigm of treating chunks of code as another piece of data to be manipulated and passed around is built into the very heart of Lisp. Lisp treats code and data on an equivalent level, by the use of lambda expressions.

The following Lisp code defines a simple function which returns its argument plus one.

(defun addone (x) (+ 1 x))

With this definition, the expression (addone 5) will return 6.

However, we can also get the same result with:

(setf (symbol-function 'addone) #'(lambda (x) (+ 1 x)))

That is, the value (in a function context) of the addone variable is a nested list, where the first entry in the top-level list is the special marker symbol lambda, the second entry is the list of arguments, and the third entry is the body of the function.

As a consequence of this, the scenarios described in the C++ examples above become:

(defun f (op i)
  (funcall op i))               ; call the functional value of op with arg i and return the result

(f 'addone 7)                   ; returns 8
(defun addn (n)                 ; returns a lambda expression
  (lambda (x) (+ x n)))
(f (addn 3) 7)                  ; returns 10

Here, addn is a defined as function which returns a nested list, the first element of which is the lambda marker. That is, (addn n) returns a function which adds n to its argument. The funcall operator is a special operator which treats its first argument as a function, and calls it with the remainder of the parameters.

Obviously, this mechanism is more powerful than the function objects available in C++. If your code is sophisticated enough, you can build up any function object at all at runtime, rather than just the parametrized function objects that you have decided upon at compile time. This uniformity of Lisp's treatment of code and data is one of the keys to its strengths.

However, it is worth pointing out that it is unusual for this level of flexibility to be required.

Extending the Language

The final example concerns one of the key features of C++, namely its ability to allow a programmer to extend the language itself by the use of overloaded operators. This facility has been present in C++ since the beginning, and is one of the key differentiating features between C++ and other object-oriented programming languages.

A canonical example of the use of this feature is for complex numbers. Unlike FORTRAN, C++ currently has no built-in support for complex numbers; however, in C++ it is easy to add support for complex numbers in a manner which is transparent for future programmers. This support is even included in the C++ standard library, in the header file <complex>.

However, I include the bare bones of an illustrative equivalent here:

class mycomplex {
  double re, im;
public:
  mycomplex(double r=0, double i=0) : re(r), im(i) {}
  double real() const {return re;}
  double imag() const {return im;}
  
};
inline mycomplex operator+(const mycomplex &c1, 
                           const mycomplex &c2) 
{return mycomplex(c1.real() + c2.real(), 
                  c1.imag() + c2.imag());}

This class allows the transparent use of complex numbers in a program as follows:

#include <mycomplex.h>

mycomplex c1(0,1);
mycomplex c2(1,0);
mycomplex c3;

c3 = c1 + c2;

This is the one of the simplest example; there are many other examples of overloading operators to extend the language itself (such as the iostream libraries use of the << and >> operators, and various "smart pointer" implementations).

Lisp has a considerable advantage for allowing extension of the language, in that it is (at least theoretically) an interpreted language. In fact, such extensions of the programming language itself are an area in which Lisp excels—one of the common uses for Lisp is to prototype new programming languages.

To understand this facility, we need to return to the explanation of how the Lisp interpreter works from the first section. There, we noticed that there were special forms which allowed the normal rules for evaluation of expression to be altered.

Lisp allows the programmer to bridge the gap between defining a function and adding a new special form through the macro facility—which is similar to but more sophisticated than C's macro preprocessor.

To illustrate this we start simply, with a macro which just squares its argument.

(defmacro sqr (x) (list '* x x))

Exactly as in C/C++, this macro form behaves differently from the equivalent version expressed as a function

(defun sqr-f (x) (* x x))

in that the argument will be evaluated twice for the macro, but only once for the function.

Under the covers, the interpreter will deal with an expression like (sqr x) by replacing the expression with the body of the macro, and re-evaluating the new expression.

To increase the utility of macros, you need to take advantage of the backquote. As for the quote special form described above, the backquote prevents its argument from being evaluated. However, anything in a backquoted expression which is prefixed with a comma will be evaluated—so if x has value 1, then `(x is ,x) will yield (x is 1). Further, if something in a backquoted expression is prefixed with ,@ then that thing will be evaluated, assumed to be a list, and unravelled. Hence:

(setf xl '(x y z))
`(xl has entries ,@xl)

will yield (xl has entries x y z).

The net of all this is that it is easy to add new control structures to the language—or to prototype entire new languages. For example, to add a while control structure, we just need:

(defmacro while (test &rest body)       ; &rest is like ... in C - body gets set
  `(do ()                               ; to all remaining parameters, in a list
       ((not ,test))
       ,@body))

When the Lisp interpreter encounters the while in:

(setf ii 0)
(while (< ii 10)
  (princ ii)                            ; print out ii
  (incf ii))                            ; increment ii

it first notes that test is (< ii 10) and body is ((printc ii) (incf ii)). Then it transforms the expression into (do () ((not ,test)) ,@body)). Finally, the evaluation of the comma-prefixed terms converts this to (do () ((not (< ii 10))) (princ ii) (incf ii))), which in turn is re-evaluated and behaves in the desired way.

The key difference here from the C preprocessor is that the macro writer has a choice between evaluating the macro arguments when the macro is expanded, or when the expanded version is itself evaluated.

One consequence of this flexibility is that most of the features of Lisp which involve non-standard evaluation patterns are themselves macros; there are only 25 or so completely built-in special forms in Lisp; there are four times as many macros.

The facilities in C++, even combined with obfuscated macro trickery, still do not approach this level of flexibility. Given how difficult it can be already to support C++, this may be a Good Thing. . .

Where Next?

My intent with this article has been to show C++ programmers that there are other approaches and points of view which can be useful and interesting. Some of these alternatives have influenced the development of C++ itself; some of them may yet influence the future development of C++.

My own personal candidate for a feature from Lisp which could be usefully employed in C++ is automatic garbage collection. Hunting memory leaks has long been a traditional C/C++ programmer's pastime for whiling away winter evenings. It's past time for this to change, and Java has been leading the way in this direction.

It is also possible to combine the flexibility and convenience of Lisp with C/C++'s efficiency and type safety by using the two languages in parallel.

In this manner, a C/C++ program can provide the core functionality for a program in an efficient manner, while still allowing the user to extend the software with their own scripts and enhancements.

An example of this is Guile, which is now the official GNU extension language. Guile provides an interpreter for Scheme (a dialect of Lisp) as a shared library, together with a comprehensive interface for calling functions and passing data between C and Scheme. GNU Emacs is another famous effective example of such a symbiosis.

Programming language comparisons are usually pointless exercises; however, there is no reason not to take the best of both worlds.