SECD Virtual Machine 


An interesting approach to programming language implementation is 
to compile a programs of a language to the code of a virtual machine. 
A virtual machine is not a real, hardware machine. It has its own machine 
instructions, and eventually it needs to be implemented on a real machine. 

The idea of compiling a language into a virtual machine is very old. 
However, more recently this idea has become popular in some object-oriented
languages such as Java, where a program is compiled to what is called 
byte code, which is the language of a virtual machine called 
Java virtual machine. The purpose of Java virtual machine 
is portability. If you want your computer (any computer) to run Java programs,
all you need to do is to implement an interpreter that runs on the byte code.
This is especially important to web applications for which one basically
cannot assume what type of machines will run your code. 

Therefore, although the technical details in this lecture are not directly
related to Java virtual machine, the idea is important.


1. SECD Architecture

Purpose: 
   To execute the complied code on an abstract machine.

Difference from an interpreter: 
   The compiled code can be executed many times and 
code optimization is also possible. 

The SECD machine is built using four stacks:

s    Stack used for evaluation of expressions
e    Environment used to store the current value list
c    Control used to store the instructions
d    Dump used to store suspended invocation context.


The main operations that are supported by SECD machine are:

OP's    Description                Purpose
______________________________________________________________
NIL     push a nil pointer
LD	load from the environment /* get a value from context*/
LDC     load constant
LDF     load function             /* get a closure */

AP 	apply function
RTN     return                    /* restore calling env */

SEL     select in if statement
JOIN    rejoin main control       /* used with SEL */

RAP     recursive apply  
DUM     create a dummy env        /* used with RAP */
_______________________________________________________________

Plus builtin functions +,  *, ATOM, CAR, CONS, EQ, etc.



Each operation is defined by its effect on the four stacks

      s e c  d -> s' e' c' d'

Each stack is represented by an s-expression with the dot 
notation where the leftmost position denotes the top of the 
stack. 



A.  Push Objects to Stack 


Compilation Rules:
    (a) A nil is compiled to (NIL);
    (b) A number (or a constant) x is compiled to (LDC x);
    (c) An identifier is compiled to (LD (i.j))
        where (i.j) is an index into stack e.


Stack Operations:

NIL	s e (NIL.c) d        ->  (nil.s) e c d
LDC	s e (LDC x.c) d      ->  (x.s) e c d
LD 	s e (LD (i.j).c) d   ->  (locate((i.j),e).s) e c d

"Locate" is an auxiliary function. It returns the jth element of 
the ith sublist in e. 

Note: roughly, e is a list of sublists each of which is a list of 
      actual parameters.   Thus, e corresponds to the value list in
      our interpreter. There will be no name list here, as each occurrence
      of a formal parameter will be compiled to LD (i.j) and by 
      locate(i.j) the corresponding actual parameter is found.

      More details will be given soon.
      

B.  Built-in Functions

Compilation Rule: 
  A builtin function of the form
	(OP e1 ... ek)
is compiled to 
        ek' || ... || e1' || (OP)
where A||B means append(A,B) and ei' is the compiled code for ei.

The notation (of reversing arguments and operator)
is the standard reverse Polish notation.


Example: 
    (* (+ 6 2) 3) is compiled to (LDC 3 LDC 2 LDC 6 + *).

To perform an operation is to pop up the front element(s) 
from s, perform the operation, and put the result back to s.

For a unary operator OP,
       (a.s) e (OP.c) d   ->   ((OP a).s) e c d

For a binary operator OP,
       (a b.s) e (OP.c) d   ->   ((a OP b).s) e c d



Example. 
    s e (LDC 3 LDC 2 LDC 6 + *.c) d
-> (3.s) e (LDC 2 LDC 6 + *.c) d
-> (2 3.s) e (LDC 6 + *.c) d
-> (6 2 3.s) e (+ *.c) d
-> (8 3.s) e (*.c) d 
-> (24.s) e c d 



Example. 

Suppose we have e = ((1 3) (4 (5 6))). We will explain how we get such an e
later.

s	  ((1 3) (4 (5 6))) (LD (2.2) CAR LD (1.1) ADD.c) d   ->
((5 6).s) ((1 3) (4 (5 6))) (CAR LD (1.1) ADD.c)          d   ->
(5.s)     ((1 3) (4 (5 6))) (LD (1.1) ADD.c)              d   ->
(1 5.s)   ((1 3) (4 (5 6))) (ADD.c)                       d   ->
(6.s)     ((1 3) (4 (5 6))) c                             d   ->




C. The special function IF_THEN_ELSE

Compilation Rule: 
      (if e1 e2 e3) 
is compiled to 
     e1' || (SEL) || (e2' || (JOIN)) || (e3' || (JOIN))

Example.  (if (atom 5) 9 7) is compiled to  
          (LDC 5 ATOM SEL (LDC 9 JOIN) (LDC 7 JOIN))


Stack Operations:
SEL 	(x.s) e (SEL ct cf.c) d   ->  s e c' (c.d)
                 where c' = ct if x is T, and cf if x is F
JOIN 	s e (JOIN.c) (cr.d)  ->  s e cr d

Example. 
s     e  (LDC 5 ATOM SEL (LDC 9 JOIN) (LDC 7 JOIN).c)   d ->
(5.s) e  (ATOM SEL (LDC 9 JOIN) (LDC 7 JOIN).c)         d ->
(T.s) e  (SEL (LDC 9 JOIN) (LDC 7 JOIN).c)              d ->
s     e  (LDC 9 JOIN)                               (c.d) ->
(9.s) e  (JOIN)                                     (c.d) ->
(9.s) e  c                                              d ->



D.  Nonrecursive Functions

Compilation Rule:

A lambda function (lambda plist body) is compiled to 
     (LDF) || (body' || (RTN))
where body' is the compiled code for body.

Example. (lambda (x y) (+ x y)) is compiled to
               (LDF (LD (1.2) LD (1.1) + RTN))


We will explain why LD (1.2) and LD (1.1) a little later.


A function application (e e1 ... ek) is compiled to
(NIL) || ek' || (CONS) || ... || e1' || (CONS) || e' || (AP)


Stack Operations:

LDF 	s e (LDF f.c) d            ->  ((f.e).s) e c d
AP  	((f.e') v.s) e (AP.c) d    ->  NIL (v.e') f (s e c.d)
RTN	(x.z) e' (RTN.q) (s e c.d) ->  (x.s) e c d


Example. ((lambda (x y) (+ x y)) 2 3) compiles to
      (NIL LDC 3 CONS LDC 2 CONS LDF (LD (1.2) LD (1.1) + RTN) AP)

When this is executed:

--    The code
           NIL LDC 3 CONS LDC 2 CONS 
      collectively loads the list (2 3) into the s stack.

--    LDF moves the compiled code of the body (the symbol f in Stack Op's),
      along with the current context e into s

--    AP makes it ready: executing the body (the code in f) in the extended
      context (v.e') where v is the list of (fully evaluated) actual
      parameters.


The reason for 
         LD (1.2) LD (1.1) + 

We are compiling (+ x y). According to the formal parameter list in 
(lambda (x y) ...), any occurrence of y in the body should get its 
actual parameter in the second position in the parameter list, 
this is why 2 in LD (1.2). Since there is no other parameter list 
pushed into e, (2 3) is the last and thus first (from the left), this 
is why 1 in (1.2). Similar for LD (1.1).



Example.  Let BODY below denote (LD (1.2) LD (1.1) + RTN).

s e (NIL LDC 3 CONS LDC 2 CONS LDF BODY AP.c)     d   ->
(nil.s) e (LDC 3 CONS LDC 2 CONS LDF BODY AP.c)   d   ->
(3 nil.s) e (CONS LDC 2 CONS LDF BODY AP.c)       d   ->
((3).s) e (LDC 2 CONS LDF BODY AP.c)              d   ->
(2 (3).s) e (CONS LDF BODY AP.c)                  d   ->
((2 3).s) e (LDF BODY AP.c)                       d   ->
((BODY.e) (2 3).s) e (AP.c)                       d   ->
NIL ((2 3).e) BODY                        (s e c.d)   =  
NIL ((2 3).e) (LD (1.2) LD (1.1) + RTN)   (s e c.d)   ->
(locate((1.2),((2 3).e)).NIL) ((2 3).e) (LD (1.1) + RTN) 
                                        (s e c.d) = 
(3.NIL) ((2 3).e) (LD (1.1) + RTN)        (s e c.d)   ->
(locate((1.1),((2 3).e)) 3.NIL) ((2 3).e) (+ RTN) (s e c.d)=   
(2 3.NIL) ((2 3).e) (+ RTN)               (s e c.d)   ->   
(5.NIL) ((2 3).e) (RTN)                   (s e c.d)   ->   
(5.s) e c d




Since an let expression has the same semantics of an 
application, the compilation is essentially the same:

(let (x1 ... xk) (e1 ... ek) exp) 
is compiled to 
(NIL) || ek' || (CONS) || ... || e1' 
             || (CONS LDF) || (e' || (RTN)) || (AP)




E.  Recursive Functions (Optional)

Compilation Rule:

     (letrec (f1 ... fk) (e1 ... ek) exp) 
is compiled to 
     (DUM NIL) || ek' || (CONS) || ... || e1' 
                      || (CONS LDF) || (exp' || (RTN)) || (RAP)

This is exactly the same as let, except DUM and RAP. It is 
also like an application
	((lambda (f1 ... fk) exp) e1 ... ek)
but permits recursion.


Stack Operations:

DUM	s e (DUM.c) d  ->  s (W.e) c d
           where W has been called PENDING earlier
 
RAP  	((f.(W.e)) v.s) (W.e) (RAP.c) d  
		->  nil rplaca((W.e),v) f (s e c.d)


We know that a lambda function ei is to be compiled to 
	(LDF (... code for ei ... RTN))

As a whole, a letrec expression is compiled to something
that looks like the following code:
        (DUM NIL LDF (... code for the body of ek ... RTN) CONS
			......
		 LDF (... code for the body of e1 ... RTN) CONS
                 LDF (... code for the body of e1 ... RTN) CONS
		 LDF (... code for exp .. RTN) RAP)
	
When executed, DUM creats a dummy structure (W.e). The purpose 
is to have a dummy car part which will be later redirected to 
form a circular structure.

NIL in the compiled code above is used again to build a new 
sublist for the values corresponding to the names (f1 ... fn).  
For each actual parameter ei, LDF pushes a closure onto S, 
including the code of ei and the environment (W.e) at that 
moment.  Let's denote Cei for the code  of ei and Ce for the 
code of exp, we then have the following S before ARP:
	S = ((Ce.(W.e)) v.s)
where v is the arg list which is ((Ce1.(W.e)) ... (Cek.(W.e))).
The cdr's of all these elements in S point to the same structure

	_______
      ->|  |  | -> e
	-------
         |
	 W

In executing RAP, the pointer to W is redirected to v by rplaca. 
This results in a circular structure, i.e., the context for 
each Cei is to the arg list v itself, followed by old context
e. This is how recursive calls can be executed: if in 
executing Cei, there is a reference to fj (recall that an 
identifier is compiled to the form (LD (n.j))), the index pair
(n.j) will lead to the jth value in v (n is used to identify this v)
whose code is Cej and context has v as the car part and e as the 
cdr part.




F.  Generating Indices for Identifiers

  Let's explain this by an example.

  Example. 
	((lambda (z) ((lambda (x y) (+ (- x y) z)) 3 5)) 6)

  In this example, x, y and z in the expression (+ (- x y) z) 
  are compiled respectively to

            (LD (1.1)), (LD (1.2)), (LD (2.1)) 
    
  This is because the indices are generated according to the order
  in which functions are calls. In this case, the function 
  (lambda (z) ...), the outer function, is called first. The 
  variable z is a global variable to the inner function 
  (lambda (x y) ...), and thus the binding z to 6 should be pushed 
  into the e stack earlier.


  Example.  Compiling 
      ((lambda (z) ((lambda (x y) (+ (- x y) z)) 3 5)) 6) 
  into SECD code

  (e e1 ... ek) (a function application) is compiled to
      (NIL) || ek' || (CONS) || ... || e1' || (CONS) || e' || (AP)

  Thus, we should have 
      (NIL) || (LDC 6) || (CONS) || e' || (AP)
  where e' is the compiled code for the function 
      (lambda (z) ((lambda (x y) (+ (- x y) z)) 3 5))

  This is a lambda function which should be compiled to 
      (LDF) || (body' || (RTN)). 
  Thus, we have  
      e' =  (LDF) || (body' || (RTN)) 
  where body' is the compiled code for 
      ((lambda (x y) (+ (- x y) z)) 3 5)
  i.e.,
      body' = (NIL) || (LDC 5) || (CONS) || (LDC 3) || (CONS) 
              ||  exp' || (AP)
  where exp' is the compiled code for (lambda (x y) (+ (- x y) z)),
  i.e.,
      exp' = (LDF) || (body1' || (RTN))  
  where
      body1' = (LD (2.1) LD (1.2) LD (1.1) SUB ADD)

  We then get the compiled code for the whole expression by 

  (a) substitute body1' into exp' to get

  exp' = (LDF 
            (LD (2.1) LD (1.2) LD (1.1) SUB ADD RTN)
         )

  (b) substitute exp' into body' to get

  body' = (NIL LDC 5 CONS LDC 3 CONS 
              LDF 
              (LD (2.1) LD (1.2) LD (1.1) SUB ADD RTN)
              AP
          )

  (c) substitute body' into e' to get

  e' =  (LDF  (NIL LDC 5 CONS LDC 3 CONS 
               LDF 
               (LD (2.1) LD (1.2) LD (1.1) SUB ADD RTN)
               AP
               RTN)
        )



  Thus, the original expression is compiled to 

  (NIL LDC 6 CONS LDF  
                   (NIL LDC 5 CONS LDC 3 CONS 
                     LDF 
                      (LD (2.1) LD (1.2) LD (1.1) SUB ADD RTN) 
                     AP 
                    RTN) 
                  AP
  )