Saturday, March 29, 2008

Concepts, Techniques and Models of Computer Programming

In the book Concepts, Techniques and Models of Computer Programming (CTM), Peter van Roy and Seif Haridi build a programming language based on a very small kernel language. The kernel language is extended as the models needs become more complex but it remains simple and understandable. The language used in the book is called Oz and it is implemented by the Mozart Programming System.

The models are described by three different views:
  • The computation model; the language and how sentences of the language are executed by the abstract machine.
  • The programming model; the techniques and design principles used to write programs.
  • Reasoning techniques; techniques for reasoning about the program to increase confidence that they work as expected.

The Kernel Language

The full kernel language, supporting the models is shown below.

<Statement> ::=

% Statements needed in declarative model
    <Statement1> <Statement2>           % Statement sequence
    | X = <number>|<atom>|<boolean>     % Variable and value

    | X = f(I1:Y1 ... In:Yn)|           % Record and tuple
    | X = Y                             % Variable variable binding
    | local X1 ... Xn in S1 end         % Variable declaration

    | proc {X Y1 ... Yn} S1 end         % Procedure declaration
    | {X Y1 ... Yn}                     % Procedure application (call)
    | if B then S1 else S2 end          % Conditional

    | case X of pat then S1 else S2 end % Pattern matching
    | {NewName X}                       % Creates a new locally unique name


% Statements needed in declarative model with exceptions
    | try S1 catch X then S2 end        % Try a statement and catch the possible exception
    | raise X end                       % Raise an exception


% Statements needed in concurrent declarative model
    | thread S1 end                     % Run statement in new thread.

% Statements needed in stateful model
    | {NewCell I C}                     % Creates a cell C with content I
    | {Exchange C Old New}              % Replaces the value of C with New; Old is returned




The Practical Language

Apart from extending the kernel language, the practical language of Oz is also extended by adding linguistic abstractions and syntactic sugar.

A linguistic abstraction is a new linguistic construct such as a function fun instead of a proc. This construct is both an abstraction and an addition to the language.

fun {Max X Y}
    if X>=Y then X else Y end

end

% Translates into:
proc {Max X Y ?R}
    R = if X>=Y then X else Y end

end

Syntactic sugar on the other hand does not involve a new abstraction. It simply allows us to write the same statements in a more convenient and readable way An example is:

if N==1 then [1]
else
    local L in

    ...
    end
end

% Translates into:
if N==1 then [1]
else L in

    ...
end

Both are added by means of transformations into the kernel language. While building the language all the transformations are shown in detail making it very clear what is going on.

The Declarative Model

The basic model of CTM is called the declarative model. Van Roy and Haridi defines a declarative model as evaluating functions over partial data structures. It covers both functional programming as in Scheme without state and logic programming as in Prolog without search.

The Abstract Machine

The abstract machine of the declarative model consists of: A single-assignment store, a value-store extended with single-assignment or dataflow variables. An environment, a mapping from variable identifiers to entities in the store. A semantic stack, containing pairs of statements in execution and the associated environment.

When a program is executing different values are added to the stack, the store and the environment according to the specified semantics of the statement. A thorough walk-through is done in the book.

Dataflow Variables

The single-assignment store may contain variables that are unbound, dataflow variables. This means that they can be used as out-parameters from procedures and partial values in records. It is also possible to bind two variables to each other. When one becomes bound then all variables see that binding.

When an unbound variable is accessed before it is bound execution waits until the variable is bound and then continues. This will enable both concurrent declarative programming and relational programming later in the book.

Practical Language

To make the language practical several syntactic conveniences are added to the kernel language:
  • Nested partial values: person(name:"George" age:25) instead of local A B in A="George" B=25 X=person(name:A age:B) end
  • expr1 andthen expr2 translates into if expr1 then expr2 else false end
  • expr1 orelse expr2 translates into if expr1 then true else expr2 end
  • if and case statements can be nested concisely.
  • Statements can be turned into expressions using a nesting marker $: proc {name} skip end translates into name = proc {$} skip end
  • Records are written animal(name:”Tapir” age:77)
  • Tuples are records with integer values animal(1:”Tapir” 2:77) may be written as animal(“Tapir” 77)
    • Anonymous tuples are commonly written with infix notation with # as the separator
    • ’#’ (“Tapir” 77) equals “Tapir”#77
  • Lists are either the atom nil or a tuple ’|’ (H T).
    • In infix that is H|T.
    • 1 | 2 | 3 | nil may be abbreviated [1 2 3]

Procedures are selected over functions since they are more flexible. They allow multiple outputs or none. In combination with the strong support for records, procedures will later enable modules and object oriented programming through simple transformations.

When adding functions as linguistic abstractions, notice how the dataflow variable R is unbound until inside the procedure. This is impossible if using a normal value store where the variables are bound at definition time.

% Functions as linguistic abstractions
fun {Max X Y}
    if X>=Y then X else Y end

end

% Translates into:
proc {Max X Y ?R}
    R = if X>=Y then X else Y end

end

Loops as linguistic abstractions (definitions of the procedures inf Programming Techniques below):

% The integer loops
for I in A..B do statements end

% and
for I in A..B;S do statements end

% translates into (with S=1 in the first case)
{For A B S proc {$ I} statements end}



% The list loop
for X in L do statements end

% translates into
{ForAll L proc {$ X} statements end}


Programming Techniques

Since the declarative model is similar to the functional (stateless) model, standard functional programming techniques such as higher order functions and recursion is used.

Iteration

Iteration is done with recursion:

% Standard iteration
fun {Iterate S IsDone Transform}
    if {IsDone S} then S
    else S1 in

        S1={Transform S}
        {Iterate S1 IsDone Transform}
    end
end

% A list loop
proc {ForAll L P}
    case L
    of nil then skip
    [] X|L2 then

        {P X}
        {ForAll L2 P}
    end
end

% An integer loop
proc {For A B S P}
    proc {LoopUp C}
            if C=<B then {P C} {LoopUp C+S} end

        end
    proc {LoopDown C}
        if C>=B then {P C} {LoopDown C+S} end
    end

in
    if S>0 then {LoopUp A} end
    if S<0 then {LoopDown A} end

end

Difference Lists

Another technique that is enabled by choosing single-assignment variables is programming with difference lists. A difference list is a 2-tuple where the first element is a list and the second element is a tail of the first list.

  • nil#nil % Represents the empty list
  • [a]#[a] % so does this
  • X#X % and this
  • (a|b|c|X)#X % Represents [a b c]
  • (a|b|c|d|X)#(d|X) % so does this
  • [a b c d]#[d] % and this

Notice how unbound variables can be placed at the end of the lists. This enables some very cool features. To append (a|b|c|X)#X and (d|e|f|Y)#Y, just bind X to (d|e|f|Y). This creates the difference list (a|b|c|d|e|f|Y)#Y. We have just appended the lists [a b c] and [d e f] with a single binding. Here is a function that appends any two difference lists:

% Function to append to difference lists
fun {AppendD D1 D2}
    S1#E1=D1
    S2#E2=D2
in
    E1=S2
    S1#E2
end

% It can be used like this (Browse means evaluate in Oz)
local X Y in {Browse {AppendD (1|2|3|X)#X (4|5|Y)#Y}} end

Constant time append versus linear time of the first list with a value store. Nice!

Abstract Data Types

Simple implementations of ADT leak implementation details to the user. This makes it possible to preform unsafe operations.

% Simple implementation of a stack.
fun {NewStack} nil end
fun {Push S E} E|S end
fun {Pop S E} case S of X|S1 then E=X S1 end end

fun {IsEmpty S} S==nil end

One way to avoid this is by using keys for accessing the ADT. Oz introduces the concept of names, created with {NewName}, to create tokens that are unique within the system. {NewName} is not declarative, since it returns a new value every time it is called, but programs using it may still behave declaratively.

% A procedure for creating a wrap/unwrap pair, for protecting values.
proc {NewWrapper ?Wrap ?Unwrap}
    Key={NewName}
in
    fun {Wrap X}
        fun {$ K} if K==Key then X end end

    end
    fun {Unwrap W} {W Key} end
end

% Protect the value
ProtectedValue={Wrap Value}

% Retrieve the value
Value={Unwrap ProtectedValue}


% A secure version of the stack

local Wrap Unwrap in
    {NewWrapper Wrap Unwrap}
    fun {NewStack} {Wrap nil} end
    fun {Push S E} {Wrap E|{Unwrap S}} end

    fun {Pop S E}
        case {Unwrap S} of X|S1 then E=X {Wrap S1} end
    end
    fun {IsEmpty S} {Unwrap S}==nil end

end

Programming in the large

Oz supports structuring programs by modules and functors. A module is implemented as a record with procedures. Only the interface is put into the export record.

declare MyList in
local
    proc {Append ... } ... end
    proc {MergeSort ...} ... end

    proc {Sort ... } ... {MergeSort ...} ... end
    proc {Member ...} ... end
in
    MyList=´export´(append:Append sort:Sort member:Member)

end

Using procedural abstraction this module can be turned into a software component or functor.

fun {MyListFunctor}
    proc {Append ... } ... end

    proc {MergeSort ...} ... end
    proc {Sort ... } ... {MergeSort ...} ... end
    proc {Member ...} ... end

in
    ´export´(append:Append sort:Sort member:Member)
end

Every time MyListFunctor is called it creates and returns a new MyList module.
  • A functor definition can be evaluated at runtime, giving a functor.
  • A functor can have external references to other entities through closures.
  • A functor can have arguments, which are the other modules needed.
  • A functor is lightweight; it can be used to encapsulate a single entity.

A linguistic abstraction is added to the practical language.


functor
export
    append:Append
    sort:Sort
    member:Member
define
    proc {Append ... } ... end
    proc {MergeSort ...} ... end

    proc {Sort ... } ... {MergeSort ...} ... end
    proc {Member ...} ... end
end

Apart from {NewName}, everything up to now has been declarative, that is without side-effects. To enable programming with functors we need to be able to load functors from multiple files. To do this we need IO.

Values in Oz can be stored and loaded with the Pickle module. Note that loading takes a URL and hence can be loaded over the network.

{Pickle.save X FN} % Save X in file FN
{Pickle.load FNURL ?X} % Load X from file (or URL) FNURL


And since procedures are values and functors procedures, they to can be saved and loaded. Since modules are so common additional functions exists in the Module module.

% Loads the MyList module into the variable MyListModule
MyListModule = Module.link [´MyList.ozf´]}

% System modules are loaded from a known path while others may be loaded explicitly.
functor
import
    Browser
    FO at ´file:///home/mydir/FileOps.ozf´
define

    {Browser.browse {FO.countLines ´/etc/passwd´}}
end

The Pickle module is also used when implementing distributed computing in a very elegant way in the chapter about distribution.

The Other Models

Van Roy and Haridi investigates many models apart from the declarative model: The concurrent declarative model, the concurrent message-passing model, the stateful model, the object-oriented model, the stateful concurrent model, the relational model, the distributed model and the constraint programming model. Here is quick overview of some of the other models:

The Concurrent Declarative Model

The abstract machine of the declarative model is extended to include multiple semantic stacks instead of just one. Nothing else is changed. The kernel language is extended with the statement: thread S1 end.

thread
    proc {Count N} if N>0 then {Count N-1} end end

in
    {Count 1000000}
end

Streams are also implemented in the concurrent declarative model with the use of flow variables.

fun {Generate N Limit}
    if N<Limit then

        N|{Generate N+1 Limit}
    else nil end
end
fun {Sum Xs A}
    case Xs
    of X|Xr then {Sum Xr A+X}
        [] nil then A
    end

    end
local Xs S in
    thread Xs={Generate 0 150000} end % Producer thread
    thread S={Sum Xs 0} end % Consumer thread

    {Browse S}
end

Laziness cab be added to the model by one more instruction, ByNeed, to the kernel language. The execution state is extended with a by-need trigger, a pair trig(x, y) consisting of a dataflow variable y and a one-argument procedure x. Next to the single-assignment store, a new store called the trigger store is added to the abstract machine. The trigger store contains all the by-need triggers and is initially empty.

lazy is a linguistic abstraction that is defined in terms of ByNeed.

fun lazy {Generate N} N|{Generate N+1} end

% Is transformed into
fun {Generate N}
    {ByNeed fun {$} N|{Generate N+1} end}

end

The Stateful Model

Explicit state is added as one new basic type to the computation model. We call the type a cell. A cell is a pair of a constant, which is a name value, and a reference into the single-assignment store. The set of all cells lives in the mutable store, which is added to the abstract machine.

Compared to the declarative model, it adds just two new statements, the cell operations NewCell and Exchange. Syntactic sugar is added with: X=@C, bind X to the content of cell C and C:=X: set x to be the content of cell C.

The Object Oriented Model

Object oriented programming is supported by adding linguistic abstractions for class, attr and meth.

% A class
class Counter
    attr val
    meth init(Value)
        val:=Value
    end

    meth browse
        {Browse @val}
    end
    meth inc(Value)
        val:=@val+Value
    end
end

% A class without semantic support
local

    proc {Init M S}
        init(Value)=M in (S.val):=Value
    end
    proc {Browse2 M S}
        {Browse @(S.val)}
    end
    proc {Inc M S}
        inc(Value)=M in (S.val):=@(S.val)+Value
    end

in
    Counter=c(attrs:[val]
    methods:m(init:Init browse:Browse2 inc:Inc))
end

The methods of Oz supports variable length argument lists, optional arguments and an otherwise method that will be called if no other method is appropriate. This gives strong support for delegation and meta-programming.

As can be seen above, classes are implemented as a set of methods and a set of attributes.

An object can be implemented like this:

fun {New WClass InitialMethod}
    State Obj Class={Unwrap WClass}
in
    State = {MakeRecord s Class.attrs}
    {Record.forAll State proc {$ A} {NewCell _ A} end}
    proc {Obj M}
        {Class.methods.{Label M} M State Obj}
    end

    {Obj InitialMethod}
    Obj
end

Inheritance is implemented by calculating a new class record starting from existing class records. They are combined according to the preferred inheritance rules.

Summary

Van Roy and Haridi has produced an impressive book in the spirit of the Structure and Interpretation of Computer Programs. The book is notably free from critique of other languages. The authors let their impressive work speak for itself.

This short walk-through of the book, with most of the models and details left out, cannot do justice to this incredible work. A must read for anyone interested in programming language design.

Five out of five!

No comments:

Post a Comment