Pocket Smalltalk - A White Paper

A Smalltalk System for Small Computers

By Andrew Brault, Copyright (c) 1998

1 - Preface
- 1.1 - Introduction
- 1.2 - Smalltalk, the Language
- 1.3 - How it Works
- 1.4 - Built-in Classes
2 - The System
- 2.1 - Declarative Smalltalk
- 2.2 - Starting Up
- 2.3 - Symbols
- 2.4 - Image-Based Persistence
- 2.5 - The System Dictionary
- 2.6 - Exception Handling
- 2.7 - Proxies, nil, and doesNotUnderstand:
- 2.8 - become:
- 2.9 - Method Dictionaries
- 2.10 - Global Variables
- 2.11 - #ifNil
3 - The Implementation
- 3.1 - Static Optimizations
- 3.2 - Optimized Messages
- 3.3 - Memory Management
- 3.4 - Characters
- 3.5 - Classes and Metaclasses
- 3.6 - Blocks and Contexts
4 - Conclusions

1 - Preface

1.1 - Introduction

Current Smalltalk systems require large amounts of computer memory to operate. This white paper describes an implementation of Smalltalk which can run in very little memory, yet supports all the essential features of Smalltalk. It is therefore ideal for deploying applications on small machines such as handheld PDA devices. Memory overhead for a simple "hello world" program is just 5 kilobytes plus a 20 kilobyte virtual machine (on the PalmPilot PDA).

1.2 - Smalltalk, the Language

This white paper assumes that you are familiar with the basics of the Smalltalk programming language. There are many books on Smalltalk, and several free implementations. A little research will turn up many sources of information, but the best introduction to the language is still the original book on Smalltalk:

           "Smalltalk-80, the Language"
           by Adele Goldberg and David Robson
           Published by Addison-Wesley
           ISBN 0-201-13688-0

1.3 - How it Works

This implementation, called "Pocket Smalltalk", consists of an integrated development environment (similar to the usual Smalltalk IDE) running under Windows 95/NT, plus a virtual machine which runs on the PalmPilot. The IDE contains a cross-compiler which will, at any point, convert the classes and methods defined by the IDE into a .PDB (Pilot database) file which can then be transferred to the PalmPilot and executed by the virtual machine.

The IDE currently has a few limitations compared to traditional Smalltalk environments. First, there is no way to evaulate expressions in a workspace---this is because the virtual machine running on the PalmPilot makes use of many features of the Palm operating system which would be very difficult (and slow) to simulate within the IDE. In the future, the system may support either simulation of a limited subset of the virtual machine within the IDE, or else a serial link to the PalmPilot (or an emulator) permitting workspace-like functionality.

In addition, there is no way to create complex objects at compile time. Simple literals such as strings and arrays may be embedded in methods as usual, but initialization of class variables and related things must be performed at runtime. This is generally not a very severe limitation, and it is eased somewhat by the fact that the IDE allows you to define some extra kinds of objects at compile time which are not allowed by traditional Smalltalks (for example, IdentityDictionaries can be created at compile time).

Despite these limitations, Pocket Smalltalk is a very practical alternative to C or C++ for developing palmtop or PDA applications.

1.4 - Built-in Classes

Pocket Smalltalk can work with a very small class library (compared to other Smalltalks). The following classes are present in a base image: (classes marked by a * can be removed if desired)

 Object
   Behavior
     Class
     Metaclass
   BlockClosure
     FullBlockClosure
   Boolean
     True
     False
   Character
   Collection
     KeyedCollection
       ArrayedCollection
         Array
         ByteArray
         List*
           OrderedCollection
         String
       Dictionary*
         IdentityDictionary*
     Set*
       IdentitySet*
   ExceptionHandler
   Message
   MiniDebugger*
   Number
     Integer
       SmallInteger
       LongInteger
     Fraction*
     Point*
   Smalltalk
   Stream*
     ReadStream*
     WriteStream*
   UndefinedObject

2 - The System

Pocket Smalltalk differs in some important respects from other Smalltalk systems. This section details the differences you will encounter.

2.1 - Declarative Smalltalk

Pocket Smalltalk is an example of a declarative Smalltalk system. The new ANSI Smalltalk standard is moving toward declarative systems. Whereas a traditional Smalltalk program is built by extending the base image---in the process building arbitrary graphs of objects---a Pocket Smalltalk program is completely defined by its classes and methods. As a consequence, a Pocket Smalltalk program can be represented as a simple file in "file-out" format. In contrast, a traditional Smalltalk program cannot be represented declaratively at all, and can only be recreated by reapplying the series of actions used to extend the base image.

2.2 - Starting Up

When your compiled program first starts up, it begins by sending the selector #start to the class Smalltalk. You must replace the default #start method with your own in order to get your program running.

It is illegal to return from the Smalltalk class>>#start method. Doing so will likely crash the virtual machine. Instead, you should execute "Smalltalk quit" to stop your program.

2.3 - Symbols

Symbols are used for three main purposes in Pocket Smalltalk:

Naming methods (i.e., selectors)
Naming classes
Serving as unique "tokens"

A symbol is written as #symbolName, where symbolName is any valid Smalltalk identifier.

Since symbol data accounts for a large fraction of the total size of a typical Smalltalk program, Pocket Smalltalk will replace each symbol by a unique SmallInteger. As a consequence, there is no Symbol class, nor is there a way to distinguish between symbols and integers. This is almost never a problem, though, and if necessary, symbols can be "wrapped" in another class to provide the desired functionality.

This optimization saves a considerable amount of memory space and execution time. The only real disadvantage, aside from losing the distinction between Symbols and SmallIntegers, is that you will be unable to access the characters of a symbol (for example, in order to print the symbol). Again, this should not be a problem since Symbols should not be printed at runtime anyways (Strings should be used for this purpose).

For debugging purposes, you may choose to include the text of each symbol (select the "emit debug info" option in the IDE). Then you may use the expression:

        Context textOfSymbol: symbol

to recover the String containing the symbol's characters. The default MiniDebugger uses this feature to provide a symbolic stack trace when an error occurs.

2.4 - Image-Based Persistence

You may save the entire state of a Pocket Smalltalk program by creating a "snapshot" of memory. This is accomplished at runtime by executing:

        Smalltalk snapshot

The next time your program is started, it will resume exactly where it left off.

2.5 - The System Dictionary

In Pocket Smalltalk, the system dictionary called "Smalltalk" is actually a class. You may look up classes "dynamically" by name at runtime with the methods:

        Smalltalk at: #ClassName

or:

        Smalltalk includesKey: #ClassName

You cannot add or remove classes (or methods) at runtime.

The Smalltalk class also provides various other system operations which can be seen by browsing the class.

2.6 - Exception Handling

Exception handling is present in a very simplified form. Class-side methods of class ExceptionHandler are invoked by the virtual machine when various exceptions occur. For example, ExceptionHandler class>>#divisionByZero is sent when division by zero occurs. The usual result of an exception is to terminate the program with an error message (but this may be overridden).

Nonlocal returns through arbitrary stack frames may be accomplished by the following methods:

        ExceptionHandler class>>#catch:during:
        ExceptionHandler class>>#throw:

#catch:during: takes as its first argument a 1-argument block which is evaulated with a thrown object if a #throw: message is sent during the execution of the second argument of #catch:during: (a 0-argument block). Any object may be "thrown". For example:

ExceptionHandler
   catch: [:object | ('caught: ', object printString) print]
   during: [ExceptionHandler throw: 12345].

If no #catch:during: method is extant when a #throw: message is sent, the method ExceptionHandler class>>#uncaughtThrow: is sent (with the thrown object as an argument).

There is no provision for "unwind protection" or proceedable exceptions.

2.7 - Proxies, nil, and doesNotUnderstand:

When an object is sent a message which is cannot interpret, a Message object representing the message send is created, and then the message #doesNotUnderstand: is sent to the object with the Message as the argument.

By default, #doesNotUnderstand: just signals an error, but certain classes may want to intercept #doesNotUnderstand: and take some special action.

A class which provides very little behavior of its own, but instead forwards most messages onto another object is called a "proxy". Proxies and other classes with similar needs can subclass from nil instead of from Object (or a subclass of Object), thereby inheriting none of the "basic" methods defined by class Object. They may then use #doesNotUnderstand: to forward messages to another object.

A class inheriting from nil need only implement the one selector: #doesNotUnderstand:. The cross compiler does not allow you to create a subclass of nil which does not implement this selector.

2.8 - become:

The primitive operation Object>>#become: swaps the identity of two objects. In Pocket Smalltalk this is an efficient operation. You must be careful with this operation because it can crash the virtual machine if you "become" the receiver of a method into an object with fewer named instances variables. The "become" primitive will fail if you try to convert from a pointerless object (e.g., a string or a byte array) to a pointer object, or vice versa. It will also fail if you try to swap the identities of SmallIntegers, or if either object is a statically allocated object (i.e., a class, metaclass, or literal).

The built-in collection classes implement expansion by means of #become:. This allows a collection to be represented by a single object, rather than two objects as in some Smalltalk implementations.

Method Dictionaries

Pocket Smalltalk does not have traditional "method dictionaries" attached to each class. Methods are compiled into a special, efficient format and are not available as Smalltalk objects. Nevertheless, you may perform some limited operations on methods, as illustrated below:

        Collection includesSelector: #do:
        Set canUnderstand: #keys
        15 respondsTo: #+
        Array methodCount
        Object selectors

Note that #perform: and related methods can be used as usual.

Global Variables

Global variables are handled a bit differently in Pocket Smalltalk than in other Smalltalk systems. Rather than being Associations in the Smalltalk system dictionary, global variables are class variables of Object. Since all classes (except root classes; see above) are subclasses of Object, they all can access these class variables the same way as global variables.

At compile time, references to class (and "global") variables are converted into single instructions. No separate objects are created for global variables; nor can you set the value of a global variable at compile time (it must be done with some kind of initialization code at runtime).

Unlike other Smalltalks, classes are not global variables. For most purposes, you can treat them as such, but the major difference is that you cannot assign to them. Class names must not conflict with class variable names.

2.11 - #ifNil:

A very common idiom in Smalltalk is the following:

        object isNil ifTrue: [...]

This expression can be simplified by defining a new message #ifNil: so that you can write:

        object ifNil: [...]

The problem with doing this is that ordinary Smalltalks cannot apply compiler optimizations to this expression, so the above code will execute more slowly and take more space than the usual isNil ifTrue: case. Pocket Smalltalk, however, knows how to optimize ifNil: and related messages, so you can use them without any penalty. The message forms it recognizes are as follows:

ifNil: [...]
ifNotNil: [...]
ifNil: [...] ifNotNil: [...]
ifNotNil: [...] ifNil: [...]
orIfNil: [...]

The last message deserves some explanation. orIfNil: answers the receiver if the receiver is not nil, but if the receiver is nil it answers the result of evaluating the argument block. This can be used to provide "default" values for possibly-nil variables:

        ^name orIfNil: ['anonymous']

3 - The Implementation

The small amount of memory available on the machines for which Pocket Smalltalk is targeted causes many implementation difficulties. This section describes how Pocket Smalltalk gets around these difficulties to allow the creation of small and efficient Smalltalk programs.

3.1 - Static Optimizations

Pocket Smalltalk combines aspects of the usual Smalltalk development environment with a global compiler which can statically analyze your program and perform various optimizations. When you are ready to test your program, you instruct Pocket Smalltalk to emit a .PDB file which you then HotSync to your PalmPilot. At this time, the global compiler examines your classes and methods and performs the following optimizations:

Frequently used selectors are extracted; message sends which use these frequently used selectors take less space than ordinary message sends
Messages sent to 'self' in a leaf class are inlined if they are small enough (e.g., simple accessor and mutator methods)
References to literal objects (arrays, strings, byte arrays, etc) are replaced with instructions to access an equivalent statically allocated literal object (therefore, no "literal frame" is needed)
Methods and method dictionaries are replaced with a special optimized form which can be used directly at runtime -- no separate objects for methods or method dictionaries are needed
Symbols are replaced with integer values (see below) and the text of each symbol and selector is discarded (unless kept for debugging purposes)
"Clean" blocks (those which do not reference nonlocal variables) are statically allocated at compile time. "Full" blocks (those which do reference nonlocal variables) are converted into a single instruction (and do not create objects at compile time)
"subclass responsibility" and "should not implement" methods are removed (if specified by the programmer)
Classes and metaclasses are compressed; compressed classes take only 10 bytes (plus methods) and compressed metaclasses take only 8 bytes

3.2 - Optimized Messages

Some control-flow messages can be converted into an optimized form if they match certain patterns. The patterns matched are as follows (the ... represent arbitrary expressions):

... ifTrue: [...]
... ifFalse: [...]
... ifTrue: [...] ifFalse: [...]
... ifFalse: [...] ifTrue: [...]
... and: [...]
... or: [...]
[...] repeat
... timesRepeat: [...]
... to: ... do: [:index | ...]
... to: ... by: ... do: [:index | ...]
[...] whileTrue: [...]
[...] whileFalse: [...]

Memory Management

The PalmPilot platform (and small machines in general) pose difficult problems for memory management. The PalmPilot has a dynamic heap of only (at maximum) around 64 kilobytes, much of which is used by the system (for display memory, stacks, PalmOS UI objects, etc). Older Pilots have only 32 kilobytes for their dynamic heap.

Pocket Smalltalk bypasses the dynamic memory scheme imposed by PalmOS and allocates chunks of memory as database records. It then disables write protection on these records in order to write into them. In this manner, the entire memory of the PalmPilot can be used to store objects, and no extra memory is needed to make image snapshots, since the objects are already stored in an appropriate format within the database records. Records are allocated and de-allocated as the memory needs of the Smalltalk program change.

In order to effectively support the automatic memory management (garbage collection) required by Smalltalk, an indirect object table is used. This object table is configured (at compile time) to support a fixed maximum number of live objects simultaneously. Each object table index identifies an object and can be one of 2^15-1 (32767) different values. Therefore, up to 32767 different objects can be active at one time (not counting methods, symbols, small integers, or method dictionaries). (The 32768th object is used as a special tag for certain system operations.)

Each object table entry contains a 4-byte pointer to the beginning of the corresponding object. The objects reside in database records as described above. Because of limitations in PalmOS versions previous to 3.0, the maximum size of the object table is 64k---and probably less, due to heap fragmentation. Therefore, you should not rely on more than 32k to be available for the object table, giving 32k / 4 = 8192 object table entries. Although 8192 objects may not seem like much, there are really very few programs that will need this many objects. You are more likely to run out of memory to store the objects in than you are to run out of object table slots to refer to them. Typical programs of around 200 classes can run quite happily in under 1000 object table slots and perhaps 20k of heap memory. However, if you need more object table slots, it is generally safe to increase the maximum to around 16000 slots. This may force users of PalmOS v2.0 (and previous versions) to "defragment" their machines before using your program, however.

Garbage collection is implemented by a two-pass mark-compact collector. First, the reachable objects are discovered by tracing pointers starting at the roots (basically the objects on the data stack). Then, each database record containing objects is scanned. Unreachable objects are discarded (and their object table entries recycled), while reachable objects are "slid" down toward the beginning of each database record. After each record is scanned, all the reachable objects will be consolidated at the beginning of the record, while the rest of the record will consist of free space. As a consequence, allocation of new objects is very quick, as heap fragmentation cannot occur.

The objects statically allocated at compile time (classes, metaclasses, and literals) are never scanned or reclaimed. Therefore, having more literals or classes in your program will not slow down garbage collection.

Garbage collection of a typical 20k heap takes less than 0.1 seconds on the PalmPilot, so you will probably never even notice a pause due to memory management.

3.4 - Characters

There are 256 Character objects, one for each extended (8-bit) ASCII character. They are immutable (unchangeable) objects. Pocket Smalltalk does not represent Characters as "immediate" values as do some other Smalltalk systems. Instead, a novel approach is taken which avoids allocating 256 separate Character objects. Upon startup, the system allocates a single Character object and 256 object table pointers to the object. Each pointer then points to the same Character object. The ASCII value of a Character is determined by its index within the 256 object pointers. Therefore, the total memory used to store Characters is just 1030 bytes (6 bytes for the Character object and 1024 bytes for the object table entries) in contrast to the 3072 bytes that would normally be required.

A slightly extended syntax is provided for Character literals. In addition to the usual $x syntax, you can use the following "escape" sequences to get special characters:

        $\newline, $\space, $\tab, $\backspace, $\cr, $\lf,
        $\escape, $\backslash

You can also specify a character directly by ASCII code by giving it in the form: $\xxx where xxx is the ASCII code.

Classes and Metaclasses

Pocket Smalltalk implements the full class/metaclass system of Smalltalk-80. Classes such as SmallInteger, String, and WriteStream are instances of their corresponding metaclass (SmallInteger class, String class, and WriteStream class, respectively). Each metaclass is an instance of the class Metaclass. The class of Metaclass is Metaclass class, and the class of Metaclass class is Metaclass. The superclass of Object class is Class, therefore allowing all classes to create instances of themselves via the messages #new and #new: implemented in Behavior (the superclass of Class). As usual, you may obtain the class for any object by sending it #class, and you may store classes in variables and refer to them by name in methods.

Names of classes are discarded at runtime to avoid taking space, but if you instruct the compiler to emit debugging information the class names will be available by sending the class the message #name. Metaclasses respond to #name by answering the name of their unique instance followed by 'class', e.g.:

        Collection class name  =>  'Collection class'

Blocks and Contexts

In order to minimize the space requirements of user programs, Pocket Smalltalk imposes a few (relatively minor) restrictions on blocks and contexts. First of all, contexts (stack frames) are not represented directly as objects as they are in some other Smalltalks. Therefore the pseudo-variable thisContext will not work as expected. Referring to thisContext will give an integer index which can be used with the MiniDebugger class to access the receiver, selector, method class, arguments, sender, etc. of various stack frames. The main uses for this are to provide "walkback" functionality when an error occurs, and to give more meaningful error messages in certain cases (for example, "#do: is a subclass responsibility").

Most of the functionality of blocks is available. In particular, blocks may access (and assign to) variables in enclosing scopes. Nonlocal returns are also supported, as are "recursive" (reentrant) blocks. The main restriction on blocks is that they may not refer to variables in enclosing scopes which have already returned. This is almost never a problem in practice provided that you do not hold onto blocks in instance variables or global variables. For such purposes you should use a Message or a MessageSend object instead of a block.

Attempts to access an variable (from a block) in an enclosing scope which has already returned will result in the message #outerScopeInvalid being sent to the block, which by default signals an error. Similarly, attempting to perform a nonlocal return from a block whose home context has alreay returned will send #alreadyReturned to the block.

Blocks are currently classified into 4 categories depending on what kinds of actions they perform:

"Optimized" blocks are part of optimized control structures such as #ifTrue: or #timesRepeat:. They are converted directly into virtual machine instructions and do not create BlockClosure objects. (Note that cascaded control structures, or control structures which do not have a recognizable pattern, will generate ordinary blocks.)

"Clean" blocks are those which do not access any variables in outer scopes, and which do not perform nonlocal returns. Clean blocks are instances of BlockClosure and are allocated statically at compile time. They do not cause any memory allocation when they are evaluated at runtime.

"Full" blocks either access variables in outer scopes, or perform nonlocal returns (or both). Full blocks are allocated at runtime when the block expression is evaluated, resulting in a FullBlockClosure object.