Chapter 7: Collections


Return to [ Table of Contents] [ Main Page] [ Previous Chapter] [ Next Chapter]

Collection: What is it?

Collections are an important set of classes. These classes manage groups of objects; it is almost impossible to write Smalltalk code without using a collection. A collection is simply a set of elements where each element is a pointer to an object.
  
           

Smalltalk has a number of different kinds of collections including the Bag, Set, Array, OrderedCollection, SortedCollection, String, Symbol, and Dictionary. The following diagram illustrates the class hierarchy of these collection classes:

  


Most collections do not care what class of object they are managing. If you wanted, each element in a collection could contain a different object from a different class. Some collections are specific about the kind of object they manage. For example, the class String is a collection that must contain only Character objects.

Collections vary in the capabilities they provide. Some collections can grow and shrink in size and are useful for groups of objects that require this behavior. For example, if every instance of Customer were in a list, the addition or deletion of customers would cause this list to increase or shrink. To implement this list, you would need a variable-sized collection which has this behavior. With a variable-sized collection, the initial allocation can be specified or will default to something appropriate. Other collections cannot change their size and, as a result, often have a more efficient management scheme. Fixed-size collections are appropriate when the number of elements in a group is known and is stable.

A collection can be either ordered or unordered. An ordered collection has some ordering to the elements. This can be a simple index, as in an Array, or a key in a Dictionary, or enforced by some internal ordering such as sorting.

Indexing allows you to refer to an element at a fixed location. For example, in an array of three strings #('red' 'white' 'blue'), you can request the element at the index of 1. All indexing starts at 1 and goes to the size of the collection. Indexed collections are always ordered and do not arbitrarily change the order of elements in the collection. An object stays at its assigned location unless explicitly moved.

Collections can also be unordered. Unordered collections have no ordering and, therefore, the elements cannot be indexed.

Some collections have additional features such as not allowing duplicates in the collection.

Note: It should not be surprising that most collections that are indexable manage their elements as indexed instance variables. Thus, you will see support for the new:, at:, and at:put: messages discussed in Chapter 4: Data Operations.

Common Collection Classes

Smalltalk has several different common collections that have variations of the behaviors discussed above. The following table shows these collections and their behaviors:

 

BEHAVIORS

COLLECTIONSIndexedVariable-size DuplicatesSortingHolds
Bag NYY NAny object but nil
Set NYN NAny object but nil
Array YNY NAny object
OrderedCollection YYY NAny object
SortedCollection YYY YAny object
String YNY NCharacters
Symbol YNN NCharacters
Dictionary NYN NKey + any object

Key: Y = Behavior supported, N = Behavior not supported

Collection Classes do have several similar behaviors. They all:

Bag

A Bag is an unordered collection of elements that are organized for efficient lookup. A Bag acts like a container into which things can be placed and withdrawn. They store their elements in random order and cannot be indexed. A Bag can increase or shrink and allows duplicates- it can contain the same object multiple times.

A Bag is a good collection to use if you need a collection that can change in size, but the order of the objects in the collection is not important, and it does not matter that there are duplicate objects.

For example, a Bag is a good choice for managing a list of customer objects. It does not matter in what order these objects are placed in the list, it is not important to index into this list, and the list needs to grow and shrink.

As an example, aBag is used for tracking the number of occurrences of each element:

  aBag := #( 1 2 3 3 3 3) asBag. 
  aBag size  -> 6 
  aBag occurrencesOf: 3  
    Output:  4 

Set

A Set is identical to a Bag except that it does not allow duplicate objects. A Set ignores any request that would add a duplicate to the collection.

For example, you may decide it is important to check to ensure there are no duplicate customers in the list whenever a new customer object is added. In this case a Set is more appropriate than a Bag. However, ensuring that there are no duplicates adds overhead every time an object is added to the list.

As an example, aSet is used for tracking unique items:

  aSet := #( 1 2 3 3 3 3 ) asSet. 
  aSet size  -> 3 
  aSet occurrencesOf: 3  
    Output:  1 
 

Array

An Array is a fixed-size collection of elements that can be indexed by integer keys that begin at 1 and increase. An Array cannot grow or shrink. The elements of an Array can be any object, duplicate objects are allowed, and elements are stored in random order.

An Array is useful when you know the size of the collection and that size rarely changes. For example, let's assume you are designing an application to manage rooms in a building. The number of rooms is fixed and you want a fast way to index to each room to check for information, such as occupant or phone number. An Array is a good match for this situation.

An Array is an example of a class implemented using indexed instance variables. Remember that the number of indexed instance variables is set at instance creation time. That is why an Array is fixed in size. It cannot grow the number of indexed instance variables without creating a new instance.

As an example, anArray is an array of 3 places with 'hi' in the first position. This example also demonstrates the indexing ability of an Array.

  anArray := Array new: 3 
  anArray at: 1 put: 'hi' 
  anArray at: 1  
    Output:  'hi' 
  anArray at: 3  
    Output:  nil 
 

OrderedCollection

An OrderedCollection offers the most comprehensive protocol of any of the collection classes, and is consequently the "work horse" used most often by programmers. An OrderedCollection is a variable-size collection of elements in which the user of the collection specifies the location of each element. An OrderedCollection can be indexed by integer keys that begin at 1 and increase. The elements of an OrderedCollection can be any class of object. An OrderedCollection also allows duplicated objects and stores the elements in random order.

This collection is heavily used for variable-size lists that require control over the location of elements in the collection. For example, an OrderedCollection is often used in user interfaces for holding information to be displayed in a list box. This allows information to be added or removed from the list. The indexing support is useful when the user selects a particular element in the OrderedCollection.

An OrderedCollection does not contain indexed instance variables. It manages its list of objects by creating a collection that is indexable. An OrderedCollection object points to this collection from one of its instance variables. Thus it manages its elements as indexed instance variables with one level of indirection. That is why an OrderedCollection can grow or shrink.

As an example, an OrderedCollection of cars is used to store three car types and demonstrate the ability to index these variables:

  cars := OrderedCollection new. 
  cars add: 'Cadillac' 
  cars addFirst: 'Lexus' 
  cars addLast: 'Corvette' 
  cars display  
    Output:  ('Lexus' 'Cadillac' 'Corvette') 
 

SortedCollection

A SortedCollection stores objects in the order specified by a block of code called the sort block. The sort block is a two-argument block that indicates the order the two arguments should be stored in the collection relative to each other. The collection stores the first argument ahead of the second argument in the collection when the sort block evaluates to true. The sort block can contain multiple statements, but it must return a value true or false.

For example, the following block compares the object referenced by the variable a with the object referenced by the variable b

  [:a :b | a <= b]     "return true when a is less than 
                                 or equal to b"   

The block returns a value true if the first object is less than or equal to the second object. This block causes a collection to store elements in ascending order. This is the default sort block for a SortedCollection.

This collection uses the sort block whenever a new element is added to the collection. The first argument in the sort block always points to the new object and the second argument in the sort block points to an existing object. The collection runs the sort block, iterating sequentially through all existing objects, until the sort block returns true or there are no more existing objects in the collection.

The method of writing new sort blocks is:

  #(4 3 5 2 1) asSortedCollection: [:a :b | a >= b] 
  #( 'one' 'two' 'three' )asSortedCollection: [ :x :y | xsize >= ysize] 
 

A SortedCollection would be a good choice for maintaining a customer list sorted by name. However, this adds significant overhead whenever a new object is added to the collection, so it should be used prudently.


Collection Messages

Collections have a variety of messages for managing themselves. The paragraphs below summarize only the most commonly used messages.

new

The new message allocates a collection with a default size, which varies depending on the type of the collection. It returns the new collection. For example, the following statement allocates a Bag. Display the message:

  Bag new  
  

This message can be inappropriate for fixed-size collections since it allocates an empty collection (no elements). Remember, fixed-size collections cannot grow. For example, display the following:

  |aCollection|  
  aCollection := Array new.  
  aCollection at: 1 put: 'a string'.  

new: aNumber

The new: message allocates a collection with a size specified by the argument aNumber. The message returns the new collection.

For a fixed-size collection, the argument specifies the number of elements in the collection. Display the following example:

  Array new: 10  
  

This expression allocates an array with 10 elements. A size message sent to this collection will return the number 10. Display the following example:

  (Array new: 10) size  
  

You can index this array up to a maximum of 10. Display the following example:

  |aCollection|   
  aCollection := Array new: 10.   
  aCollection at: 1 put: 'string 1'.   
  aCollection at: 10 put: 'string 10'.   
  aCollection  

You cannot index outside of this range. Both of the following statements are not valid for the above example:

  aCollection at: 0 put: 'string 0'.  
  aCollection at: 11 put: 'string 11'.  
  

For variable-size collections, the new: message allocates the number of elements that can be added to the collection before additional space is needed. This is a performance issue: allocating too much space wastes memory and allocating too little causes additional overhead whenever the collection needs more memory.

A variable-size collection, even though it has space reserved, is empty until elements are added to it. Display the following:

  |aCollection|   
  aCollection := OrderedCollection new: 10.   
  aCollection size   
  

This example will return a size of zero, indicating the collection is empty. Now display the following:

  |aCollection|   
  aCollection := OrderedCollection new: 10.   
  aCollection add: 'string 1'.   
  aCollection size   
  

with: anObject

These four messages create a collection to hold the specified objects:

The size of the collection is equal to the number of keywords in the message. For example, the message with: creates a collection with one element and places anObject in the collection at that one location. The message returns the new collection. This message is valid for every collection. Display the following examples:

  Array with:'string 1' with:'string 2'.   
   
  OrderedCollection with:1 with:2 with:3 with:4.   
  

size

The size message returns the number of elements currently in the collection. It is supported by all collections. If the collection is empty, its size will be zero. This message is in several of the previous examples.

do: aBlock

The do: message is a general iteration that loops through each element in a collection and runs a one-argument block of code specified by the aBlock argument with each element as the argument. This message is used for side effects mostly, with no interest in the returned object.

For example, the following code will count the number of vowels in the string 'now is the time' and output a value of 5. Display the example:

  "Count the number of vowels in a string."  
  |vowels|  
  vowel := 0.  
  'now is the time' do: [ :each |  
    each isVowel ifTrue: [  
      vowels := vowels + 1]]  
  

As another example, set a variable numbers to a collection of numbers. The following example computes the sum of all those numbers for a result of 15. Display the example:

  "Compute the total sum of numbers in numbers."   
  |numbers sum|   
  numbers := #(1 2 3 4 5).   
  sum := 0.   
  numbers do: [ :aNumber | sum := sum + aNumber].   
  sum  
  

This example runs the block of code for each number in the collection with the variable aNumber pointing to that number. The following diagram illustrates this iteration.


 

Look at another example. The following code counts the number of even numbers in a collection. It produces a result of 2. Display the example:

  "Count the number of even numbers in numbers."   
  |numbers evenNums|   
  numbers := #(1 2 3 4 5).   
  evenNums := 0.   
  numbers do: [ :aNumber | aNumber even ifTrue: [evenNums := evenNums + 1]].   
  evenNums  
   

Note: Numbers recognizes a message called even that returns true if the number is even and returns false otherwise.

add: anObject

The add: message adds an object specified by the anObject argument to a collection. The message is valid only for collections that can grow in size. It returns the object that was added. Display the following example:

  |aCollection|   
  aCollection := Bag new.   
  ^aCollection add: 1   
  

remove: anObject

The remove message removes an object from a collection. The message is valid only for collections that can grow or shrink in size. It returns the object that was deleted. If the object is not found, the Debugger window will be displayed. Display the following example:

  |aCollection|   
  aCollection := Bag new.   
  aCollection add: 1.   
  aCollection remove: 1.   
  

A collection finds an object by performing an equality check, not an identity check. In the preceding example, the first object found in the collection with a value of 1 will be the object removed from the collection. Display the following example:

  |aCollection|   
  aCollection := Bag new.   
  aCollection add: 1.   
  aCollection add: 1.   
  aCollection remove: 1.   
  aCollection   
  

In this example, the two elements in the collection point to separate instances of SmallInteger, each with a value of 1. This message will remove the first object found with a value of 1.

at: aNumber put: anObject

The at:put: message places an object, specified by the anObject argument, in the collection at the index specified by the aNumber argument. It returns the object placed in the collection. This message is valid for collections that can be indexed. Display the following example:

  |aCollection|   
  aCollection := Array new: 10.   
  aCollection at: 5 put: 'string 5'.   
  aCollection   
  

The index must be a valid number or a Debugger window will be displayed. Display the following:

  |aCollection|    
  aCollection := Array new: 10.   
  aCollection at: 11 put: 'string 11'.   
  

at: aNumber

The at: message returns the object at the index specified by the aNumber argument. The message is valid for collections that can be indexed. Display the following:

  |aCollection|   
  aCollection := Array new: 10.   
  aCollection at: 5 put: 'string 5'.   
  aCollection at: 5   
  

removeAtIndex: aNumber

The removeAtIndex: message removes the object at the index position specified by the aNumber argument and reduces the collection by one element. This causes all the succeeding elements to shift down one index position. It returns the removed object.

This message is only valid for collections that can be indexed and can grow and shrink in size, such as OrderedCollection. Display the following example:

  |aCollection|   
  aCollection := OrderedCollection new.   
  aCollection add: 'string 1'.   
  aCollection add: 'string 2'.   
  aCollection removeAtIndex: 1.   
  aCollection  
   

, aCollection

The , message combines two collections to form a third collection by concatenation. The receiver of the message must be a collection that can be indexed. The argument aCollection can be any collection. The message returns a new collection of the same class. as the receiver.

The message is a good way to grow a fixed-size collection. For example, create an array and then add a new element to the array. Display the following example:

  |aCollection|   
  aCollection := Array new: 2.   
  aCollection at: 1 put: 'string 1'.   
  aCollection at: 2 put: 'string 2'.   
  aCollection := aCollection, (Array with: 'string 3').   
  

detect: aBlock

This message returns the first element in the receiver for which aBlock evaluates to true. For example, display the following:

  #( 4 7 10 3 7) detect: [ :each | each > 7] 
 

This will display the 10.

select: aBlock

This message returns a subset of the receiver containing all elements for which aBlock evaluates to true. For example, display the following:

  'now is the time' select: [ :each | each isVowel ] 
 

Returns the value: o i e i e.

reject: aBlock

This message returns a subset of the receiver containing all elements for which aBlock evaluates to false. For example, display the following:

  'now is the time' reject: [ :each | each isVowel ] 
 

Returns the value: nw s th tm.

collect: aBlock

This message creates and returns a new collection of the same size and type as the receiver. The elements are the result of performing aBlock on each element in the receiver. For example, display the following:

  #('now' 'is' 'the' 'time' 123) collect: [ :each | each isString ] 
 

Returns the string: now is the time.

Summary of collections and messages

The following table summarizes what methods each collection supports.


                                         METHODS 
COLLECTIONS do: add: remove: at: at:put: removeIndex: , detect: select: reject: collect:
Bag Y Y Y N N N N Y Y Y Y
Set Y Y Y N N N N Y Y Y Y
Array Y N N Y Y N Y Y Y Y Y
OrderedCollection Y Y Y Y Y Y Y Y Y Y Y
SortedCollection Y Y Y Y N Y Y Y Y Y Y
Dictionary Y Y N Y Y N N Y Y Y Y
String Y N N N Y N Y Y Y Y Y
Symbol Y N N N Y N Y Y Y Y Y


Special Collections

There are three other collections that are useful but that differ enough from the other collections that they merit a special discussion. The three classes are String, Symbol, and Dictionary.

String

Logically the String class is identical to the Array class except that it can contain only characters. For example, the string 'John' is shown in the following diagram:


               

Inspect the following string:

  'John'  
  

The String Inspector window opens and shows the contents of the string. Notice the string is made up of four elements, each pointing to a character. Close this window when you are done.

A String cannot grow or shrink in size, but it can be indexed. Thus, it supports the messages at: and at:put: but does not support the messages add: or remove:. Messages that appear to grow a String object create a new String object rather than grow an existing object. For example, display the following statement:

  'John', 'Doe'     "Join two strings to form a third string." 

It returns a new String object, 'John Doe'.

A String is an example of a class that is implemented using indexed instance variables that contain bytes rather than pointers. Each byte contains the character stored at that indexed position. However, the interface to String is still via pointers. That is, String expects a pointer to an instance of Character as it's input and returns a pointer to an instance of Character as output. Inspect the following example. You will see that an instance of Character is returned from the at: message.

  'abc' at: 1  
  

A String can replace an element unless is was created using a literal. Display two statements:

  (String with: $x) at: 1 put: $y.  
  
  'x' at: 1 put: $y.   
  

Both statements create a String object of one character, one using the with: message and the other using a literal. The at:put: message is then sent to each String object. The first statement will support the message, the second statement will cause an error. Close the Debugger window when it is displayed.

Strings have additional methods that make sense for a character string. Display the following examples:

  'string' trimBlanks  
       --"Return a string with the leading and trailing blanks removed."  
  
  'February 3, 1994'breakAt:','  
       --"Return an array of two strings." 
  
  'one two three' subStrings  
       --"Return an array of strings for each word in the string." 

Symbol

Symbols are used by classes to define the selectors for the methods defined for the class. For example, if Customer class has a method called name, then the symbol for the selector is #name. Symbols are also used to provide unique keys in dictionaries. However, atoms might also be used for this.

The Symbol class is similar to the String class, with some unique differences. It has the following behavior:

        |aCollection| 

aCollection := #Symbol

This means it does not support the messages new:, with:, with:with:, with:with:with:, and with:with:with:with:.

It does support the , (comma) message but returns a new string, not a new symbol. Display the following example:

  #name1, #name2   
  

Note: Remember that a symbol is preceded by the number, or pound, sign.

Try a simple example. Display the following:

  #name at: 1 put: $N.  
  

This will result in a Debugger window. Close the window when it appears.

Dictionary

A dictionary is like a simple database table. Dictionaries are unordered collections whose elements are accessed by an explicitly assigned external key. Keys are often strings or symbols, but in principle any object can be used as a Dictionary key. Dictionaries keep track of key/value pairs. Each key/value pair is an instance of a class called Association. An instance of Association points to an object that is the key and points to an object that is the value. An element in a dictionary contains a pointer to an instance of Association. There are no restrictions on the elements stored in a Dictionary.


          

Keys must be unique in a dictionary. No two associations in a dictionary can have keys with the same value.

Note: Key matching for storing or accessing elements is based on the equality operation =. Consequently, keys must be unique with respect to the equality operation, that is, two elements cannot be associated with the same key nor can two keys be equal. For example, if a String object with a value of 'name' is already stored as a key, another String object with the same value cannot be added as a key.

Because of the key/value pairing, several of the methods for Dictionary are slightly different from the methods for other collections. Dictionary also provides several additional methods specific to keys and values.

The paragraphs below briefly summarize some of the common messages for Dictionary.

Note: Some of the examples ask that you inspect them, not display them. The display command of a Dictionary object only shows its values, not its keys. An Inspector window is needed to see both the keys and values.

add: anAssociation

The add: message adds the anAssociation argument to the dictionary. It returns the association that was added. Inspect the following example:

  Dictionary new  
  add: (Association key: '257511');  
  yourself  
  

removeKey: aKey

The removeKey: message removes the association from the Dictionary with the key equal to aKey argument. It returns the value of the association that was removed. It displays a Debugger window if aKey cannot be found in the Dictionary. Inspect the following example:

  |aDictionary| 
  aDictionary := Dictionary new. 
  aDictionary add: (Association key: '27511' value: 'Cary, NC'). 
  aDictionary removeKey: '27511' 
  

at: aKey

The at: message returns the value associated with the aKey argument. It displays a Debugger window if the aKey argument cannot be found in the aDictionary. Display the following example:

  |aDictionary| 
  aDictionary := Dictionary new. 
  aDictionary add: (Association key:'27511' value:'Cary, NC'). 
  aDictionary at:'27511' 
  

keyAtValue: aValue

The keyAtValue: message returns the key for the first value that it finds equal to the aValue argument. It displays a Debugger window if aValue cannot be found in the aDictionary. Display the following example:

  |aDictionary| 
  aDictionary := Dictionary new. 
  aDictionary add: (Association key:'27511'value:'Cary'). 
  aDictionary keyAtValue:'Cary' 
  

at:aKey put:aValue

The at:put: message finds the aKey in the aDictionary and sets its value equal to the aValue argument. If the key is not found, it adds it to the dictionary. This message can be used instead of the add: message to add a new association to the dictionary. It returns aValue. Display the following example:

  |aDictionary| 
  aDictionary := Dictionary new. 
  aDictionary add: (Association key:'27511'value:'Cary'). 
  aDictionary at:'27511'put:'Raleigh'. 
  aDictionary at:'49255'put:'Montgomery'. 
  aDictionary 
  
Notice that the value for the key '27511' has been changed and a new association with a key of '49255' and a value of 'Montgomery' has been added.

do: aBlock

The do: message iterates through all the objects in the dictionary elements, running a one-argument block of code specified by the aBlock argument with the argument set to the value of each association. Display the following example:

  |aDictionary aCollection| 
  aDictionary := Dictionary new. 
  aCollection := OrderedCollection new. 
  aDictionary at:'27511'put:'Cary'. 
  aDictionary at:'49255'put:'Montgomery'. 
  aDictionary do:[:city|aCollection add:'city = ',city]. 
  aCollection 
  
This example returns an OrderedCollection with each element pointing to a String object with the name of a city preceded by the characters 'city = '.

keysDo: aBlock

The keysDo: message iterates through all the Dictionary elements, running a one-argument block of code specified by the aBlock argument with the argument set to the key of each association. Display the following example:

  |aDictionary aCollection| 
  aDictionary := Dictionary new. 
  aCollection := OrderedCollection new. 
  aDictionary at:'27511'put:'Cary'. 
  aDictionary at:'49255'put:'Montgomery'. 
  aDictionary keysDo:[:zip|aCollection add:'zip code = ',zip]. 
  aCollection 
  
This creates an OrderedCollection with each element pointing to a String object with a zip code preceded by the characters 'zip code = '.

keys

The keys message returns a set of all the keys in the dictionary. Display the following example:

  |aDictionary| 
  aDictionary := Dictionary new. 
  aCollection := OrderedCollection new. 
  aDictionary at:'27511'put:'Cary'.  
  aDictionary at:'49255'put:'Montgomery'. 
  aDictionary keys 
  

values

The values message returns an instance of the class Bag of all the values in the Dictionary. Display the following example:

  |aDictionary| 
  aDictionary := Dictionary new. 
  aCollection := OrderedCollection new. 
  aDictionary at:'27511'put:'Cary'. 
  aDictionary at:'49255'put:'Montgomery'. 
  aDictionary values 
  
Note: Why does Dictionary return keys in a Set and values in a Bag? A Set does not allow duplicates whereas a Bag does. A Dictionary object cannot have duplicate keys, so a Set is a good match. But because a Dictionary object can have duplicate values, they are placed in a Bag.


Summary

Let's review some of the important points covered in this chapter:


Return to [Top of the page]
Smalltalk Tutorial

Go to Chapter8:Streams

Return to Chapter6: Inheritance

Return to Main Page