Richard A. O'Keefe ok at cs.otago.ac.nz Mon Mar 3 17:18:21 CET 2003 The thing which strikes me forcibly is how few changes there are between the freely available 1.9 draft and the official standard. This includes very little in the way of typo correction... Here's a list of issues I have found in sections 5.9 "Stream Protocols" and 5.10 "File Stream Protocols". Note the word "ISSUE"; this is not the same as "ERROR". 1. "Some stream classes will build sequenceable collections or report the values of a sequenceable collection." p255 These are the only occurrences of "sequenceable collection"; SequenceableCollection is not an ANSI class or protocol and there is no "sequenceable collection" concept elsewhere in the ANSI ST standard. MINOR. 2. "Other types of streams may operate on files, positive integers, random numbers, and so forth." p255 There _are_ ANSI Smalltalk streams that operate on files, but not any that operate on random numbers. That's a pity, because most Smalltalk users will not be expert in random number generation. SCOPE issue; the standardisers no doubt had good reason for keeping RNG out and there are public algorithms one can adapt. However, a common protocol one could follow would have been nice. In particular, should one ensure that 0.0 _is_ a possible result of #next or that it is _not_? 3. "Transcript is a stream that may be used to log textual message[s] generated by a Smalltalk program." p255 TYPO. 4. The protocol hierarchy lumps "gettable streams" (basically, things that can support #atEnd and #next) and "peekable streams" (basically, things that can in addition support #peek) into one protocol. As a means of specifying the streams that happen to be present in ANSI Smalltalk itself, that's OK. But it provides people with no guidance about which methods they should implement if they _can't_ implement #peek (efficiently). This matters to me because I have several times been in just that situation. #atEnd and #next, no problem. #peek, only if I implemented lookahead myself (an extra two instance variables 'peeked lookahead' and slowdown in #atEnd and #next checking/maintaining them). SCOPE. It is important for readers of the ANSI protocols to realise that they are primarily meant for specifying ANSI classes; if they are useful as models for your own classes, that's nice, but don't expect it. 5. 5.9.1.5 Definition, p257 "move objects in sequence from the front of the receiver's future sequence values to the back of th[e] receiver's past..." This occurs twice. Surely a spelling checker would have found "the" written as "th"??? TYPO 6. 5.9.1.5 Errors, p257 "If the receiver has any sequence values and amount is greater than or equal to the total number of sequence values of the receiver". But this forbids you moving the position to the end of the sequence. I think this should read "If amount is greater than the total number of sequence values of the receiver". ERROR. - 5.9.1.6 Definition I find "appended with" grating. Less contentiously, the two sentences are different in mood, for no apparent reason. Better as "Sets the receiver's future sequence values to the concatenation of its past sequence values and its future sequence values, and makes the receiver's past sequence values empty." WRITING. - #contents and #upTo: are there, why not #upToEnd? The irony here is that #upToEnd is implementable even in some streams where #contents is not. But we've had discussions about whether the standardisers were willing to require implementors to add even quite tiny methods, and the answer was that they weren't. This is an issue for me trying to put together an ANSI-like implementation I could use, but the standard is what the standardisers wanted. 7. 5.9.2.2 page 259 "and, passed as the argument to an evaluation of operand." The comma should not be there, and the argument is called "operation" not "operand", as the very next sentence points out. TYPOS. 8. 5.9.2.4 #next: p259 "The result is undefined if amount is larger than the number of objects in the receiver's future sequence values". It really is not clear to me why the last block in a sequence of #next: calls could not be short. (That's how read(2) works in UNIX, after all.) This prompted me to look at Squeak. I find that in Squeak *some* implementations of #next: will return a short block and some will go rumble-rumble-CRASH. But it is easy to implement safely: next: amount |result| result := self species new: amount. 1 to: amount do: [:index | self atEnd ifTrue: [^result copyFrom: 1 to: index - 1]. result at: index put: self next]. ^result Why this matters is that if you have a stream that is not positionable, you don't KNOW whether there are that many items left. As it is, the ANSI definition of #next: pushes the responsibility on the user without providing the user with any means of carrying out that responsibility, other that to program his/her own version of #next:. This is particularly important because some ANSI Smalltalk streams are supposed to be positionable, but if you try it, you may be in for a nasty surprise. (See below.) The "style", so to speak, is a little inconsistent here, because #upTo: *will* accept a final block that doesn't end in the usual way. I'm not quite sure how to classify this, but a method you cannot use when you most need it (see below) sounds like an issue to me. 9. 5.9.2.6 p260 "The results are undefined if there are no future sequence values in the receiver." But there is only one result, a . TYPO. 10. 5.9.4.4 p264 "Has the effect of enumerating the (sic.) aCollection with the message #do: and adding each element to the receiver with #nextPut:." (a) The second "the" shouldn't be there. (b) Collection methods generally say that elements are enumerated _in the same order as_ they would be by #do: but don't actually require that #do: be used. Squeak's implementations of #nextPutAll: do *not* always use #do:, sometimes they use a block copy and it is a pity that the standard doesn't actually let them do this. (The difference is observable in Squeak.) TYPO OVERSPECIFICATION 11. 5.9.10 Description p267 " provides for the creation of objects conforming to the protocol whose sequence values are supplied by a collection." (1) "ReadWriteStreamfactory" should be "ReadWriteStream factory" (2) "WriteStream" should be "ReadWriteStream" (3) "objects conforming to the ... protocol whose ... values ..." should be "objects, conforming to the ... protocol, whose ... values ..." TYPOS - ReadStreams are created with #on:. WriteStreams are created with #with:. Why is there no #on: for ReadWriteStreams? You can get the same effect by using (ReadWriteStream with: aCollection) reset; yourself but why should you have to? SCOPE 12. 5.10 pervasive This is possibly the single biggest problem. Explicit reference is made to POSIX, yet the strong assumption is made that every nameable file is positionable. To that I can only say '/dev/tty'. (Well, I can also say pipes, named pipes. On my machine, there's /dev/mouse and /dev/pty*. on Linux machines there's /dev/proc. Unix systems have lots of unseekable "files". Rather strangely, when I tested lseek() on /dev/tty or stdin, on one UNIX system it claimed to have succeeded, and on another it closed my connection. (Yes, really!) I think some people have interpreted EISPIPE too narrowly.) It's not just a matter of #position and #position: being in the interface, it's a matter of there being no provision at all for them failing. Oh yes, there's another way that positioning can fail. Consider VM/CMS. (My knowledge of VM/CMS ends with major version 6, and is a little fuzzy because I haven't used it for a while.) VM/CMS is record-oriented. It has fixed length records and variable length records. Unusually, if you have a disc file full of variable length records, you _can_ seek to an arbitrary record in the file. What you can't do is see to an arbitrary byte/character. The only way I can see to implement #position: on such files (which are quite a nice way to represent text) is position: amount self reset. self skip: amount. which isn't very nice. Then there is a third way that positioning can fail, and I'm surprised that no-one who spent much time working on Xerox systems managed to get the point across. It was certainly an issue in Xerox Quintus Prolog, and the reason why I fought hard to keep byte-addressing for files out of the Prolog standard. Consider a file of 16-bit text (our original concern was the XNS 16-bit character set used in Xerox Lisp) or wider (such as Unicode), represented in "compressed" form (these days, think of UTF-8 or better still, UTR-6). With UTF-8, position _as measured in characters_ is not the same as position _as measured in bytes_, a problem that already exists in MS-DOS and Windows when you do CRLF->cr mapping. (That's why the C standard does not require byte addressing for text streams.) The simplest way out then is to define #position to return a magic cookie that increases monotonically with position in the file (as C does) and to define #position: to work only for magic cookies recorded by #position, or possibly with 0. But it gets worse. The "compressed" form used with XNS characters, the SBCS->DBCS/DBCS->SBCS shift codes commonly used in VM/CMS, the similar code set escapes used in ISO 2022, and the rather nice UTR-6 have the property that you cannot interpret bytes at some position in a file without knowing what "shift state" you are in, even if the position is known to be where a valid character starts. So a "position" has to be an object that may contain a shift state as well as a byte position. Indeed, it could encode position in character sequence, position in underlying byte sequence, and shift state (if any). The CRLF->cr problem is a serious one even for the file streams in ANSI Smalltalk. A further problem here is that #cr is defined to deposit the same sequence of characters for all streams, so that aStream cr is not the same as aStream nextPut: Character cr. Since Character cr is a single character, an ANSI Smalltalk implementation appears to be *forbidden* to convert it to CR+LF on writing. 13. Also, there seems to be an assumption either that every file that can be opened for output can be opened in read-write mode. Well, in POSIX it is quite possible for a user to have permission to write to a file without having permission to read from it. So suppose there is a log file that I am supposed to append to: logStream := FileStream write: logFileName mode: #'append' check: true type: #'text'. This makes sense, does it not? But #contents is in the protocol, and it is absolutely unimplementable in this case, because the file system permissions flatly refuse to let me see the "past sequence values" of the file, yet there is no provision for #contents failing. Nor is there any way for a program to determine whether it _would_ fail without trying it and catching the entirely unspecified error that might (or then again, might not) be raised. There really _really_ need to be methods canChangePosition -> canReportContents -> Ah, you say, if someone is writing ANSI Smalltalk, it is up to them to ensure that they stay within the limits. But there is no way for a programmer to find out whether they are staying within the limits or not. Suppose you open a file (which _can_ be positioned), read a file name from it, and try to open _that_ file. You have no way of telling whether #position: or #contents can work, and in the case of a file with '-w--w----' permission, no way of telling even whether the attempt to open it will work, short of trying. Failure to adequately consider current file systems. 14. 5.10.1 Description p270 "translated from or two" should be "translated from or to" TYPO. 15. "is treated as a sequenced of 8-bit characters" ^ lose the "d" TYPO. 16. 5.10.2.1 #next: Definition p271 "The result is undefined if amount is larger than the number of objects in the receiver's future sequence values." Remember that we're talking about reading from file system objects here. The only way to know in advance whether there are that many characters left is to ask whether theStream contents size - theStream position >= amount but because #contents is unimplementable for many file system objects, the programmer cannot use this. This is not a cheap way to find the size of a file especially in a 64-bit file system. So you might try t := theStream position. theStream setToEnd. size := theStream position - t. theStream position. and cache the size. (It's a little odd that there is no #size message for positionable streams so that this dance can be avoided.) However, in many systems, the size of a file may change, even be reduced. So between the time that you determine the amount of data left and the time that you ask for the next n items, the amount left may have changed, so even in a positionable stream, you CAN'T predict whether it will be safe to call #next:. So in the very case where #next: has the greatest payoff (from using block copies) it is least safe to use it.