Smalltalk

Computing Streams

March 22, 2006 21:07:35.583

Here's an interesting tweak on existing Smalltalk ideas that Martin Kobetic and I wrote on the plane on the way back from our last planning meeting. Martin was wishing that streams were more composable, so that you could implement more complex encodings by just wrapping streams on top of each other. This is tricky to do with the existing streams and encodings. I was also talking about implementing some of the collection iteration methods on stream. VisualWorks already has do: implemented for readStreams, but things like collect: and select: can be very useful.

So Martin pointed out that those methods act on the stream like it was a collection, and you'd like an operation that acted on it like it was a stream. We started calling these "ing" methods - collecting:, selecting:, injecting:into:, and so forth. So,

#( 1 2 3 4 5) readStream selecting: [:each | each odd]. 
 

gives you back a readStream which, if asked repeatedly for its next element, will give you 1, then 3 then 5, then nil.

This is handy enough by itself, but gets more interesting. For one thing, you can stack these streams on top of each other. One of the standard ugly bits in Smalltalk is to do a collect: and select: together in order to filter out some elements, then apply a transformation to the other. You can just wrap one in the other, but then you end up creating an intermediate collection that you never use, and it's inefficient. Alternatively, you can implement a method that does them both at once, which I've seen called #collect:when:. But then you have to wonder how many of these special case combinations you'll end up writing. Using streams, you can wrap them, and no intermediate collection is required. So

(#( 1 2 3 4 5) readStream selecting: [:each | each odd]) 
      collecting: [:each | each squared].
 

gives you back a readStream whose elements would be 1, 9 and 25, but without the overhead of creating the intermediate collection.

For starting out as a thought experiment, this turns out to be a very nice and natural metaphor. Not only that, but we we were able to implement it all during the plane ride. Most of the standard iteration methods map very simply to this, although we didn't come up with a sensible meaning for detect:. It's in a package called ComputingStreams, that's published in the public repository. Next post, we'll look at some of the hairier computations you can do with these.

Smalltalk

Hairy Computing Streams

March 22, 2006 21:27:19.083

The previous post talked about computing streams, and how you could make nicely composable streams parameterized by blocks. One big reason we wanted to do this is to use it for things like character encodings. For example, do Base64Encoding, or URL encoding putting invalid URL characters into UTF8-encoded %HH strings.

That's more complicated, because we need the output to have more or fewer characters than the input. So we added a more general method transforming:, which takes a two-argument block, with an output stream and the argument.

Suppose we wanted to turn a stream of hex digits into a stream of bytes. So '0A142B' becomes #(16r0A 16r14 16r2B). To do this, we can't just stream over the input. We need to know if we're at the start of a two-digit sequence, in which case we return nothing, but remember where we are, or if we're at the end, we need to build the byte and return it. This is relatively complex to do with normal streams, but can be expressed as

| previousByte| 
previousByte := nil. 
inputStream transforming: [:output :each | 
    | result|
    previousByte isNil 
        ifTrue: [previousByte := each] 
        ifFalse: [ 
            result:= (('0123456789ABCDEF' indexOf: previousByte) - 1) bitShift: 4.
            result:= result bitOr: ('1023456789ABCDEF' indexOf: each) - 1. 
            output nextPut: result. 
            previousByte := nil] ]. 
 

That's not trivial, but it's a lot simpler than some other ways of doing it.

Another example comes from Blaine Buxton. He had an example for collecting words from an input string and playing around with blocks to do it. This got me thinking. In that case we need not only to do something when we see a character, but something at the end. But it seems like a common pattern, grouping things from a collection together. So I defined a method accumulating:into: which takes an initial value and a block. The block can either return nil, indicating that we're done with this entry, or return any other value, indicating that that's the new thing being accumulated. It's kind of a twist on inject:into:. With that, the code for creating words becomes just

 ('one two three four' readStream 
    accumulating: String new 
    into: [:sum :each | each isAlphaNumeric 
        ifTrue: [sum , (String with: each)] 
        ifFalse: [nil]]) upToEnd. 
 

I'm not sure that as a general method this is worth having around all the time, but it was an interesting experiment, and since this is just written in terms of transforming:, you can always implement it directly. Blaine's example actually created word objects out of the result. That's a little bit messier, but the easiest thing to do is just wrap it in a collecting: operation that gets the Word objects.

One other wrinkle in this is to do with contentsSpecies. If we want to convert a string to a bytearray, using either collect: or a stream, we run into a problem that the system tries to automatically figure out the result collection type for us. So in this case, if we did an operation like upToEnd, we'd get an array, which might not be what we want. For character encodings, it's almost certainly not what we want. For other types of conversions, we might get an error, e.g. if streaming over a bytearray and creating characters. But now we have these conversion operations visible as objects - the streams that provide the answers. So it's very easy to implement a contentsSpecies: method that lets us tell it what kind of collection we want, and to get back a byte array as a result.

This is really just the way general streams of values work, if you go back to something like LISP/Scheme and Structure and Interpretation of Computer Programming. Smalltalk streams, though, had too much of a history of being tied to character processing, and didn't provide these kinds of transformations. There seem to be a lot of other applications too. One of the first things I did was look at URL-encoding, which is incredibly messy right now, and has nested streams, but with code at one layer that has to poke into the internals of layers inside it. It turned the main definition into about eight lines of code. It's a little bit slower, but not that much, and it's way more maintainable.

This is still very much an experiment. Right now, these only work for readStreams, for one thing. We still have to figure out the corresponding metaphors for writing. But so far I'm very happy with it.

Computers

Crashed Hard Drive

March 23, 2006 9:51:14.329

Last week I had a hard disk soft crash. Over the weekend, the laptop seemed to have died, then when I rebooted I would start getting intermittent errors, and some very ugly noises from the inside. My backups not being everything you could want, I started copying stuff off, but it only got part way through before it was consistently crashing.

Then I got a wonderful suggestion off the #smalltalk IRC channel from Andres Valloud. He suggested turning the laptop on its side in order to boot it. I don't know what the logic is - I assume there's somehow less strain on the mechanism that way. Anyway, it booted beautifully, and I was able to copy off all the things that I hadn't considered important enough to back up until I was faced with the imminent prospect of losing them. This included my Smalltalk Solutions slides, which I'd finished (only slightly past deadline) but hadn't yet uploaded. Thanks Andres!