Computing Streams
Here's an interesting tweak on existing Smalltalk ideas that Martin Kobetic and I wrote on the plane on the way back from our last planning meeting. Martin was wishing that streams were more composable, so that you could implement more complex encodings by just wrapping streams on top of each other. This is tricky to do with the existing streams and encodings. I was also talking about implementing some of the collection iteration methods on stream. VisualWorks already has do: implemented for readStreams, but things like collect: and select: can be very useful.
So Martin pointed out that those methods act on the stream like it was a collection, and you'd like an operation that acted on it like it was a stream. We started calling these "ing" methods - collecting:, selecting:, injecting:into:, and so forth. So,
#( 1 2 3 4 5) readStream selecting: [:each | each odd].
gives you back a readStream which, if asked repeatedly for its next element, will give you 1, then 3 then 5, then nil.
This is handy enough by itself, but gets more interesting. For one thing, you can stack these streams on top of each other. One of the standard ugly bits in Smalltalk is to do a collect: and select: together in order to filter out some elements, then apply a transformation to the other. You can just wrap one in the other, but then you end up creating an intermediate collection that you never use, and it's inefficient. Alternatively, you can implement a method that does them both at once, which I've seen called #collect:when:. But then you have to wonder how many of these special case combinations you'll end up writing. Using streams, you can wrap them, and no intermediate collection is required. So
(#( 1 2 3 4 5) readStream selecting: [:each | each odd]) collecting: [:each | each squared].
gives you back a readStream whose elements would be 1, 9 and 25, but without the overhead of creating the intermediate collection.
For starting out as a thought experiment, this turns out to be a very nice and natural metaphor. Not only that, but we we were able to implement it all during the plane ride. Most of the standard iteration methods map very simply to this, although we didn't come up with a sensible meaning for detect:. It's in a package called ComputingStreams, that's published in the public repository. Next post, we'll look at some of the hairier computations you can do with these.
Comments
Why the -ing:?
[Brian Rice] March 23, 2006 12:19:36.000
First, this is great to see. I've also made similar improvements in Slate's stream system, and they're really fun to work with. However, why use the suffix? I think there's an advantage to being able to use collections and streams interchangeably, just thinking of streams as lazily-allocated collections. Then you could e.g. ask #contents of a Stream and get all of the results when you need to, or #contents of a Collection to get the collection itself (actually we do "as: Collection" but that's beside the point). There is a similar selector #readStream which does what Smalltalk does and then on ReadStreams, it returns the stream itself. Slate has many places where collections and streams are taken as arguments and used interchangeably. Also, from a type-inferencing standpoint, having #collect: etc. work on both Collections and Streams yields a simple rule: the result will have the character of the original (as much as makes sense anyway), so Collection->Collection and Stream->Stream. I would actually prefer it if the -ing: suffix were used for selectors on Collections than produced Streams, that way the sense of the thing were distinguishable. What do you think?
Why the -ing:?
[ Alan Knight] March 23, 2006 15:25:24.000
Comment by Alan Knight
Well, the biggest practical reason was that I already had an implementation of collect: on streams which would return a collection. And I did implement e.g. collecting: on collections to return a stream. So the invariant I had was with the return type. With collect:->Collection and collecting:->Stream, regardless of the receiver. I hadn't thought too much about type inference, but If I had the methods that return streams as #collecting: on Collection, and as #collect: on a stream, then I don't have a way to polymorphically get a stream.
Having #readStream on Streams and contents on Collections seem like good things.
I see
[Brian Rice] March 23, 2006 21:33:59.000
I'll probably stick with the idioms the way I have them. Incidentally, to get a collection out of a ReadStream, i have >> defined on it (a pipe) which will grab a writeStream out of the argument (so it could be anything responding to writeStream or a WriteStream for such a thing), run a block over all the elements to "transfer" them, and then return the contents of the target. The target doesn't even need to be a collection or a stream, just something with a default writeStream that it will provide when asked. It has the feel of shell programming to an extent - perhaps another selector would be more suitable for Smalltalk-80 just to avoid confusion.
Fun with Streams
[Ralph Johnson] March 27, 2006 22:49:49.000
I've had collect: and select: defined on Stream for a long time. They return streams; a CollectStream and a SelectStream, to be precise. There are lots of nice tricks you can do with them.
See http://wiki.cs.uiuc.edu/PatternStories/FunWithStreams