ObjectStudio8

String Concatenation vs. Streams

October 11, 2009 14:43:33.678

I've seen several ObjectStudio applications, that do heavy String manipulation. Most of the time, String>>+ is used to concatenate two strings.
This method is pretty expensive. Like an Array>>add: a new String is created and both Strings are copied into.
Especially inside a look, this overhead of creating and destroying temporary instances just adds up.
Instead of String>>+ or String>>, I recommend to use Streams.

IMHO, the easiest way to create a WriteStream is
writeStream := String new writeStream.

In order to prove my statement, I rewrote String>>breakUsing: using Streams.
Before:
oldBreakUsing: aString
    | ans subpart |
    ans := OrderedCollection new.
    subpart := ''.
    self do:
            [:ch |
            (aString includes: ch)
                ifTrue:
                    [subpart notEmpty ifTrue: [ans add: subpart].
                    subpart := '' + ch]
                ifFalse: [subpart := subpart + ch]].
    subpart notEmpty ifTrue: [ans add: subpart].
    ^ans asArray


After:
breakUsing: aString

    | ans subpart |
    ans := OrderedCollection new.
    subpart := String new writeStream.
    self do:
            [:ch |
            ((aString includes: ch) and: [subpart notEmpty])
                ifTrue:
                    [ans add: subpart contents.
                    subpart reset].
            subpart nextPut: ch].
    subpart notEmpty ifTrue: [ans add: subpart contents].
    ^ans asArray


The test is very simple:
Time millisecondsToRun: [100000 timesRepeat: [
 'abbbbbbbbbbc-dfffffffffffef-gggggggggggggggghi-jjjjjjjjjjjjkl-mmmmmmmmmmmmno-ppppppppppqr-stu-vw-xzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzyz' breakUsing: '-' ]].


Time millisecondsToRun: [100000 timesRepeat: [
 'abbbbbbbbbbc-dfffffffffffef-gggggggggggggggghi-jjjjjjjjjjjjkl-mmmmmmmmmmmmno-ppppppppppqr-stu-vw-xzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzyz' oldBreakUsing: '-' ]].


The new code takes 1683ms vs. the 6148ms it took the old implementation to run.
Now I know, ObjectStudio still uses String>>+ all over the place, but when something catches our eye, we try to change it.
Maybe you can tell us, which area is especially painful for you?


posted by Andreas Hiltner