Memory Management

It's the Allocation Stupid

November 10, 2004 16:24:38.386

With a release rolling out and the build servers frozen, I've had some time to "play" with some pet projects. I've learned some interesting thing about memory management. This may be long.

I maintain a large legacy soft real time processing application written in C. It really really really wants to be much more dynamic than it is. Currently it follows the age-old strategy of just pre-allocating everything against worse case scenarios. This only scales so far.

One of the thoughts I've had is to migrate it towards Objective-C. Objective-C is an elegant superset of C that gives you a very Smalltalk like object paradigm. The one glaring difference is that it has no memory management facilities. The community seems to eschew automatic memory management, preferring instead to use a system of macros which manage reference counters. I found in simple prototypes, that half of my time and code was spent in memory management issues. This is frustrating for a Smalltalker.

There's this thing out there called the Boehm Collector that purports to be a garbage collector for C. Given what it has to work with, it's an incredibly novel implementation. Basically, you replace your alloc/free/etc calls with variants of the same called GCmalloc, etc (or if you're really crazy, you'll just re#define them), and then you link in this libgc library. Happily, for versions of GCC prior to 3.4.2, they actually build an objective-c runtime with this thing. Which means, if you can live without the weighty gnustep libraries, you indeed can have an automatic memory managed objective-C.

I thought I'd test this out. I put together a Coordinate object (time based point basically), and a simple Sequence object (thing OrderedCollection here). The basic test was to fill the Sequence with 1000 of these Coordinates, and then loop 20000 times, replacing all of the slots over and over again with new Coordinates. Past the original fill, this means 2,000,000 objects created and stored, 1000 of which are reachable at any time, and the others being garbage.

On my box (dual 3.2Xeon@6300 bogomips), it took roughly 9.5 wall seconds to complete the task. I used top in a separate shell to observe that the programs memory consumption stayed nice and steady. For a baseline, I then recompiled with the non-gc runtime. I did not introduce any memory managment code. I ran it and it too took about 9.5 seconds to run! Of course, top showed that it consumed more than 500MB of RAM before it finished. This was very enlightening. The myth seems to be that manually manipulated reference countering is faster than an automatic collector. But if an automatic gc is not noticably different from none at all... then it's doubtful that the manual ref counting is going to be any different, except... you'll have to get it just write with every allocation, store, etc and you'll have more lines of code.

On a lark, I thought I'd give VisualWorks a chance to compete. The equivalent code went something like this:

seq := OrderedCollection new.
1000 timesRepeat: [seq add: Point zero].
20000 timesRepeat: [1 to: 1000 do: [:n | seq at: n put: Point zero]]

Mind you, Smalltalk is this really slow language. The looping is much slower than in C. And the bounds check and translation for every store is expensive too. So, it should go slower. Or at least about the same. It took ONE second. Give or take 50 or so milliseconds. Yeah, that's right about 9x faster. And yes, at the end of the return, all of the memory had been reclaimed, not just allocated.

While I think understand why, I must say I didn't realize it would be so marked of a difference. The VisualWorks VM allocates large blocks of memory, and then as it allocates each new object, it just advances a pointer. Whereas a C alloc does quite a bit more. This is what allows the VisualWorks VM to really shoot for the "everything is an object" point. I must say, I am once again impressed.

Two other benchmarks I ran were one with ST/X which scored about 4.5 seconds (expected it to be faster), and one with just raw C (no objective-C) which took 4 seconds. I am curious how Java or C# would fair in similiar cases? If anyone can concoct a test for either of those platforms, I'd be willing in seeing how fast they can chew through objects. Remember, at the end of your test run, you have to verify that all of the transient allocations have been reclaimed. IOW, it's not just an allocation test, but reclaimation as well. Any one else want to run any of the other Smalltalks, that would be cool too. If you can, run the VW snippet above and report that as the baseline measurement.

Comments

It's the Allocation Stupid

[ Martin Kobetic] November 10, 2004 18:34:47.396

Comment by Martin Kobetic

A thousand points is about 20K, my guess is that they never tenured out of eden, i.e. very cheap reclamation

Did you miss...?

[Travis Griggs] November 10, 2004 20:19:22.257

Martin, did you miss the 20000 timesRepeat:? The initial population of 1000 points is indeed around 20K. But then 20,000,000 more points are chewed throw. So if 1000 points is 20K, and we create 20000 * 1000 more points, we get 20000 * 20K or 20K squared, which is roughly 400 MB. Which is well over the reclamation threshold.

Performance and memory management

[Pensieri di un lunatico minore] November 11, 2004 13:28:37.982

Trackback from Pensieri di un lunatico minore

Performance and memory management

Travis Griggs talks about memory management and performance, and this reminded me of something we ran into at work. A......

VSE speed

[Giorgio Ferraris] November 11, 2004 13:46:13.414

Hi, Travis, I tested VSEW 3.1.3 (old and on the way)

VW: around 2 seconds VSE: around 4.2 seconds (so half the speed). Not bad for a virtual machine dated 1996.

(PS: VSEW is VisualSmalltalk Enterprise Window edition, the old Digitalk Smalltalk)

[] November 11, 2004 14:03:33.088

Travis, it's still only 20K of Points that is referenced at any given point in time. When the active eden space is filling up (1/2 of 300K ?), there's still only 20K of survivors that get copied over and the rest is simply left behind, no additional reclamation cost. I think this is the typical scenario where this strategy should shine. An interesting experiment might be collecting results for different sizes of the collection. I'd expect that as the collection size starts to come close to the eden treshold, the scavenges will be too frequent making noticeable difference. I'd be curious to see what happens when the Points start to tenure. Would it be just few points here and there or would there be a lot of them tenuring at once.